8

ActiveRecord::Core "#find" now reuses "#find_by" cache key

 2 years ago
source link: https://blog.saeloun.com/2022/02/09/rails-prevent-duplicates-in-find_by-cache
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

When querying using #find or #find_by results are stored to cache. This helps Rails load some queries from the cache instead of overloading the database server. Each query is responsible for generating a cache key and storing the result in the cache. This causes some irregularities to appear.

Before

One small oversight was #find and #find_by(id: ...) using different cache keys. Both queries return the same result but do not store the result to the exact cache location.

Let’s look into how ActiveRecord Core works:

def find(*ids) # :nodoc:
  # We don't have cache keys for this stuff yet
  return super unless ids.length == 1
  return super if block_given? ||
                  primary_key.nil? ||
                  scope_attributes? ||
                  columns_hash.key?(inheritance_column) && !base_class?

  id = ids.first

  return super if StatementCache.unsupported_value?(id)

  key = primary_key

  statement = cached_find_by_statement(key) { |params|
    where(key => params.bind).limit(1)
  }

  record = statement.execute([id], connection)&.first
  unless record
    raise RecordNotFound.new("Couldn't find #{name} with '#{key}'=#{id}", name, key, id)
  end
  record
end

We can see here that the cache key is just for primary_key (which in most scenarios is "id").

Let’s go through the #find_by method that accepts a hash of attributes.

def find_by(*args) # :nodoc:
  return super if scope_attributes? || reflect_on_all_aggregations.any? ||
                  columns_hash.key?(inheritance_column) && !base_class?

  hash = args.first

  return super if !(Hash === hash) || hash.values.any? { |v|
    StatementCache.unsupported_value?(v)
  }

  return super unless hash.keys.all? { |k| columns_hash.has_key?(k.to_s) }

  keys = hash.keys

  statement = cached_find_by_statement(keys) { |params|
    wheres = keys.each_with_object({}) { |param, o|
      o[param] = params.bind
    }
    where(wheres).limit(1)
  }
  begin
    statement.execute(hash.values, connection)&.first
  rescue TypeError
    raise ActiveRecord::StatementInvalid
  end
end

The cache key here gets set to hash.keys which returns an array of the columns that find_by searches with.

Which is where the ambiguity arises. While #find returns the cache key "id", find_by returns the cache key ["id"].

After

Rails ActiveRecord::Core “#find” now reuses “#find_by” cache key. Both queries use the same cache location.

Query Cache Key find(123) [“id”] find_by(id: 123) [“id”] find_by(id: 123, foo: true) [“id”, “foo”]

It was a simple fix added to the #find method, which now pushes primary_key to an array.

def find(*ids) # :nodoc:
  # We don't have cache keys for this stuff yet
  return super unless ids.length == 1
  return super if block_given? || primary_key.nil? || scope_attributes?

  id = ids.first

  return super if StatementCache.unsupported_value?(id)

  cached_find_by([primary_key], [id]) ||
    raise(RecordNotFound.new("Couldn't find #{name} with '#{primary_key}'=#{id}", name, primary_key, id))
end

Minor tweaks to core libraries can lead to huge benefits across applications!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK