11

How ActiveRecord Uses Caching To Avoid Unnecessary Trips To The Database

 3 years ago
source link: https://www.honeybadger.io/blog/rails-activerecord-caching/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
How ActiveRecord Uses Caching To Avoid Unnecessary Trips To The Database

A general way to describe caching is storing the result of some code so that we can quickly retrieve it later. In some cases, this means storing a computed value to avoid needing to recompute it later. However, we can also cache data by simply keeping it in memory, without performing any computations, to avoid having to read from a hard drive or perform a network request.

This latter form is particularly relevant for ActiveRecord, where the database often runs on a separate server. Thus, all requests incur network-traffic overhead, not to mention the load placed on the database server when the query is performed again.

Fortunately, for Rails developers, ActiveRecord itself already handles a lot of this for us, perhaps without us even being conscious of it. This is nice for productivity, but sometimes, it's important to know what is being cached behind-the-scenes. For example, when you know (or expect) a value is being changed by another process, or you absolutely must have the most up-to-date value. In cases like these, ActiveRecord provides a couple of 'escape hatches' to force an uncached read of the data.

ActiveRecord's Lazy Evaluation

ActiveRecord's lazy evaluation is not caching per se, but we will be encountering it in code examples later on, so we'll provide a brief overview. When you construct an ActiveRecord query, in many cases, the code does not issue an immediate call to the database. This is what allows us to chain multiple .where clauses without having to hit the database each time:

@posts = Post.where(published: true)
# no DB hit yet
@posts = @posts.where(publied_at: Date.today)
# still nothing
@posts.count
# SELECT COUNT(*) FROM "posts" WHERE...

There are some exceptions to this. For example, when using .find, .find_by, .pluck, .to_a, or .first, it is impossible to chain additional clauses. In most of the examples below, I will be using .to_a as a simple way to force a DB call.

Note that if you are experimenting with this in a Rails console, you will need to turn off 'echo' mode. Otherwise, the console (either irb or pry) calls .inspect on the object once you hit 'enter', which forces a DB query. To disable echo mode, you can use the following code:

conf.echo = false # for irb
pry_instance.config.print = proc {} # for pry

ActiveRecord Relations

The first part of ActiveRecord's built-in caching we'll look at is relations. For example, we have a typical User-Posts relationship:

# app/models/user.rb
class User < ApplicationRecord
  has_many :posts
end

# app/models/post.rb
class Post < ApplicationRecord
  belongs_to :user
end

This gives us the handy user.posts and post.user methods to perform a database query to find the related record(s). Let's say we're using these in a controller and view:

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def index
    @user = User.find(params[:user_id])
    @posts = @user.posts
  end
...

# app/views/posts/index.html.erb
...
<%= render 'shared/sidebar' %>
<% @posts.each do |post| %>
  <%= render post %>
<% end %>

# app/views/shared/_sidebar.html.erb
...
<% @posts.each do |post| %>
  <li><%= post.title %></li>
<% end %>

We have a basic index action that grabs @user.posts. Similar to the previous section, the database query has not been run at this point. Rails then renders our index view, which, in turn, renders the sidebar. The sidebar calls @posts.each ..., and at this point, ActiveRecord fires off the database query to get the data.

We then return to the rest of our index template, where we have another @posts.each; however, this time, there is no database call. What's happening is that ActiveRecord is caching all these posts for us and does not bother trying to read from the database again.

Escape Hatch

There are times when we may want to force ActiveRecord to fetch the associated records again; perhaps, we know it is being changed by another process (a background job, for example). Another common situation is in automated tests where we want to get the latest value in the database to validate that the code has updated it correctly.

There are two common ways to do this, depending on the situation. I think the most common way is simply to call .reload on the association, which tells ActiveRecord that we want to ignore whatever it has cached and get the latest version from the database:

@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user.posts.reload # DB call
@user.posts # Cached new version, no DB call

Another option is to simply get a new instance of the ActiveRecord model (e.g., by calling find again):

@user = User.find(1)
@user.posts # DB Call
@user.posts # Cached, no DB call
@user = User.find(1) # @user is now a new instance of User
@user.posts # DB Call, no cache in this instance

Caching relationships is good, but we often end up with complicated .where(...) queries beyond simple relationship lookups. This is where ActiveRecord's SQL cache comes in.

ActiveRecord's SQL Cache

ActiveRecord keeps an internal cache of queries it has performed to speed up performance. Note, however, that this cache is tied to the particular action; it is created at the start of the action and destroyed at the end of the action. This means you will only see this if you a performing the same query twice within one controller action. It also means the cache is not used in the Rails console. Cache hits are shown in the Rails log with a CACHE. For example,

class PostsController < ApplicationController
  def index
    ...
    Post.all.to_a # to_a to force DB query

    ...
    Post.all.to_a # to_a to force DB query

produces the following log output:

  Post Load (2.1ms)  SELECT "posts".* FROM "posts"
  ↳ app/controllers/posts_controller.rb:11:in `index'
  CACHE Post Load (0.0ms)  SELECT "posts".* FROM "posts"
  ↳ app/controllers/posts_controller.rb:13:in `index'

You can actually take a peek at what's inside the cache for an action by printing out ActiveRecord::Base.connection.query_cache (or ActiveRecord::Base.connection.query_cache.keys for just the SQL query).

Escape Hatch

There are probably not many reasons you would need to bypass the SQL Cache, but nevertheless, you can force ActiveRecord to bypass its SQL cache by using the uncached method on ActiveRecord::Base:

class PostsController < ApplicationController
  def index
    ...
    Post.all.to_a # to_a to force DB query

    ...
    ActiveRecord::Base.uncached do
      Post.all.to_a # to_a to force DB query
    end

As it's a method on ActiveRecord::Base, you could also call it via one of your model classes if it improves readability; for example,

  Post.uncached do
    Post.all.to_a
  end

Counter Cache

It's pretty common in web applications to want to count the records in a relationship (e.g., a user has X posts or a team account has Y users). Due to how common it is, ActiveRecord includes a way to automatically keep a counter up-to-date so that you don't have a bunch of .count calls using up database resources. It only takes a couple of steps to enable it. First, we add counter_cache to the relationship so that ActiveRecord knows to cache the count for us:

class Post < ApplicationRecord
  belongs_to :user, counter_cache: true
end

We also need to add a new column to User, where the count will be stored. In our example, this will be User.posts_count. You can pass a symbol to counter_cache to specify the column name if needed.

rails generate migration AddPostsCountToUsers posts_count:integer
rails db:migrate

The counters will now be set to 0 (the default). If your application already has some posts, you'll need to update them. ActiveRecord provides a reset_counters method to handle the nitty-gritty details, so you just need to pass it IDs and tell it which counter to update:

User.all.each do |user|
  User.reset_counters(user.id, :posts)
end

Finally, we have to check the places where this count is being used. This is because calling .count will bypass the counter and will always run a COUNT() SQL query. Instead, we can use .size, which knows to use the counter cache if it exists. As an aside, you may want to default to using .size everywhere, as it also doesn't reload associations if they are already present, potentially saving a trip to the database.

Conclusion

For the most part, ActiveRecord's internal caching "just works". I can't say I've seen many cases that need to bypass it, but as with all things, knowing what goes on "under-the-hood" can save you some time and agony when you stumble into a situation that requires something out of the ordinary.

Of course, the database is not the only place where Rails is doing some behind-the-scenes caching for us. The HTTP specification includes headers that can be sent between the client and server to avoid having to re-send data that hasn't changed. In the next article in this series on caching, we'll take a look at the 304 (Not Modified) HTTP status code, how Rails handles it for you, and how you can tweak this handling.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK