Avoiding ActiveRecord::PreparedStatementCacheExpired Errors

Adding columns to a database table without a default value or non-null constraint is generally considered a safe schema change — it won’t lock the database and is backwards compatible with code on deploy and won’t require taking the servers down for a maintenance window. However, we’ve noticed it is not completely zero downtime for us and occasionally run into a spike of ActiveRecord::PreparedStatementCacheExpired errors during deployment that resolve afterwards.

Digging into the error, we’ve discovered that it has to deal with our setup using a Postgres DB with prepared statements enabled in Rails. Sam Davies wrote an excellent post with a lot of background on the root of the error as well as how Rails handles it.

To summarize what we learned (original post here):

Rails is configured to use prepared statements for efficiency — most Rails apps use the same set of SQL queries over and over again, and using prepared statements lets the database cache the compiled plan for queries and reuse them to reduce the overhead on repeated queries.
The prepared statements become invalidated if the return result changes — for instance if you cache a SELECT * and then add columns on the table you select from
In this case, you have to deallocate the prepared statement and retry:
If this happens outside of a transaction, Rails will handle deallocation of the prepared statement and retry
If this happens inside of a transaction, Rails will handle deallocation of the prepared statement and raises ActiveRecord::PreparedStatementCacheExpired (this is what was happening to us)
At a framework level, Rails could not retry transactions, because it’s possible for users to write code with side effects such as external API calls, email sends, etc… inside of the transaction that could have unintended effects if repeated, but it was possible to check for transactions that only touched the DB and add a catch/retry there and transparently fix the issue.

The post also included this snippet of code that we could use to retry the errors:

# Make all transactions for all records automatically retriable in the event of
# cache failure
class ApplicationRecord
  class << self
    # Retry automatically on ActiveRecord::PreparedStatementCacheExpired.
    #
    # Do not use this for transactions with side-effects unless it is acceptable
    # for these side-effects to occasionally happen twice
    def transaction(*args, &block)
      retried ||= false
      super
    rescue ActiveRecord::PreparedStatementCacheExpired
      if retried
        raise
      else
        retried = true
        retry
      end
    end
  end
end

Armed with this knowledge, we thought we were well on our way to fixing this problem. However, we quickly discovered that it would be a huge task given our large monolithic codebase. There were transactions in a lot of different places and many transactions ran code owned by other teams that we could need their sign off on to verify the safety of retrying. We would have to dig through a lot of old code and coordinate across many teams to employ this solution and unfortunately our level of tech debt made it very costly — we wanted to get rid of this blip of downtime on our deploys but it was worth seeking out alternatives.

Looking back at our learning about the cause of the statement cache error, a second approach became apparent — we would not have to deal with ActiveRecord::PreparedStatementCacheExpired errors in the first place if our prepared statements didn’t get invalidated. “SELECT * from foo” will return different results after a schema change and become invalidated, but “SELECT bar FROM foo” will stay valid no matter how many columns we add to table foo (but it will become invalid if we remove column bar), all we had to do was get Rails to issue SELECT statements with a list of columns instead of *.

Luckily, we’ve noticed such queries before in our Rails logs when loading models that had ignored columns. Digging into the code confirmed that Rails will enumerate columns in a SELECT statement if there are any ignored columns:

  def build_select(arel)
    if select_values.any?
      arel.project(*arel_columns(select_values.uniq))
    elsif klass.ignored_columns.any?
      arel.project(*klass.column_names.map { |field| arel_attribute(field) })
    else
      arel.project(table[Arel.star])
    end
  end

Which lead to this initial solution we applied to some of our models that are most actively modified

class Model < ApplicationRecord
  self.ignored_columns = [:__fake_column__]
end

Which solved our problem, no transaction retries necessary, and could be done model by model, which aligned better with our team structure than going transaction by transaction. One risk with having columns listed in select queries is that ActiveRecord generates these queries using a cached list of columns generated on app start; if someone removes a column from the database in an unsafe way, our app would start erroring on reads for the model (there would only be errors on writes without our modification). However, that scenario is an exception that we already have guards against, we were ok with having the risk.

Hope this is useful to others! And curious to hear alternative approaches.

Avoiding ActiveRecord::PreparedStatementCacheExpired Errors

Avoiding ActiveRecord::PreparedStatementCacheExpired Errors

Recommend

What It’s Like to Work as an Infrastructure Software Engineer (at Flexport)

Sending Threaded Emails Using Rails ActionMailer

Happy 10th birthday to Everyday Rails!

5 great newsletters for the productive Rails developer

Work around flaky test failures in Rails with rspec-retry

Rails architecture tips from route helpers

Rails application templates made even easier with Rails Bytes

2 ways to test Rails log messages with RSpec

Start with what you know

Modern command line alternatives

About Joyk