Under the hood of the includes method in Rails

The way the includes method work in Active Record is one of the most common questions during interviews, yet many developers are unaware of how that method behaves. Therefore, it’s worth saying that the method’s behavior is interesting as it behaves differently depending on the situation.

This article is not a deep dive into ActiveRecord’s internals, but a meaningful and straightforward explanation of how includes works and in which case we should be using it.

The problem

We want to load a collection of records with some relations, but such an operation causes too many SQL queries to be performed, which takes a lot of time. We are building a blog application, and on the list of articles, we would like to put the number of comments for every article. The structure looks as follows:

class Article < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :article
end

With the following database:

image info

Simple database call to get articles and comments count produces multiple SQL queries:

Article.all.each do |article|
  puts "#{article.title} - comments: #{article.comments.size}"
end

image info

Such a situation is a perfect scenario for using includes to improve the performance of the code.

The solution

With the includes added, our code is the following:

Article.includes(:comments).each do |article|
  puts "#{article.title} - comments: #{article.comments.size}"
end

and two queries are performed instead of four:

image info

It is a top-level overview of the case where you can benefit from using includes. Now it’s time to explain why includes can behave differently in some cases.

Two faces of includes

There are two methods used by includes: preload and eager_load. Each of them behaves differently and includes choose which one to use by answering those questions:

image info

In the above case with articles and comments, we simply wanted to access associated records to get the number of comments per article. That’s why includes used preload. If we would modify our query from:

Article.includes(:comments)

to

Article.includes(:comments).where(comments: { id: 1 }).references(:comments)

Then a different query will be produced because eager_load is going to be used instead of preload. So what's the difference between preload and eager load?

Preload versus eager load

As I mentioned before, preload it’s used when we want to access associated record, and it performs two queries: one to load primary records and second to load associated records.

image info

Since two separated queries are performed, it’s impossible to filter records using associated records (comments):

Article.includes(:comments).where('comments.id != 1').map { |a| a.comments.size }
# => ActiveRecord::StatementInvalid (PG::UndefinedTable)

We need to tell ActiveRecord that we want to refer to another table which is comments in our case; that’s why I used the references(:comments) part. I simply told ActiveRecord that I want to access associated records and filter the query using them so it can’t perform two separate queries.

When preload can’t be used, eager_load is used, which produces the query with left outer join to pull only those articles that match criteria but with associated comments.

If you don’t want to use includes

If you want to make it clear when two queries are produced and when only one, you can use preload and eager load directly without letting includes to decide:

Article.includes(:comments).where('comments.id != 1').references(:comments)

# is the same as

Article.eager_load(:comments).where('comments.id != 1')

and the same for preload:

Article.includes(:comments)

# is the same as

Article.preload(:comments)