Under the hood of the includes method in Rails
The way the includes method work in Active Record is one of the most common questions during interviews, yet many developers are unaware of how that method behaves. Therefore, it’s worth saying that the method’s behavior is interesting as it behaves differently depending on the situation.
This article is not a deep dive into ActiveRecord’s internals, but a meaningful and straightforward explanation of how includes works and in which case we should be using it.
The problem
We want to load a collection of records with some relations, but such an operation causes too many SQL queries to be performed, which takes a lot of time. We are building a blog application, and on the list of articles, we would like to put the number of comments for every article. The structure looks as follows:
class Article < ApplicationRecord
has_many :comments
end
class Comment < ApplicationRecord
belongs_to :article
end
With the following database:
Simple database call to get articles and comments count produces multiple SQL queries:
Article.all.each do |article|
puts "#{article.title} - comments: #{article.comments.size}"
end
Such a situation is a perfect scenario for using includes to improve the performance of the code.
The solution
With the includes added, our code is the following:
Article.includes(:comments).each do |article|
puts "#{article.title} - comments: #{article.comments.size}"
end
and two queries are performed instead of four:
It is a top-level overview of the case where you can benefit from using includes
. Now it’s time to explain why includes can behave differently in some cases.
Two faces of includes
There are two methods used by includes: preload
and eager_load
. Each of them behaves differently and includes choose which one to use by answering those questions:
In the above case with articles and comments, we simply wanted to access associated records to get the number of comments per article. That’s why includes
used preload
. If we would modify our query from:
Article.includes(:comments)
to
Article.includes(:comments).where(comments: { id: 1 }).references(:comments)
Then a different query will be produced because eager_load
is going to be used instead of preload
. So what's the difference between preload and eager load?
Preload versus eager load
As I mentioned before, preload it’s used when we want to access associated record, and it performs two queries: one to load primary records and second to load associated records.
Since two separated queries are performed, it’s impossible to filter records using associated records (comments):
Article.includes(:comments).where('comments.id != 1').map { |a| a.comments.size }
# => ActiveRecord::StatementInvalid (PG::UndefinedTable)
We need to tell ActiveRecord that we want to refer to another table which is comments in our case; that’s why I used the references(:comments)
part. I simply told ActiveRecord that I want to access associated records and filter the query using them so it can’t perform two separate queries.
When preload
can’t be used, eager_load
is used, which produces the query with left outer join to pull only those articles that match criteria but with associated comments.
If you don’t want to use includes
If you want to make it clear when two queries are produced and when only one, you can use preload and eager load directly without letting includes to decide:
Article.includes(:comments).where('comments.id != 1').references(:comments)
# is the same as
Article.eager_load(:comments).where('comments.id != 1')
and the same for preload:
Article.includes(:comments)
# is the same as
Article.preload(:comments)