Counting things in Active Record
Seemingly it may seem that counting records in the database with ActiveRecord is straightforward. In reality, it looks more complex as .count
, .size
, and .length
methods are not working the same way. The lack of knowledge about their behavior may lead to performance problems.
The count
method
Ruby implements its own count
method, which counts the number of elements in the given array. However, ActiveRecord::Relation
class implements its own implementation of the count
method. Calling it on the relation will perform the COUNT()
query:
User.all.count
# (4.8ms) SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000
The result of such call is not memoized so that it will perform the COUNT()
query every single time we would call it:
users = User.all
users.count
# (4.8ms) SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000
users.count
# (4.3ms) SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000
As I mentioned before, Ruby implements its own count
method, so if we would like to use it on the relation, we have to convert the relation first to the array and call the count
on it:
User.all.to_a.count
# User Load (18.8ms) SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000
Such call is almost four times slower than the count
called on the relation, so beware of that.
The size
method
Like the count
method, the size
method has its implementation in both Ruby and Active Record. This time the usage with ActiveRecord::Relation
is a little bit more tricky because it behaves differently depending on the case.
Not loaded records
When your records are not yet loaded, the size
method performs the COUNT()
query on the records:
users = User.all; nil
users.loaded? # => nil
users.size
# (4.8ms) SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000
Loaded records
In the opposite case, when your records are already loaded, the size
method behaves like the standard Ruby implementation and counts the objects without performing the query:
users = User.all; nil
users.loaded? # => nil
users.load
users.loaded? # => true
users.size
# => 20000
Records from association with counter cache
There is a case where the size
method called on not loaded records won't perform the COUNT()
query on the database. It happens when your association has counter_cache option defined:
class Address < ActiveRecord::Base
belongs_to :user, counter_cache: true
end
class User < ActiveRecord::Base
has_many :addresses
end
The User
model should also have the addresses_count
column defined, which holds the number of addresses records associated with the given user. Now, when the size
method is called on the association that it is not loaded, the COUNT()
query is not performed:
user = User.last
user.addresses.size
# => 5
Without the counter_cache
option the query will be performed:
user = User.last
user.addresses.size
# (0.3ms) SELECT COUNT(*) FROM `addresses` WHERE `addresses`.`user_id` = 14
# => 5
The length
method
ActiveRecord::Relation
does not implement own length
method so results are always converted to the array and then counted:
users = User.all; nil
users.loaded? # => nil
users.length
# User Load (18.7ms) SELECT `users`.* FROM `users` WHERE `templates`.`deleted_at` IS NULL
# => 20000
users.loaded? # => true
users.length
# => 20000
It's not recommended to call length
on the records that are not yet loaded as it will load all objects into the memory.
Checking if we have any records to count
What if we don't want to count records but verify if there are any records at all? We should also be careful here. Most likely, we have a few ways to perform this action:
- Using
any?
to verify if any of the elements in the collection is present - Using
empty?
to verify if the collection does not contain any element - Using
present?
to verify if the collection's elements are present - Using
.size == 0
,.count == 0
and.length == 0
expressions
Regarding the last option with size
, count
, and length
, we are already aware of the consequences of using them. How about the other three options?
Using any?
The any?
method checks every element of the array, and if at least one evaluates to true
then it returns true
as a result:
elements = [nil, false, true]
elements.any?
# => true
elements = [nil, false, false]
elements.any?
# => false
In terms of operating on the ActiveRecord::Relation
object, .any?
behaves the same way as .size
does. It performs the COUNT()
query when records are not loaded, it operates on the collection when records are loaded, and it does not serve the COUNT()
query when the relation is using the counter_cache
option.
users = User.all; nil
users.loaded?
# => nil
users.any?
# (4.8ms) SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => true
users.load
users.loaded?
# => true
users.any?
# => true
Using empty?
In dealing with the ActiveRecord::Relation
object, the empty?
method behaves the same way as any?
does. When records are not yet loaded, it performs the COUNT()
query; otherwise, it verifies if the collection contains at least one element.
The main difference between any?
and empty?
is that the first method is looking for at least one element that evaluates to true
while empty?
does not care about the collection's values. It only verifies if the collection has elements:
[nil, false, false].any?
# => false
[nil, false, false].empty?
# => false
Using present?
The usage of the present?
method is not recommended unless you are sure that the records are loaded. When the records are not loaded, similar to the length
method, the present?
method loads all objects into the memory and then check presence:
users = User.all; nil
users.loaded?
# => nil
users.present?
# User Load (19.2ms) SELECT `users`.* FROM `users` WHERE `templates`.`deleted_at` IS NULL
# => true
users.loaded?
# => true
users.present?
# => true
The most optimal choice
Considering all of those cases we discussed above, in most cases, the best choice to count things is to use the .size
method and verify if there are any elements in the collection to use the.empty?
or .any?
method.
The only exception for the above choice is when your records are not loaded yet, but you will load them anyway. The most common example is a simple view where before rendering records, we check if we have anything to render:
<% if @users.any? %>
<h1>List of users</h1>
<% @users.each do |user| %>
...
<% end %>
<% end %>
In the above case, we will perform the COUNT()
query and then another query to load the records, which we would like to avoid. Instead of using @users.any?
, we can use @users.present?
as this will load the records into the memory. The .present?
method uses the .empty?
method under the hood, so it's still an optimal choice.
Bonus: the performance comparsion of .count
, .length
and .size
In pure Ruby, all of those three methods are counting the elements in the array. The question is, which one is the fastest? Let's check it out. To test the performance, we will use two arrays: one array with 1k number elements and 1k elements representing simple object instance. We will check the number of elements in each of those arrays 50k times.
Benchmarks code:
# test data
array1 = Array.new(1_000) { rand(1..9) }
class User; end
array2 = Array.new(1_000) { User.new }
# benchmarks
require 'benchmark'
n = 50_000
# array of numbers
Benchmark.bm do |benchmark|
benchmark.report(".size") do
n.times { array1.size }
end
benchmark.report(".count") do
n.times { array1.count }
end
benchmark.report(".length") do
n.times { array1.length }
end
end
# array of objects
Benchmark.bm do |benchmark|
benchmark.report(".size") do
n.times { array2.size }
end
benchmark.report(".count") do
n.times { array2.count }
end
benchmark.report(".length") do
n.times { array2.length }
end
end
The results:
.size
- needed0.002902
to count numbers and0.002844
to count objects.count
- needed0.003553
to count numbers and0.003531
to count objects.length
- needed0.002339
to count numbers and0.002703
to count objects
The fastest method is .length
, and the slowest is .count