Counting things in Active Record

Seemingly it may seem that counting records in the database with ActiveRecord is straightforward. In reality, it looks more complex as .count, .size, and .length methods are not working the same way. The lack of knowledge about their behavior may lead to performance problems.

The count method

Ruby implements its own count method, which counts the number of elements in the given array. However, ActiveRecord::Relation class implements its own implementation of the count method. Calling it on the relation will perform the COUNT() query:

User.all.count
# (4.8ms)  SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000

The result of such call is not memoized so that it will perform the COUNT() query every single time we would call it:

users = User.all

users.count
# (4.8ms)  SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000

users.count
# (4.3ms)  SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000

As I mentioned before, Ruby implements its own count method, so if we would like to use it on the relation, we have to convert the relation first to the array and call the count on it:

User.all.to_a.count
# User Load (18.8ms)  SELECT `users`.* FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000

Such call is almost four times slower than the count called on the relation, so beware of that.

The size method

Like the count method, the size method has its implementation in both Ruby and Active Record. This time the usage with ActiveRecord::Relation is a little bit more tricky because it behaves differently depending on the case.

Not loaded records

When your records are not yet loaded, the size method performs the COUNT() query on the records:

users = User.all; nil
users.loaded? # => nil
users.size
# (4.8ms)  SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => 20000

Loaded records

In the opposite case, when your records are already loaded, the size method behaves like the standard Ruby implementation and counts the objects without performing the query:

users = User.all; nil
users.loaded? # => nil
users.load
users.loaded? # => true
users.size
# => 20000

Records from association with counter cache

There is a case where the size method called on not loaded records won't perform the COUNT() query on the database. It happens when your association has counter_cache option defined:

class Address < ActiveRecord::Base
  belongs_to :user, counter_cache: true
end

class User < ActiveRecord::Base
  has_many :addresses
end

The User model should also have the addresses_count column defined, which holds the number of addresses records associated with the given user. Now, when the size method is called on the association that it is not loaded, the COUNT() query is not performed:

user = User.last
user.addresses.size
# => 5

Without the counter_cache option the query will be performed:

user = User.last
user.addresses.size
# (0.3ms)  SELECT COUNT(*) FROM `addresses` WHERE `addresses`.`user_id` = 14
# => 5

The length method

ActiveRecord::Relation does not implement own length method so results are always converted to the array and then counted:

users = User.all; nil
users.loaded? # => nil
users.length
# User Load (18.7ms)  SELECT `users`.* FROM `users` WHERE `templates`.`deleted_at` IS NULL
# => 20000

users.loaded? # => true
users.length
# => 20000

It's not recommended to call length on the records that are not yet loaded as it will load all objects into the memory.

Checking if we have any records to count

What if we don't want to count records but verify if there are any records at all? We should also be careful here. Most likely, we have a few ways to perform this action:

  • Using any? to verify if any of the elements in the collection is present
  • Using empty? to verify if the collection does not contain any element
  • Using present? to verify if the collection's elements are present
  • Using .size == 0, .count == 0 and .length == 0 expressions

Regarding the last option with size, count, and length, we are already aware of the consequences of using them. How about the other three options?

Using any?

The any? method checks every element of the array, and if at least one evaluates to true then it returns true as a result:

elements = [nil, false, true]
elements.any?
# => true

elements = [nil, false, false]
elements.any?
# => false

In terms of operating on the ActiveRecord::Relation object, .any? behaves the same way as .size does. It performs the COUNT() query when records are not loaded, it operates on the collection when records are loaded, and it does not serve the COUNT() query when the relation is using the counter_cache option.

users = User.all; nil
users.loaded?
# => nil
users.any?
# (4.8ms)  SELECT COUNT(*) FROM `users` WHERE `users`.`deleted_at` IS NULL
# => true

users.load
users.loaded?
# => true
users.any?
# => true

Using empty?

In dealing with the ActiveRecord::Relation object, the empty? method behaves the same way as any? does. When records are not yet loaded, it performs the COUNT() query; otherwise, it verifies if the collection contains at least one element.

The main difference between any? and empty? is that the first method is looking for at least one element that evaluates to true while empty? does not care about the collection's values. It only verifies if the collection has elements:

[nil, false, false].any?
# => false

[nil, false, false].empty?
# => false

Using present?

The usage of the present? method is not recommended unless you are sure that the records are loaded. When the records are not loaded, similar to the length method, the present? method loads all objects into the memory and then check presence:

users = User.all; nil
users.loaded?
# => nil
users.present?
# User Load (19.2ms)  SELECT `users`.* FROM `users` WHERE `templates`.`deleted_at` IS NULL
# => true

users.loaded?
# => true
users.present?
# => true

The most optimal choice

Considering all of those cases we discussed above, in most cases, the best choice to count things is to use the .size method and verify if there are any elements in the collection to use the.empty? or .any? method.

The only exception for the above choice is when your records are not loaded yet, but you will load them anyway. The most common example is a simple view where before rendering records, we check if we have anything to render:

<% if @users.any? %>
  <h1>List of users</h1>
  <% @users.each do |user| %>
    ...
  <% end %>
<% end %>

In the above case, we will perform the COUNT() query and then another query to load the records, which we would like to avoid. Instead of using @users.any?, we can use @users.present? as this will load the records into the memory. The .present? method uses the .empty? method under the hood, so it's still an optimal choice.

Bonus: the performance comparsion of .count, .length and .size

In pure Ruby, all of those three methods are counting the elements in the array. The question is, which one is the fastest? Let's check it out. To test the performance, we will use two arrays: one array with 1k number elements and 1k elements representing simple object instance. We will check the number of elements in each of those arrays 50k times.

Benchmarks code:

# test data
array1 = Array.new(1_000) { rand(1..9) }

class User; end

array2 = Array.new(1_000) { User.new }

# benchmarks
require 'benchmark'
n = 50_000

# array of numbers
Benchmark.bm do |benchmark|
  benchmark.report(".size") do
    n.times { array1.size }
  end

  benchmark.report(".count") do
    n.times { array1.count }
  end

  benchmark.report(".length") do
    n.times { array1.length }
  end
end

# array of objects
Benchmark.bm do |benchmark|
  benchmark.report(".size") do
    n.times { array2.size }
  end

  benchmark.report(".count") do
    n.times { array2.count }
  end

  benchmark.report(".length") do
    n.times { array2.length }
  end
end

The results:

  • .size - needed 0.002902 to count numbers and 0.002844 to count objects
  • .count - needed 0.003553 to count numbers and 0.003531 to count objects
  • .length - needed 0.002339 to count numbers and 0.002703 to count objects

The fastest method is .length, and the slowest is .count