Thursday, April 21, 2016

Dealing with transient test failures due to database results order

Fox by kindl_jiri from flickr (CC-ND)
The most #mildlyinfuriating aspect of testing are tests which work most of the time, but fail occasionally. Even worse are tests which always work when you run them individually, but sometimes fail when ran as part of a test suite.

The most common category of transient test errors I've seen are Capybara Javascript testing, and I don't have a great solution for those, but there's another category of really obnoxious tests - tests which expect results from database to be in specific order.

They usually work, as databases do the laziest thing possible, so when you ask one for a bunch of records without specifying any particular ordering, it will usually return them in whichever order they are physically on the disk, which usually corresponds to order of their creation, which then usually corresponds to their serial or GUID primary key, so you get that implicit ORDER BY id, most of the time.

Which is just fine, except once in a while the database will reorder physical records to compact tables, causing physical order of record to no longer correspond to primary key order, and test to fail.

Then you rerun it individually over and over, and it works every time, as this kind of reordering is not going to happen mid-test, only once enough repeatedly created and deleted data accumulated in the table. Like once every 20 full test runs, each half an hour long. Very frustrating to debug.

What if you could force such tests to reveal themselves somehow? Most databases won't be of much help, and ORM is standing in the way of adding ORDER BY to every query which doesn't have one, but fortunately it's not too difficult to tell ActiveRecord to shuffle results of everything that didn't request specific order, by placing this kind of code in your spec/spec_helper.rb or equivalent:

Rails.application.eager_load!
ObjectSpace.each_object(Class) do |cls|
  if cls.ancestors[1..-1].include?(ActiveRecord::Base)
    begin
      cls.instance_eval do
        default_scope { order('rand()') }
      end
    rescue
      warn "Can't order #{cls}"
    end
  end
end

I wouldn't recommend running it like that every time, and especially not in production, as ORDER BY RAND() everywhere is going to have annoying performance impact, but enabling it temporarily just to debug already existing transient test failures might be just the right tool.

No comments:

Post a Comment