Friday, March 5, 2010

I've been had!

Here is a lesson for you that you should take to heart: Trust your mentors, but find out for yourself.

When I first heard about batched finding in ActiveRecord (#find_each adn #find_in_batches), it was a great revelation. My buddy told me how it worked, and I loved it!

According to him, find_in_batches used a "batch_size" hash option, defaulting to 1000, to decide how many records to load into memory at a time, and "find_each" would perform a query for each record one at a time.

Give those options, and wanting to balance my memory conservation against responsible use of database time, I ended up with a lot of blocks like this:


Model.find_in_batches do |models|
  models.each do |model|
    ....
  end
end

It seemed tedious to do a nested block every time, but I didn't want to use the "find_each" option which was going to slam my database with a million single row queries.

Eventually I'd repeated myself enough that I decided I was going to do something about it. I figured I'd create a plugin that would monkeypatch in a method called "find_each_in_batches" that would do the extra block for you, so you could automatically iterate over one model at a time, but still have the finding occur in batches. In order to get the namespacing right, I opened up the source file for ActiveRecord's built in batching methods (batch.rb), and imagine my surprise when I see this:


def find_each(options = {})
  find_in_batches(options) do |records|
    records.each { |record| yield record }
  end
  self
end

What do you know! #find_each actually does exactly what I was planning on making my new plugin do! My buddy had been misinformed, and had in turn misinformed me, and I just took his word for it!

If I had checked out this source code myself in the first place, I would have known this in the first place, and wouldn't have those double iteration blocks littered throughout my code base.

Reading the source code is a good habit anyway, because it introduces you to idioms the pros are using that you may not be familiar with and is a great way to improve your own code. But even just a glance to confirm what i'd been told with the inline documentation would have prevented a world of hurt.

So, while I'm here slapping my forehead, I encourage to you take heed to what I said at the top: Trust your mentors, but find out for yourself.

No comments: