Monday, May 25, 2009

Curse you, rake db:migrate!

Have you ever been in this situation?

"Hmm, this feature will require a big change to my database! I mean, we're going to have to touch every single record in that table."

If you have, you've probably followed up with this thought:

"Good thing I'm using Rails! They make it so easy!"

And then you went and wrote this migration:


def self.up
Model.all.each do |m|
#..some important function
#performed on every object..
end
end


And then you were really proud of how quickly that went, and you run it on your development machine, and it works really well. But THEN you push it to your staging or production server that has way more data than your dev machine, and you get this staring back at you from your command line:


** [out :: 123.123.123.100:8063] ==
** YourCrazyMigration: migrating
=========================================


And you stare at that for about 30 seconds before shouting:

"Mother F%^&er! I did it again! I can't believe I did it again! I built a stupid migration that uses the stupid 'all' method which is now dominating the memory on that box and I can either kill it and pick up the pieces or let it run for the next 3 hours as it pages the hell out of the hard disk!"

Well, since I did EXACTLY THAT just now, I decided that from now on we'll be using a new migration task at our development shop called "safety_migrate", which you're welcome to take advantage of if you'd like. It runs through every file in your migration directory, checking for the dreaded "all" method, and WILL NOT run the migrations unless every file is clean!

check the gist:

http://gist.github.com/117625

Happy Migrating!

7 comments:

tadman said...

Although using models within migrations is inherently dangerous as the schema may be out of date but the model itself is always current, which can lead to problems saving records or validating columns which do not yet exist, there are occasions where there is no other way.

In that case, a simple extension to ActiveRecord::Base to implement a memory-safe each method is best.

A rough idea of how to do this is http://gist.github.com/117653

A more thorough approach that's scope aware is implemented in http://github.com/theworkinggroup/rails_mysql_hacks/blob/a056df012a06162bcae7ae3b162eeb5281829d03/lib/mysql_hacks/extensions/base.rb

Srdjan said...

I feel for you man, but I have a question. Wasn't this possible to do through a background job? Something that could be put in a queue and processed later?

FlyCoder said...

you are now on my blogroll :)

http://flycoder.blogspot.com

Ethan Vizitei said...

@Srdjan

Yes, you are correct, there are several better ways we could have done this. The point of the code here was to make sure you never attempt to make database changes in this way ever again. :)

A background job would work, so would paginating within the migration, or doing direct SQL statements with cursors and stuff. About anything is better than just using the "all" method.

Thanks for the comment.

remi said...

You shouldn't use models in your migrations in the first place. I have never run into an "occasion where there is no other way."

Your tables / models will change over time. When you use models in migrations, you can easily end up with migrations that fail because X attribute or association doesn't exist anymore.

When I really want to use a model, I use migration_model ... just a little trick to create a model class that has none of your custom methods/association/etc, because those things like to blow up in migrations.

http://github.com/openrain/migration_model

If you need to touch all of the models in a table ... it's not pretty, but I'd say try doing it with SQL. If you've got a couple million rows, that's the only way to do it :/

That said ... if your table isn't huge, I'd use something like migration_model or just be sure not to use custom methods/associations when using models in your migrations. Unless you like it when migrations blow up.

Anonymous said...

Uh, so what's wrong with using batched find?

Model.find_each(:batch_size => 100 ) do
..
end

Will only pull 100 records at a time and won't kill your memory usage.

Ethan Vizitei said...

@Anonymous

There's nothing wrong with using a batched find, that's exactly what I would do. The problem is when you just forget that it's necessary. The code sample I gave would stop your migration from running until you changed the migration from using ".all" to using a batched find. Sometimes it pays to read rather than skim. :)