Tuesday, May 26, 2009

Metric-Fu

The latest addition to my utility-belt of Ruby tools, Metric-Fu is giving me plenty of ideas on how to refactor my codebase.

Essentially, this library gives you the ability to take 8 of the most common code analyzing tools and run them all on your codebase at once, producing one consolidated report. I love it!

The full list can be found at the Metric_Fu page on rubyforge.org, but 2 of my favorites are listed here:

Roodi gives you some design help as it checks all kinds of common programming problems. Method have too many parameters? Cyclomatic complexity too high? Forget the else clause on a case statement? Roodi will give you the heads up you need.

Flay checks out your code for duplicate constructs and segments. It found several pieces of duplicated logic that I had never noticed before, I was really impressed. It also does soft matches on "similar" code, things that could probably be combined and simplified if you're clever. Very nice.

The great thing about metric_fu is that you just run one command, and all the packages get run for you, giving you one page afterwards containing links to the results of whatever package you want. Check it out, you might be surprised at how much your fingers start itching to go back and fix all the problems you didn't even know you had.

Monday, May 25, 2009

Curse you, rake db:migrate!

Have you ever been in this situation?

"Hmm, this feature will require a big change to my database! I mean, we're going to have to touch every single record in that table."

If you have, you've probably followed up with this thought:

"Good thing I'm using Rails! They make it so easy!"

And then you went and wrote this migration:


def self.up
Model.all.each do |m|
#..some important function
#performed on every object..
end
end


And then you were really proud of how quickly that went, and you run it on your development machine, and it works really well. But THEN you push it to your staging or production server that has way more data than your dev machine, and you get this staring back at you from your command line:


** [out :: 123.123.123.100:8063] ==
** YourCrazyMigration: migrating
=========================================


And you stare at that for about 30 seconds before shouting:

"Mother F%^&er! I did it again! I can't believe I did it again! I built a stupid migration that uses the stupid 'all' method which is now dominating the memory on that box and I can either kill it and pick up the pieces or let it run for the next 3 hours as it pages the hell out of the hard disk!"

Well, since I did EXACTLY THAT just now, I decided that from now on we'll be using a new migration task at our development shop called "safety_migrate", which you're welcome to take advantage of if you'd like. It runs through every file in your migration directory, checking for the dreaded "all" method, and WILL NOT run the migrations unless every file is clean!

check the gist:

http://gist.github.com/117625

Happy Migrating!

Sunday, May 17, 2009

Maximum Impact for Minimal Effort

Today was a day for working outside, and I got a lot of mulch spread over the landscaping at my house.

I love mulch because it's really easy to use, really cheap to buy, and really makes a difference in the way your yard looks. You can't get much more bang for your buck as far as landscaping goes.

Believe it or not, I was thinking a lot about work as I was hauling chips of cedar over to my flower beds, and I started considering what sorts of things are "mulchy" in the web development world.

Color

Unfortunately, people aren't often impressed by performance or functionality nearly as much as they are by aesthetics. A white page full of blue links that all do really cool stuff is just a huge turn off. The difference that can be made with a simple header, a 3-column layout, and a pleasant color scheme is phenomenal.

Central Navigation

Sometimes you have websites that have spaghetti links all over the place. Trying to get back to the homepage usually means either clicking the back arrow 14 times, or re-typing in the domain address. A very common and successful approach is to have a set of navigational links as part of the header, and it's successful for a reason: people know where to go. No matter where they are, the critical places on the website can be accessed easily.

Intelligent defaults

If you have a dropdown that has a list of states, and 80% of your users are local to your state, go ahead and default the selection to your state. If you have 25 reports users can run on your website, and 3 of them are used more often than any others, put those three at the top. It doesn't take a lot of work to reposition things, but it makes a big difference.

There you go. Three easy things to do that will make a large impact on your users.

Thursday, May 14, 2009

Joins and named_scopes in ActiveRecord

So many of us know how cool named_scopes are in ActiveRecord; they really make building complex queries quite pleasant compared to writing out big hairy SQL strings all over the place. However, in the past week as I have refactored my whole web application to use this excellent feature, I've run into some little-discussed items that I feel should be shared somewhere. Hope you enjoy, and anybody who is more advanced in their Rails-fu is welcome to give me some schooling as to where I've gone wrong on any of these, as most of my discoveries here are through trial and error.

joining "through" associations

Say you have a three-layer association. The following would be an example of this:

class Company < ActiveRecord::Base
has_many :employees
has_many :children,:through=>:employees

validates_inclusion_of :industry,
:in=>[“TECHNOLOGY”,”FINANCE”,”AGRICULTURE”]
end

class Employee < ActiveRecord::Base
belongs_to :company
has_many :children
end

class Child < ActiveRecord::Base
belongs_to :employee
has_one :company,:through=>:employee
end


So a company has many employees, and an employee has many children. This is pretty straight-forward. Now, let's say that we want to find all the children who's parents work for any company in the “TECHNOLOGY” industry. How could we accomplish this? Well, certainly one way would be to iterate over the associations, starting by finding all the technology companies, then iterating over each employee, and adding their children to a growing array. It would look something like this:


children = []
Company.find_all_by_industry(“TECHNOLOGY”).each do |company|
company.employees.each do |emp|
emp.children.each do |child|
children << child
end
end
end


But that's not great. That's a lot of lines of code to read for one qeury. Let's put the meat of this finding algorithm into a named scope instead:


class Employee < ActiveRecord::Base
named_scope :works_in_technology,
:joins=>:company,
:conditions=>”companies.industry = 'TECHNOLOGY'”
end

children = []
Employee.works_in_technology.each do |emp|
emp.children.each {|child|
children << child
}
end


That's a little better. You can add in joins to your query just by giving the symbol name of the association, so that's really nice looking. However, as long as we're joining on associations, why not just have another named scope in the Child model and do this whole thing without any iteration?


class Child < ActiveRecord::Base
named_scope :parent_works_in_technology,
:joins=>:company,
:conditions=>
”companies.industry = 'TECHNOLOGY'”
end

children = Child.parent_works_in_technology


Hey, that's a lot better! Only one problem: it doesn't work. Yeah, we do have an association between the child and company models, but it's “through” the employee model, and the named_scope doesn't know how to assemble that query properly (try it, you'll see the sql it generates isn't quite right). We can still do this as one named_scope join query, but we're going to have to get a little more detailed level of control:


class Child < ActiveRecord::Base
named_scope :parent_works_in_technology,
:joins=>”, employees, companies”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
companies.industry = 'TECHNOLOGY'”
end

children = Child.parent_works_in_technology


Now THAT works just fine. It's a little more verbose in the named_scope declaration itself, but it's really nice for wherever else you need to use that query in your codebase. Notice that you have to do the foreign key joining yourself in the conditions string, don't forget to do that or you will definitely not get the results you are expecting.

Watch out for duplicates in join queries

So now we're joining across multi-layer associations successfully, and now we have a new requirement: We're rolling out a new children's special to all the companies in our system and we need to get a list of all the companies who have employees with children younger than 12. Based on the last named_scope we built in the child model, it seems like we could do another one just like it in the company model.


class Company < ActiveRecord::Base
named_scope :has_employees_with_young_children,
:joins=>”, employees, children”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
children.age < 12”
end

companies = Company.has_employees_with_young_children


If you don't know how table joins work, it might seem like this would be fine. The problem comes up when you have a company that has more than one child under age 12: because you're asking it to return you one record for every match of the conditions, you'll get a company record back for every child at that company that's under 12! If “Innitrode” has 50 employees, and those 50 employees have collectively a total of 75 children under age 12, then this named_scope will return 75 innitrode records rather than the 1 you were expecting. This certainly would not be the behaviour you were looking for. Enter the “group” option. Grouping allows you to take the set of records you get back and compress them depending on a matching value in each one. For example:


class Company < ActiveRecord::Base
named_scope :has_employees_with_young_children,
:joins=>”, employees, children”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
children.age < 12”,
:group=>”companies.id”
end

companies = Company.has_employees_with_young_children


Presto, now the results you get back will be flattened so that each record with a company id of n is grouped into a single record, so you will get no duplicate company records in the array you have returned.

ReadOnly records

So now you're using these great joining named_scopes to find records based on all kinds of cross-table conditions, and things are working great. However, they will continue to do so until you need to update one of those records you retrieved, and then you will get a bit of a surprise when your save method throws an error saying that the record you are trying to update is read only.

Actually, there's a very good reason for this, you just have to know a little bit about how active record works. When you run a query with joins in it, you're getting back single records that have WAY more attributes than just the ones found in the model table you're dealing with. If you join employees to companies, than your employee object will have a hash of attributes that has all of the data in it's table, PLUS all the data in the table for the company he works for. So what if you update one of THOSE attributes instead of the ones that are part of the employee table? The update statement to the employee table wouldn't be able to find the column named “corporate address” or whatever, and would crash. So, the records that are returned from a joined query are just marked readonly to prevent that from ever happening. In a rails action, this would rarely be a big deal because you're usually just passing the ID of the object you want to update, plus the parameters that you need to change, so you'd be loading a new record out of the database anyway based on the ID, and that object will be writable. However, it's important to know how to combat this in case the situation every arises where you DO need to update one of these records. There are 2 ways that I can think of around it.

First, you can just reload the record you want to update by running another find just using the ID of the object.


company = Company.find(company.id)


Another approach that works is to update your named_scope declaration to use the “select” option:


class Child < ActiveRecord::Base
named_scope :parent_works_in_technology,
:joins=>”, employees, companies”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
companies.industry = 'TECHNOLOGY'”,
:select=>”children.*”
end


Now that you're only pulling back the attributes specifically from your model table, the objects won't be read only. Be careful with this one, though, because you'll run into a runtime error if you try to use anything that modifies the query like the “size” method.


children_count = Child.parent_works_in_technology.size


This is going to alter your query to call “Select count(children.*)”, and your database will reject this as invalid.

Chaining Join Queries

Finally, what if you have a few named scopes on the same model that join the same tables? Is this a problem?


class Company < ActiveRecord::Base
named_scope :has_employees_with_young_children,
:joins=>”, employees, children”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
children.age < 12”,
:group=>”companies.id”
named_scope :has_employees_with_male_children,
:joins=>”, employees, children”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id and
children.gender = 'M'”,
:group=>”companies.id”
end


It's not a problem as long as you don't chain them together. But as soon as you do this, you're in trouble:


Company.has_employees_with_young_children.has_employees_with_male_children


You're SQL query will now try to join in both of those other tables twice, which will result in ambiguous column references. Usually the answer to this problem is just to not chain joined queries together, but if you need to (like I did in one particular case), you can have the join itself be a different named_scope, so that you only run the joins once:


class Company < ActiveRecord::Base
named_scope :has_young_children,
:conditions=>”children.age < 12”
named_scope :has_male_children,
:conditions=>”children.gender = 'M'”
named_scope :join_in_children,
:joins=>”, employees, children”
:conditions=>
”employees.company_id = companies.id and
children.employee_id = employees.id”,
:group=>”companies.id”
end


This will work, you just have to remember to chain it in every time:

Company.join_in_children
.has_male_children
.has_young_children

Hope this helps someone. Any reader with other tricks they've learned with joins, or better ways to do something listed here, put it in the comments and if it's good I'll move it to an update in the post (with credit given to the author, of course).

Monday, May 11, 2009

Git solves all your branching and merging problems! Almost!

The entire development team here at Research to Practice (that is, Ray and I) have been working entirely out of git for a little while now. The problem was, we were using it just like Subversion. You see, having experienced a bad branch merge before, it's tough to get up the gumption to try again.



Why? Imagine getting food poisoning from a questionable sushi place in the midwest. Yeah, you know that it was totally the health code violations and 12-day-old-fish at that place that made you vomit 9 shades of purple, but you'd still have a hard time bringing yourself to eat anything at an upscale sushi restaurant on the coast of japan, wouldn't you?



Yeah, it's kind of like that.

We know that git is good at this stuff, we know that we want to take advantage of the great flexibility that using branches offers, but when it comes down to actually DOING it, there's always a reason why maybe we shouldn't try this just yet.

Last week, we decided to lay down the law. We wanted to have some features be worked on for a 2.0 release of our website, but we also wanted to continue deploying fixes and small enhancements to the currently running application. No way around it, it was time to branch out.

So it was with great fear and trepidation that we first unleashed this phrase upon our command line:


$> git checkout -b version_2_features


Oh Gods, what have we wrought?!

...Ok, so it wasn't that bad yet. The checkout command with the "b" option created a new branch for us, and after developing a small and inconsequential feature (a necessary precaution because we were fully fearing that the demons-of-parallel-development could swoop in at any moment and destroy everything we had done so far) we pushed this branch to the github repository so that we could both develop on it:


$> git push origin version_2_features


Alright, so far so good. Now we have our "master" branch, which was esentially "What is currently running in production", and our "version_2_features" branch where we could start developing everything we wanted to put into the version we were planning to release at the end of the summer.




Great Success!




So we started developing that way, and everything was moving along splendidly, until we hit something of a snag. Well, a boat anchor, really. See, the whole point of having this 2.0 branch be separate was so that it's content would not be deployed until much later, but SOMEhow, SOMEone, did SOMEthing (I'll let you ask Ray for the details behind that), and suddenly the version_2_features branch ended up being accidentally merged into master.



This resulted in an hour of us assuring ourselves that we could probably find a way out of this without having to admit we (he) ever screwed up, and we came up with a list of options.

1) Use git reset on the repo to bring it back to before the bad merge
2) Check out a previous commit version (before the bad merge), and then commit that as the head
3) Use git reset on a local repo, and push the result to github
4) Revert the merge commit, than when you want to merge for real later revert the revert.

Now out of the first 3 options, it turned out that exactly 0 of them were possible. There's no way (that we could find) to call "RESET" on our remote github repository, checking out a previous commit disconnected you from the head so you couldn't commit and call it the new head, and trying to push a branch that had been reset to the remote repo was met with a regular level of failure. Option 4 looked plausible, but beyond our ken, and neither of us is a git wizard just yet.

So we went with option 5, check out an old version of the repository, copy the entire directory structure into our working directory, commit the deltas, and bam, we're back to premerge state.

NEVER DO THIS

Little did we know at the time that this would later take us beyond regular and straight into the epic level of failure.


"What's the matter?" you might ask. "Isn't your directory back to the way you wanted it?"

The answer to that question is yes, our working directory was back to the production version, which was great for the master branch....but not for the history. You see, we still needed to be able to merge features that were being added to the version_2_features branch back into the master at some point in the future, and that's really hard to do if your repository version believes that those 2 branches have already been merged!!!

By the time we realized this, we'd already pushed these changes to the repo, and were beginning to consider just how screwed we were. Thus began the wailing and gnashing of teeth.

"We were right! Branching was a horrible idea! We've done exactly what should never have been begun!"



So we rebased and merged and copied and merged and went through every combination possible of recursive/3-way/octopus merging until we finally were PRETTY sure that everything was ok (about half a days worth of work), and do you know what I realized after that?

6) delete the remote github repository and create a new one from the one on your local machine that's still in a good state.

Yes. It was possible. It was even a good idea. if only I had thought of it sooner.

Moral of the story? git is great for branching and merging, far better than any solution I've used in the past. Unfortunately, even git can't protect you from your own stupidity. No tool is that powerful (except Java, that's what it was designed for).