Ok, new project. I believe it's dangerous to rely on code that you do not understand. As a rails-developer, I have tons of plugins and gems that I do not understand. See the problem?
To rectify this, I'm making it my goal to read through one of my main project's many dependancies each week. Two side benefits:
1) I will probably be better at writing my own open source libraries if I've seen a larger sample of how they're usually constructed.
2) code reading is good for you, but it's tough to find time to just sit down and crack open a library. This will give me a good reason.
So without further ado, today I'm doing a read-through of http://github.com/sdsykes/slim_scrooge, a great ActiveRecord optimizing library that has made a difference in the performance of my current main project. Don't expect anything linear here, I'm just going to record my notes and if you want to use them too you're welcome to them.
The point of the slim scrooge library is to moniter your active record queries, and optimize them so that they only pull back the columns that you end up using in that section of code. Let's find out how it works:
1) First thing I noticed. There is a test directory, but no tests. Problem? maybe....
2) Scratch my first note. It appears that SlimScrooge::ActiveRecordTest actually runs the ActiveRecord tests that are included with Rails. I guess this makes sense, as a regression test. Anything that filters activerecord should still pass the activerecord test suite. Still, this definitely means that the code itself is not under test. The gem could do nothing, and the tests would still go green. I'm not here to judge, though. I've written my own share of untested code.
3) first included file in the main library is a C extension called 'callsite_hash'. Looking in the /ext directory of the plugin. My "C" is a little rusty since I've been out of it for 3 years, but I think I get that it's defining the global ruby function "callsite_hash", and mapping it to the c function "rb_f_callsite" in this callsite_hash.c file. I don't know what it does yet, as it's the rb_f_callsite function is a little dense for my limited C skills, but maybe it will make more sense in context. So, moving on.
4) Next inclusion is SlimScrooge::SimpleSet (a subclass of Hash, /lib/slim_scrooge/simple_set.rb). This class stores a set of keys based on a submitted array, all mapped to the value "true". Because of the syntax, each time an element is added, it will only create a new entry if it's not already in the set. So basically it's a set of unique elements with some helper methods to keep operations restricted to only the keys (like a collect method that only runs over the keys array). Knowing what the gem does, at this point I'm guessing this is the structure that column names are stored in so you know which ones were used and which ones weren't after a query. We'll see.
5) Moving on to /lib/slim_scrooge/callsites.rb, which defines the class SlimScrooge::Callsites (no parent class). This class only has static methods, so I guess it's never instantiated. It has a class-level variable called @@callsites, which is a hash. Write access to the hash is synchronized through the uses of a Mutex which is instatiated at the time of class definition as a class-level constant (SlimScrooge::Callsites::CallsitesMutex). Given that I don't know what's being stored here, I don't feel like I can accurately analyze it. Therefore, I'm jumping over to the top-level algorithm in /lib/slim_scrooge/slim_scrooge.rg
6) lib/slim_scrooge/slim_scrooge.rb definately is the meat of the gem. SlimScrooge uses good old alias_method_chain to bring about "find_by_sql_with_slim_scrooge" (defined in the gem) and "find_by_sql_without_slim_scrooge" (the original "find_by_sql" method in ActiveRecord). This is how the gem inserts itself into every activerecord call. In the "find_by_sql_with_slim_scrooge", we see what's being done step by step:
A) if the sql passed in is an array (that is, a custom query directly from a programmer writing Model.find_by_sql("blah")), don't bother. Let it run like normal.
B) if this "callsite" has been seen before, try to optimize it.
C) if it hasn't been seen before, try to monitor it
D) otherwise, let it go (find_by_sql_without_slim_scrooge)
7) So what is a "callsite"? How do you know if you've been here before? Well, apparently that's what the C extension is for "callsite_hash.c". The query is passed into this black-magic-extension which by some occult method creates a unique key for it (called a callsite_key). This is then stored in that class-level hash in the "Callsites" class.
8)There is logic written in here to pass it through unoptimized if the query is not "scroogable", and there are several conditions that meet that. For one, if there's any joining, it won't bother. Also, if it's not a "select" query (that is, it doesn't start with SELECT, include the expected table name, and have a "FROM" in it). [These were limitations I was unaware of before].
9) The monitoring of a query is done by attaching a MonitoredHash to each row in the first query. This hash maintains a reference to the callsite, and can be configured to not monitor certain columns. Anytime a column is accessed that was previously unseen, the callsite is notified.
10) next time the query is run, the callsite has a record of which columns were used and uses "scrooged_sql()" to only produce a select query for those columns.
Well, this was fun. I feel like I've learned a bit about how my site works under the hood, and a little more qualified to comment on the use of this gem in the future. Here are a few things I learned that are not directly about the slim_scrooge gem:
1) The Mutex class can be used to synchronize access to an object.
2) ActiveRecord appears to direct all queries through the "find_by_sql" method. That's the place to hit it if you want to get in some sort of filtering.
3) C extensions for ruby use an "Init_*" method to integrate themselves into the runtime.
Until next time,