Tuesday, November 25, 2008

Background Jobs with BackgroundJob

Most web-applications have some things that need to happen that just can't be done within the confines of a single web-request. Database maintenance is a good example of this; maybe you need to send automated and personalized emails to every user of your application, or generate a large PDF file. Sometimes these things can be picked up by cron jobs (if they are recurring and happen the same way every time), and that's fine. Other times, though, the task you want to accomplish is triggered by a user of the web app, so it could happen any time, and a cron job won't cut it. Let me give you an example: in our startup, as I mentioned in my last blog post, we have to give out large reports on our the data our clients have stored on our system. If you read my last post, you know that we're doing this now with simple excel files. The problem is, if our client wants to run a report over six months plus of data, it will take 15-90 seconds to generate depending on how complex the report is. Obviously, that's not something I want one of my mongrels blocking on, just hanging out not servicing any other requests for more than a minute. All it would take would be a few long reports being run at the same time to drastically bottleneck the application. So how can we delegate these tasks to be done in the background so they don't apply load to the web-server?

Simple Background with Rake

The first thing I tried for this sort of thing was taken directly from one of Ryan Bates excellent "RailsCasts" (Episode 127 to be precise): I just put everything I needed delegated into rake tasks and had a method like this in the application controller:


def call_rake(task,options={})
options[:rails_env] = RAILS_ENV
args = options.map{|n,v| "#{n.to_s.upcase}='#{v}'"}
system "rake #{task} #{args.join(' ')} &"
end


Easy enough, right? Now any controller action can just "call_rake" with the name of the task they want to run and the arguments they want to pass along to it. The ampersand in the system call makes it a forked process, so it works out well.

Now, this worked fine as long as the only things I wanted to run were administrative tasks on my end. Why? Because you can see how if 500 users were to call an action that used "call_rake" at around the same time then I would have 500 background processes spawned. Not good, right?

Better Background with BackgroundJob
Enter some other options: I was pretty familiar with "BackgrounDRb", and I'd heard about "Starling" on another Railscast, so I wrote to our awesome hosting provider (EngineYard) to ask which they preferred dealing with for their clients. The guy I talked to pointed out that they had plenty of experience with both, but that I might want to check out another library called BackgroundJob since that was the one they'd had the least trouble with. I did check it out, and the simplicity is compelling.

To sum it up in brief, a process hangs out on your server and queries this table from time to time. The table ("bj_jobs") is essentially a queue of jobs to be executed. You just submit jobs you want to be run in the background as a string system call, and it takes care of processing the jobs in the table one at a time. Here's the example of the cron job from BackgroundJob's docs:


cmd = bj run --forever \\
--rails_env=development \\
--rails_root=/Users/ahoward/rails_root

*/15 * * * * $cmd


Now commonly BJ is used to run worker code that you write in a new "jobs" directory with "script/runner". That's a good pattern, I think, but for me I already had all these rake tasks I was using for background stuff. So what did I do to my application controller?


def call_rake(task,options={})
options[:rails_env] = RAILS_ENV
args = options.map{|n,v| "#{n.to_s.upcase}='#{v}'"}
Bj.submit "rake #{task} #{args.join(' ')}"
end


Almost nothing. I just changed "system" to "Bj.submit" and, then all the rake tasks I was already using to run background jobs, plus any new ones I wanted to write for users (like generating big reports) can be called the same way they were before. After a week now of having BackgroundJob in production, I'm pretty happy with it, and I'd highly recommend giving it a try to see if it fits your needs before you mess with anything more complex.

Saturday, November 22, 2008

Reporting in Excel with Rails

At my startup, we recently have run into a problem. The deal is, our customers have large amounts of data stored on our servers, and they want a large array of reports to access it. That's no big deal in and of itself, but the problem is that in some cases they want a report that spans a years worth of data (around 1,000,000 records of data), and they want to be able to sort, arrange, perform calculations on, and graph all of these data points. So my first thought was that we were going to be in for a lot of really unpleasent javascripting, but it turns out there is a better way. What could we leverage that already can take in data and do all those things that I mentioned above? Spreadsheet programs, of course, Microsoft Excel being the most common (I use OpenOffice myself, but it can open .xls files too). So I went hunting for a library to help me produce Excel files. As it so happens, there's already a great library in the works called (unsuprisingly) spreadsheet. It's available on rubyforge as a Rails plugin, and for me it's getting the job done. [NOTE: Requires "ruby-ole" gem]

So here's what I started with, after reading the docs for the spreadsheet library. Basically this is the creation of one report:


require 'spreadsheet'

class Reporter

public

class << self

def report_all_transactions(district)
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet
sheet.row(1).concat ["ALL TRANSACTIONS BY THERAPIST"]
index = 2
district.therapists.each do |t|
sheet.row(index).concat [t.full_name]
index += 1
sheet.row(index).concat ["ID","Date","Start",
"Stop","Status","Location"]
index += 1
district.transactions.each do |trans|
sheet.row(index).concat [trans.id,trans.date,
trans.start,trans.stop,
trans.status,trans.school.name]
index += 1
end
index += 2
end
book.write "#{RAILS_ROOT}/public/data/all_transactions_report.xls"
end

end
end


It's not bad on the whole, but it's a little wordy, and the one thing that REALLY bothered me about it was that index tracking I was having to do and the repetitive nature of the whole "add some data, increment the index" thing.

So, in order to make this spreadsheet library more pleasant to work with, I created the class below:


class SheetWrapper

attr_accessor :index

def initialize(worksheet)
@sheet = worksheet
@index = 0
end

def add_row(array)
@sheet.row(@index).concat array
@index += 1
end

def add_lines(count)
@index += count
end
end


It's pretty simple, just a couple methods, but look what it does to that first code example I showed:


require 'spreadsheet'

class Reporter

public

class << self

def report_all_transactions_by_date(district)
book = Spreadsheet::Workbook.new
sheet = SheetWrapper.new(book.create_worksheet)
sheet.add_row ["ALL TRANSACTIONS BY THERAPIST"]
sheet.add_lines(1)
district.therapists.each do |t|
sheet.add_row [t.full_name]
sheet.add_row ["ID","Date","Start",
"Stop","Status","Location"]
district.transactions.each do |trans|
sheet.add_row [trans.id,trans.date,
trans.start,trans.stop,
trans.status,trans.school.name]
end
sheet.add_lines(2)
end
book.write "#{RAILS_ROOT}/public/data/all_transactions_report.xls"
end

end
end


Better already. One more problem that I had, though, was that all my report methods were kind of following the same pattern of opening a work book, adding some data, and then writing it out to the data directory. In order to reduce that pattern down to one chunk of reusable code, I extracted the "run_report" method as follows:


require 'spreadsheet'

class Reporter

public

class << self

def report_all_transactions_by_date(district)
filename = "all_transactions"
self.run_report(filename,"ALL TRANSACTIONS") do |sheet|
district.therapists.each do |t|
sheet.add_row [t.full_name]
sheet.add_row ["ID","Date","Start","Stop","Status","Location"]
district.transactions.each do |trans|
sheet.add_row [trans.id,trans.date,
trans.start,trans.stop,
trans.status,trans.school.name]
end
sheet.add_lines 2
end
end
end

end

protected
class << self

def run_report(file_name,title,header_row = nil)
name = "#{file_name}.xls"
book = Spreadsheet::Workbook.new
sheet = SheetWrapper.new(book.create_worksheet)
sheet.add_row [title]
sheet.index = 2
sheet.add_row header_row if header_row
yield(sheet)
book.write "#{RAILS_ROOT}/public/data/#{name}"
end

end
end


And that's pretty much it. Now all the large reports can be generated as Excel files, and the users are happy because they can play with that data anyway they need to in order to extract the information they're looking for (and we didn't have to spend a month writing features that spreadsheet programs already have into our web app).

Friday, November 14, 2008

Keeping those users in line

So our business has been running for about six months now, we've just started to turn the corner towards revenue, and I've got a big lesson learned that I think every developer/entrepreneur should be thinking about from day one:
Users are fickle, impatient, self-important, cantankerous, destructive monsters who will rape your software if you give them the chance

Did that sound a little harsh? Well, it was intended to, but not because I think users are terrible people. On the contrary, users are almost always normal people with normal personalities; the problem is that it's too easy for us as software developers who know the codebase well to passively feel like all users have that same depth of knowledge and experience, which can easily lead to letting your guard down when you should be coding defensively. In the sentence above, I'm not talking about the way users are objectively, but the way we perceive them to be if we superimpose our knowledge of the code and the technology on top of them.

Fickle
You've put your heart and soul into this product. Late nights, long hours, sacrificing in other areas of your life to keep the business moving forward. You can think back through every incremental improvement and marvel at how far you've come. It's a shock when a user sees some teensy-tiny thing that you were planning on getting around to fixing one of these days, and they just up and leave, deciding they just can't take it. What the hell is up with that? What's one little glitch compared to all the great things about this software? It's a hard reality, but the fact of the matter is people EXPECT something they're paying for to work right. If the entire product works perfectly, than you have done exactly what they EXPECT, and no one is impressed by what they already expected. So any glitch however minor, means that your product actually comes in below their expectations. And what happens then? Well, when was the last time you went back to a restaurant after previously being given food that just didn't even meet your baseline expectations?

Impatient

I cut my software development teeth in an agile shop, so one of the things that was pounded into my head was the avoidance of premature optimization: make it work, make it clean, speed it up only if speed is a problem. This brings me to the number one problem I've caused in our startup's codebase so far: long-running actions. As a developer I know what a large amount of data is being queried, processed, and rendered, so I'll give it four or five seconds to generate one of the bigger reports. Nontechnical users? *Click*...*Click click*....*click click click*...*clickclickclickclickclickcli-"Man, this thing sucks!" (By this time they've generated and queued up so many long running requests that the cluster has to be restarted). They expect an immediate response, otherwise how will they know that the action they took caused anything to happen at all? (progress bars, waiting logos, optimized code, and sanity limits on request size are all possible remedies, the point is to do SOMETHING).

self-important

You have over 100 open items on your development list, you constantly have to be watching the live performance monitoring in case of problems, and on top of this some random user emails you saying that they're having trouble with something. They need to do this one thing, but they don't know how. Ignore them for half an hour while you finish up a development task, and by that time they've already told everyone they know that you have the worst customer service of any company they've ever run into and that they can't work with a company like yours. What makes them think that their time is so much more important than yours? After all, you're the one who's RUNNING this machine, you've got a million more important things to do than answer customer emails! News Flash: regular user can quite often be scared or intimidated by technology. If they don't know how to do something in your system, they probably won't just click around until they figure it out. They're going to try to get a hold of you, and if they can't then they'll be at a self-imposed standstill with your software, unable to make any progress and getting more frustrated by the minute. I wrote about this in a previous post, customer service is KING, and I can guarantee you that there is NOTHING more important than answering customer's emails, no matter how pressed for time you feel.

cantankerous

Most users are at least pleasant when they contact you; some are out for blood. No matter how nice you are, how quickly they solve your problem, they're angry and they're going to take it out on your sorry head. What gives them that right? Just because they're a customer doesn't give them permission to discard any semblance of decency when talking with you! OK, on this one, you may be right to an extent: users don't have the right to treat you like garbage. However, the fact is they're going to. Some understanding will be required on your part. Maybe (like in the case of our software startup) the company they work for mandated that they have to use your software, they don't like computers, and they can't yell at their boss because they'll get fired. Who does that leave in the crosshairs? You, of course. Maybe there are other circumstances you don't know about. Whatever the reason, the only thing you can do is take a deep breath remember a few things: 1) It's not personal, you're anonymous to them, and there's no point in getting worked up over it 2) Two wrongs don't make a right; no matter how your are treated you should respond by endeavoring to be as polite and helpful as possible 3) You can't please everyone. Deal.

destructive

My email box always gets a few gems every day of errors that got thrown by our system. Somebody tried to logon with no password, it threw an exception. Somebody tried to run a report from "This Year" date to "Blank" date (yeah, they really did type text instead of a date). Somebody picked a start date that came AFTER the end date of their report parameters. Yes, from our armchairs it appears that these people must be deliberately malicious or abysmally stupid. Remember, they didn't write the software, they don't know what does what to start with, and the fact of the matter is it's your job as a developer to make sure this kind of crap doesn't happen (I include myself in that "your job" part, as I've made plenty of mistakes along these lines). If your interface is difficult to understand, or you haven't prepared for some strange combination of horridly invalid input, it's not the users fault, it's yours for not having a product that has a clear workflow and good input scrubbing. You can't plan for all of the strange things that will occur, but you can make a habit out of fixing them as soon as they do.

I guess the point here was to throw a light-hearted spin on the fact that users can and will do pretty much anything to what you put in front of them, and it takes a big amount of empathy and intellect to be able to successfully put yourself in their position (both as a coder and as a customer support rep, because in a startup you'll be doing both). It's not fun or easy, but it is important, and your business will be better off for it.

Wednesday, November 12, 2008

Revenue, sweet Revenue

This week our startup turned a serious corner, as we managed to actually MAKE MONEY! That's right, months of effort have finally paid off as we have earned our first $100. Admittedly, this is a very small sum of money, but you have to start somewhere, and there's something exciting about knowing that we won't just be plunging into the open maw of insolvency forever. Now if we can just make this joyous event a frequently recurring situation, we might just make a business out of this little startup yet.