Tuesday, March 9, 2010

New Mini-Gem in the Wild

Ok, here's a problem I've had:

I want to deploy my application to a "cloud" setup rather than my current "slice" infrastructure. This is great, but means I have to refactor out many features that used to make use of the local filesystem because I cannot depend on it. I could spawn new servers at any time in a cloud infrastructure, and there's no GFS joining them. For most features, this is no big deal: store your assets on Amazon S3 and get over it. That's what I've done for user uploads, report files, and all manner of other assets.

The problem came with some of my data updates. We have a process at my company where some of our internal users have a page on our application they go to in order to update our data from non-web-enabled sources (mostly through the use of CSV files, since all our data sources seem to be able to generate those). Currently we store those files on the GFS, than kick off a background job passing the file name and let the background job do whatever processing it needs to do on that file.

It's harder to do on the cloud, though, because you can't just store it locally, our utility server instance that's running our background jobs won't be able to get to it since it's a different filesystem entirely. You could put it in the database, and if I were using MongoDB on this project I probably would, but that's not a habit I want to get into with MySQL.

The patchwork solution we're going forward with for now is ditching the file into s3, passing the key to the background task, and re-downloading the file on that end for processing.

In order to make this process a little more palatable, I've quickly built and released a mini-gem called Cumulus CSV. It just wraps a simple interface around storing an uploaded csv file to S3, and iterating over it later. It's available from my github account, or on gemcutter as "cumulus_csv", so if you're struggling with the same problem see if it will help you out!

No comments: