Sunday, February 21, 2010

A New Playground

Tonight I decided it was time to get my skills up to speed again. With all the new ideas and gems being thrown around in this time of swift evolution in the Rails community, I've felt left in the dust as I continue to toll away on an old project using an old version of rails and a set of older support gems.

So, in an effort to become current once again, I've started a new project (which I'll talk about in another post) utilizing the following:

Rails 2.3.5

Yeah, I know Rails 3 is coming out soon, so why bother coming up to this minor version at all? Well, my previous project was on Rails 2.2.3, and I haven't been getting to play with Rack as much as some of the more current developers have. I think that's one of the major modularity improvements that has pushed rails in the direction it's going with 3.0, and I want on-board.

MongoDB

There's a lot of hype around non-SQL databases right now. As an early user of DB4O, my whole developer career has been littered with occasional forays into the non-relational, and although I have no objection to relation databases per-se, I love the idea of schema-less data. MongoDB does this beautifully, and it's very popular in the ruby community right now in no small part thanks to the MongoMapper gem. Having my data as one hash of properties appeals to my sense of simplicity, and the embedded documents and collections are nifty looking. Time will tell if I will end up feeling like it suits my needs on a larger project.

Heroku

Currently, my large startup project runs on EngineYard. We trust them because their customer service is great and they've done an amazing job of helping us along every step of the way. Their new cloud accounts are awesome (we're going to be switching over to that from our old "slice" architecture soon), and they have a track record a mile long of proven excellence with difficult scaling problems. And they cost what they're worth.

For a smaller project, either of a personal nature or one that's going to be bootstrapped with little revenue to kick it off, it's a little pricy. Heroku, however, is very reasonably priced, and has a slew of posts on the ruby blog network right now that are titled along the lines of "How to deploy a crazy-cool-new-application on Heroku in under 30 minutes". Their development account is free, and I want to see what the buzz is about, so I'll be posting how I find the hosting provider after I've played around a bit.

RSpec

OK, this isn't exactly new, but I just haven't had a project yet where I've been willing to start with a totally new testing library that I'm unfamiliar with, so Test::Unit has been my default. RSpec comes highly recommended, though, and claims to make a nice paradigm shift into the BDD world. Will it make a difference? I don't know, but it's popular enough among rubyists that I'd be silly not to give it a try and see how I feel about the different nature of test presentation.

Delayed Job

I've been a BackgroundJob user ever since my first queue necessitating feature came into existance, and it's never let me down with it's straightforward simplicity. However, I've felt on occasion that a fuller feature set would be desirable for some of my more granular queueing needs. Delayed Job is at the top of the list as far as widespread adoption goes, so if I'm going to jump to a fuller library this is the first place to look.

I love playing with new toys, so the weeks ahead promise to be filled with new and exciting blog material chronicling my exploration of that which is new and popular. Enjoy!

Wednesday, February 17, 2010

Time for a change

I've been going as "codeclimber" for a long time with the normal ".blogspot.com" domain, but honestly it's just because it was easier to setup that way and I've been too lazy to go register my freaking name as my domain name. Well, I've finally done it, so from now on "codeclimber.blogspot.com" redirects to "blog.ethanvizitei.com". Watch out, now it's personal!

Thursday, February 11, 2010

Making it Happen

I spend a lot of time trying to make my code good. Maybe too much. I love my codebase like a child. When presented with a problem, my first instinct is to make it somehow fit into what my system already does. Sometimes that's not going to be an easy proposition, though, and in SOME of those cases we need to let the business needs get top priority and just make it happen.

An Example

We have a vendor we ship claims to from time to time, and they use a manual process. Because they are in the public sector, they require a certain format for their claims, which includes a lot of unnecessary Excel formulas because that's what they audit based on (the formulas themselves). Does it matter that we could generate an excel file that has every number they need along the way? No, they must have the FORMULAS in the spreadsheet, or it isn't a valid claim.

Unfortunately for me, our spreadsheet library we use (the "spreadsheet" ruby gem) doesn't have support for formulas yet. So our options were to A) spend some time to work on this gem and add formula support (time we didn't really have). Or B) make the spreadsheet generation a partly manual process on our end as well (Blech, we actually did this for a quarter, generate our numbers and copy them into the claim template one at a time).

This quarter, I went a different route:


Sub Main
  Dim Doc As Object
  Dim InputSheet As Object
  Dim TemplateSheet As Object
 
  SetupVariables(Doc,InputSheet,TemplateSheet,ClaimSheet)
  RowNum = 1
  NameCell = InputSheet.getCellByPosition(2,RowNum)
  Do While NameCell.Type <> com.sun.star.table.CellContentType.EMPTY
    ClaimRow = InputSheet.Rows(RowNum)
    ClaimSheet = Doc.createInstance("com.sun.star.sheet.Spreadsheet")
    Doc.Sheets.insertByName(NameCell.getString(), ClaimSheet)
    CopyTemplateToNewClaim(TemplateSheet,ClaimSheet)
    ProcessClaim(InputSheet,ClaimSheet,1)
    RowNum = RowNum + 1
    NameCell = InputSheet.getCellByPosition(2,RowNum)
  Loop
End Sub

Yep, I used OpenOffice and BASIC and just scripted the damn thing. Ugly? Yes. But it worked, and we don't have to manually calculate our claims. I just generate our spreadsheet that has the claim numbers, and use the macro to copy each row into it's own template sheet in the workbook. It doesn't fit into our rails application at all, but we made it happen, and we're collecting on those claims.

If you're beating your head against the wall in frustration, sometimes it pays to step back and resort to plain old hackery, cause at the end of the day your customers only really care that it works.

Thursday, February 4, 2010

Tricky Little find_in_batches (watch your :select clause)

If you are like me, you have a few background processes that deal with tons of data (reporting, etc). To run these processes, you may use ActiveRecord's "find_in_batches" method which allows you to only pull so many records into memory at a time (a good idea when processing large numbers of records). You may also explicitly use a "select" clause in your AR queries from time to time to only pull in the fields you need for a given purpose. Be wary, as I tripped up over something silly today and you should know about it.

You see, I had something like this:


class SomeModel < ActiveRecord::Base
  named_scope :only_essentials,
              :select=>"some_models.info,some_models.name"
end

class BackgroundProcess
  def run
    SomeModel.only_essentials.
              find_in_batches(:batch_size=>200) do |batch| 
      #....some processing of each record in the batch
    end
  end
end


The problem? It was running way too fast. Over in mere seconds. You might think, "Hey, that's not such a bad problem to have", until you realize that the reason it was running so fast is it was only processing the first 200 records (that is, the first batch).

Why? It's simple if you know how "find_in_batches" works. It uses "order" and "limit" to order your models by primary key and limit the result set. What field did I not include in my named scope? "some_models.id" is correct. Add that field so your reference point is maintained, and everything runs as expected.

Cheers,

~Ethan

Monday, February 1, 2010

Smart quotes and dumb errors

Wow, this one just sucked. :)

I'll make it quick. We have a model in our rails app that captures large strings and then displays an abbreviated version to the user later. For example:

class Narrative < ActiveRecord::Base
  validates_presence_of :text

  def text_in_brief
    return nil if text.nil?
    (text.size > 20) ? (text[0..19] + "...") : text
  end
end


Hopefully that made sense. We want an ellipses to indicate that this value continues on for some time, so if it's longer than 20 characters, just cut it off and add the three periods.

Enter JSON. We sometimes want to send this value to JSON. And every now and then the to_json method blares this exception:

JSON::GeneratorError: source sequence is illegal/malformed

But only very rarely. What the hell is going on here?

Analyzing the data shows that all the strings that cause this explosion have one thing in common - a smart quote as the 19th or 20th character. I mean the one that is actually represented as "\342\200\235". See where this is going?

By splitting off the string at the character level, it's possible to cut off the smart quote somewhere in the middle, because a smart-quote is actually 3 characters long.  This causes an invalid string, which bombs the to_json call.  F@(#!

Quick and Dirty solution?


class Narrative < ActiveRecord::Base
  validates_presence_of :text

  before_save :escape_smart_quotes
  
  def escape_smart_quotes
    self.name.gsub! "\342\200\235", '"' 
  end  

  def text_in_brief
    return nil if text.nil?
    (text.size > 20) ? (text[0..19] + "...") : text
  end
end

Maybe when I've cooled off I'll do this better and extract it to be usable in other models. For now, just be glad I figured it out at all. :)