2011-08-23

A herd of Hairy Rhinos

Back in 2007, I was hired by Google and stuffed, with ceremony, into a team comprised of people way more awesome than I could have hoped to be at the time.

But this post isn't about Google - not even close. It's about Mozilla Rhino. You see - My very first team at Google was working on a project built on top of Rhino, creating a Javascript version of Rails. Niftycool Steve Yegge goes into it - in considerable depth - in his blog post: Rhino-On-Rails. I was working on the same project built on top of that, and it was my first real introduction to JavaScript and Server Side JavaScript. It hooked me - writing JavaScript for both client and server side.

The one thing I didn't like was the Rails part. I am definitely not a Rails fan. For one thing, its power is offset by its complexity. For another, that complexity is there to make it easier for people new to web development to make nifty things. I prefer to live much closer to the metal, eschewing the lofty heights of MVC, class/method mapping and ActiveRecord for more power and control.

I love Ruby - don't get me wrong, it's a fantastic language. But where it sorely lacks is _implementation_. Ruby 1.8 and 1.9 aren't really ready for production. JRuby is making great strides, now. MacRuby is awesome, but not fully ruby. IronRuby I haven't touched, as I don't do .NET.

Also, JavaScript is just plain fun, too. It's pretty much as powerful and niftycool as Ruby: It was, after all, intended to be a lisp dialect before nosy non-developers said: "Make it like that new Java language!"

So at home that summer, I spent time fiddling with Mozilla Rhino and Java Jetty. One super nifty feature about Rhino: You can create separate, sandboxed JavaScript "scopes". So I figured that's gotta be useful. I created a platform using virtual hosting to redirect HTTP requests to a number of JavaScript scopes, each one "containing" a website in itself.

I added a number of other features:

  • Built-in SQL support: Create a file named queries/find_movies containing "SELECT * FROM movies WHERE movie_name = :name", and use DB.find_movies({name: 'Toy Story 3.14'}) to receive an array of objects: [{movie_name: "Toy Story 3.14", creator: "Pi-xar", length: "159.2"}] as result. Yes, I much prefer writing raw SQL over dealing with ORMs. I know that's not the case for everyone, but it is for me.
  • Template engine. It converts a template into a javascript function, compiles it to JVM bytecode, and thus makes for super-fast templates.
  • A java-based HTTPRequest, so the server side javascript can make JSON calls to remote hosts, too!
  • And finally, a knot of functions to tie them all together: A web framework. Web.add_page(name, path_regexp, function(request, response) { return render('hello.html', { name: 'world' }) ; });


Fast forward about a year and a half later, and I start DeafCode LLC with some friends.  Our first attempt to create Captionfish is on Ruby on Rails. At first, all seems well, but it gets really slow. (Keep in mind, this was back in the days of 1.8.2, and we were running on a minimal VPS). There's a _lot_ of overhead in ActiveRecord wasting time, Ruby 1.8.2 isn't all that fast nor production ready. Rails abstraction is jumping all over the place. So we look for solutions: Mongrel was a small speedup. Making a cluster was another speedup, but we can't do a massive cluster since it's a small VPS. Then we do some benchmarking.

Using apache bench and a page that runs one sql query and simply returns its results run through a template, the cluster of 5 Ruby on Rails instances averaged 30 pages per second. And it was taking a considerable amount of resources to run those 5 in parallel.

Then for comparison, I dust off Hairy Rhino and make do the exact same thing. While it took the resources of two instances of RoR, it had a throughput that apache bench reported was 80 per second. We thought it was super fast until we realized that apache bench reported it was getting 1Mbit/sec. Which was, interestingly enough, the upload limit for our little VPS. Ran apache-bench locally, and it soared up to 120/sec.

It took us _one_ night to convert the entirety of Captionfish (Which was, admittedly, much smaller then than it is now) to HR, and all of our speed and memory woes went away.

Then we converted two other sites, also using RoR, onto HR. With a single instance using two times the resources of a single RoR instance, HR was able to serve three different websites at considerably faster speed. (At the time of this writing, the sole HR instance on the DeafCode production VPS is serving 7 different websites, with network throughput still being the choke point.)

The JVM is a really nifty piece of work. And so is Mozilla Rhino. A big, hearty thanks to JVM developers everywhere and to the Mozilla Rhino team.

A side note: node.js is making great strides in the server side, too. But just like Ruby, v8 is single threaded, and node.js is designed to operate in parallel as a cluster. Mozilla Rhino has two big advantages over it: The JVM is natively multithreaded, and strongly so. Mozilla Rhino has dirt simple access to any of the thousands of packages written for Java.

HairyRhino is available at github: https://github.com/captdeaf/HairyRhino - I keep meaning to get around to writing basic documentation for it, but have yet to feel motivated enough to. :D.

2 comments:

  1. Great work on CaptionFish and on YouTube captions!

    Can I find out more about timing and synchronizing captions with visual content or frames? Is this info available?

    thanks,
    Greg Rice
    DeafAccessFilms.com

    ReplyDelete
    Replies
    1. Hi Greg -

      It really depends on what information you want to know. From scanline 21 for broadcast cable to simple .srt files for videoplayers. Standalone formats or encoded into video files, etc.

      What would you like to know?

      Delete