Released: SnowMaker – a unique id generator for Azure (or any other cloud hosting environment)

What it solves

Imagine you’re building an e-commerce site on Azure.

You need to generate order numbers, and they absolutely must be unique.

A few options come to mind initially:

  • Let SQL Azure generate the numbers for you. The downside to this approach is that you’re now serializing all of your writes down to a single thread, and throwing away all of the possible benefits from something like a queuing architecture. (Sidenote: on my current project we’re using a NoSQL graph DB with eventual consistency between nodes, so this wouldn’t work for us anyway.)
  • Use a GUID. These are far from human friendly. Seriously, can you imagine seeing an order form with a GUID on the top?
  • Prefix numbers with some form of machine specific identifier. This now requires some way to uniquely identify each node, which isn’t very cloud-like.

As you can see, this gets complex quickly.

SnowMaker is here to help.

What it does

SnowMaker generates unique ids for you in a highly distributed and highly performant way.

  • Ids are guaranteed to be unique, even if your web/worker role crashes.
  • Ids are longs (and thus human readable).
  • It requires absolutely no node-specific configuration.
  • Most id generation doesn’t even require any off-box communication.

How to get it

Library: Install-Package SnowMaker

(if you’re not using NuGet already, start today)

Source code: hg.tath.am/snowmaker or github.com/tathamoddie/snowmaker

How to use it

var generator = new UniqueIdGenerator(cloudStorageAccount);
var orderNumber = generator.NextId("orderNumbers");

The only caveat not shown here is that you need to take responsibility for the lifecycle of the generator. You should only have one instance of the generator per app domain. This can easily be done via an IoC container or a basic singleton. (Multiple instances still won’t generate duplicates, you’ll just see wasted ids and reduced performance.) Don’t create a new instance every time you want an id.

Other interesting tidbits

The name is inspired by Twitter’s id generator, snowflake. (Theirs is more scalable because it is completely distributed, but in doing so it requires node-specific configuration.)

Typical id generation doesn’t even use any locks, let alone off-box communication. It will only lock and talk to blob storage when the id pool has been exhausted. You can control how often this happens by tweaking the batch size (a property on the generator). For example, if you are generating 200 order ids per server per second, set the batch size to 2000 and it’ll only lock every 10 seconds.

Node synchronisation is done via Azure blob storage. Other than that, it can run anywhere. You could quite easily use this library from AppHarbor or on premise hosting too, you’d just wear the cost of slightly higher latency when acquiring new ids batches.

The data persistence is swappable. Feel free to build your own against S3, Ninefold Storage, or any other blob storage API you can dream up.

The original architecture and code came from an excellent MSDN article by Josh Twist. We’ve brushed it off, packaged it up for NuGet and made it production ready.

Under the covers

SnowMaker allocates batches of ids to each running instance. Azure Blob Storage is used to coordinate these batches. It’s particularly good for this because it has optimistic concurrency checks supported via standard HTTP headers. At a persistence level, we just create a small text file for each id scope. (eg, the contents of /unique-ids/some-id-scope would just be “4”.)

One issue worth noting is that not all ids will always be used. Once a batch is checked out, none of the ids in it can ever be reallocated by SnowMaker. If a batch is checked out, only one id is used, then the process terminates, the remaining ids in that batch will be lost forever.

Here’s a sequence diagram for one client:

SequenceDiagram

Here’s a more complex sequence diagram that shows two clients interacting with the store, each using a different batch size:

Multiple clients

Twavatar – coming to a NuGet server near you

Yet another little micro-library designed to do one thing, and do it well:

twavatar.codeplex.com

Install-Package twavatar

I’ve recently been working on a personal project that lets me bookmark physical places.

To avoid having to build any of the authentication infrastructure, I decided to build on top of Twitter’s identity ecosystem. Any user on my system has a one-to-one mapping back to a Twitter account. Twitter get to deal with all the infrastructure around sign ups, forgotten passwords and so forth. I get to focus on features.

The other benefit I get is being able to easily grab an avatar image and display it on the ‘mark’ page like this:

image

(Sidenote: You might also notice why I recently built relativetime and crockford-base32.)

Well, it turns out that grabbing somebody’s Twitter avatar isn’t actually as easy as one might hope. The images are stored on Amazon S3 under a URL structure that requires you to know the user’s Twitter Id (the numeric one) and the original file name of the image they uploaded. To throw another spanner in the works, if the user uploads a new profile image, the URL changes and the old one stops working.

For most Twitter clients this isn’t an issue because the image URL is returned as part of the JSON blob for each status. In our case, it’s a bit annoying though.

Joe Stump set out to solve this problem by launching tweetimag.es. This service lets you use a nice URL like http://img.tweetimag.es/i/tathamoddie_n and let them worry about all the plumbing to make it work. Thanks Joe!

There’s a risk though … This is a free service, with no guarantees about its longevity. As such, I didn’t want to hardcode too many dependencies on it into my website.

This is where we introduce Twavatar. Here’s what my MVC view looks like:

 @Html.TwitterAvatar(Model.OwnerHandle) 

Ain’t that pretty?

We can also ask for a specific size:

 @Html.TwitterAvatar(Model.OwnerHandle, Twavatar.Size.Bigger) 

The big advantage here is that if / when tweetimag.es disappears, I can just push an updated version of Twavatar to NuGet and everybody’s site can keep working. We’ve cleanly isolated the current implementation into its own library.

It’s scenarios like this where NuGet really shines.

Update 1: Paul Jenkins pointed out a reasonably sane API endpoint offered by Twitter in the form of http://api.twitter.com/1/users/profile_image/tathamoddie?size=bigger. There are two problems with this API. First up, it issues a 302 redirect to the image resource rather than returning the data itself. This adds an extra DNS resolution and HTTP round trip to the page load. Second, the documentation for it states that it “must not be used as the image source URL presented to users of your application” (complete with the bold). To meet this requirement you’d need to call it from your application server-side, implement your own caching and so forth.

The tweetimag.es service most likely uses this API under the covers, but they do a good job of abstracting all the mess away from us. If the tweetimag.es service was ever to be discontinued, I imagine I’d update Twavatar to use this API directly.