How Etsy Helped us Deploy

A Checkered Past

When our company was founded – long before the advent of GitHub or other cloud-based social coding hubs – we used to deploy our new code to the website by copying files over the network. And in ten years or so, not much changed.

We ‘upgraded’ the process to allow us to upload more than one file at a time, but we still had to choose each file from a list of thousands, and you can bet occasionally we made mistakes. When we added more servers, we added functionality to upload to each in parallel, but we still found we sometimes missed files, or upload dependencies in the wrong order. This nagged at us as a team – but we focused on other – more important – issues, such as scaling and new features.

We accepted this as the status quo until a few years ago, when we read about how Etsy deploy using their Deployinator application. We realised then that we could make our deployment process better – way better. We too could have a ‘one click’ deploy! This is the short story of how we built an application to do it.

Getting our ducks in a row with Mercurial

The first thing we decided to do was move from SVN to Mercurial. This was something we had in the pipeline anyway as we wanted to leverge more of the DVCS features such as easy branching. Once we made that step, we created one golden rule: anything pushed to master could go live at any time.

We knew the next step: we would purge all the existing code on our servers; all the code that we had uploaded but had no easy way to delete; all the code uploaded to the incorrect location. We would then clone each web server from the clean main Mercurial repository. But before we did that, we had to have an automatic and fast way to deploy a given changeset to each server. So we took a step back to think about how we were going to build a web application to allow us ‘click to deploy’.

Instaploy

Since, as an organisation, we are moving from ColdFusion to JRuby, we decided to build a Sinatra web application using some custom Mercurial command line wrapper classes to serve as our deployment application. We used Twitter Bootstrap to make it look pretty. But we needed a name for the application, and thus Instaploy was born – a web application that deploys instantly(ish).

Instaploy works as follows:

We break down our applications into ‘stacks’.
Each stack works from a different Mercurial repository, and Instaploy has a set of local repositories cloned from master so it has the big picture.
Each stack has different pre-deploy and post-deploy hooks, or ‘house-keeping tasks’ as we called them.
When someone wants to deploy their code they:
- Push to master (default branch)
- Log into Instaploy
- They then see the last change deployed, and any pending depoyments; we just keep a list of deployments completed in a database table.
- They click to deploy their changeset and they see a summary of their changes (files changed etc.) and a button asking them if they want to ‘lock’ the stack.
- The application also determines from code paths in the changeset if there is any housekeeping tasks required post-deploy, such as application reloads for important cached data etc. These promises – to be fullfilled post-deploy – are displayed to the user.
- If they choose to proceed the stack is locked, thus preventing anyone else from deploying until they are done.
- During this period, they are able to upload any dependencies their code might require, such as database changes etc.
- Once they have satisified any dependencies they click the ‘Deploy Now’ button and after about 30 seconds of waiting, their changesets are ‘pushed’ to all production machines in the cluster.

PsExec

Sounds simple, but it is complicated by the fact that we are working in a Windows environment. Usually a method to undertake the actual deployment on *nix systems is to use an automation tool to SSH onto each machine and pull the new code in from the master repoistory. This doesn’t work for us, so we needed an alternative. We ended up using PsExec – a Windows tool from the SysInternals suite – to execute a remote process on each server. This process simply executes a batch script which:

Starts cmd.exe to get a terminal session
Changes to the webroot
Runs hg pull -r <changeset_id> to get the deployed change into the repository on the server.
Runs hg up -r <changeset_id> to get the working copy reflecting the same change.
Then executes any other required task on the remote server, such as touching deployment descriptors etc.

We’re still honing the process, but we are deploying to ColdFusion and JRuby stacks at the moment, each of which require different post-deploy tasks. Overall it’s a made a massive difference to how the team works, and deployment is no longer the error-prone chore it used to be.

It’s pretty much all good

We know exactly what code is actually on the live servers, thus negating the problem of long-forgotten code that might be a security risk.
Know exactly who deployed it, when, and what was in it by simply checking in Mercurial for the deployed changeset ID.
We can rollback to any previous revision at the click of a button.
We can deploy faster and more accurately than ever before – a real benefit when reacting to emergencies at 3am.

So a big thanks to Etsy for showing us the way, it just shows the amazing benefit of sharing ideas like this with the broader development community. Long may it continue.

4 Responses to How Etsy Helped us Deploy

Kris says:

May 11, 2012 at 12:30 pm

I read the Etsy articles on their deployinator also. It has me thinking about how we can change our deployment process as well. The one outstanding thing that is missing some details for me is handling those items during the lock — database changes, other dependencies to upload bit. Makes it sound like this is still manual intervention.

As I’ve tried to map out our deployment process, this is the bit that always gets stuck – how are you handling database schema and data changes? Strictly via script? For roll-back, do you store a reverse engineering script for removing these types of changes? Still doesn’t quite sounds like a super-quick rollback to me, and I really am curious how other folks are handling it.

Thanks for these continued posts on the changes in your shop.

- ciaranarcher says:
  
  May 11, 2012 at 8:13 pm
  
  Hi Kris
  
  I know what you mean – all these articles never mention the _hard_ stuff, and I’d put database version management into that category 🙂
  
  Deploying dependencies *is* still manual for us. We need to run some SQL against the database, to add a new column to a table or to add a new table, so we do it when we have the stack locked.
  
  Regarding rollback: the thing is that we don’t often have to rollback, and if you do, usually we don’t have to roll back the database changes, as the old code will simply not reference the new table/column anymore. Of course if you are renaming columns etc. then it gets trickier, but we are just trying to cater for the common cases.
  
  In my opinion it is difficult to come up with an agile process that is completely automated. Instead try to cater for the common use case, and give the flexibility for the unusual cases. We’re not a big company so it works for us.
  
  I hope that makes sense!
  
Michael says:

May 19, 2012 at 5:52 pm

You say: “Once testing is complete, the developer can create a code review from the diff between the feature branch and the default branch (having of course merged default into the feature branch first).”

Isn’t merging the default branch into a feature branch considered bad form? I thought the appropriate way of dealing with this was to rebase the feature branch on the tip of default before committing to the main branch (to avoid overly complex graphs)? I know this is more of a git philosophy, but am recently trying to develop a compatible workflow with this idea – developers work from private forks which they rebase before code-review – and would appreciate any arguments about why merging the default branch in might be preferable.

ciaranarcher says:

May 20, 2012 at 12:52 pm

Hi Michael – the only reason we merge in default (which is like master in git) is to avoid doing a code review (i.e. a diff) that might show changes by others. Of course this is only relevant if you pull from the origin repository after you start your feature branch. As for git rebase, I’m not sure there is a direct equivalent in Mercurial. There is some discussion about it here: http://stackoverflow.com/questions/2672351/hg-how-to-do-a-rebase-like-gits-rebase – I hope that helps.