We've been using the release automation scripts, aka Bootstrap, for the past several releases. We've hit some bumps but overall we've improved quality as we've been pushing changes into the scripts instead of having to document and remember to re-check a large list of "gotchas" whenever we run into problems, or need to add new steps to the release (e.g. DLL/EXE signing). Every release "step" has a set of verification tests, which we've been augmenting and then not having to think about next release.
Repeating the same set of steps every 6-8 weeks sounds pretty terrible; it's short enough so that it feels like you've done it a million times already, and it's just long enough that you get a little fuzzy on the details and have to constantly refer to the documentation. Even worse, you have about a dozen individual files to edit, hundreds of commands to run, and if any of it is incorrect you (generally) need to go back and start over from there. After all, you can't build without a tag, repack without a build, or generate (all of the) updates without repacks.
The part that's exciting, challenging and fun is when you start to break down the big, scary problem into a set of small, manageable problems. Abstract the small problems into discrete steps and automate them, so that you don't need to worry about the individual details every time you use them. You can examine each step separately, optimize it, test it, and try to get as close to absolute consistency as possible. Do paranoid, pedantic tests for correctness that would drive a person mad.
In short, it's basically refactoring and unit testing. When you start doing it after-the-fact there's a high ramp-up cost, but once you've got the ball rolling it starts picking up serious momentum.
The next big hurdle is end-to-end automation. Right now, with the automation and infrastructure as-is, a human has to:
- log into Tag machine, kick off tag script
- log into win32, linux and mac tinderboxes and kick off build script
- verify builds and copy to the candidates directory
- configure l10n, update generation/verification
- kick off l10n build script on win32, linux and mac tinderboxes
- verify l10n builds and copy to the candidates directory
- sign win32 EXEs/DLLs
- log into update machine, kick off patch generator
- log into staging machine, kick off staging script
- turn on test updates
- sign installers
- create bouncer links, push bits to mirrors
- turn on updates
This is not including the huge number of config/version bumps that go along with all of these disparate systems. If you spend a lot of time going over all of these files, it seems pretty obvious that we could be putting in one set of info and generating all of this data.
We're actually now at the point where we can do all the tagging/version bumping automatically, generate Tinderbox mozconfig/tinder-config.pl based on the single bootstrap.cfg, and generate the patcher2 configs (which creates partial updates and configures AUS).
However, we still need to log into the individual machines described above, check out/update the scripts, and run them. Each of these processes generally take between 1 and 4 hours, so having them run back-to-back would not only reduce the total time to do a release (should be fine running all night, or over weekends), it should help to reduce mistakes and eliminate the time-wasting polling that we currently have to do (although bootstrap does support sending email notifications now, so at least it can be event-driven).
We've been looking at Buildbot to help tie this into a seamless process. Buildbot supports both the idea of BuildSets (e.g. win32, linux and mac builders all operating as one pass/fail operation) as well as dependent steps e.g. Tag -> Source -> BuildSet(linux,mac,win32) -> Repack(linux,mac,win32) -> Updates -> Stage.
My original idea for pushing this all together was to send Changes into Buildbot from Bootstrap everytime a new step was ready, but preed looked into it more and realized that buildsets and dependent steps already do what we need. This is great, because it moves us more incrementally from "Human logging into 10 machines to run 1000 commands" to "Human logging into 10 machines to run 1 script" to "Buildbot logging into 10 machines to run 1 script on each", without us having to write any additional code.
Anyway, we've got a lot of other things going on, but I'm really proud of all the work we've done to get this far, and confident that we'll be able to get this across the finish line soon. We've done it so incrementally that I don't feel like we've built this giant cathedral; it's more that we've just broken down our big problem into little bits that we can improve more quickly.