Simplifying and speeding up the serving of Firefox updates

A discussion on simplifying and improving the process of configuring and serving updates for Mozilla-based applications

For a few months now, I’ve been working in my spare time on a way to make configuring and serving updates to Mozilla-based applications easier.

Mozilla updates are MAR files, which are linked to by the Automatic Update Service_ (aka AUS2). Several tools are involved in the making of updates for production releases, chiefly Patcher (/mozilla/source/tools/patcher/), driven by the release automation framework for releases. Nightly updates use a simpler script which automatically determines where builds should be updated to; Patcher needs every update path to be explicitly specified in it’s config file.

Both Patcher and the nightly script call the update-packaging tools (/mozilla/source/tools/update-packaging/) to do the work of generating MAR files, which in turn use the “mar” utility (supports tar-like arguments to manipulate MAR files, e.g. “mar -t file.mar”, “mar -x file.mar”, etc.) and the “mbsdiff” utility, which generates binary patches using a modified version of bsdiff.

The update-packaging tools are in need of a makeover too, but that is a story for another day.

Getting back to how updates are served - Patcher’s other job is to generate thousands of text files, which are used to configure AUS. Every possible update path, is actually generated dynamically from two text files (partial.txt and complete.txt) which reside in a directory layout that is similar, but in a slightly different order, than the information in that URL (…/product/version/buildid/buildTarget/locale/channel/update.xml). These complete.txt and partial.txt files have gone through two revisions in their file format, in the first variables for the generated XML like updateType, URL to the MAR file, etc. are on a specific line number. In the second (“version=1”), key/value pairs are used.

However, it took me a very long time to get a handle on the above, and I think the separation between Patcher and the AUS server is not very useful. In fact, the method of explicit updates for all is downright unhelpful; every single release (e.g. 2.0.0.15), the following happens:

#. partial updates are generated from 2.0.0.14->2.0.0.15 #. every previous release (2.0.0.[1,2,3,4,…]) is pointed to the same 2.0.0.15 update

That means generating and publishing two text files for each (release * platform * locale) combination, which all contain exactly the same data. Also I think that taking a hint from the way the nightly system works would be useful here; 2.x should automatically point to the latest *unless* explicitly overridden, it should not require explicit configuration to do the norm. Finally, the nightly and production system should not be so different; every nightly update is a lost opportunity to test pre-releases of the production system, and having forked systems is bad for bugfixing and feature porting (note that there are no nightly updates for locales other than en-US, for example).

So, I’ve been thinking for a long time about how to make tools that are easier to use, understand and extend. One idea is to have the AUS server configuration be a database, not a giant tree of text files, and have the data in one place (not stored in a config file which is expanded to a giant tree of text files by a separate app). Another is to provide a simple API, and a few command line tools which use this API to modify update data and export it.

The conceptual model right now is that each release contains one update, which contains two patches (one partial, one complete). Both the database schema and the API reinforce this model.

Here’s what I have working so far. In case it’s not obvious, this is most definitely an early “throw the first one away” prototype:

an API for dealing with updates, in Python (Release, Update, Patch classes)
a simple database layer for storing and retrieving these objects from a MySQL database
an import plugin for AUS2 configuration, and an export plugin to straight update.xml_ files

The schema is based on Lars’ fine work on the subject, although I did modify it slightly. This schema is not totally done yet either, for example foreign keys aren’t actually hooked up, but there’s enough there to see that it works. There’s a run.py command in that directory that calls the importer and exporter correctly.

This means that you can read existing AUS2 data into a database (if you have it), and create or manipulate update information using the API from Python (or directly with SQL, if you like). You can generate update.xml files and put them straight onto a webserver.

What I’ve put together needs quite a lot more work, but I wanted to open it up for comment. Here’s what I think is remaining, at least:

database should hold the history of updates, not just the current state
need a web service which talks directly to the database, as an alternative to pre-generating all update.xml files.
should use existing libs for the DB ORM (SQLAlchemy maybe?), generating XML, etc. not the home-grown things I threw together
I think it would be advantageous to make the model/schema/API more sophisticated and normalized (e.g. updates could belong in multiple channels), but I don’t want to go beyond the essentials quite yet.
the new update-packaging tools should be able to read data from this system in order to automatically determine the appropriate “from” release to base partial MARs on, and also there should be some way to register that new updates are available, that access would be internal and append-only (e.g. only needs SELECT, INSERT).

I think that to solve the first, update paths should be explicitly configured once, but there needs to be business logic in the server app (or update.xml file generator) which overrides this when a newer release is available. For instance, if a user is on version 1.0 and version 1.1 is available which has a partial for 1.0, then the partial 1.0->1.1 should be served. However, if version 1.2 is available, then the complete 1.0->1.2 update should be served.

The second problem has more to do with the burden inherent in handling tens of thousands of text files (e.g. backing them up or restoring them can take a very long time), although I believe that it is useful to have the option to pregenerate the path/update.xml files, especially for people without so many updates as mozilla.org is pushing each release.

Anyway, comments welcome! Certainly feel free to nudge me if it looks like I’m going off the rails here, but I think this approach could make things a little better in update-land. I’ll take patches too, but if anything serious comes of this I’ll probably clean up and move over to Mozilla’s repo, and rewrite a bunch, so don’t take the current implementation too seriously..

Simplifying and speeding up the serving of Firefox updates

Connect