rhelmer's blog - mozillahttps://www.rhelmer.org/blog/2019-06-15T12:22:00-07:00Vectiv and the Browser Monoculture2019-06-15T12:22:00-07:002019-06-15T12:22:00-07:00Robert Helmertag:www.rhelmer.org,2019-06-15:/blog/vectiv-and-the-browser-monoculture.html<p>So, so tired of the "hot take" that having a single browser engine implementation is good, and there is no value to having multiple implementations of a standard. I have a little story to tell about this.</p>
<p>In the late 90s, I worked for a company called Vectiv. There isn't …</p><p>So, so tired of the "hot take" that having a single browser engine implementation is good, and there is no value to having multiple implementations of a standard. I have a little story to tell about this.</p>
<p>In the late 90s, I worked for a company called Vectiv. There isn't much info on the web (the name has been used by other companies in the meantime), <a class="reference external" href="https://siteselection.com/ssinsider/webpick/wp010716.htm">this old press release</a> is one of the few I can find.</p>
<p>Vectiv was a web-based service for commercial real estate departments doing site selection. This was pretty revolutionary at the time, as the state-of-the-art for most of these was to buy a bunch of paper maps and put them up on the walls, using push-pins to keep track of current and possible store locations.</p>
<p>The story of Vectiv is interesting on its own, but the relevant bit to this story is that it was written for and tested exclusively in IE 5.5 for Windows, as was the style at the time. The once-dominant Netscape browser had plummeted to negligible market share, and was struggling to rewrite Netscape 6 to be based on the open-source Mozilla Suite.</p>
<p>Around this time, Apple was starting to have a resurgence. Steve Jobs had returned, and the candy-colored iMac was proving to be successful. Apple was planning to launch official stores, and the head of their real estate department was a board member of Vectiv, so we managed to land our first deal - a pilot project with Apple's nascent real estate department.</p>
<p>We picked up a few iMacs around the office for testing, and immediately hit a snag - Steve had ordered that everyone in the company, real estate dept included, has to use the new Mac OS X. The iMacs that the dept used (and that we tested on) were pretty slow, but serviceable. The real snag was that our product didn't really work on IE for Mac. Like, at all. Pages wouldn't load, and the browser would consistently crash on certain pages.</p>
<p>This was before Safari and its Webkit engine, We started debugging and rewriting bits of the product, and simultaneously talking to Microsoft about our problems. They were responsive, and hopeful the upcoming update would fix some of our problems. Sadly, there were to be no further updates for IE 5 for Mac.</p>
<p>I was something on a Unix fanboy at the time, and had been using early releases of Mozilla Suite on my Solaris workstation, so I knew that our product basically worked with some rough edges (mostly minor things like CSS, with a few less trivial problems around divergent web standards.)</p>
<p>Long story short, our QA manager and myself visited Apple's real estate and test folks, and we settled on using Mozilla 0.6 for the pilot, and corresponding Netscape 6 when it was released (I think we ended up using Netscape 7.1, which I recall being a lot more usable, being based on Mozilla 1.4)</p>
<p>Vectiv had other clients like Dollartree and Quiznos, but getting over that initial pilot hurdle was key to proving that our product worked and had backing from a known brand. Vectiv was VC backed and like many startups caught up in the dot-com crash ran out of runway, although the product was sold and did live on. I did a few consulting gigs setting up local installs for the remaining clients.</p>
<p>Most people reading this probably know the rest of the story - IE stagnated, AOL pulled the plug on Netscape, and Mozilla Suite was reborn as the Firefox browser. With MS moving to Google Chrome's Blink browser engine, Mozilla Firefox's Gecko engine along with Apple Safari's Webkit are the only independent implementations of the various web standards.</p>
<p>(Blink is technically a fork of Webkit, but IE and Netscape were ultimately forks of NCSA Mosaic, I think it's fair to call it independent at this point.)</p>
<p>To be clear: having multiple browser engines didn't ultimately save Vectiv, but Firefox did open the door for Safari and Chrome, as Firefox's Firebug (the predecessor of today's integrated devtools) enticed web developers enough that they made their sites more standards-compliant just so they could have access to nice devtools.</p>
<p>It's easy for me to write a nice narrative of the past, complete with the moral of the story. The future isn't totally certain, but it's clear that the web will continue to play a large role in the world. Let's not (again) back ourselves into a corner and cede all meaningful control over that future.</p>
A new owner for add-ons manager2017-11-15T12:58:00-08:002017-11-15T12:58:00-08:00Robert Helmertag:www.rhelmer.org,2017-11-15:/blog/a-new-owner-for-add-ons-manager.html<p>A little over a year ago, Mossop <a class="reference external" href="https://www.oxymoronical.com/blog/2016/08/A-new-owner-for-the-add-ons-manager">announced a change of ownership</a> of
the add-ons manager.</p>
<p>I have been honored to set direction and work on such an important part
of Firefox, and proud of the work I've done. A big part of this was to
help teams go faster …</p><p>A little over a year ago, Mossop <a class="reference external" href="https://www.oxymoronical.com/blog/2016/08/A-new-owner-for-the-add-ons-manager">announced a change of ownership</a> of
the add-ons manager.</p>
<p>I have been honored to set direction and work on such an important part
of Firefox, and proud of the work I've done. A big part of this was to
help teams go faster in delivering their work to users, and there was
also quite a bit of performance work and review for 57 as well as
changes to better support WebExtensions.</p>
<p>Over this time, it's become clear that the WebExtensions team is more
than equipped to handle ownership of the add-ons manager itself.</p>
<p>In particular, Andrew Swan has been instrumental in setting and communicating
technical direction as well as contributing code and reviews. The add-ons
manager isn't really its own official module as such, but I believe that
Andrew has shown leadership here and would like to publicly pass the torch.</p>
<p>Kris Maglione has also been doing quite exceptional work here, so I think
either of them should be able to take a vacation (but not together) and not
leave a vacuum of technical leadership.</p>
<p>As Mossop did before me, I am going to be updating the suggested reviewers
in Bugzilla to be aswan and kmag, with me as a last resort.</p>
<p>Please join me in congratulating Andrew and sending him all of your add-on
manager related questions!</p>
about:addons in React2016-11-30T12:34:00-08:002016-11-30T12:34:00-08:00Robert Helmertag:www.rhelmer.org,2016-11-30:/blog/aboutaddons-in-react.html<p>While working on tracking down some tricky UI bugs in <cite>about:addons</cite>, I wondered
what it would look like to rewrite it using web technologies. I've been
meaning to learn React (which the Firefox devtools use), and it seems like a
good choice for this kind of application:</p>
<ol class="arabic simple">
<li>easy to …</li></ol><p>While working on tracking down some tricky UI bugs in <cite>about:addons</cite>, I wondered
what it would look like to rewrite it using web technologies. I've been
meaning to learn React (which the Firefox devtools use), and it seems like a
good choice for this kind of application:</p>
<ol class="arabic simple">
<li>easy to create reusable components</li>
</ol>
<blockquote>
XBL is used for this in the current <cite>about:addons</cite>, but this is a non-standard
Mozilla-specific technology that we want to move away from, along with XUL.</blockquote>
<ol class="arabic simple" start="2">
<li>manage state transitions, undo, etc.</li>
</ol>
<blockquote>
There is quite a bit of code in the current <cite>about:addons</cite> implementation
to deal with undoing various actions. React makes it pretty easy to track
this sort of thing through libraries like Redux.</blockquote>
<p>To explore this a bit, I made a simple <a class="reference external" href="https://github.com/rhelmer/aboutaddons/">React version of about:addons</a>. It's
actually <a class="reference external" href="https://addons.mozilla.org/en-US/firefox/addon/about-addons/">installable as a Firefox extension</a> which overrides <cite>about:addons</cite>.</p>
<p>Note that it's just a proof-of-concept and almost certainly buggy - the way
it's hooking into the existing sidebar in <cite>about:addons</cite> needs some work for
instance. I'm also a React newb so pretty sure I'm doing it wrong. Also,
I've only implemented #1 above so far, as of this writing.</p>
<p>I am finding React pretty easy to work with, and I suspect it'll take
far less code to write something equivalent to the current implementation.</p>
Toy Add-on Manager in Rust2016-11-30T12:14:00-08:002016-11-30T12:14:00-08:00Robert Helmertag:www.rhelmer.org,2016-11-30:/blog/toy-add-on-manager-in-rust.html<p>I've been playing with Rust lately, and since I mostly work on the Add-on
Manager these days, I thought I'd combine these into a <a class="reference external" href="https://github.com/rhelmer/AddonManager">toy rust version</a>.</p>
<p>The Add-on Manager in Firefox is written in Javascript. It uses a lot of
ES6 features, and has "chrome" (as opposed to "content …</p><p>I've been playing with Rust lately, and since I mostly work on the Add-on
Manager these days, I thought I'd combine these into a <a class="reference external" href="https://github.com/rhelmer/AddonManager">toy rust version</a>.</p>
<p>The Add-on Manager in Firefox is written in Javascript. It uses a lot of
ES6 features, and has "chrome" (as opposed to "content") privileges, which
means that it can access internal Firefox-only APIs to do things like download
and install extensions, themes, and plugins.</p>
<p>One of the core components is a class named <a class="reference external" href="https://dxr.mozilla.org/mozilla-central/rev/8d8846f63b74eb930e48b410730ae088e9bdbee8/toolkit/mozapps/extensions/internal/XPIProvider.jsm#5356-5360">AddonInstall</a> which implements a
state machine to download, verify, and install add-ons. The main purpose of this
toy Rust project so far has been to model the design and see what it looks like.</p>
<p>So far mostly it's an exercise in how awesome Enum is compared to the
JS equivalent (int constants), and how nice match is (versus switch statements).</p>
<p>It's possible to compile the Rust app to a native binary, or alternatively to
asm.js/wasm, so one thing I'd like to try soon is loading a wasm version of
this Rust app inside a Firefox JSM (which is the type of JS module used for
internal Firefox code).</p>
<p>There's a <a class="reference external" href="https://crates.io/crates/webplatform">webplatform crate</a> on crates.io that enables which allows for
easy DOM access, it'd be interesting to see if this works for Firefox
chrome code too.</p>
Better Source Code Browsing With FreeBSD and Mozilla DXR2014-11-27T16:45:00-08:002014-11-27T16:45:00-08:00Robert Helmertag:www.rhelmer.org,2014-11-27:/blog/better-source-code-browsing-with-freebsd-and-mozilla-dxr.html<p>Lately I've been reading about the <a class="reference external" href="http://www.amazon.com/Design-Implementation-FreeBSD-Operating-Edition/dp/0321968972">design and implementation of the FreeBSD
Operating System</a> (great book, you should read it).</p>
<p>However I find browsing the source code quite painful. Using vim or emacs is
fine for editing invidual files, but when you are trying to understand and
browse around a …</p><p>Lately I've been reading about the <a class="reference external" href="http://www.amazon.com/Design-Implementation-FreeBSD-Operating-Edition/dp/0321968972">design and implementation of the FreeBSD
Operating System</a> (great book, you should read it).</p>
<p>However I find browsing the source code quite painful. Using vim or emacs is
fine for editing invidual files, but when you are trying to understand and
browse around a large codebase, dropping to a shell and grepping/finding around
gets old fast. I know about ctags and similar, but I also find editors
uncomfortable for browsing large codebases for an extended amount of time -
web pages tend to be easier on the eyes.</p>
<p>There's an <a class="reference external" href="https://en.wikipedia.org/wiki/LXR_Cross_Referencer">LXR</a> fork called <a class="reference external" href="http://fxr.watson.org/">FXR</a> available, which is way better and I am
very grateful for it - however it has all the same shortcomings LXR that we've
become very familiar with on the Mozilla LXR fork (<a class="reference external" href="http://mxr.mozilla.org/">MXR</a>):</p>
<ul class="simple">
<li>based on regex, not static analysis of the code - sometimes it gets things
wrong, and it doesn't really understand the difference between a variable
with the same name in different files</li>
<li><a class="reference external" href="http://fxr.watson.org/fxr/source/amd64/acpica/acpi_machdep.c?v=FREEBSD10">not particularly easy on the eyes</a> (shallow and easily fixable, I know)</li>
</ul>
<p>I've been an admirer of Mozilla's next gen code browsing tool, <a class="reference external" href="http://dxr.mozilla.org">DXR</a>, for a
long time now. DXR uses a clang plugin to do static analysis of the code,
so it produces the real call graph - this means it doesn't need to guess at the
definition of types or where a variable is used, it <em>knows</em>.</p>
<p>A good example is to contrast a file on MXR with the same file on DXR.
Let's say you wanted to know where <a class="reference external" href="http://dxr.mozilla.org/mozilla-central/source/dom/canvas/CanvasUtils.cpp#42">this macro</a> was first defined, that's
easy in DXR - just click on the word "NS_WARNING" and select "Jump to definition".</p>
<p>Now <a class="reference external" href="http://mxr.mozilla.org/mozilla-central/source/dom/canvas/CanvasUtils.cpp#42">try that on MXR</a> - clicking on "NS_WARNING" instead yields a search which
is <a class="reference external" href="http://mxr.mozilla.org/mozilla-central/ident?i=NS_WARNING">not particularly helpful</a>, since it shows every place in the codebase that
the word "NS_WARNING" appears (note that DXR has the ability to do this same
type of search, in case that's really what you're after).</p>
<p>So that's what DXR is and why it's useful. I got frustrated enough with the
status quo trying to grok the FreeBSD sources that I took a few days and the
with help of folks in the #static channel on irc.mozilla.org (particularly
Erik Rose) to get DXR running on FreeBSD and indexed a tiny part of the source
tree as a proof-of-concept (the source for "/bin/cat"):</p>
<p><a class="reference external" href="http://freebsdxr.rhelmer.org">http://freebsdxr.rhelmer.org</a></p>
<p>This is running on a FreeBSD instance in AWS.</p>
<p>DXR is currently undergoing major changes, SQLite to ElasticSearch
transition being the central one. I am tracking how to get the "es" branch of
DXR going <a class="reference external" href="https://gist.github.com/rhelmer/60bc81c6cee9c507008a">in this gist</a>.</p>
<p>Currently I am able to get a LINT kernel build indexed on DXR master branch, but
still working through issues on the "es" branch.</p>
<p>Overall, I feel like I've learned way more about static analysis, how DXR works,
FreeBSD source code and produced some useful patches for the Mozilla and the
DXR project and hopefully will provide a useful resource for the FreeBSD
project, all along the way. Totally worth it, I highly recommended working
with all of the aforementioned :)</p>
Deploying Socorro quickly2014-06-21T16:45:00-07:002014-06-21T16:45:00-07:00Robert Helmertag:www.rhelmer.org,2014-06-21:/blog/deploying-socorro-quickly.html<p>I've been seeing a lot more people looking for help and information
about installing and running <a class="reference external" href="https://github.com/mozilla/socorro">Socorro</a> (the software that powers
the <a class="reference external" href="https://crash-stats.mozilla.com">crash-stats.mozilla.com</a>)</p>
<p>We've done a lot of work the past few years on making the system
more flexible and are constantly working on improving the documentation,
especially …</p><p>I've been seeing a lot more people looking for help and information
about installing and running <a class="reference external" href="https://github.com/mozilla/socorro">Socorro</a> (the software that powers
the <a class="reference external" href="https://crash-stats.mozilla.com">crash-stats.mozilla.com</a>)</p>
<p>We've done a lot of work the past few years on making the system
more flexible and are constantly working on improving the documentation,
especially the installation instructions - and the more people that are
able to get the system going, the more <a class="reference external" href="https://github.com/mozilla/socorro/graphs/contributors">contributions</a> we've seen.</p>
<p>Still, the docs have been mostly focused on getting a developer install
for hacking on the system, and less so on installing and upgrading the software
without having to configure and understand every component.</p>
<p>In response to some specific questions on the <a class="reference external" href="https://lists.mozilla.org/listinfo/tools-socorro">mailing list</a> about how to
install and then upgrade Socorro, we've released the deploy script that Mozilla
uses internally (with some modifications to work in a more vanilla environment).</p>
<p>The easiest way to get a system going is to spin up a <a class="reference external" href="http://socorro.readthedocs.org/en/latest/installation/vagrant.html">Vagrant VM</a> and
then follow the "<a class="reference external" href="http://socorro.readthedocs.org/en/latest/installation/install-binary.html">Installing from binary package</a>" instructions.</p>
<p>We also run a <a class="reference external" href="https://ci.mozilla.org/job/socorro-vagrant/">Jenkins bot</a> to ensure that the Vagrant and deploy script
don't regress.</p>
<p>This is easy enough that it's making our "<a class="reference external" href="http://socorro.readthedocs.org/en/latest/installation/install-src-dev.html">Installing from source</a>"
instructions look quite baroque, so expect those to see some improvements soon
too!</p>
<p>I'd like to give a particular shout-out to <a class="reference external" href="https://twitter.com/jorgenpt">Jørgen P. Tjernø</a> who has been
doing <a class="reference external" href="https://github.com/mozilla/socorro/commits?author=jorgenpt">quite a bit of work</a> to make sure deploys are smooth - thanks
Jørgen!</p>
Etherpad 2013 Meetup Videos on Air Mozilla2013-04-17T23:26:00-07:002013-04-17T23:26:00-07:00Robert Helmertag:www.rhelmer.org,2013-04-17:/blog/etherpad-2013-meetup-videos-on-air-mozilla.html<p>I have been working for a few months now on migrating Mozilla's Etherpad
install from the <a class="reference external" href="http://en.wikipedia.org/wiki/Etherpad">original Etherpad</a> to <a class="reference external" href="http://etherpad.org">Etherpad Lite</a>. I got a chance to
work with a fair amount of people from the excellent Etherpad community
while working on things like adding "Team Site" support to Etherpad Lite …</p><p>I have been working for a few months now on migrating Mozilla's Etherpad
install from the <a class="reference external" href="http://en.wikipedia.org/wiki/Etherpad">original Etherpad</a> to <a class="reference external" href="http://etherpad.org">Etherpad Lite</a>. I got a chance to
work with a fair amount of people from the excellent Etherpad community
while working on things like adding "Team Site" support to Etherpad Lite, and
met even more amazing people recently during the 2013 Etherpad meetup.</p>
<p>Videos have just been posted on Air Mozilla:</p>
<iframe frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen msallowfullscreen width="640" height="360" name="vidly-frame" src="https://vid.ly/embeded.html?link=4q5d3n&new=1&autoplay=false&hd=yes"><a target="_blank" href="https://vid.ly/4q5d3n"><img src="https://vid.ly/4q5d3n/poster" /></a></iframe><p><a class="reference external" href="https://air.mozilla.org/etherpad-meetup-part-1">Etherpad 2013 Meetup Part 1</a></p>
<iframe frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen msallowfullscreen width="640" height="360" name="vidly-frame" src="https://vid.ly/embeded.html?link=8s4p1k&new=1&autoplay=false&hd=yes"><a target="_blank" href="https://vid.ly/8s4p1k"><img src="https://vid.ly/8s4p1k/poster" /></a></iframe><p><a class="reference external" href="https://air.mozilla.org/etherpad-meetup-part-2">Etherpad 2013 Meetup Part 2</a></p>
<p>If you are at all interested in:</p>
<ul class="simple">
<li><a class="reference external" href="http://en.wikipedia.org/wiki/Etherpad">original Etherpad</a> (!)</li>
<li><a class="reference external" href="http://etherpad.org">Etherpad Lite</a></li>
<li>upcoming node.js features</li>
<li><a class="reference external" href="http://en.wikipedia.org/wiki/Operational_transformation">operational transformation</a></li>
<li>real-time multi-user wiki(pedia) editing (WYSIWYWIKI?)</li>
<li>hosted etherpad solutions</li>
<li>other cool stuff</li>
</ul>
<p>Then definitely check these out!</p>
capture and replay http post using tcpdump2012-12-07T23:06:00-08:002012-12-07T23:06:00-08:00Robert Helmertag:www.rhelmer.org,2012-12-07:/blog/capture-and-replay-http-post-using-tcpdump.html<p>Mozilla runs a crash-stats service, which accepts crash reports from clients
(mobile/desktop browsers, B2G, etc) and provides a reporting interface.</p>
<p>Recently, a change landed on the client side to enable multiple minidumps to
be attached to an incoming crash, and we want to add support to the server
to …</p><p>Mozilla runs a crash-stats service, which accepts crash reports from clients
(mobile/desktop browsers, B2G, etc) and provides a reporting interface.</p>
<p>Recently, a change landed on the client side to enable multiple minidumps to
be attached to an incoming crash, and we want to add support to the server
to accept these as soon as possible.</p>
<p>Our usual test procedure is to pull an existing crash from production and
submit it as a new crash to our dev and staging instances. Unfortunately, we
had no easy way to test this particular scenario, since the current crash
collector only stores a single minidump, and discards any others. We really
want real data in this case - we of course have unit tests and synthetic
data, but the crash collector is a critical service so we want to get it right
the first time when we push updates.</p>
<p>We decided that the most expedient way to get real data would be to capture
from production using tcpdump, then replay this to the dev/staging servers.</p>
<p>There are tools readily available to do this - the major concern is that
we're capturing a large amount of traffic, so we want to filter out as much
as possible. Also, tcpdump has a built-in mechanism for rolling and gzipping
capture files (either every n seconds, or when the file gets over n bytes).</p>
<p>First, run tcpdump on the target (production) server:</p>
<pre class="literal-block">
tcpdump -i eth0 dst port 81 -C 100 -z "gzip" -w output.pcap
</pre>
<p>eth0 is the interface we're interested in, only incoming traffic, and only
port 81. The -C and -z commands will cause tcpdump to roll the output.pcap file
every 100 megabytes.</p>
<p>This ends up producing a (potentially large) number of files:</p>
<pre class="literal-block">
output.pcap
output.pcap1.gz
output.pcap2.gz
</pre>
<p>When you feel you've captured enough data, stop the tcpdump process and
use tcpslice to rebuild a single capture file:</p>
<pre class="literal-block">
tcpslice -w full.pcap output.pcap*
</pre>
<p>Then use tcptrace to reassemble the packets into complete sessions (this
is necessary since TCP packets may be received out-of-order).
This will create one file per HTTP session:</p>
<pre class="literal-block">
tcptrace -e full.pcap
</pre>
<p>Now we have a set of files named e.g. fmekmf.dat - if you take a look inside
these you will see they are full HTTP sessions. They can be replayed against
a dev/stage server using netcat like so:</p>
<pre class="literal-block">
cat aaju2aajv_contents.dat | nc devserver 80
</pre>
<p>You may need to modify the files first, to change the Host header for example.
This is easy to do in-place with sed:</p>
<pre class="literal-block">
cat aaju2aajv_contents.dat | sed 's/Host: prodserver/Host: devserver/' | nc devserver 80
</pre>
<p>NOTE - this technique potentially uses a ton of disk space, I did this in
many stages so I could backtrack in case I made any mistakes. If disk space
(and overall time) are a premium, for example you are setting up a continuous
pipeline, I'd investigate using named pipes instead of creating actual files
for uncompressing and running tcpslice + tcptrace.</p>
<p>Also, if you are doing this in a one-off manner then tcpflow or wireshark
(wireshark has a terminal version, tshark) are easier to work with- I wanted
to do the capture on a locked-down server which had tcpdump available, and
wanted to take advantage of the log rolling+compression feature.</p>
webkit using Perf-o-Matic 2.x2012-03-01T11:13:00-08:002012-03-01T11:13:00-08:00Robert Helmertag:www.rhelmer.org,2012-03-01:/blog/webkit-using-perf-o-matic-2x.html<p>Thanks to the massive efforts of <a class="reference external" href="https://plus.google.com/u/0/105748986001435560355/">Ryosuke Niwa</a> (rniwa), the WebKit
project now uses the code from <a class="reference external" href="http://graphs.mozilla.org">graphs.mozilla.org</a>, check it out:</p>
<p><a class="reference external" href="http://webkit-perf.appspot.com">webkit-perf.appspot.com</a></p>
<p>We share all the same front-end code, the major differences are that
they have their own <a class="reference external" href="http://trac.webkit.org/browser/trunk/Websites/webkit-perf.appspot.com">backend graphserver</a> and the static dashboard
images are …</p><p>Thanks to the massive efforts of <a class="reference external" href="https://plus.google.com/u/0/105748986001435560355/">Ryosuke Niwa</a> (rniwa), the WebKit
project now uses the code from <a class="reference external" href="http://graphs.mozilla.org">graphs.mozilla.org</a>, check it out:</p>
<p><a class="reference external" href="http://webkit-perf.appspot.com">webkit-perf.appspot.com</a></p>
<p>We share all the same front-end code, the major differences are that
they have their own <a class="reference external" href="http://trac.webkit.org/browser/trunk/Websites/webkit-perf.appspot.com">backend graphserver</a> and the static dashboard
images are generated using the google charts API instead of node.js
(makes sense, since their server runs on Google Appengine).</p>
<p>rniwa has been doing fantastic work and contributing tons of great
features (while refactoring the code base appropriately), and he started
working on this at an excellent time - just as we kicked off the <a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise">Signal
From Noise</a> project which is leading the way to another major evolution
of our work on measuring and tracking performance for important Mozilla
projects like Firefox, Fennec and B2G.</p>
hacking on graphs 2.0 is fun and easy2011-05-24T15:27:00-07:002011-05-24T15:27:00-07:00Robert Helmertag:www.rhelmer.org,2011-05-24:/blog/hacking-on-graphs-20-is-fun-and-easy.html<p>Interested in adding features/fixing bugs/using your own data with
<a class="reference external" href="http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/">perf-o-matic 2.0</a>? It's easy!</p>
<p>git clone git://github.com/rhelmer/graphs.git</p>
<p>open graphs/graph.html # in your favorite browser; on Mac "open" will do
the right thing</p>
<p>You can now hack on graph.html, js/common.js …</p><p>Interested in adding features/fixing bugs/using your own data with
<a class="reference external" href="http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/">perf-o-matic 2.0</a>? It's easy!</p>
<p>git clone git://github.com/rhelmer/graphs.git</p>
<p>open graphs/graph.html # in your favorite browser; on Mac "open" will do
the right thing</p>
<p>You can now hack on graph.html, js/common.js and js/graph-2.js (maybe
js/embed.js and js/dashboard.js, if you're working on the embed or
dashboard components). You'll be pulling live data from
graphs-new.mozilla.org by default.</p>
<p>For most cases, that's it! If you need more, read on:</p>
<p><strong>*What about the dashboard? I loaded index.html but there are no
graphs!*</strong></p>
<p>No problem; it's all in the INSTALL file, but here's the tl;dr version:</p>
<p>These images are generated by running node.js from cron, doing
server-side HTML5 canvas and saving the result to a static image (PNG).</p>
<p>You need to install <a class="reference external" href="http://nodejs.org/">node.js</a> and <a class="reference external" href="http://npmjs.org/">npm</a>, then:</p>
<p>npm install canvas htmlparser jquery jsdom</p>
<p>mkdir images/dashboard</p>
<p>node ./scripts/static_graphs.js</p>
<p>You should now have static graph images in ./images/dashboard/ and
index.html should look healthier.</p>
<p><strong>*But I want to run the backend server, so I can post my own
results!*</strong></p>
<p>Check out the INSTALL file; it has an example apache config and lists
the dependencies you'll need to install (note - only tested on RHEL 6,
will accept patches/pull requests if you get it running elsewhere
though).</p>
<p><strong>*Ok, but I have my own backend server; can't I just provide my own
JSON feed?*</strong></p>
<p>Yes! The manifest file (for building the menu on the "Custom Chart"
page) looks like
<a class="reference external" href="http://graphs-new.mozilla.org/api/test?attribute=short">http://graphs-new.mozilla.org/api/test?attribute=short</a> and the
individual test runs look like
<a class="reference external" href="http://graphs-new.mozilla.org/api/test/runs?id=16&branchid=1&platformid=12">http://graphs-new.mozilla.org/api/test/runs?id=16&branchid=1&platformid=12</a></p>
<p><strong>*Ok! But I fixed/added/rewrote something, how can I send a patch?*</strong></p>
<p>Excellent! Send me (<a class="reference external" href="http://github.com/rhelmer">http://github.com/rhelmer</a>) a pull request, or
file a bug at bugzilla in <a class="reference external" href="https://bugzilla.mozilla.org/enter_bug.cgi?product=Webtools&component=Graph%20Server&version=2.0">product Webtools component Graphserver
version 2.0</a>, and thanks for contributing!</p>
production graphs 2.0 server ready for use2011-05-23T18:11:00-07:002011-05-23T18:11:00-07:00Robert Helmertag:www.rhelmer.org,2011-05-23:/blog/production-graphs-20-server-ready-for-use.html<p>Hello,</p>
<p>The 2.0 version of graphs.mozilla.org is ready for use:</p>
<p><a class="reference external" href="http://graphs-new.mozilla.org">http://graphs-new.mozilla.org</a></p>
<p>We're not quite ready to take over graphs.mozilla.org yet - the plan is
to do a phased rollout starting with this post, followed by advertising
the new URL on graphs.m.o …</p><p>Hello,</p>
<p>The 2.0 version of graphs.mozilla.org is ready for use:</p>
<p><a class="reference external" href="http://graphs-new.mozilla.org">http://graphs-new.mozilla.org</a></p>
<p>We're not quite ready to take over graphs.mozilla.org yet - the plan is
to do a phased rollout starting with this post, followed by advertising
the new URL on graphs.m.o, and finally taking over graphs.m.o and moving
the old server to graphs-old.m.o</p>
<p>This is the same version (with some minor tweaks based on feedback) as
described in this webdev blog post:</p>
<p><a class="reference external" href="http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/">http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/</a></p>
<p>The primary difference between this and the staging server (now at
graphs.allizom.org), is that graphs-new.m.o has realtime access to the
production DB rather than using a nightly snapshot. The dashboard images
are refreshed on 5-minute intervals, custom charts are pretty much
real-time (with several layers of caching, could be a few minutes old in
reality).</p>
<p>Thanks to everyone who has tested and provided feedback! More is
welcome, we plan to continue making incremental (and perhaps
not-so-incremental) improvements.</p>
<p>You can find more information at
<a class="reference external" href="https://wiki.mozilla.org/Perfomatic:UI">https://wiki.mozilla.org/Perfomatic:UI</a></p>
<p>Thanks!</p>
<p>rhelmer</p>
<p>P.S. one thing I should call out specifically - old-style graph URLs are
not compatible, primarily because the new graphserver automatically
shows the average of all machines in a platform rather than a separate
line for each, and the old-style URLs refer to individual machines. If
this is a show-stopper for anyone let's discuss, it's certainly in the
realm of possibility to support.</p>
Socorro development VMs available2011-05-18T12:03:00-07:002011-05-18T12:03:00-07:00Robert Helmertag:www.rhelmer.org,2011-05-18:/blog/socorro-development-vms-available.html<p>I have been working on a Vagrant virtual machine config for Socorro:</p>
<p><a class="reference external" href="https://github.com/rhelmer/socorro-vagrant">https://github.com/rhelmer/socorro-vagrant</a></p>
<p>Vagrant (<a class="reference external" href="http://vagrantup.com/">http://vagrantup.com/</a>) is a tool to automate setup of the
VirtualBox VMs. It uses puppet to set up and maintain the VM (the puppet
manifests are based on what we use …</p><p>I have been working on a Vagrant virtual machine config for Socorro:</p>
<p><a class="reference external" href="https://github.com/rhelmer/socorro-vagrant">https://github.com/rhelmer/socorro-vagrant</a></p>
<p>Vagrant (<a class="reference external" href="http://vagrantup.com/">http://vagrantup.com/</a>) is a tool to automate setup of the
VirtualBox VMs. It uses puppet to set up and maintain the VM (the puppet
manifests are based on what we use at Mozilla for staging and
production). Puppet will install and set up all dependencies such as
HBase, Postgres, etc. and make sure the latest Socorro trunk is
installed and configured for your dev environment.</p>
<p>This is still a work-in-progress and I am hoping to get continous
integration up soon, (which will have the side-effect of generating
downloadable VM appliances!), so in the meantime please let me know how
it goes if you try it out.</p>
perf-o-matic 2.0 news2011-03-23T20:18:00-07:002011-03-23T20:18:00-07:00Robert Helmertag:www.rhelmer.org,2011-03-23:/blog/perf-o-matic-20-news.html<p>First of all, thank you for all the feedback for the <a class="reference external" href="http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/">new perf-o-matic
2.0 interface</a>! It has been overwhelmingly positive and is utterly
invaluable.</p>
<p>To avoid the time and expense of having to wheedle information out of
me, here are the answers to some frequently asked questions:</p>
<ul class="simple">
<li>the data …</li></ul><p>First of all, thank you for all the feedback for the <a class="reference external" href="http://blog.mozilla.com/webdev/2011/02/04/perfomatic2-0/">new perf-o-matic
2.0 interface</a>! It has been overwhelmingly positive and is utterly
invaluable.</p>
<p>To avoid the time and expense of having to wheedle information out of
me, here are the answers to some frequently asked questions:</p>
<ul class="simple">
<li>the data on staging updates every night from production, at 00:01
Pacific. This can take a few hours, expect data to be spotty until 3
or so.</li>
<li>staging has been moved to <a class="reference external" href="http://graphs.allizom.org">graphs.allizom.org</a> (the old advertised
link will redirect)</li>
<li>Still using <a class="reference external" href="https://wiki.mozilla.org/Perfomatic:UI">wiki.m.o/Perfomatic:UI</a> for tracking high-level issues,
but have started moving over to bugzilla and also triaging open bugs
targeted for the older (current) version</li>
<li>production machine acquired (<a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=627446">bug 627446</a>), waiting on database
access (<a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=642258">bug 642258</a>)</li>
</ul>
<p>Current plan is to have the new production server hosted at
graphs-new.m.o, which will have read-only access to production at first
(I'd like to get some good automated tests in place before we start
receiving data, since outages can mean tree closure).</p>
<p>Any thoughts? Feel free to comment here, ping me in irc, or <a class="reference external" href="https://bugzilla.mozilla.org/enter_bug.cgi?product=Webtools&component=Graph%20Server&version=2.0">file a bug
in product Webtools component Graphserver version 2.0</a></p>
gzip-encoding on tinderbox-stage needs testing2010-06-29T11:00:00-07:002010-06-29T11:00:00-07:00Robert Helmertag:www.rhelmer.org,2010-06-29:/blog/gzip-encoding-on-tinderbox-stage-needs-testing.html<p><a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=574524">Bug 574524</a> should make loading pages from Tinderbox much faster,
especially the brief and full log reports. If you use Tinderbox and are
interested in faster load times, please help test <a class="reference external" href="http://tinderbox-stage.mozilla.org/Firefox/">tinderbox-stage</a> and
comment in the bug if you think anything is broken due to this change.</p>
canvas love2009-02-18T23:39:00-08:002009-02-18T23:39:00-08:00Robert Helmertag:www.rhelmer.org,2009-02-18:/blog/canvas-love.html<p>I am reading about <a class="reference external" href="https://bespin.mozilla.com/">Bespin</a> all over the place, with a lot of focus on
SVG versus canvas, canvas not working in Internet Explorer, etc.</p>
<p>I don't know Bespin's plans in this area, but lots of projects which use
canvas (such as <a class="reference external" href="http://code.google.com/p/flot/">flot</a>) also test with and provide <a class="reference external" href="http://excanvas.sourceforge.net/">excanvas</a>, which …</p><p>I am reading about <a class="reference external" href="https://bespin.mozilla.com/">Bespin</a> all over the place, with a lot of focus on
SVG versus canvas, canvas not working in Internet Explorer, etc.</p>
<p>I don't know Bespin's plans in this area, but lots of projects which use
canvas (such as <a class="reference external" href="http://code.google.com/p/flot/">flot</a>) also test with and provide <a class="reference external" href="http://excanvas.sourceforge.net/">excanvas</a>, which
uses IE's <a class="reference external" href="http://en.wikipedia.org/wiki/Vector_Markup_Language">VML</a> support to provide the basic canvas API. I have read
that excanvas does not work in IE8's standards mode, however it does
work in quirks mode.</p>
<p>There seems to be lots of little explosions of creativity around the
combination of faster Javascript interpreters and canvas, like these
<a class="reference external" href="http://gyu.que.jp/jscloth/touch.html">"3D in 2D" demos</a>, <a class="reference external" href="http://box2d-js.sourceforge.net/">Box2D physics</a> (this works in IE8 thanks to
excanvas).</p>
<p>I've been working on a project which does graphs and other data
visualization in the browser. I ended up using jquery and flot although
<a class="reference external" href="http://raphaeljs.com/">raphael</a> (which uses SVG or VML, so supports IE) was in the running as
well. Working with raphael is neat because everything you create is a
DOM object so it's a lot like working with HTML, but in the end flot
just has many more out-of-the-box features like selection support,
timescales, and so on. Not having IE support is not an option, and I'd
rather not depend on Flash or other plugin if at all possible; I am
quite pleased that there are a ton of reasonable ways to acheive this
given those constraints.</p>
<p>I know this stuff is obvious to most of us around here, but I'm
surprised that excanvas doesn't come up more in these discussions. It is
obviously not as ideal as having honest-to-goodness canvas or SVG
support in all major browsers, but it's a very creative way to drag IE
along, putting a Javascript wrapper around their similar-but-different
native feature.</p>
Tinderbox2009-02-15T13:50:00-08:002009-02-15T13:50:00-08:00Robert Helmertag:www.rhelmer.org,2009-02-15:/blog/tinderbox.html<p>I have been meaning to respond to a bit of <a class="reference external" href="http://drkscrtlv.livejournal.com/302915.html#tinderbox">Aki's post</a> which <a class="reference external" href="http://roberthelmer.com/blog/?cat=9">linked
to me</a> a while back.</p>
<p>I totally agree on quite a bit, although I'd argue that unless someone
really steps up, takes a leadership role, and sets a clear future
direction, then sticking with Tinderbox indefinitely …</p><p>I have been meaning to respond to a bit of <a class="reference external" href="http://drkscrtlv.livejournal.com/302915.html#tinderbox">Aki's post</a> which <a class="reference external" href="http://roberthelmer.com/blog/?cat=9">linked
to me</a> a while back.</p>
<p>I totally agree on quite a bit, although I'd argue that unless someone
really steps up, takes a leadership role, and sets a clear future
direction, then sticking with Tinderbox indefinitely is going to
continue to give you diminishing returns. Tinderbox 1 has been in
maintenance mode for a very long time, although cls, bear and reed do a
great job of keeping it secure and limping along. Tinderbox 2 was
maintained by bear for a while when he was at OSAF but he suggested
Buildbot as a better alternative, and Tinderbox 3 looks like a great
proof of concept but has been inactive for a very long time.</p>
<p>I feel that it's better to contribute to an already active community
that has a lot of momentum behind it, instead of trying to build support
behind home-grown products like Tinderbox and Bonsai, given the amount
of work it is to build and maintain an active community and the current
state of these projects. There were no active competing projects when
these tools were released, and they really set the bar at a time when
"continuous integration" had yet to be coined. Overall they've been
hugely successful and delivered a lot of value to Mozilla and others,
but without a driving force behind new development, they are not keeping
up with demand. I could give you a bunch of little examples, but I think
that the fact that the "blame" column (which is a critical feature) has
been empty since the switch to hg says it all.</p>
<blockquote>
<p>rhelmer covered <a class="reference external" href="../?cat=9">the current tinderbox/buildbot split</a>, and is
among the voices I've heard/read calling for a move away from the
waterfall view, which I don't completely understand. I do understand
that the waterfall is far from ideal as a solitary view. But it does
represent the activity of builds and build machines over a brief
amount of time quite well. Even better when you have a guilty column
;-)</p>
<p>So, why not have both? Or multiple? Not to clutter, but to present
different ways of accessing the data. Each with their own strengths.</p>
</blockquote>
<p>I don't think that the waterfall is bad, it is actually quite brilliant
for certain use cases; however the waterfall is at one end of the
spectrum, with something like Dolske's <a class="reference external" href="http://isthetreegreen.com/">isthetreegreen.com</a> on the
other side, and things like <a class="reference external" href="http://tests.themasta.com/tinderboxpushlog/">tinderboxpushlog</a> somewhere in the middle.
So in essence I agree, but I think the waterfall is actually not that
useful in most cases. It's a pretty low-level, diagnostic type of
interface.</p>
<p>Why do people visit <a class="reference external" href="http://tinderbox.mozilla.org">Tinderbox</a>? Here is what I think:</p>
<ol class="arabic simple">
<li>Should I pull the tree ("Will It Build?")</li>
<li>Can I check in ("Is the tree open?")</li>
<li>Who broke the build (and how)?</li>
<li>Has there been a regression in performance or other metrics?</li>
</ol>
<p>Out of these, only the latter two are served by the waterfall, and
that's only a starting point for this kind of investigation (which the
waterfall does an OK job at).</p>
<p>I think that the first two are a much larger subset of users, and a huge
and complex display is actively hurting them. Regression hunters need a
much larger arsenal of tools, and the waterfall may not be the best
place for them to start, and certainly isn't the last place to visit
(they'll need build logs, graphs, etc.).</p>
<p>There's a ton of innovation going on around build and release right now,
for example I really like how <a class="reference external" href="https://hudson.dev.java.net/">Hudson</a> approaches the problems here,
and also has direct support for release processes. Like Buildbot, it
doesn't do everything Tinderbox does, and it has it's own tradeoffs.
It's not a drop-in replacement for Tinderbox.</p>
<p>A drop-in replacement for Tinderbox is an interesting notion, but I
think it's worth taking a step back and figuring out if you're really
getting the value you could be. I think <a class="reference external" href="http://sethgodin.typepad.com/seths_blog/2009/02/solving-a-different-problem.html">this</a> says it better than I
can:</p>
<blockquote>
<p>The telephone destroyed the telegraph.</p>
<p>Here's why people liked the telegraph: It was universal,
inexpensive, asynchronous and it left a paper trail.</p>
<p>The telephone offered not one of these four attributes. It was far
from universal, and if someone didn't have a phone, you couldn't
call them. It was expensive, even before someone called you. It was
synchronous--if you weren't home, no call got made. And of course,
there was no paper trail.</p>
<p>If the telephone guys had set out to make something that did what
the telegraph does, but better, they probably would have failed.
Instead, they solved a different problem, in such an overwhelmingly
useful way that they eliminated the feature set of the competition.</p>
<p>The list of examples is long (YouTube vs. television, web vs.
newspapers, Nike vs. sneakers). Your turn.</p>
</blockquote>
making updates easier2008-07-30T13:37:00-07:002008-07-30T13:37:00-07:00Robert Helmertag:www.rhelmer.org,2008-07-30:/blog/making-updates-easier.html<p>For a few months now, I've been working in my spare time on a way to
make configuring and serving updates to Mozilla-based applications
easier.</p>
<p>Mozilla updates are <a class="reference external" href="http://wiki.mozilla.org/Software_Update:MAR">MAR</a> files, which are linked to by the <a class="reference external" href="http://wiki.mozilla.org/AUS">Automatic
Update Service</a> (aka AUS2). Several tools are involved in the making of
updates …</p><p>For a few months now, I've been working in my spare time on a way to
make configuring and serving updates to Mozilla-based applications
easier.</p>
<p>Mozilla updates are <a class="reference external" href="http://wiki.mozilla.org/Software_Update:MAR">MAR</a> files, which are linked to by the <a class="reference external" href="http://wiki.mozilla.org/AUS">Automatic
Update Service</a> (aka AUS2). Several tools are involved in the making of
updates for production releases, chiefly <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/patcher/">Patcher</a>, driven by the
<a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation framework</a> for releases. Nightly updates use a
simpler script which automatically determines where builds should be
updated to; Patcher needs every update path to be explicitly specified
in it's config file.</p>
<p>Both Patcher and the nightly script call the <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/update-packaging/">update-packaging tools</a>
to do the work of generating MAR files, which in turn use the "mar"
utility (supports tar-like arguments to manipulate MAR files, e.g. "mar
-t file.mar", "mar -x file.mar", etc.) and the "mbsdiff" utility, which
generates binary patches using a <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/other-licenses/bsdiff/">modified version</a> of <a class="reference external" href="http://www.daemonology.net/bsdiff/">bsdiff</a>.</p>
<p>The update-packaging tools are in <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=444050">need of a makeover</a> too, but that is
a story for another day.</p>
<p>Getting back to how updates are served - Patcher's other job is to
generate thousands of text files, which are used to configure AUS. Every
possible update path, like <a class="reference external" href="https://aus2.mozilla.org/update/3/Firefox/3.0b3/2008020514/WINNT_x86-msvc/ar/beta/update.xml">this one for 3.0b3</a>, is actually generated
dynamically from two text files (partial.txt and complete.txt) which
reside in a directory layout that is similar, but in a slightly
different order, than the information in that URL
(.../product/version/buildid/buildTarget/locale/channel/update.xml).
These complete.txt and partial.txt files have gone through two revisions
in their file format, in the first variables for the generated XML like
updateType, URL to the MAR file, etc. are on a specific line number. In
the second ("version=1"), key/value pairs are used.</p>
<p>AUS2 configuration files only reflect the current state of the system;
for releases the history is in <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/patcher-configs/moz19-branch-patcher2.cfg">Patcher config files</a>
(Config::General). The release automation scripts automatically update
and check this file into CVS, so it's not too painful to deal with in
most situations. There are some <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=384065">outstanding bugs</a> but overall it does
what it is supposed to do.</p>
<p>However, it took me a very long time to get a handle on the above, and I
think the separation between Patcher and the AUS server is not very
useful. In fact, the method of explicit updates for all is downright
unhelpful; every single release (e.g. 2.0.0.15), the following happens:</p>
<ol class="arabic simple">
<li>partial updates are generated from 2.0.0.14->2.0.0.15</li>
<li>every previous release (2.0.0.[1,2,3,4,...]) is pointed to the same
2.0.0.15 update</li>
</ol>
<p>That means generating and publishing two text files for each (release *
platform * locale) combination, which all contain exactly the same
data. Also I think that taking a hint from the way the nightly system
works would be useful here; 2.x should automatically point to the latest
*unless* explicitly overridden, it should not require explicit
configuration to do the norm. Finally, the nightly and production system
should not be so different; every nightly update is a lost opportunity
to test pre-releases of the production system, and having forked systems
is bad for bugfixing and feature porting (note that there are no nightly
updates for locales other than en-US, for example).</p>
<p>So, I've been thinking for a long time about how to make tools that are
easier to use, understand and extend. One idea is to have the AUS server
configuration be a database, not a giant tree of text files, and have
the data in one place (not stored in a config file which is expanded to
a giant tree of text files by a separate app). Another is to provide a
simple API, and a few command line tools which use this API to modify
update data and export it.</p>
<p>The conceptual model right now is that each release contains one update,
which contains two patches (one partial, one complete). Both the
database schema and the API reinforce this model.</p>
<p>Here's what I have working so far. In case it's not obvious, this is
most definitely an early "throw the first one away" prototype:</p>
<ul class="simple">
<li>an <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/aus/aus.py">API</a> for dealing with updates, in Python (Release, Update, Patch
classes)</li>
<li>a <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/aus/database.py">simple database layer</a> for storing and retrieving these objects
from a MySQL database</li>
<li>an <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/aus/input/files.py">import plugin for AUS2 configuration</a>, and an <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/aus/output/files.py">export plugin to
straight update.xml</a> files</li>
</ul>
<p>The schema is based on <a class="reference external" href="http://svn.mozilla.org/projects/aus/trunk/sql/aus.sql">Lars' fine work</a> on the <a class="reference external" href="http://wiki.mozilla.org/AUS:v3">subject</a>, although I
did <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/db/aus.sql">modify it slightly</a>. This schema is not totally done yet either,
for example foreign keys aren't actually hooked up, but there's enough
there to see that it works. There's a <a class="reference external" href="http://roberthelmer.com/svn/scripts/trunk/aus/run.py">run.py</a> command in that
directory that calls the importer and exporter correctly.</p>
<p>This means that you can read existing AUS2 data into a database (if you
have it), and create or manipulate update information using the API from
Python (or directly with SQL, if you like). You can generate update.xml
files and put them straight onto a webserver.</p>
<p>What I've put together needs quite a lot more work, but I wanted to open
it up for comment. Here's what I think is remaining, at least:</p>
<ul class="simple">
<li>database should hold the history of updates, not just the current
state</li>
<li>need a web service which talks directly to the database, as an
alternative to pre-generating all update.xml files.</li>
<li>should use existing libs for the DB ORM (SQLAlchemy maybe?),
generating XML, etc. not the home-grown things I threw together</li>
<li>I think it would be advantageous to make the model/schema/API more
sophisticated and normalized (e.g. updates could belong in multiple
channels), but I don't want to go beyond the essentials quite yet.</li>
<li>the new update-packaging tools should be able to read data from this
system in order to automatically determine the appropriate "from"
release to base partial MARs on, and also there should be some way to
register that new updates are available, that access would be
internal and append-only (e.g. only needs SELECT, INSERT).</li>
</ul>
<p>I think that to solve the first, update paths should be explicitly
configured once, but there needs to be business logic in the server app
(or update.xml file generator) which overrides this when a newer release
is available. For instance, if a user is on version 1.0 and version 1.1
is available which has a partial for 1.0, then the partial 1.0->1.1
should be served. However, if version 1.2 is available, then the
complete 1.0->1.2 update should be served.</p>
<p>The second problem has more to do with the burden inherent in handling
tens of thousands of text files (e.g. backing them up or restoring them
can take a very long time), although I believe that it is useful to have
the option to pregenerate the path/update.xml files, especially for
people without so many updates as mozilla.org is pushing each release.</p>
<p>Anyway, comments welcome! Certainly feel free to nudge me if it looks
like I'm going off the rails here, but I think this approach could make
things a little better in update-land. I'll take patches too, but if
anything serious comes of this I'll probably clean up and move over to
Mozilla's repo, and rewrite a bunch, so don't take the current
implementation too seriously..</p>
tinderbox json examples back online2008-07-15T12:17:00-07:002008-07-15T12:17:00-07:00Robert Helmertag:www.rhelmer.org,2008-07-15:/blog/tinderbox-json-examples-back-online.html<p>Thanks to the intrepid Mozilla IT Team (in particular Trevor and Justin)
for sending me the contents of people.mozilla.com/~rhelmer, I now have
the Tinderbox JSON examples back online.</p>
<p>Since it's on my own server now and I have to pay for the bandwidth, I
am not auto-refreshing …</p><p>Thanks to the intrepid Mozilla IT Team (in particular Trevor and Justin)
for sending me the contents of people.mozilla.com/~rhelmer, I now have
the Tinderbox JSON examples back online.</p>
<p>Since it's on my own server now and I have to pay for the bandwidth, I
am not auto-refreshing the data anymore, because I don't want people
actually using it :) Maybe I can hook up some kind of access to a
Mozilla community server, I'll look into this later.</p>
<p>Here is the <a class="reference external" href="http://roberthelmer.com/mozilla/mockups/tinderbox/ajax.html">AJAX example</a>, which apparently still works :). The <a class="reference external" href="http://roberthelmer.com/mozilla/mockups/perf/">Perf
example</a> which uses the tboxJsonApi is apparently borken :( I did a
little debugging on it last night, not sure where it's breaking yet,
it's probably the assumptions that my lame-o regex parsers use.</p>
<p>Anyway, I know that at least Cesar is working on stuff that uses this
data, and I'd like to continue to <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=445041">make it better</a> so file bugs.</p>
releases on tap2008-07-10T23:54:00-07:002008-07-10T23:54:00-07:00Robert Helmertag:www.rhelmer.org,2008-07-10:/blog/releases-on-tap.html<p>One of the things that was pounded into me while working at MoCo is the
idea of having a bug tracker and using it. I literally can't work
without one anymore. It's the first thing I really pushed for at my new
job (they were using various ad-hoc systems for …</p><p>One of the things that was pounded into me while working at MoCo is the
idea of having a bug tracker and using it. I literally can't work
without one anymore. It's the first thing I really pushed for at my new
job (they were using various ad-hoc systems for project management, but
not a real bug tracker for the software dev side). I've realized that I
just can't keep everything in my head, various notepads and text files,
etc. and expect to get anything done, or let anyone know what my
priorities are.</p>
<p>In return, I really tried to hammer in the idea of fast, automated
release cycles. We spent a lot of time (and the release engineering team
does still spend a lot of time) wrapping the build system and other
tools so that they can be run and the output verified automatically,
chasing that ideal of the <a class="reference external" href="http://www.formula1.com/news/features/2008/7/8015.html">Formula One-style hand-off</a> to QA and to the
users.</p>
<p>The way releases work now is incredible, just night and day from when I
started at MoCo a little over two years ago. However, there's one thing
that's always bugged me, and since I just had the opportunity to set up
an automated build/release environment, I thought I'd expound a little
bit on it.</p>
<p>The one thing is that nightly builds of Firefox just aren't the same as
the release builds. The way updates work is different, branding is
turned on, bits are signed (on Windows), the directory structure for
files is different. Firefox releases are actually rebuilt from source
for each release.</p>
<p>So what? None of these, even added up, are a big deal, right? Obviously
releases work fine, and there are a ton of great people (and the tools
they've made) that make sure that nothing is missed because of this. But
wouldn't it be great if we could just take the nightly updates and
builds that have already been put through the ringer by thousands of
people, and give those straight to QA? Or if we can't have that, how
about at least have the release builds put through the same tests and
available to QA immediately after checkin?</p>
<p>Am I pushing some fanciful, architecture-astronaut utopian vision? I
don't think so, because this is how I've done releases in the past, and
this is how I do releases now. Let me tell you about it.</p>
<p>I use <a class="reference external" href="https://hudson.dev.java.net/">Hudson</a>, which I can't recommend highly enough (well, if you're
not allergic to Java, I guess). It makes this kind of process easy. It's
not necessary to use it to achieve this of course, I'm just throwing
this out as a data point.</p>
<p>On each checkin:</p>
<ul class="simple">
<li>a unique build number is generated</li>
<li>a new build is generated (I also have it run unit tests, and install
the software to run functional tests)</li>
<li>release files and other artifacts like build logs are archived, and
checksums of the files are stored</li>
<li>if anything goes wrong, the team and the developer who checked in the
latest change are notified</li>
</ul>
<p>The software is available to QA as soon as this automated process is
complete. When it's time to release, I can tag the build via the web UI
(although it's easy enough to do outside of Hudson if you have the build
number, which in turn contains the branch/datestamp/revision info
needed).</p>
<p>Having the next release always "on tap" makes it easy for me to largely
ignore the build/release side of things, and focus on developing
software, writing tests, and tracking down problems.</p>
<p>Now, Mozilla's situation is way more complicated, which I alluded to a
bit earlier. This post isn't a "see what I can do!" rant as much as a
"look what's possible!" idea. I think that this kind of setup is totally
doable for Mozilla's products, but there are some serious issues:</p>
<ul class="simple">
<li>branding is turned on at compile time. having nightly builds not
called "Firefox" is a *good* thing, as otherwise end-users would be
very confused.</li>
<li>"--enable-tests", needed for unit tests, cannot be run in release
builds at the moment (for technical reasons outside the scope of this
post; I'm sure there are bugs on this)</li>
<li>release builds are signed and have a different filename format and
directory structure (e.g. "firefox-3.0.pre.en-US.win32.installer.exe"
for nightly versus "3.0/win32/en-US/Firefox Setup 3.0.exe")</li>
<li>release builds are cryptographically signed, to assure users that
these files really were created by MoCo (regardless of what mirror or
download site they may have come from).</li>
<li>nightly updates are only for en-US, and use a different set of tools
to generate updates, and a different mode of the update server to
serve updates (some ideas for fixing this problems are in <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=410806">bug
410806</a>, but again this is outside of the scope of this post)</li>
</ul>
<p>So all of these are pretty much good things (branding, signing, etc.) or
technical issues that could surely be fixed (nightly updates, unit
tests). Arguably, nightly users and release users tend to be very
different people, with very different needs and expectations, so all of
the "intentional problems" here are really good things. This pretty much
eliminates the possibility (as far as I can see) that Firefox release
engineers could take a nightly build and be able to ship that as a
release build.</p>
<p>Even if the branding issue were solved (e.g. repackaging), signing still
needs to be done, partial diff files would need to be regenerated, and
probably other things that I'm overlooking. The automated tests that
were run on the nightlies may not be applicable (you may scoff at the
paranoia, but there was a <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=340976">bug</a> regarding the size of the Vista icon in
official branding found late in the Fx3 beta cycle which caused a bunch
of grief. This situation was improved by making a Minefield version of
the same icon, which is a good fix, but I think my point still stands).</p>
<p>Here's another option - why not create a real, honest-to-god Firefox
release build, on each checkin (or at least alongside each nightly
build)? This at least makes it available to QA as soon as humanly
possible, and it could probably be opened up somehow to interested
community testers (human-triggered builds are right now, just put into a
special area).</p>
<p>Maybe I'm just spoiled working on little tiny projects, but I think even
the already super-fast and extensively tested Firefox releases could be
made super-faster and the tests extensivlier, at the cost of freeing up
the release engineers of the need to babysit the One and Final Release
Build.</p>
on moving to buildbot for reals2008-04-08T01:32:00-07:002008-04-08T01:32:00-07:00Robert Helmertag:www.rhelmer.org,2008-04-08:/blog/on-moving-to-buildbot-for-reals.html<p>People are often very confused by the state of where Mozilla is with
regard to Tinderbox versus Buildbot. They are both continuous
integration systems, and you'd think that just jumping wholesale would
be easier than the unholy marriage I've described in the past.</p>
<p>The big distinctions are these:</p>
<ul class="simple">
<li>server vs …</li></ul><p>People are often very confused by the state of where Mozilla is with
regard to Tinderbox versus Buildbot. They are both continuous
integration systems, and you'd think that just jumping wholesale would
be easier than the unholy marriage I've described in the past.</p>
<p>The big distinctions are these:</p>
<ul class="simple">
<li>server vs. client - Buildbot clients and server are tightly coupled,
and communicate through an active TCP connection (managed by
Twisted). Tinderbox clients simply send email to the server, one for
build start and one for build stop (build stop has the status
specified, which changes color on Tinderbox server). The logfile for
the build may be attached to the "end" email.</li>
<li>Tinderbox server vs. Buildbot server - tinderbox.mozilla.org puts up
with a lot of load. Buildbot server can probably not handle this.
Also, Tinderbox server has a bunch of features that Mozilla
developers depend on, like setting status, etc.</li>
</ul>
<p>Personally I feel that Tinderbox is the wrong way to visualize what
developers actually need, but I'll save that for a later and more
productive post :) For now, suffice to say that Tinderbox server does a
lot more and can handle way more load than Buildbot server.</p>
<p>However, Buildbot server does have some very nice qualities, like being
able to see the log in real-time, and being able to stop and force
builds. So, an interim solution is to have Buildbot server send email to
Tinderbox server on behalf of it's clients, so you get Buildbot as an
administrative, developer-only interface, and Tinderbox server as the
general, public interface.</p>
<p>The 1.8 and 1.9 nightly builders are already exposed to nightly users;
there are a couple kinks to work out, so I won't link to it right now
(I'll let the people that are actually maintaining it do that :P), but
the glorious future is that developers can stop and kick builds as well
as see real-time logs.</p>
<p>So, that's all well and good, and I think fairly well understood. Now
here's the hairy part - the 1.8 and 1.9 nightly Buildbot clients are
turning around and calling Tinderbox! WTF! (note that the unittest and
moz2 buildbots do not do this, only the 1.8/1.9 nightly boxes). This is
because Tinderbox client contains code to do a bunch of things:</p>
<ul class="simple">
<li>mozilla-specific build process</li>
<li>performance testing</li>
<li>create updates</li>
<li>publish updates (nightly AUS only)</li>
<li>rebooting windows 9x between builds (not joking)</li>
<li>support for a bajillion products and platforms (mostly through huge
"if" blocks)</li>
<li>support for hybrid depend/clobber builders</li>
<li>support for uploading to various locations on FTP</li>
<li>much, much more</li>
</ul>
<p>Some of these features are very useful and not available elsewhere, and
some are obviously not useful anymore. The error and log handling leaves
a lot to be desired; it's not something trivially fixable, unfortunately
(lots of people have tried, resulting in not one but two attempted
rewrites).</p>
<p>Getting all of the useful bits of this into Buildbot has been a real
challenge, but <a class="reference external" href="http://blog.mozilla.com/bhearsum">Ben Hearsum</a> has all of the important bits worked out
for moz2. I'm hoping to spend some time packaging that up as a
BuildFactory, to make it easy to reuse this code for other branches and
products (mostly because I'd really like to see bug <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=421586">421586</a> get
fixed), strictly as a community member of course :)</p>
<p>You can read more about Buildbot <a class="reference external" href="http://buildbot.net/repos/release/docs/buildbot.html#GNUAutoconf">process-specific factories</a> (that's a
nice example of what a GNU Autoconf style project could use, which comes
with Buildbot) but suffice to say it's a way of encapsulating the basic
build process so you don't need to copy and paste "cvs co client.mk",
"make -f client.mk MOZ_CO_PROJECT=blah" for each builder in your
Buildbot master.cfg</p>
<p>This brings up the other big missing piece, which is that Buildbot's
awesome Source class can't be used because it doesn't understand that it
can't just update the whole "mozilla" CVS module, but needs to use the
client.mk instead. This means that built-in clobber support and the
built-in "tryserver" support can't be used (the current Mozilla
implementations have a lot of custom code).</p>
<p>Bug <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=414031">414031</a> suggests a possible way to implement support for it.
Although it's kind of a pain to implement, using a driver script like
this is fairly common in Java projects, so I think some kind of generic
support might be feasible.</p>
<p>If you're not sure what I'm talking about here and why Source can't be
used out of the box, the client.mk only does a partial checkout of the
"mozilla" CVS module depending on which MOZ_CO_PROJECT is specified.
Also, it can and does check out different versions of subdirectories,
such as NSPR and NSS.</p>
<p>In other words, this is not your typical "checkout module && ./configure
&& make" project, although it is deceptively close in some ways :) It'd
probably be better to have basic support for this flow, just based on
principle of least surprise. I think that it also has material effect on
tool support and new developers, too.</p>
rel-o-mation slideware!2008-04-04T10:39:00-07:002008-04-04T10:39:00-07:00Robert Helmertag:www.rhelmer.org,2008-04-04:/blog/rel-o-mation-slideware.html<p>I put <a class="reference external" href="http://people.mozilla.org/~rhelmer/presentations/2008Apr01_release_automation/slides.html">this set of slides</a> together to explain what state the release
automation project is in. It probably makes more sense when I am sitting
there to explain what each point means, but I figured I'd put it out
there anyway :)</p>
<p>The current setup mimics ye olde manual release …</p><p>I put <a class="reference external" href="http://people.mozilla.org/~rhelmer/presentations/2008Apr01_release_automation/slides.html">this set of slides</a> together to explain what state the release
automation project is in. It probably makes more sense when I am sitting
there to explain what each point means, but I figured I'd put it out
there anyway :)</p>
<p>The current setup mimics ye olde manual release process, forged by
Chase. Over the past few years we've worked on wrapping that process in
scripts with this <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/release">perl framework</a> (aka "bootstrap"), which
auto-generates configs for underlying systems like <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox/">tinderbox</a> and
<a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/patcher/">patcher</a>, checks logs for errors, etc.</p>
<p>A lot of the current bugs come from underlying systems, especially the
tinderbox client. Reducing some of the complexity here would both make
the system more understandable and most likely faster as well. It's
pretty tough to make changes when you're doing this level of wrapping,
too.</p>
<p>Now that everything is driven by Buildbot, it probably makes the most
sense to call the build system directly, instead of
buildbot->bootstrap->tinderbox->build_system that we have today. There
are bugs on all of this already, hopefully the slides and this post will
make it clearer how they tie together.</p>
Breakout!2008-03-19T14:45:00-07:002008-03-19T14:45:00-07:00Robert Helmertag:www.rhelmer.org,2008-03-19:/blog/breakout.html<p>I was working through some <a class="reference external" href="http://pygame.org">Pygame</a> tutorials last week and thought
it'd be fun to see if Canvas/JS was fast enough in Fx3 to do some simple
games.</p>
<p>So, I spent a couple evenings last weekend and made a really dumb Sprite
class, and stole some reasonable "breakout physics …</p><p>I was working through some <a class="reference external" href="http://pygame.org">Pygame</a> tutorials last week and thought
it'd be fun to see if Canvas/JS was fast enough in Fx3 to do some simple
games.</p>
<p>So, I spent a couple evenings last weekend and made a really dumb Sprite
class, and stole some reasonable "breakout physics" from <a class="reference external" href="http://www.scriptedfun.com/arinoid-an-arkanoid-clone/">this
tutorial</a> to make this <a class="reference external" href="http://roberthelmer.com/src/js/jsgames/breakout.html">Breakout clone in JS</a>.</p>
<p>The collision detection for the bricks is a little sloppy (there's a
little damage on bricks from time to time) and I haven't done any perf
work yet, but it seems to work ok in Fx3 nightlies on my MBP. Safari
works ok too, just not quite as fast.</p>
<p>Any activity in other tabs seems to have a huge impact on performance,
there's probably a better way to do the sprite maneuvers etc. but I only
had a few hours to spend on this so far. Pointers welcome :)</p>
moving 1.8 nightlies to release machines March 5 20082008-03-04T12:34:00-08:002008-03-04T12:34:00-08:00Robert Helmertag:www.rhelmer.org,2008-03-04:/blog/moving-18-nightlies-to-release-machines-march-5-2008.html<p>As previously announced on Tinderbox and planet, we're migrating nightly
production to running on the same machines as release production.</p>
<p>On the moz1.8 branch, we've been running the new nightlies in parallel
with the "traditional" nightlies since Feb 15 2008, and are going to
switchover live tomorrow.</p>
<p>The new …</p><p>As previously announced on Tinderbox and planet, we're migrating nightly
production to running on the same machines as release production.</p>
<p>On the moz1.8 branch, we've been running the new nightlies in parallel
with the "traditional" nightlies since Feb 15 2008, and are going to
switchover live tomorrow.</p>
<p>The new machines:</p>
<p>* production-pacifica-vm</p>
<p>* production-prometheus-vm</p>
<p>* bm-xserve05</p>
<p>The old machines:</p>
<p>* pacifica-vm</p>
<p>* prometheus-vm</p>
<p>* bm-xserve02</p>
<p>Starting tomorrow, the performance machines will begin following the new
machines. The new machines will publish updates and nightly builds to
the usual location, and the old machines will be disabled (but kept
around for a while, just in case).</p>
<p>If there is a reason that we should not proceed, or if you see any
problems after the migration, please update bug 417147 or email
<a class="reference external" href="mailto:build@mozilla.org">build@mozilla.org</a>.</p>
<p>Thanks!</p>
<p>Rob</p>
moving nightly Mozilla1.8 Firefox to release automation system2008-02-14T19:35:00-08:002008-02-14T19:35:00-08:00Robert Helmertag:www.rhelmer.org,2008-02-14:/blog/moving-nightly-mozilla18-firefox-to-release-automation-system.html<p>I've just enabled nightly builders from the release automation system on
the Mozilla 1.8 tree (see <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=417147">bug 417147</a> for details).</p>
<p>I've blogged on this previously, but just to reiterate some of the
reasons:</p>
<ul class="simple">
<li>unify the (very fragmented) nightly and final release processes
(tools, procedure, etc).</li>
<li>move away from Tinderbox …</li></ul><p>I've just enabled nightly builders from the release automation system on
the Mozilla 1.8 tree (see <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=417147">bug 417147</a> for details).</p>
<p>I've blogged on this previously, but just to reiterate some of the
reasons:</p>
<ul class="simple">
<li>unify the (very fragmented) nightly and final release processes
(tools, procedure, etc).</li>
<li>move away from Tinderbox client to Buildbot</li>
<li>use the same set of machines for both nightly and release</li>
</ul>
<p>The first point is a really big one for me, using totally different
tools for nightly and release means that we don't get much testing of
our release-only procedure and tools, so we often hit unexpected bugs on
release day, and it also leaves nightly users without the benefits we
provide for releases like automated update verification, updates for all
locales, thorough error checking and monitoring of build machines,
automated staging runs before pushing changes live, for a start.</p>
<p>The current setup still uses Tinderbox, it's just being invoked by
Buildbot, so developers should notice no change besides new hostnames.
We're trying this out on 1.8 branch first before we tackle 1.9, so far
it has been quite smooth but please let us know if you notice anything
out of the ordinary. We have not switched over perf tests yet, but we
expect the results to not change (although we may want to merge some
graphs for developer convenience, etc). This will happen before the old
machines are turned off.</p>
<p>We're planning on turning off the older 1.8 builders sometime after
February 25th, so please do let us know if you see any problems. I've
left a note with the names of the new builders at the top of the
<a class="reference external" href="http://tinderbox.mozilla.org/Mozilla1.8">Mozilla1.8 Tinderbox tree</a>.</p>
<p>This is only one tiny step towards improving life both for the
build&release group and also developers and nightly testers, but it's
quite significant from an infrastructure point of view, and has been
brewing for a long time. I'm not sure what the next steps are going to
be, but I've written up <a class="reference external" href="http://wiki.mozilla.org/User:Rhelmer:Migrating_Tinderbox_to_Buildbot">some thoughts</a> on where I think we should go
and why.</p>
tinderboxJsonApi 0.12008-01-17T00:56:00-08:002008-01-17T00:56:00-08:00Robert Helmertag:www.rhelmer.org,2008-01-17:/blog/tinderboxjsonapi-01.html<p>Many people have told me that they were excited about the JSON Tinderbox
feed, but were quickly discouraged from doing anything fun due to the
scary data structure that it presents; it's a straight dump of what the
server uses, and is obviously optimized towards making a waterfall
display (plus …</p><p>Many people have told me that they were excited about the JSON Tinderbox
feed, but were quickly discouraged from doing anything fun due to the
scary data structure that it presents; it's a straight dump of what the
server uses, and is obviously optimized towards making a waterfall
display (plus, it's just plain weird).</p>
<p>I set up an <a class="reference external" href="http://people.mozilla.org/~rhelmer/mockups/tinderbox/ajax.html">enhanced waterfall</a> as an example a while back, but it's
really hard to take it further without spending a lot of time digging
around inside the tinderbox_data object.</p>
<p>I've often wished that I could just sort by column in Tinderbox, so
instead of doing yet-another one-off script I put together a little web
app that gives you a sortable table of the latest (non-talos) perf data:
<a class="reference external" href="http://people.mozilla.org/~rhelmer/mockups/perf/">Analysis paralysis</a></p>
<p>Click on the headers, and you get data sorted by your criteria. The data
is real-time, but does not auto-reload.</p>
<p>I started to hit a wall almost immediately due to the machinations
required for the tinderbox_data structure, so I stepped back and took
some time to write a <a class="reference external" href="http://people.mozilla.org/~rhelmer/mockups/perf/tboxJsonApi.js">tboxJsonApi.js</a> instead of dealing directly with
the data from Tinderbox. This lets me write code like:</p>
<dl class="docutils">
<dt>::</dt>
<dd><script src="<a class="reference external" href="http://tinderbox.mozilla.org/Firefox/json.js">http://tinderbox.mozilla.org/Firefox/json.js</a>"><script>tree = new Tree(tinderbox_data);builds = tree.getBuilds();for (i in builds) { build = builds[i]; build.getName(); build.getStartTime(); build.getStatus();</script></dd>
</dl>
<p>You can get checkins for a particular build, or test results (the scrape
data is processed, right now it only supports anchor tags with "key:
value" format link text, which is why Talos isn't yet supported).</p>
<p>There's a bunch more stuff I want to do before this will be generally
useful to me, e.g. CSV export, merging all build, perf and test data for
a checkin into one row, etc. but I think it's obvious that we could have
more useful tools for tracking and analyzing the absolute mountian of
data that mozilla.org produces every day.</p>
<p>Let me know if you find this useful, and/or have any questions or ideas
for improvements. I was able to throw this all together in a few hours
this evening, because I spent so much less time wrestling with data
structures and more modeling the kind of app I wanted.</p>
summarizing build-on-checkin feedback2008-01-09T23:08:00-08:002008-01-09T23:08:00-08:00Robert Helmertag:www.rhelmer.org,2008-01-09:/blog/summarizing-build-on-checkin-feedback.html<p>Lots of feedback on the build-on-checkin idea in my blog, the newsgroup,
and especially joduinn's <a class="reference external" href="http://oduinn.com/2008/01/04/build-always-vs-build-on-checkin/">recent post</a> on the subject. The primary
concerns seem to be:</p>
<ul class="simple">
<li>we need as many performance tests per checkin as possible</li>
</ul>
<p>I've filed <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=410869">bug 410869</a> to track this. I think the way we do this …</p><p>Lots of feedback on the build-on-checkin idea in my blog, the newsgroup,
and especially joduinn's <a class="reference external" href="http://oduinn.com/2008/01/04/build-always-vs-build-on-checkin/">recent post</a> on the subject. The primary
concerns seem to be:</p>
<ul class="simple">
<li>we need as many performance tests per checkin as possible</li>
</ul>
<p>I've filed <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=410869">bug 410869</a> to track this. I think the way we do this now
is wrong, and we'd get more performance cycles if we fixed this by
separating the start time of the test from the revision that the test is
for. Also, we should do a separate perf test for each checkin, not just
the latest when the perf machine becomes available, to be able to track
down regressions to a specific changeset.</p>
<ul class="simple">
<li>sometimes the build breaks for non-checkin reasons, and someone needs
to be hunted down to correct it if it's build-on-checkin</li>
</ul>
<p>I think this is mainly a fault of not having adequate monitoring,
auto-recovery, and load-balancing of the server farm, and not giving the
right people access to force builds directly. bhearsum is rocking the
monitoring side in <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=410019">bug 410019</a> so we'll know as soon as anything goes
wrong at the machine level, and Buildbot can do the load-balancing and
give developers an interface to force/clobber/stop builds as needed,
without having to give everyone in the project a shell account or wait
til the next checkin to pick up a CLOBBER file.</p>
<ul class="simple">
<li>some people will still be stuck waiting for build cycles, this just
moves the problem around</li>
</ul>
<p>I think this is absolutely a valid concern, and the more I think about
it, build-on-checkin isn't really all that valuable until we have
multiple buildslaves able to run in parallel, so no one has to wait for
the current cycle to finish in order to have their checkin tested. <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=411629">bug
411629</a> has been filed to track this.</p>
<ul class="simple">
<li>CVS commits are not atomic, what if we pull a partial checkin?</li>
</ul>
<p>Fortunately this goes away when we switch to hg for Moz2, but even for
1.8 and 1.9 branches we poll Bonsai (and can use the revision, aka
branch+timestamp) that it contains, instead of just blindly pulling CVS.
I don't *think* that Bonsai is susceptible to this kind of thing due
to the way it groups checkins before reporting them, but please correct
me if this is wrong. Also, isn't this a problem today, since Tinderbox
client just blindly picks a timestamp and pulls it?</p>
<p>If I've missed or misrepresented anything, please let me know, and check
out the <a class="reference external" href="https://bugzilla.mozilla.org/showdependencytree.cgi?id=401936&hide_resolved=1">dependency tree</a> on <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=401936">bug 401936</a> for more information.</p>
perf impact on nightly release automation move2007-12-28T20:26:00-08:002007-12-28T20:26:00-08:00Robert Helmertag:www.rhelmer.org,2007-12-28:/blog/perf-impact-on-nightly-release-automation-move.html<p>If you care about the behavior of the Firefox perf test machines, please
check out my post <a class="reference external" href="http://groups.google.com/group/mozilla.dev.performance/browse_thread/thread/678d982f76b2e6de/05d8889947a42840#05d8889947a42840">moving Mozilla1.8 tinderboxes to Buildbot - perf
impact</a> in the mozilla.dev.performance newsgroup.</p>
<p>The big question is whether we can move to a model where we only build
on checkin rather than …</p><p>If you care about the behavior of the Firefox perf test machines, please
check out my post <a class="reference external" href="http://groups.google.com/group/mozilla.dev.performance/browse_thread/thread/678d982f76b2e6de/05d8889947a42840#05d8889947a42840">moving Mozilla1.8 tinderboxes to Buildbot - perf
impact</a> in the mozilla.dev.performance newsgroup.</p>
<p>The big question is whether we can move to a model where we only build
on checkin rather than continuously. This change would mean faster build
turnaround times for developers, and a reduced load on build machines.
It also means that the perf machines cycle less often. Currently,
there's no way to disambiguate the start time of the run versus the
latest revision in the build (for CVS, revision in this sense is
branch+timestamp), Tinderbox and graph servers all expect build and perf
run to be the same thing.</p>
<p>In case you're wondering why I'm worried about the Mozilla1.8 tree, if
all goes well with this rollout we'll want to do it this way on Firefox
tree as well; the Mozilla1.8 branch is stable and already on release
automation, so we think it makes sense to start there first.</p>
tinderbox to buildbot: moz18 branch2007-12-19T03:53:00-08:002007-12-19T03:53:00-08:00Robert Helmertag:www.rhelmer.org,2007-12-19:/blog/tinderbox-to-buildbot-moz18-branch.html<p>I've set up the <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> staging server for the Mozilla 1.8
branch (Firefox 2.x) to also generate nightly builds and depend builds
on checkin to the branch (using buildbot's BonsaiPoller). I outlined
some of the advantages to this release automation/nightly+depend
integration in my <a class="reference external" href="http://roberthelmer.com/blog/?p=23">previous post …</a></p><p>I've set up the <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> staging server for the Mozilla 1.8
branch (Firefox 2.x) to also generate nightly builds and depend builds
on checkin to the branch (using buildbot's BonsaiPoller). I outlined
some of the advantages to this release automation/nightly+depend
integration in my <a class="reference external" href="http://roberthelmer.com/blog/?p=23">previous post</a>.</p>
<p>You can see the results on the <a class="reference external" href="http://tinderbox.mozilla.org/Mozilla1.8-Staging/">Mozilla1.8-Staging Tinderbox tree</a>.</p>
<p>The main impediment to taking this live are the performance test
machines. These machines currently only cycle when a new build is
available, but ideally we'd want them to keep re-testing the same build
as many times as possible, to get more stable test results. Since the
Tinderbox-driven depend builds currently continously cycle instead of
waiting for checkin, we tend to get several test builds for the same
source code as a side effect.</p>
<p>These machines forge their start time to match that of the build it came
from which allows for easily matching up checkins and build times to
performance results, but this doesn't really make sense if we're doing
multiple test runs per build.</p>
<p>I've started a thread in the mozilla.dev.builds newsgroup with the
subject "moving Mozilla1.8 tinderboxes to Buildbot" for general
discussion about this idea.</p>
tinderbox to buildbot, step 12007-12-06T00:09:00-08:002007-12-06T00:09:00-08:00Robert Helmertag:www.rhelmer.org,2007-12-06:/blog/tinderbox-to-buildbot-step-1.html<p>As I mentioned <a class="reference external" href="http://roberthelmer.com/blog/?p=21">previously</a>, I've been working on incrementally moving
our <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox">Tinderbox</a> client installs over to <a class="reference external" href="http://buildbot.net">Buildbot</a>.</p>
<p>The <a class="reference external" href="http://wiki.mozilla.org/User:Rhelmer:Migrating_Tinderbox_to_Buildbot#Shorter_term">first step</a> is to switch from driving Tinderbox from Buildbot and
our <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> system, instead of being driven on each
machine from the <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox/multi-tinderbox.pl">multi-tinderbox</a> script. The release automation still
calls Tinderbox …</p><p>As I mentioned <a class="reference external" href="http://roberthelmer.com/blog/?p=21">previously</a>, I've been working on incrementally moving
our <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox">Tinderbox</a> client installs over to <a class="reference external" href="http://buildbot.net">Buildbot</a>.</p>
<p>The <a class="reference external" href="http://wiki.mozilla.org/User:Rhelmer:Migrating_Tinderbox_to_Buildbot#Shorter_term">first step</a> is to switch from driving Tinderbox from Buildbot and
our <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> system, instead of being driven on each
machine from the <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox/multi-tinderbox.pl">multi-tinderbox</a> script. The release automation still
calls Tinderbox client underneath, so features like <a class="reference external" href="http://wiki.mozilla.org/Build:ClobberingATinderbox">CLOBBER</a> support
and all of your other favorites remain.</p>
<p>I have the staging automation builders publishing to the <a class="reference external" href="http://tinderbox.mozilla.org/MozillaStaging">Tinderbox
MozillaStage tree</a>. Note that it's using Mozilla1.8 builders but firing
off builds when it sees trunk checkins; this is because I want to make
sure the Bonsai polling is working and Mozilla1.8 doesn't get very many
checkins :) Also, I'm trying to stay out of the way of the ongoing trunk
automation work that <a class="reference external" href="http://blog.mozilla.com/bhearsum/">bhearsum</a> is driving (AKA, letting him find and
fix the trunk+release_automation bugs before I add nightly support).
Expect to see trunk nightlies up there in the next few weeks.</p>
<p>This has several advantages right off the bat:</p>
<ul class="simple">
<li>same release process we use for production (currently a very small
subset</li>
<li>only build on checkin, should help cycle times</li>
<li>same builders used for nightly and production releases (admittedly,
this is how it used to be before release automation; but now we can
let Buildbot handle the queuing instead of either interrupting
depend/nightly builds or running multiple builds on the same machine,
which is slow)</li>
</ul>
<p>As we continue to make the nightly and final release process more alike,
we can start taking advantage of things like that only final releases
have but are missing on nightlies:</p>
<ul class="simple">
<li>updates for l10n (only en-US gets updates currently :( )</li>
<li>update verification</li>
<li>publishing the source tarball used to buld</li>
<li>using the same timestamp for all platforms</li>
</ul>
<p>On the administration side, it should let us manage builders centrally,
more quickly and easily deploy new builders, and with a little more work
will let us parallelize builds (multiple build machines per column, or
buildslaves per builder in Buildbot parlance), which should further help
cycle time (no having to wait for the current build to finish to get a
build started with your fresh checkin).</p>
<p>Comments/questions/concerns welcome! Feel free to email <a class="reference external" href="mailto:robert@roberthelmer.com">me</a>, the
<a class="reference external" href="mailto:build@mozilla.com">build group</a>, or post in mozilla.dev.builds newsgroup if you'd like to
discuss further.</p>
Migrating Tinderbox to Buildbot2007-11-25T01:53:00-08:002007-11-25T01:53:00-08:00Robert Helmertag:www.rhelmer.org,2007-11-25:/blog/migrating-tinderbox-to-buildbot.html<p>I've started working on migrating the Firefox nightly builds to use the
same <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> system that we've been developing for the
past year or so for maintenance releases (Firefox and Thunderbird 1.5.0x
and 2.0.0.x). The reason this is important is that each nightly release …</p><p>I've started working on migrating the Firefox nightly builds to use the
same <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">release automation</a> system that we've been developing for the
past year or so for maintenance releases (Firefox and Thunderbird 1.5.0x
and 2.0.0.x). The reason this is important is that each nightly release
(installer, update, etc.) is practice for the real thing, and we should
be using the same tools and verification processes wherever possible
(right now both Nightlies and Releases use <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox">Tinderbox client [version
1]</a> for build and l10n repack, all other aspects of the release process
are not tested until the first release candidate. Well, we have a
staging server that tests the release automation in isolation, but it's
not the same as having real nightly testers looking at the results :) ).</p>
<p>The scope of the current release automation framework (<a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/tinderbox">Bootstrap</a>) has
been to leave as much of our existing process in place as possible, and
not try to simplify or optimize. This kind of low-risk approach is the
right thing to do when you're overhauling the release process on a
maintenance branch, but it has created many layers of frameworks:</p>
<p><a class="reference external" href="http://buildbot.net">Buildbot</a>->Bootstrap->TinderboxClient->MozillaBuildSystem</p>
<p>As you can imagine this can be quite nightmarish to debug and add
features to. I believe strongly in backwards compatibility and
incremental development, but the Bootstrap and TinderboxClient client
layers are largely invisible to anyone outside of the Mozilla
Build&Release team.</p>
<p>I think where we really want to be is:</p>
<p>Buildbot->MozillaBuildSystem</p>
<p>Wherever possible, Buildbot should do the same things a developer would
do, and the configuration should be as clear as possible to read and
modify.</p>
<p>I have <a class="reference external" href="http://wiki.mozilla.org/User:Rhelmer:Migrating_Tinderbox_to_Buildbot">some thoughts</a> on how to get there on the wiki. The first step
is to slot Bootstrap into place which is actually pretty easy as it just
calls Tinderbox Client anyway. The larger work here is moving to the
more direct "Buildbot->MozillaBuildSystem" scenario, for which I have a
<a class="reference external" href="http://buildbot.roberthelmer.com">working prototype</a> and it's <a class="reference external" href="http://roberthelmer.com/svn/configs/trunk/mozilla-nightly-master.cfg">configuration</a>, if anyone is interested
in seeing more.</p>
<p>Note that I'm not talking about changing <a class="reference external" href="http://tinderbox.mozilla.org">Tinderbox Server</a> or any of
the existing mechanisms that developers use to clobber builds or check
build status. <a class="reference external" href="http://blog.mozilla.com/bhearsum">bhearsum</a> added Tinderbox Server and <a class="reference external" href="http://bonsai.mozilla.org">Bonsai</a> support
to Buildbot a while back, so builds show up on Tinderbox and we can
configure them to be triggered only on checkin (as opposed to the
continuous loop that Tinderbox Client currently does).</p>
<p>I have started a newsgroup thread in mozilla.dev.builds (subject:
"Migrating Tinderbox to Buildbot"), please follow up there if you'd like
to discuss.</p>
<p><em>EDIT - fix typo</em></p>
Tinderbox JSON - now with 100% more AJAX2007-09-01T23:10:00-07:002007-09-01T23:10:00-07:00Robert Helmertag:www.rhelmer.org,2007-09-01:/blog/tinderbox-json-now-with-100-more-ajax.html<p>As promised, I have published a more AJAXy example of the classic
Tinderbox waterfall, built using the new Tinderbox JSON output mode:</p>
<p><a class="reference external" href="http://people.mozilla.org/%7Erhelmer/mockups/tinderbox/ajax.html">http://people.mozilla.org/~rhelmer/mockups/tinderbox/ajax.html</a></p>
<p>This version uses gzip encoding for the JSON data, only reloads the page
when new data is available, and …</p><p>As promised, I have published a more AJAXy example of the classic
Tinderbox waterfall, built using the new Tinderbox JSON output mode:</p>
<p><a class="reference external" href="http://people.mozilla.org/%7Erhelmer/mockups/tinderbox/ajax.html">http://people.mozilla.org/~rhelmer/mockups/tinderbox/ajax.html</a></p>
<p>This version uses gzip encoding for the JSON data, only reloads the page
when new data is available, and I've cleaned up the code quite a bit
(split into separate functions for easier profiling, using innerHTML
instead of document.write(), etc.).</p>
<p>I'm hoping to use this as a base to start making more fundamental
improvements to the waterfall UI. Jesse suggests having the column
headers always at the top as you scroll, which sounds pretty awesome to
me. luser's <a class="reference external" href="http://mavra.perilith.com/~luser/tboxtest.html">test page</a> now shows the percentage change from the last
run for performance numbers, which I will merge into my version soon.</p>
Towards human-free releases2007-09-01T22:52:00-07:002007-09-01T22:52:00-07:00Robert Helmertag:www.rhelmer.org,2007-09-01:/blog/towards-human-free-releases.html<p>We took a big step towards truly hands-off releases by doing a (very
early) Firefox 2.0.0.7 RC1 with the <a class="reference external" href="http://buildbot.net">Buildbot</a>-enabled release
automation system. There are still <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation#Caveats">some kinks</a> to work out, but
overall things are looking great.</p>
<p>The elapsed machine time from "code freeze" to "ready …</p><p>We took a big step towards truly hands-off releases by doing a (very
early) Firefox 2.0.0.7 RC1 with the <a class="reference external" href="http://buildbot.net">Buildbot</a>-enabled release
automation system. There are still <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation#Caveats">some kinks</a> to work out, but
overall things are looking great.</p>
<p>The elapsed machine time from "code freeze" to "ready to ship" was ~15
hours, actual time was +12h or so waiting for someone to do the signing.
This does not include time for QA, but a lot of that can be interleaved,
and hopefully further automated for maintenance releases (as they
generally include no new features).</p>
<p>I know that we're already very good (<a class="reference external" href="http://shaver.off.net/diary/2007/08/06/about-ten-days-at-black-hat/">10FD</a> ftw), but I know we can do
better. Imagine with me, if you will, that we had a timeline like this:</p>
<p>Day 1 - security exploit announced</p>
<p>Day 2- RC available</p>
<p>Day 3 - fix available on auto-update</p>
<p>Are there any other software vendors that ship security fixes to a
locally-installed application on such a compressed schedule? I'd really
like to know; please leave me a comment or <a class="reference external" href="mailto:robert@roberthelmer.com">email me privately</a> if it's
sensitive. I'd love to be able to measure how we're doing, and that's
tough without knowing how others measure this.</p>
<p>On a more general note, I think that release automation software should
become a commodity just as web servers, continuous integration systems,
etc. have. If you want to help out or just see what we're doing, check
out the <a class="reference external" href="http://wiki.mozilla.org/Build:Release_Automation">Mozilla release automation docs</a>.</p>
Tinderbox waterfall in Javascript2007-08-31T00:37:00-07:002007-08-31T00:37:00-07:00Robert Helmertag:www.rhelmer.org,2007-08-31:/blog/tinderbox-waterfall-in-javascript.html<p>Just a quick follow-up on my earlier <a class="reference external" href="http://roberthelmer.com/blog/?p=16">post</a> about Tinderbox's new JSON
output mode - I've hacked up a quick working example of the waterfall
page in Javascript. You can view the source to see how you could extract
any of this info from Tinderbox.</p>
<p>Here's a screenshot in case it …</p><p>Just a quick follow-up on my earlier <a class="reference external" href="http://roberthelmer.com/blog/?p=16">post</a> about Tinderbox's new JSON
output mode - I've hacked up a quick working example of the waterfall
page in Javascript. You can view the source to see how you could extract
any of this info from Tinderbox.</p>
<p>Here's a screenshot in case it stops working :)</p>
<img alt="" src="http://people.mozilla.org/~rhelmer/mockups/tinderbox/tbox_json.png" />
<p>You can click that image to get to the live version. I used a bunch of
code, images and ideas from Ted Mielczarek's <a class="reference external" href="http://mavra.perilith.com/~luser/tboxtest.html">better-written example</a>.
In particular, the idea to parse out the OS and purpose of each column,
and display it more directly, and I took some bits of code that looked
better than the way I was doing things.</p>
<p>We're already discussing changes that will break these pages, so don't
get too cozy with them. I am pretty excited to have something to show
for this so quickly though, especially having spent very little time so
far.</p>
<p>I am anxious to start trying to <a class="reference external" href="http://roberthelmer.com/blog/?p=14">improve the experience for tinderbox
users</a> (in particular by taking use cases into account), which for me
is the whole point of this exercise. If you have ideas for better UI for
displaying the kind of information that developers, testers and other
Tinderbox users need, this is a great way to mock up a real-life
example, so have at it!</p>
Tinderbox JSON output2007-08-29T16:55:00-07:002007-08-29T16:55:00-07:00Robert Helmertag:www.rhelmer.org,2007-08-29:/blog/tinderbox-json-output.html<p>Thanks to justdave for getting our <a class="reference external" href="http://tinderbox.mozilla.org/Firefox">Tinderbox</a> server installation up
to date!</p>
<p>One of the new features I'd like to highlight is a quick&dirty <a class="reference external" href="http://json.org">JSON</a>
output format.</p>
<p>This is different from all of the existing Tinderbox modes (e.g.
<a class="reference external" href="http://tinderbox.mozilla.org/Firefox/quickparse.txt">quickparse</a>) in that it's a dump of the internal data …</p><p>Thanks to justdave for getting our <a class="reference external" href="http://tinderbox.mozilla.org/Firefox">Tinderbox</a> server installation up
to date!</p>
<p>One of the new features I'd like to highlight is a quick&dirty <a class="reference external" href="http://json.org">JSON</a>
output format.</p>
<p>This is different from all of the existing Tinderbox modes (e.g.
<a class="reference external" href="http://tinderbox.mozilla.org/Firefox/quickparse.txt">quickparse</a>) in that it's a dump of the internal data structure that
Tinderbox uses to build the waterfall output. This means two things:</p>
<ol class="arabic simple">
<li>it's fairly messy</li>
<li>it's 100% complete</li>
</ol>
<p>#1 we can deal with by cleaning up Tinderbox itself (this came from a
proposal by cls, which I initially disagreed with but now see the point
of).</p>
<p>#2 means that anything you can see on the waterfall page, even going
back in time, is accessible.</p>
<p>Hopefully that will make this a good choice for doing things like
creating alternatives to the waterfall display and any Tinderbox data
mining.</p>
<p>A word of warning - the JSON output will change, as Tinderbox itself
changes (and is hopefully cleaned up). This is a good thing though, as
this data structure is pretty funky as it stands.</p>
<p>Here's the cached version, which is what you'll want to use most of the
time:</p>
<p><a class="reference external" href="http://tinderbox.mozilla.org/Firefox/json.js">http://tinderbox.mozilla.org/Firefox/json.js</a></p>
<p>This should be updated every time Tinderbox receives an update from a
builder. It contains an object named "tinderbox_data" with a ton of
data in it. For example, here is how you can pull the latest build
status:</p>
<blockquote>
<pre class="literal-block">
<script src="http://tinderbox.mozilla.org/Firefox/json.js"></script><script>for each (builder in tinderbox_data.build_table[0]) { if (builder.buildname != undefined) { document.write('build name: ' + builder.buildname); }</script>
</pre>
</blockquote>
<p>We're working on enabling cross-site XMLHttpRequest (<a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=394207">bug 394207</a>), so
I'll give a more AJAXy example next time. I think that using
XMLHttpRequest is going to be the preferred way to access this data, not
only because you can build slicker UIs, but you can take advantage of
the "If-Modified-Since" header to only pull data as-needed, as the JSON
file is rather large.</p>
visual cues2007-05-29T15:22:00-07:002007-05-29T15:22:00-07:00Robert Helmertag:www.rhelmer.org,2007-05-29:/blog/visual-cues.html<p>The <a class="reference external" href="http://tinderbox.mozilla.org/Firefox/">Tinderbox waterfall page</a> is pretty detail-heavy. I see a lot of
complaints specifically that the page gets too wide, making it hard to
see at a glance if anything is failing to compile or failing a test
(which for Mozilla means that you can't check in until it's fixed …</p><p>The <a class="reference external" href="http://tinderbox.mozilla.org/Firefox/">Tinderbox waterfall page</a> is pretty detail-heavy. I see a lot of
complaints specifically that the page gets too wide, making it hard to
see at a glance if anything is failing to compile or failing a test
(which for Mozilla means that you can't check in until it's fixed).</p>
<p>One way to make this better is to collapse the information and provide
visual cues:</p>
<div class="figure align-center">
<img alt="visual cues" src="http://people.mozilla.com/~rhelmer/buildbot/visual_cues.png" />
<p class="caption">visual cues</p>
</div>
<p>Ok, well maybe not that exact picture :) But the point is that all
Linux, Windows and Mac machines are represented by one column per
build-type, not one column per machine.</p>
<p>Another useful thing to do would be to make use cases, and provide
different front-ends instead of just the waterfall page. For example,
there's the "can I check in right now" use case, which is different than
the "are all machines reporting ok" and "what do the performance numbers
look like today" use cases.</p>
<p>Right now, we have the <a class="reference external" href="http://build-graphs.mozilla.org/">graph server</a>, and to track down e.g. a perf
regression window can be pretty painful (mostly because you can't see
checkins along the graph; this is probably most appropriate for the new
graph server).</p>
<p>For the "can I check in now" use case, making the front page of
Tinderbox <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=366784">more friendly</a> would be a good first step.</p>
pretty pictures2007-05-25T14:18:00-07:002007-05-25T14:18:00-07:00Robert Helmertag:www.rhelmer.org,2007-05-25:/blog/pretty-pictures.html<p><a class="reference external" href="http://weblogs.mozillazine.org/preed/">preed</a> always makes fun of my release process diagram (as seen on
whiteboards everywhere):</p>
<div class="figure align-center">
<img alt="release process" src="http://people.mozilla.com/~rhelmer/release_process/process.png" />
<p class="caption">release process</p>
</div>
<p>So I made a fancier one, showing the inside of each of these steps:</p>
<div class="figure align-center">
<img alt="step" src="http://people.mozilla.com/~rhelmer/release_process/step.png" />
<p class="caption">step</p>
</div>
<p>However, I still feel that preed doesn't appreciate it, so I dedicate
this diagram to <a class="reference external" href="http://morgamic.com/">morgamic</a>.</p>
Bootstraps, pulling oneself up by one's2007-05-19T22:57:00-07:002007-05-19T22:57:00-07:00Robert Helmertag:www.rhelmer.org,2007-05-19:/blog/bootstraps-pulling-oneself-up-by-ones.html<p>We've been using the release automation scripts, aka <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/release/">Bootstrap</a>, for
the past several releases. We've hit some bumps but overall we've
improved quality as we've been pushing changes into the scripts instead
of having to document and remember to re-check a large list of "gotchas"
whenever we run into problems …</p><p>We've been using the release automation scripts, aka <a class="reference external" href="http://mxr.mozilla.org/mozilla/source/tools/release/">Bootstrap</a>, for
the past several releases. We've hit some bumps but overall we've
improved quality as we've been pushing changes into the scripts instead
of having to document and remember to re-check a large list of "gotchas"
whenever we run into problems, or need to add new steps to the release
(e.g. DLL/EXE signing). Every release "step" has a set of verification
tests, which we've been augmenting and then not having to think about
next release.</p>
<p>Repeating the same set of steps every 6-8 weeks sounds pretty terrible;
it's short enough so that it feels like you've done it a million times
already, and it's just long enough that you get a little fuzzy on the
details and have to constantly refer to the documentation. Even worse,
you have about a dozen individual files to edit, hundreds of commands to
run, and if any of it is incorrect you (generally) need to go back and
start over from there. After all, you can't build without a tag, repack
without a build, or generate (all of the) updates without repacks.</p>
<p>The part that's exciting, challenging and fun is when you start to break
down the big, scary problem into a set of small, manageable problems.
Abstract the small problems into discrete steps and automate them, so
that you don't need to worry about the individual details every time you
use them. You can examine each step separately, optimize it, test it,
and try to get as close to absolute consistency as possible. Do
paranoid, pedantic tests for correctness that would drive a person mad.</p>
<p>In short, it's basically refactoring and unit testing. When you start
doing it after-the-fact there's a high ramp-up cost, but once you've got
the ball rolling it starts picking up serious momentum.</p>
<p>The next big hurdle is end-to-end automation. Right now, with the
automation and infrastructure as-is, a human has to:</p>
<ul class="simple">
<li>log into Tag machine, kick off tag script</li>
<li>log into win32, linux and mac tinderboxes and kick off build script</li>
<li>verify builds and copy to the candidates directory</li>
<li>configure l10n, update generation/verification</li>
<li>kick off l10n build script on win32, linux and mac tinderboxes</li>
<li>verify l10n builds and copy to the candidates directory</li>
<li>sign win32 EXEs/DLLs</li>
<li>log into update machine, kick off patch generator</li>
<li>log into staging machine, kick off staging script</li>
<li>turn on test updates</li>
<li>sign installers</li>
<li>create bouncer links, push bits to mirrors</li>
<li>turn on updates</li>
</ul>
<p>This is not including the huge number of <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=360034">config/version bumps</a> that go
along with all of these disparate systems. If you spend a lot of time
going over all of these files, it seems pretty obvious that we could be
putting in one set of info and generating all of this data.</p>
<p>We're actually now at the point where we can do all the tagging/version
bumping automatically, generate Tinderbox mozconfig/tinder-config.pl
based on the single bootstrap.cfg, and generate the patcher2 configs
(which creates partial updates and configures AUS).</p>
<p>However, we still need to log into the individual machines described
above, check out/update the scripts, and run them. Each of these
processes generally take between 1 and 4 hours, so having them run
back-to-back would not only reduce the total time to do a release
(should be fine running all night, or over weekends), it should help to
reduce mistakes and eliminate the time-wasting polling that we currently
have to do (although bootstrap does support sending email notifications
now, so at least it can be event-driven).</p>
<p>We've been looking at <a class="reference external" href="http://buildbot.net">Buildbot</a> to help tie this into a seamless
process. Buildbot supports both the idea of BuildSets (e.g. win32, linux
and mac builders all operating as one pass/fail operation) as well as
dependent steps e.g. Tag -> Source -> BuildSet(linux,mac,win32) ->
Repack(linux,mac,win32) -> Updates -> Stage.</p>
<p>My original idea for pushing this all together was to send Changes into
Buildbot from Bootstrap everytime a new step was ready, but preed looked
into it more and realized that buildsets and dependent steps already do
what we need. This is great, because it moves us more incrementally from
"Human logging into 10 machines to run 1000 commands" to "Human logging
into 10 machines to run 1 script" to "Buildbot logging into 10 machines
to run 1 script on each", without us having to write any additional
code.</p>
<p>Anyway, we've got a lot of other things going on, but I'm really proud
of all the work we've done to get this far, and confident that we'll be
able to get this across the finish line soon. We've done it so
incrementally that I don't feel like we've built this giant cathedral;
it's more that we've just broken down our big problem into little bits
that we can improve more quickly.</p>
buildbot "try" support2007-02-09T16:49:00-08:002007-02-09T16:49:00-08:00Robert Helmertag:www.rhelmer.org,2007-02-09:/blog/buildbot-try-support.html<p>As many of you know by now, <a class="reference external" href="http://bhearsum.blogspot.com/">Ben Hearsum</a> has been doing awesome work
on <a class="reference external" href="http://buildbot.sf.net">Buildbot</a> integration, such as:</p>
<p>* bonsai support</p>
<p>* publishing to tinderbox</p>
<p>* setup and administration of the seneca cluster</p>
<p>Now the awesomeness continues, as we're working on a <a class="reference external" href="http://buildbot.sourceforge.net/manual-0.7.5.html#try">Buildbot "try"
server</a> to allow developers to upload patches that …</p><p>As many of you know by now, <a class="reference external" href="http://bhearsum.blogspot.com/">Ben Hearsum</a> has been doing awesome work
on <a class="reference external" href="http://buildbot.sf.net">Buildbot</a> integration, such as:</p>
<p>* bonsai support</p>
<p>* publishing to tinderbox</p>
<p>* setup and administration of the seneca cluster</p>
<p>Now the awesomeness continues, as we're working on a <a class="reference external" href="http://buildbot.sourceforge.net/manual-0.7.5.html#try">Buildbot "try"
server</a> to allow developers to upload patches that will generate
one-off builds, without having to check that patch in. This is great for
experimental builds for proof-of-concept type code, as well as making
sure your patch will compile on our supported OS environments.</p>
<p>Ben is doing <a class="reference external" href="http://ukm.spreadsheets.google.com/ccc?id=o09016850625816831570.6979258540609148598.01469457300266485452.6223612764698158517">most of the real work</a> here, and we're working hard to
document and publish details of our setup so others can benefit (and of
course, help us out when we hit problems!). Brian Warner just launched
<a class="reference external" href="http://buildbot.net">Buildbot.net</a> which looks much more useful to me than the old
sourceforge page, so expect to see more HOWTOs appearing there in the
near future.</p>
<p>Note that we're actually not using the "buildbot try" support, but
instead using "<a class="reference external" href="http://buildbot.sourceforge.net/manual-0.7.5.html#sendchange">buildbot sendchange</a>" (however it's worth reading about
"try", because it clearly explains what we're trying to do here). There
are several reasons for this; one is that "buildbot try" assumes that
Buildbot is able to check out directly from CVS (not through client.mk
like we do), and it also assumes that developers have buildbot installed
on their development machine and have direct access to the Buildbot
server on a special Buildbot-specific port. Finally, "buildbot try" does
not accept patches directly; it expects to be run in your checkout and
generate an appropriate patch itself.</p>
<p>We could work around the client.mk problem, but the rest are a bigger
deal; a lot of developers use different version control systems, and
Mozilla is definitely not new to managing patches.</p>
<p>So, the shortest path to happiness seems to be:</p>
<p>* give developers access to a "patch upload" web interface, the version
Ben put together looks like this:</p>
<div class="figure align-center">
<img alt="Try upload dialog" src="http://people.mozilla.org/%7Erhelmer/buildbot/try/try_upload.jpg" />
<p class="caption">Try upload dialog</p>
</div>
<p>* have a custom series of steps, which does:</p>
<ol class="arabic simple">
<li>clobber existing source tree</li>
<li>check out client.mk</li>
<li>download mozconfig</li>
<li>apply patch</li>
<li>configure</li>
<li>compile</li>
</ol>
<p>* upload the build to somewhere useful; for now we'll probably just
keep it on the "try" server, on the same webserver that hosts the patch
upload UI. Each "try" request gets a unique ID, so we can use that to
link to an appropriate output directory.</p>
<p>What you'd see on the Buildbot server page is something like this (NOTE
- I hacked up this image to show you an interesting scenario, but didn't
feel like waiting around for a real checkin to coincide with my "try"
test.. so, the times not matching up 100% in the ETA section is OK)</p>
<div class="figure align-center">
<img alt="Buildbot Try Waterfall example" src="http://people.mozilla.org/~rhelmer/buildbot/try/try_waterfall.jpg" />
<p class="caption">Buildbot Try Waterfall example</p>
</div>
<p>The left-most column is who made the change (the one from coop is from
the bonsaipoller, while the one from me is via the patch upload
interface). The left-most build column is a normal build triggered by
coop's change, while the right-most is triggered by my change.</p>
<p>In Buildbot, each build column represents a "Builder", behind which
there can be one or more buildslaves (buildslaves are the actual hosts,
similar to a tinderbox client). This means that you can have several
simultaneous builds on the same column that are actually happening on
different machines, and in fact you can share the same hosts between
different columns!</p>
<p>When we actually have Windows and Mac buildslaves hooked up to this,
you'll probably see one column for each OS, but that's it. We can add or
remove hosts as needed behind that, and Buildbot will handle queuing of
the incoming requests if all the available build machines are busy.</p>
The Automaton2006-11-01T15:24:00-08:002006-11-01T15:24:00-08:00Robert Helmertag:www.rhelmer.org,2006-11-01:/blog/the-automaton.html<p>The staging environment for the <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=355309">release automation project</a> (aka
"bootstrap") is up and running. This includes a CVS mirror, so
everything from tagging to build to updates (with verification each step
along the way) can be done without affecting the production environment.</p>
<p>The setup/teardown of the environment is scripted …</p><p>The staging environment for the <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=355309">release automation project</a> (aka
"bootstrap") is up and running. This includes a CVS mirror, so
everything from tagging to build to updates (with verification each step
along the way) can be done without affecting the production environment.</p>
<p>The setup/teardown of the environment is scripted out, and bootstrap can
now get through each individual release step and produce useful output
(well, using the patches in my private tree).</p>
<p>The first big deliverable of this system is to automate the currently
human-driven tasks we do as part of the Firefox and Thunderbird release
process, and it is almost there. The next phase is more about what kinds
of improvements and increased reliability/reproducability can be brought
into the process, so I will have more to say on that later, once my
patches have all landed and we try actually using this thing.</p>
<p>I've been using this project as an opportunity to try out some more
interesting tools, the workflow I've been using is:</p>
<ul class="simple">
<li>keep a private repo in <a class="reference external" href="http://svk.bestpractical.com/view/HomePage">SVK</a> on my laptop, sync'd from Mozilla CVS</li>
<li>push to personal <a class="reference external" href="http://subversion.tigris.org/">Subversion</a> repo when ready for integration
testing</li>
<li><a class="reference external" href="http://buildbot.sf.net">Buildbot</a> on my laptop runs unit tests automatically when it sees
the Subversion checkin</li>
<li>pull changes from Subversion to staging environment and run staged
release</li>
</ul>
<p>When I am ready to post a patch to Bugzilla for review, I do it via "svk
diff" against the Mozilla CVS version. Admittedly a lot of overhead for
a small project, but it being so simple means I can simply drop one of
these tools if something goes horribly wrong, and not suffer a huge
setback.</p>
<p>Overall I like this workflow a lot, although I am thinking that next
time around I will try another version control system (most likely
<a class="reference external" href="http://www.selenic.com/mercurial/wiki/index.cgi">Mercurial</a>) instead of the SVK/SVN combo. I'd really like to try some
newer features like having a <a class="reference external" href="http://www.selenic.com/mercurial/wiki/index.cgi/MqExtension">patch queue</a>, and it'd be excellent to
write a few <a class="reference external" href="http://www.selenic.com/mercurial/wiki/index.cgi/ExtensionHowto">extensions</a> to do things like pull a patch from Bugzilla
and stick into the patch queue automatically (I do that now with "curl
| patch", but it'd be nicer to just "hg bzpatch 1234" or something like
that, and have it go into the queue).</p>
<p><a class="reference external" href="http://blog.vlad1.com/">Vlad</a> is currently importing the Mozilla CVS trunk (using <a class="reference external" href="http://www.cobite.com/cvsps/">cvsps</a> and
<a class="reference external" href="http://hg.beekhof.net/hg/cvs-import">hg-cvs-import</a>) into a testing Mercurial repo, which makes deciding
which system to try next an easier decision for me.</p>
<p>There is nothing wrong with SVK/SVN combo and in fact I like it quite a
bit, but I really want to get away from the need to send patches around
before they are ready for review; I'd rather just give someone a URL to
my repo to pull from so we can stay integrated.</p>
vacation, state of release automation2006-10-13T22:23:00-07:002006-10-13T22:23:00-07:00Robert Helmertag:www.rhelmer.org,2006-10-13:/blog/vacation-state-of-release-automation.html<p>I will be on vacation from October 16th through the 23rd in Toronto, and
giving a <a class="reference external" href="http://zenit.senecac.on.ca/wiki/index.php/Guest_Lectures">lecture at Seneca</a> on the 20th. If you are in the area and
want to hang out, feel free to give me a buzz.</p>
<p>The release automation work is at a reasonably happy place …</p><p>I will be on vacation from October 16th through the 23rd in Toronto, and
giving a <a class="reference external" href="http://zenit.senecac.on.ca/wiki/index.php/Guest_Lectures">lecture at Seneca</a> on the 20th. If you are in the area and
want to hang out, feel free to give me a buzz.</p>
<p>The release automation work is at a reasonably happy place, I have
managed to write and test the following steps, and post patches for
review (the high-level steps are described in the app's <a class="reference external" href="http://lxr.mozilla.org/mozilla/source/tools/release/README">README</a> file).
:</p>
<p>* Tag</p>
<p>* Build</p>
<p>* Source</p>
<p>* Repack</p>
<p>* Updates</p>
<p>The Stage step is actually mostly tested, but I keep running out of disk
space on the staging machine, so I'll need to get creative on that one.
Sign is fairly simple and mostly manual (which is desirable), and
Release fairly simple (but obviously critical!) - it's the act of
copying the staged/signed bits to the official release directories.</p>
<p>One thing I feel I must mention is that this tool does not necessarily
support what we consider the ideal process - it instead supports the
process that we use, and that is known to work.</p>
<p>However, it is difficult to introduce benefitial changes and explore
alternatives since we haven't had a good staging environment or set of
verification tests to make sure that we haven't introduced any undesired
side-effects.</p>
<p>The trickiest bit of this isn't so much the steps themselves as having
some kind of automated verification that the step succeeded so we can
trust that running the next won't be a waste of time.</p>
<p>Our current process is very human-time-intensive, since a release
engineer needs to kick off and verify each step, and some of the steps
take several hours by themselves (builds and update
generation/verification, primarily). If something goes wrong (due to an
unexpected change in the product, a bug in one of the tools, or just
Murphy's Law) then we need to determine the last "good" step and restart
from there.</p>
<p>Automated verification does of course have a point of diminishing
returns, and Mozilla-based products are complicated enough that this
doesn't really provide any direct QA benefit, besides not wasting our
tester's valuable time on something a dumb computer can catch (like a
bad tag, bad build, mismatched or nonexistant update paths, etc.).</p>
<p>The other big downside to a human operator being the default is that
humans function much better with sleep and time off (prolonged focus
being bad for overall concentration) and it's a bad use of creative
energies. An automated process doesn't need to pause between steps, and
won't introduce variation through attempts at creativity. The place to
be creative isn't in the scope of a release, but in thinking about and
improving the overall process (generally best done in between releases,
based on the lessons learned from the past).</p>
<p>It should of course be possible for a human to jump in and drive the
process if needed, especially fixing and rerunning steps which failed
for an intermittent reason, bug in the tools, etc. It should hopefully
not be the norm, but it's a reasonable use case for this kind of tool.</p>
<p>The ideal use case that I can think of right now would be: "code is
frozen; declare and obtain sign-off for names/numbers/locales/etc. and
kick off the release process". Respinning to pick up source changes is
acheived by a variant of the Tag step, and the process is restarted and
runs through the same Build->Stage steps until we're happy with it.</p>