Mozilla runs a crash-stats service, which accepts crash reports from clients
(mobile/desktop browsers, B2G, etc) and provides a reporting interface.
Recently, a change landed on the client side to enable multiple minidumps to
be attached to an incoming crash, and we want to add support to the server
to accept these as soon as possible.
Our usual test procedure is to pull an existing crash from production and
submit it as a new crash to our dev and staging instances. Unfortunately, we
had no easy way to test this particular scenario, since the current crash
collector only stores a single minidump, and discards any others. We really
want real data in this case - we of course have unit tests and synthetic
data, but the crash collector is a critical service so we want to get it right
the first time when we push updates.
We decided that the most expedient way to get real data would be to capture
from production using tcpdump, then replay this to the dev/staging servers.
There are tools readily available to do this - the major concern is that
we're capturing a large amount of traffic, so we want to filter out as much
as possible. Also, tcpdump has a built-in mechanism for rolling and gzipping
capture files (either every n seconds, or when the file gets over n bytes).
First, run tcpdump on the target (production) server:
tcpdump -i eth0 dst port 81 -C 100 -z "gzip" -w output.pcap
eth0 is the interface we're interested in, only incoming traffic, and only
port 81. The -C and -z commands will cause tcpdump to roll the output.pcap file
every 100 megabytes.
This ends up producing a (potentially large) number of files:
output.pcap
output.pcap1.gz
output.pcap2.gz
When you feel you've captured enough data, stop the tcpdump process and
use tcpslice to rebuild a single capture file:
tcpslice -w full.pcap output.pcap*
Then use tcptrace to reassemble the packets into complete sessions (this
is necessary since TCP packets may be received out-of-order).
This will create one file per HTTP session:
tcptrace -e full.pcap
Now we have a set of files named e.g. fmekmf.dat - if you take a look inside
these you will see they are full HTTP sessions. They can be replayed against
a dev/stage server using netcat like so:
cat aaju2aajv_contents.dat | nc devserver 80
You may need to modify the files first, to change the Host header for example.
This is easy to do in-place with sed:
cat aaju2aajv_contents.dat | sed 's/Host: prodserver/Host: devserver/' | nc devserver 80
NOTE - this technique potentially uses a ton of disk space, I did this in
many stages so I could backtrack in case I made any mistakes. If disk space
(and overall time) are a premium, for example you are setting up a continuous
pipeline, I'd investigate using named pipes instead of creating actual files
for uncompressing and running tcpslice + tcptrace.
Also, if you are doing this in a one-off manner then tcpflow or wireshark
(wireshark has a terminal version, tshark) are easier to work with- I wanted
to do the capture on a locked-down server which had tcpdump available, and
wanted to take advantage of the log rolling+compression feature.