Compressing pcap files

Here at Sinodun Towers, we’re often dealing with pcap DNS traffic capture files created on a far distant server. These files need to be compressed, both to save space on the server, and also to speed transfer to our machines.

Traditionally we’ve used gzip for quick but basic compression, and xz when space was more important than CPU and we really needed the best compression we could get. At a recent conference, though, an enthusiastic Facebook employee suggested we take a look at zstd. We’d looked at it quite some time ago, but our contact said it has improved considerably recently. So we thought we’d compare the latest zstd (1.2.0) with current gzip (1.8) and xz (5.2.3) and see how they stack up when you’re dealing with pcap DNS traffic captures.

Compressing

We took what is (for us) a big file, a 662Mb DNS traffic capture, and timed compressing it at all the different compression levels offered by each compressor. We did three timed runs for each and averaged the time. Here’s the results. Each point on the graph is a compression level.

zstd turns in an impressive performance. For lower compression levels it’s both notably quicker than gzip and far more effective at compressing pcap DNS traffic captures. In the same time gzip can compress the input pcap to 25% of its original size, zstd manages 10% of the original size. Put another way, in our test the compressed file size is 173Mb for gzip versus 65Mb for zstd at similar runtimes.

zstd is also competitive with xz at higher compression levels, though xz does retain a slight lead in file size and runtime at higher compression levels.

Decompressing

Of course, being able to compress is only half the problem. If you’re collecting data from a fleet of servers and bringing that data back to a central system for analysis, you may well find that decompressing your files becomes your main bottleneck. So we also checked decompression times.

There’s little to choose between zstd and gzip at any compression level, while xz generally lags.

Resource usage

So, if zstd gives better compression in similar times, what other costs does it have over gzip? The short answer there is memory. Our measurements show that while gzip has much the same working set size regardless of compression level, zstd working sets begin an order of magnitude larger and increases; by the time zstd is competing with xz, its working set size is up to nearly 3x the size of xz.

That being said, by modern standards gzip‘s working set size is absolutely tiny, comparable to a simple ls command. You can very probably afford to use zstd. As ever with resource usage, you pays your money and you takes your choice.

Conclusion

It looks to us that if you’re currently using gzip to compress pcap DNS traffic captures, then you should definitely look at switching to zstd. If you are going for higher compression, and currently using xz, the choice is less clear-cut, and depends on what compression level you are using.

A note on the comparison

We generated the above numbers using the standard command line tools for each compressor. Some capture tools like to build compression into their data pipeline, typically by passing raw data through a compression library before writing out. While attractive for some use cases, we’ve found that for higher compression you risk having the compression becoming the processing bottleneck. If server I/O load is not an issue (which it is not for many dedicated DNS servers), we prefer to write temporary uncompressed files and compress these once they are complete. Given sufficient cores, this allows you to parallelise compression, and employ much more expensive – but effective – compression than would be possible with inline compression.

Netgear ReadyNAS Pro

I bought one of these because I keep running out of disk space. With the ability to store 12Tb, I thought it might keep me going for some time to come.

It allows you to build raid 0,1 and 5 arrays and Netgear’s own X raid something or other. Unfortunately, it doent allow striping and mirroring. It  is compatible with OS X and supports AFP shares and Timemachine as well as iscsi as I mentioned earlier. However, the GUI is a bit flaky and didn’t seem to like Initiator iqn’s at all. If you download the root ssh plugin you can access the box as root over ssh and look at what it is actually doing.

iscsi config is held in /etc/ietd.conf

Target iqn.2010-2.taurus.sinodun.com:calendarserver
 Lun 0 Path=/e/calendarserver,Type=fileio,ScsiSN=RN293R60037B-003,IOMode=wb
 HeaderDigest CRC32,None
 DataDigest CRC32,None
 IncomingUser user xxxxxxxxxxxx
 InitiatorIQN iqn.2010-02.com.sinodun.hydra:calendarserver

Target iqn.2010-2.taurus.sinodun.com:collaboration
 Lun 0 Path=/e/collaboration,Type=fileio,ScsiSN=RN293R60037B-001,IOMode=wb
 HeaderDigest CRC32,None
 DataDigest CRC32,None
 IncomingUser user xxxxxxxxxxxx
 InitiatorIQN iqn.2010-02.com.sinodun.hydra:collaboration

Firstly their Target IQNs don’t look like the spec described on wikipedia – I don’t know if it is Netgear or Wikipedia that is wrong here and don’t care as this doen’t seem to break anything.

The real problem in the Initiator IQN – I had to add this by hand and it gets stripped out of every entry in the file every time a new iscsi target is created and at other random times. According to the Netgear web GUI theses are needed for persistent reservation support.

However, once it is working it seems nice and stable. If your disks don’t mount just go and check those Initiator IQNs.

OS X Server

I have recently bought one of the new Mac Mini Servers, partly because I like everything mac and partly because I wanted to try OS X Server for my business.

First impressions are great. Apple have taken the best (or very good) OSS and created a nice, if somewhat basic GUI to sit on top. You can of course still go and edit the configuration by hand if need be.

Unfortunately, the graphics card failed in the first unit, however Apple were very good and I had a new unit in a couple of days. This got me thinking. Normally if I wanted to send a computer back for repair or replacement I would remove the disks but with a mac mini you have to prise it open with a paint scraper. Not something I really want to be doing to a box which is still under warranty.

So, when the new one arrived, I decided to take advantage of my Netgear ReadyNAS Pro NAS box’s (of which more later) ability to do iscsi and along with the globalSAN iSCSI Initiator added disks for each service that could contain sensitive data.  This took some time to get working, mostly due to issues with the Netgear but I now have all data on hot swap mirrored disks and the mac mini could go back to apple with no sign of sensitive data.

Airport Extreme

I have never had much luck getting good wireless reception in my house. Until now I have used the wireless access point built in to my ADSL router. I have tried products from Vigor, Linksys and Dlink and reception has always been terrible unless you are working in the same room as the access point. Even in the lounge, one room away from the access point, through a thin partition wall (it used to be all one room) I could only get intermittent reception at best.

So today I gave in and got an Apple Airport Extreme. I should have known that Apple would make yet another great product. First impressions are that I now have a strong, reliable signal throughout the house and what is more it was a pleasure to configure. Compared to the web pages in the Linksys and Vigor products I have tried the Apples Airport Utility is really simple, well designed and as with all Apple stuff it “just works!”

Openfire

I upgraded my ejabberd server to openfire today. It is really nice. ejabberd was very stable and worked extremely well but really lacked a management interface. Openfire has a fantastic interface that really makes it easy to configure. Even migrating my users across was painless.

Openfire have also announced a beta of their next release. It has an invisibility option. Once there is client support this should allow you to control who can see you presence. This is a feature that I and others have been waiting for.