Here at Sinodun Towers, we’re often dealing with
pcap DNS traffic capture files created on a far distant server. These files need to be compressed, both to save space on the server, and also to speed transfer to our machines.
Traditionally we’ve used
gzip for quick but basic compression, and
xz when space was more important than CPU and we really needed the best compression we could get. At a recent conference, though, an enthusiastic Facebook employee suggested we take a look at
zstd. We’d looked at it quite some time ago, but our contact said it has improved considerably recently. So we thought we’d compare the latest
zstd (1.2.0) with current
gzip (1.8) and
xz (5.2.3) and see how they stack up when you’re dealing with
pcap DNS traffic captures.
We took what is (for us) a big file, a 662Mb DNS traffic capture, and timed compressing it at all the different compression levels offered by each compressor. We did three timed runs for each and averaged the time. Here’s the results. Each point on the graph is a compression level.
zstd turns in an impressive performance. For lower compression levels it’s both notably quicker than
gzip and far more effective at compressing
pcap DNS traffic captures. In the same time
gzip can compress the input
pcap to 25% of its original size,
zstd manages 10% of the original size. Put another way, in our test the compressed file size is 173Mb for
gzip versus 65Mb for
zstd at similar runtimes.
zstd is also competitive with
xz at higher compression levels, though
xz does retain a slight lead in file size and runtime at higher compression levels.
Of course, being able to compress is only half the problem. If you’re collecting data from a fleet of servers and bringing that data back to a central system for analysis, you may well find that decompressing your files becomes your main bottleneck. So we also checked decompression times.
There’s little to choose between
gzip at any compression level, while
xz generally lags.
zstd gives better compression in similar times, what other costs does it have over
gzip? The short answer there is memory. Our measurements show that while
gzip has much the same working set size regardless of compression level,
zstd working sets begin an order of magnitude larger and increases; by the time
zstd is competing with
xz, its working set size is up to nearly 3x the size of
That being said, by modern standards
gzip‘s working set size is absolutely tiny, comparable to a simple
ls command. You can very probably afford to use
zstd. As ever with resource usage, you pays your money and you takes your choice.
It looks to us that if you’re currently using
gzip to compress
pcap DNS traffic captures, then you should definitely look at switching to
zstd. If you are going for higher compression, and currently using
xz, the choice is less clear-cut, and depends on what compression level you are using.
A note on the comparison
We generated the above numbers using the standard command line tools for each compressor. Some capture tools like to build compression into their data pipeline, typically by passing raw data through a compression library before writing out. While attractive for some use cases, we’ve found that for higher compression you risk having the compression becoming the processing bottleneck. If server I/O load is not an issue (which it is not for many dedicated DNS servers), we prefer to write temporary uncompressed files and compress these once they are complete. Given sufficient cores, this allows you to parallelise compression, and employ much more expensive – but effective – compression than would be possible with inline compression.