Using OpenSSL from inside a chroot

A little something we tripped over this week. We’re providing an experimental DNS-over-TLS server that supports TLS v1.3. Right now TLS v1.3 is still an Internet Draft; in other words, it’s not a finished standard, though close to it. The latest version of the draft is draft 23, support for which was merged into the OpenSSL master branch yesterday, January 25th. Yup, we’re living on the bleeding edge.

Support for the final standard TLS v1.3 will be in the next OpenSSL release, v1.1.1.

We’re providing the service by fronting a regular name server with haproxy v1.8.3 built against OpenSSL master.

For some time, our experimental server has happily accepted connections for an hour or two, but then stopped accepting new connections. To deepen the mystery, it’s configured in exactly the same way as two other servers that are working fine; the only difference is that those servers are using the standard packaged OpenSSL libraries from Ubuntu Xenial. In odd moments this week I’ve been digging into why.

The answer turns out to be entropy. OpenSSL needs a source of random bits for its crypto magic, and these are provided by a Deterministic Random Bit Generator (DRBG) seeded by some entropy. This part of OpenSSL has been completely rewritten for v1.1.1, and while I’m certainly not in a position to judge the technical details, the code looks far cleaner than the previous code, and appears to offer expanded possibilities for alternate entropy sources and hardware DRBG in the future. So, a thoroughly good thing.

However, there is change in behaviour on Linux compared to OpenSSL v1.1.0 and previous. In the old version, OpenSSL would attempt to read entropy from /dev/urandom (or /dev/random or /dev/srandom if not found). It would then mix in entropy from any Entropy Gathering Daemon (EGD) present, and then mix in further entropy based on the process PID, process UID and the current time. In v1.1.1 at present (the comments indicate an ongoing discussion on this), only the first configured entropy source is used, which in the case of a default Linux build is getting entropy from /dev/urandom (and again falling back to /dev/random or /dev/srandom if not found).

We have haproxy configured to run in a chroot jail. And this chroot jail did not contain /dev/urandom and friends. As it happens, OpenSSL obtains its first slab of entropy before the chroot takes effect, so that succeeds and haproxy starts to run. When, however, OpenSSL needs to read more entropy (which by default will be after at hour at latest), it cannot open /dev/urandom and friends and get more entropy. This appears as the connection failing to open, as SSL_new() fails. This is generally reported as a memory allocation failure. If you print the OpenSSL error chain, it’s slightly more informative:

140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:140BA041:SSL routines:SSL_new:malloc failure:ssl/ssl_lib.c:839:

So, if you’re seeing mysterious OpenSSL failures and you are running in a chroot jail, make sure /dev/urandom at least is available.

# mkdir -p <chroot-base>/dev
# mknod <chroot-base>/dev/urandom c 1 9
# chmod 0666 <chroot-base>/dev/urandom

In fact, we’d recommend you do the same if you’re using OpenSSL 1.1.0 or before in an application run in a chroot. If you don’t, and you aren’t running an EGD, the chances are that the only entropy you’re getting is from your PID, UID and the time. All of which may be guessable from a relatively small range.

At least we recommend you read this page on the OpenSSL wiki, which discusses the issue in more detail.

More on Debian Jessie/Ubuntu Trusty packet capture woes

Back in September I wrote about a problem we’d come across when capturing traffic with pcap_dispatch() or pcap_next_ex() on Ubuntu Trusty or Debian Jessie. When the traffic was slow, we saw packets not being captured.

We’ve since done a bit more digging. The problem, we think, is a bug in the Linux kernel select() system call. With both pcap_dispatch() and
pcap_next_ex() we’re using a central loop that is basically:

 pcap_dispatch();
 select(pcapfd, timeout);

The length of timeout in the select() call shouldn’t matter. But it does. In our test scenario, set it to 1ms and every packet in a ping to an otherwise idle network connection will be captured. Set it to 2s and most or all will be missed.

Robert Edmonds has suggested that it’s this kernel bug. Thanks, Robert – that looks like the problem to us. This was fixed in kernel 3.19. We’ve filed a Debian bug and a Ubuntu bug.

So, what can you do about it for now?

  • If using Ubuntu Trusty, consider switching to the LTS Enablement Stack. This has the fix applied.
  • If using Debian Jessie, consider switching to a 4.9 series kernel from Jessie backports,
  • Otherwise consider reducing the timeout in your call to select(). As noted above, this certainly improves the situation for our specific test scenario. However, we can’t be confident that it is a definitive fix; make sure you test your particular circumstances.

Compressing pcap files

Here at Sinodun Towers, we’re often dealing with pcap DNS traffic capture files created on a far distant server. These files need to be compressed, both to save space on the server, and also to speed transfer to our machines.

Traditionally we’ve used gzip for quick but basic compression, and xz when space was more important than CPU and we really needed the best compression we could get. At a recent conference, though, an enthusiastic Facebook employee suggested we take a look at zstd. We’d looked at it quite some time ago, but our contact said it has improved considerably recently. So we thought we’d compare the latest zstd (1.2.0) with current gzip (1.8) and xz (5.2.3) and see how they stack up when you’re dealing with pcap DNS traffic captures.

Compressing

We took what is (for us) a big file, a 662Mb DNS traffic capture, and timed compressing it at all the different compression levels offered by each compressor. We did three timed runs for each and averaged the time. Here’s the results. Each point on the graph is a compression level.

zstd turns in an impressive performance. For lower compression levels it’s both notably quicker than gzip and far more effective at compressing pcap DNS traffic captures. In the same time gzip can compress the input pcap to 25% of its original size, zstd manages 10% of the original size. Put another way, in our test the compressed file size is 173Mb for gzip versus 65Mb for zstd at similar runtimes.

zstd is also competitive with xz at higher compression levels, though xz does retain a slight lead in file size and runtime at higher compression levels.

Decompressing

Of course, being able to compress is only half the problem. If you’re collecting data from a fleet of servers and bringing that data back to a central system for analysis, you may well find that decompressing your files becomes your main bottleneck. So we also checked decompression times.

There’s little to choose between zstd and gzip at any compression level, while xz generally lags.

Resource usage

So, if zstd gives better compression in similar times, what other costs does it have over gzip? The short answer there is memory. Our measurements show that while gzip has much the same working set size regardless of compression level, zstd working sets begin an order of magnitude larger and increases; by the time zstd is competing with xz, its working set size is up to nearly 3x the size of xz.

That being said, by modern standards gzip‘s working set size is absolutely tiny, comparable to a simple ls command. You can very probably afford to use zstd. As ever with resource usage, you pays your money and you takes your choice.

Conclusion

It looks to us that if you’re currently using gzip to compress pcap DNS traffic captures, then you should definitely look at switching to zstd. If you are going for higher compression, and currently using xz, the choice is less clear-cut, and depends on what compression level you are using.

A note on the comparison

We generated the above numbers using the standard command line tools for each compressor. Some capture tools like to build compression into their data pipeline, typically by passing raw data through a compression library before writing out. While attractive for some use cases, we’ve found that for higher compression you risk having the compression becoming the processing bottleneck. If server I/O load is not an issue (which it is not for many dedicated DNS servers), we prefer to write temporary uncompressed files and compress these once they are complete. Given sufficient cores, this allows you to parallelise compression, and employ much more expensive – but effective – compression than would be possible with inline compression.

Packet capture woes with libpcap on Ubuntu Trusty and Debian Jessie

Usually when you’re using libpcap to capture network traffic, your chief worry will be whether or not your application will keep up with the flow of traffic.

Today, though, I’ve stubbed my toe on a problem with traffic that’s too slow. It happens with both Ubuntu Trusty and Debian Jessie. If there’s a gap between packets of more than about 50 milliseconds, the first packet to arrive after the gap will be dropped and you’ll never see it. I was capturing DNS queries and responses, and found that with a query rate of under 20 queries per second you start dropping queries. By the time you’re down to 15 queries per second, nearly every query is dropped.

After spotting that tcpdump doesn’t have this problem, and much experimentation later, it’s not quite as simple as that. Whether or not you drop packets depends on the libpcap API you are using. If you’re using pcap_loop() to capture packets, you can stop worrying. This works properly. I guess that tcpdump is using pcap_loop() to capture packets and that’s why it works.

If, on the other hand, you’re using pcap_dispatch() or pcap_next_ex(), as the documentation urges you to do, than you’re doomed. This is regardless of whether you are using blocking or non-blocking mode.

So, what can you do? Your choices are limited.

  1. Switch your application to using pcap_loop(). If you were using non-blocking mode with either pcap_dispatch() or pcap_next_ex(), this will be non-trivial, as pcap_loop() doesn’t observe non-blocking, but always blocks. It won’t be straightforward either if you’re using pcap_next_ex() in your own loop.
  2. Upgrade. The problem is fixed if you upgrade Ubuntu to Xenial. I also found the problem apparently fixed by updating Jessie to the 4.7.0 kernel in Debian Backports.