Using OpenSSL from inside a chroot

A little something we tripped over this week. We’re providing an experimental DNS-over-TLS server that supports TLS v1.3. Right now TLS v1.3 is still an Internet Draft; in other words, it’s not a finished standard, though close to it. The latest version of the draft is draft 23, support for which was merged into the OpenSSL master branch yesterday, January 25th. Yup, we’re living on the bleeding edge.

Support for the final standard TLS v1.3 will be in the next OpenSSL release, v1.1.1.

We’re providing the service by fronting a regular name server with haproxy v1.8.3 built against OpenSSL master.

For some time, our experimental server has happily accepted connections for an hour or two, but then stopped accepting new connections. To deepen the mystery, it’s configured in exactly the same way as two other servers that are working fine; the only difference is that those servers are using the standard packaged OpenSSL libraries from Ubuntu Xenial. In odd moments this week I’ve been digging into why.

The answer turns out to be entropy. OpenSSL needs a source of random bits for its crypto magic, and these are provided by a Deterministic Random Bit Generator (DRBG) seeded by some entropy. This part of OpenSSL has been completely rewritten for v1.1.1, and while I’m certainly not in a position to judge the technical details, the code looks far cleaner than the previous code, and appears to offer expanded possibilities for alternate entropy sources and hardware DRBG in the future. So, a thoroughly good thing.

However, there is change in behaviour on Linux compared to OpenSSL v1.1.0 and previous. In the old version, OpenSSL would attempt to read entropy from /dev/urandom (or /dev/random or /dev/srandom if not found). It would then mix in entropy from any Entropy Gathering Daemon (EGD) present, and then mix in further entropy based on the process PID, process UID and the current time. In v1.1.1 at present (the comments indicate an ongoing discussion on this), only the first configured entropy source is used, which in the case of a default Linux build is getting entropy from /dev/urandom (and again falling back to /dev/random or /dev/srandom if not found).

We have haproxy configured to run in a chroot jail. And this chroot jail did not contain /dev/urandom and friends. As it happens, OpenSSL obtains its first slab of entropy before the chroot takes effect, so that succeeds and haproxy starts to run. When, however, OpenSSL needs to read more entropy (which by default will be after at hour at latest), it cannot open /dev/urandom and friends and get more entropy. This appears as the connection failing to open, as SSL_new() fails. This is generally reported as a memory allocation failure. If you print the OpenSSL error chain, it’s slightly more informative:

140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:140BA041:SSL routines:SSL_new:malloc failure:ssl/ssl_lib.c:839:

So, if you’re seeing mysterious OpenSSL failures and you are running in a chroot jail, make sure /dev/urandom at least is available.

# mkdir -p <chroot-base>/dev
# mknod <chroot-base>/dev/urandom c 1 9
# chmod 0666 <chroot-base>/dev/urandom

In fact, we’d recommend you do the same if you’re using OpenSSL 1.1.0 or before in an application run in a chroot. If you don’t, and you aren’t running an EGD, the chances are that the only entropy you’re getting is from your PID, UID and the time. All of which may be guessable from a relatively small range.

At least we recommend you read this page on the OpenSSL wiki, which discusses the issue in more detail.

More on Debian Jessie/Ubuntu Trusty packet capture woes

Back in September I wrote about a problem we’d come across when capturing traffic with pcap_dispatch() or pcap_next_ex() on Ubuntu Trusty or Debian Jessie. When the traffic was slow, we saw packets not being captured.

We’ve since done a bit more digging. The problem, we think, is a bug in the Linux kernel select() system call. With both pcap_dispatch() and
pcap_next_ex() we’re using a central loop that is basically:

 select(pcapfd, timeout);

The length of timeout in the select() call shouldn’t matter. But it does. In our test scenario, set it to 1ms and every packet in a ping to an otherwise idle network connection will be captured. Set it to 2s and most or all will be missed.

Robert Edmonds has suggested that it’s this kernel bug. Thanks, Robert – that looks like the problem to us. This was fixed in kernel 3.19. We’ve filed a Debian bug and a Ubuntu bug.

So, what can you do about it for now?

  • If using Ubuntu Trusty, consider switching to the LTS Enablement Stack. This has the fix applied.
  • If using Debian Jessie, consider switching to a 4.9 series kernel from Jessie backports,
  • Otherwise consider reducing the timeout in your call to select(). As noted above, this certainly improves the situation for our specific test scenario. However, we can’t be confident that it is a definitive fix; make sure you test your particular circumstances.

Compressing pcap files

Here at Sinodun Towers, we’re often dealing with pcap DNS traffic capture files created on a far distant server. These files need to be compressed, both to save space on the server, and also to speed transfer to our machines.

Traditionally we’ve used gzip for quick but basic compression, and xz when space was more important than CPU and we really needed the best compression we could get. At a recent conference, though, an enthusiastic Facebook employee suggested we take a look at zstd. We’d looked at it quite some time ago, but our contact said it has improved considerably recently. So we thought we’d compare the latest zstd (1.2.0) with current gzip (1.8) and xz (5.2.3) and see how they stack up when you’re dealing with pcap DNS traffic captures.


We took what is (for us) a big file, a 662Mb DNS traffic capture, and timed compressing it at all the different compression levels offered by each compressor. We did three timed runs for each and averaged the time. Here’s the results. Each point on the graph is a compression level.

zstd turns in an impressive performance. For lower compression levels it’s both notably quicker than gzip and far more effective at compressing pcap DNS traffic captures. In the same time gzip can compress the input pcap to 25% of its original size, zstd manages 10% of the original size. Put another way, in our test the compressed file size is 173Mb for gzip versus 65Mb for zstd at similar runtimes.

zstd is also competitive with xz at higher compression levels, though xz does retain a slight lead in file size and runtime at higher compression levels.


Of course, being able to compress is only half the problem. If you’re collecting data from a fleet of servers and bringing that data back to a central system for analysis, you may well find that decompressing your files becomes your main bottleneck. So we also checked decompression times.

There’s little to choose between zstd and gzip at any compression level, while xz generally lags.

Resource usage

So, if zstd gives better compression in similar times, what other costs does it have over gzip? The short answer there is memory. Our measurements show that while gzip has much the same working set size regardless of compression level, zstd working sets begin an order of magnitude larger and increases; by the time zstd is competing with xz, its working set size is up to nearly 3x the size of xz.

That being said, by modern standards gzip‘s working set size is absolutely tiny, comparable to a simple ls command. You can very probably afford to use zstd. As ever with resource usage, you pays your money and you takes your choice.


It looks to us that if you’re currently using gzip to compress pcap DNS traffic captures, then you should definitely look at switching to zstd. If you are going for higher compression, and currently using xz, the choice is less clear-cut, and depends on what compression level you are using.

A note on the comparison

We generated the above numbers using the standard command line tools for each compressor. Some capture tools like to build compression into their data pipeline, typically by passing raw data through a compression library before writing out. While attractive for some use cases, we’ve found that for higher compression you risk having the compression becoming the processing bottleneck. If server I/O load is not an issue (which it is not for many dedicated DNS servers), we prefer to write temporary uncompressed files and compress these once they are complete. Given sufficient cores, this allows you to parallelise compression, and employ much more expensive – but effective – compression than would be possible with inline compression.

Packet capture woes with libpcap on Ubuntu Trusty and Debian Jessie

Usually when you’re using libpcap to capture network traffic, your chief worry will be whether or not your application will keep up with the flow of traffic.

Today, though, I’ve stubbed my toe on a problem with traffic that’s too slow. It happens with both Ubuntu Trusty and Debian Jessie. If there’s a gap between packets of more than about 50 milliseconds, the first packet to arrive after the gap will be dropped and you’ll never see it. I was capturing DNS queries and responses, and found that with a query rate of under 20 queries per second you start dropping queries. By the time you’re down to 15 queries per second, nearly every query is dropped.

After spotting that tcpdump doesn’t have this problem, and much experimentation later, it’s not quite as simple as that. Whether or not you drop packets depends on the libpcap API you are using. If you’re using pcap_loop() to capture packets, you can stop worrying. This works properly. I guess that tcpdump is using pcap_loop() to capture packets and that’s why it works.

If, on the other hand, you’re using pcap_dispatch() or pcap_next_ex(), as the documentation urges you to do, than you’re doomed. This is regardless of whether you are using blocking or non-blocking mode.

So, what can you do? Your choices are limited.

  1. Switch your application to using pcap_loop(). If you were using non-blocking mode with either pcap_dispatch() or pcap_next_ex(), this will be non-trivial, as pcap_loop() doesn’t observe non-blocking, but always blocks. It won’t be straightforward either if you’re using pcap_next_ex() in your own loop.
  2. Upgrade. The problem is fixed if you upgrade Ubuntu to Xenial. I also found the problem apparently fixed by updating Jessie to the 4.7.0 kernel in Debian Backports.

Comparing drill output

I wanted to compare the output of multiple drill queries. After a bit of trial error, I came up with this sed to remove the parts of drlll’s output that change from query to query.

drill | grep -v "Query time" | grep -v WHEN 
        | sed 's/t[[:digit:]]{1,}t|,sid.{1,}$//'

FreeBSD with new xorg 1.12.X

A while ago I switched my FreeBSD desktop to the new xorg 1.12.X. In ports this requires adding WITH_NEW_XORG=yes to /etc/make.conf.

I only found out today that for packages you need to add a file called /etc/pkg/FreeBSD-new-xorg.conf containing:

FreeBSD_new_xorg: {
 url: "pkg+${ABI}/new_xorg",
 mirror_type: "srv",
 signature_type: "fingerprints",
 fingerprints: "/usr/share/keys/pkg",
 enabled: yes

Re-installing everything to see if it is more stable

Patch to APC PowerChute Network Shutdown for Ubuntu

The following patch fixes the install script for Ubuntu 13.04

--- 2013-09-25 16:07:54.772107691 +0000
+++ 2013-09-26 12:05:46.214115186 +0000
@@ -1,4 +1,4 @@
 # PowerChute Network Shutdown v.3.0.1
@@ -22,6 +22,7 @@
@@ -288,6 +289,11 @@
 # We're a Linux derivative, decide which one.
 if [ -f /etc/vima-release ] || [ -f /etc/vma-release ]; then
+ elif [ -x /usr/bin/lsb_release ]; then
+ /usr/bin/lsb_release -i | grep Ubuntu > /dev/null
+ if [ $? -eq 0 ]; then
+ fi
 grep XenServer /etc/redhat-release > /dev/null 2>&1
 if [ $? -eq 0 ]; then
@@ -368,6 +374,11 @@
+ STARTUP=/etc/init.d/PowerChute
+ PCBE_STARTUP=/etc/init.d/PBEAgent
+ PCS_STARTUP=/etc/init.d/pcs
+ ;;
@@ -563,6 +574,9 @@
 /etc/rc.d/init.d/PowerChute stop
+ /etc/init.d/PowerChute stop
+ ;;
 /etc/rc2.d/S99PowerChute stop
@@ -977,6 +991,10 @@
 cp Linux/ notifier
 cp Linux/ shutdown
+ cp Linux/ notifier
+ cp Linux/ shutdown
+ ;;
 cp Solaris/ notifier
 cp Solaris/ shutdown
@@ -1057,6 +1075,14 @@
 rm -f /etc/rc.d/rc4.d/*99PowerChute
 rm -f /etc/rc.d/rc5.d/*99PowerChute
 rm -f /etc/rc.d/rc6.d/*99PowerChute
+ elif [ $OS = "$LINUX_UBUNTU" ] ; then
+ rm -f /etc/rc0.d/*99PowerChute
+ rm -f /etc/rc1.d/*99PowerChute
+ rm -f /etc/rc2.d/*99PowerChute
+ rm -f /etc/rc3.d/*99PowerChute
+ rm -f /etc/rc4.d/*99PowerChute
+ rm -f /etc/rc5.d/*99PowerChute
+ rm -f /etc/rc6.d/*99PowerChute
@@ -1069,6 +1095,14 @@
 elif [ -f /sbin/chkconfig ]; then
 chkconfig --add PowerChute
 chkconfig PowerChute on
+ elif [ $OS = "$LINUX_UBUNTU" ] ; then
+ ln -s /etc/init.d/PowerChute /etc/rc0.d/K99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc1.d/K99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc2.d/S99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc3.d/S99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc4.d/S99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc5.d/S99PowerChute
+ ln -s /etc/init.d/PowerChute /etc/rc6.d/K99PowerChute
 ln -s /etc/rc.d/init.d/PowerChute /etc/rc.d/rc0.d/K99PowerChute
 ln -s /etc/rc.d/init.d/PowerChute /etc/rc.d/rc1.d/K99PowerChute
@@ -1120,6 +1154,14 @@
 if [ -f /sbin/chkconfig ]; then
 chkconfig PowerChute off
 chkconfig --del PowerChute
+ elif [ $OS = "$LINUX_UBUNTU" ] ; then
+ rm -f /etc/rc0.d/*99PowerChute
+ rm -f /etc/rc1.d/*99PowerChute
+ rm -f /etc/rc2.d/*99PowerChute
+ rm -f /etc/rc3.d/*99PowerChute
+ rm -f /etc/rc4.d/*99PowerChute
+ rm -f /etc/rc5.d/*99PowerChute
+ rm -f /etc/rc6.d/*99PowerChute
 rm -f /etc/rc.d/rc1.d/K99PowerChute
 rm -f /etc/rc.d/rc2.d/S99PowerChute

Juniper SRX mucking with DNS

I was getting some strange DNS answers on the servers in a trust zone on my SRX. All the servers are statically NAT’d to external IP’s and run their own caching resolvers but when I tried to query for the servers A RR I kept getting the internal IP address. No name server either internal or external was serving that A RR. Eventually I realised that it was the SRX changing the answer section of the DNS response. I don’t know if it is on by default, or if I switched it on by mistake but it was the DNS Application Layer Gateway (ALG) trying to help by making use of what it knew about the static NATs. Switching DNS ALG off solved the issue. For a detailed description of what the ALG does see the Junos OS Security Configuration Guide.

FreeBSD svn and passwords

I have been getting the message about storing un-encrypted svn passwords for ages on FreeBSD and Ubuntu. I finally thought I would look into how you store encrypted passwords.

ATTENTION! Your password for authentication realm:
<> Sinodun Projects Subversion Repository
can only be stored to disk unencrypted! You are advised to configure
your system so that Subversion can store passwords encrypted, if
possible. See the documentation for details.
You can avoid future appearances of this warning by setting the value
of the 'store-plaintext-passwords' option to either 'yes' or 'no' in

Lots of goggling later the best I could find was an article about gnome-keyring-daemon and subversion which mentions the need for A quick search found this was missing on my system. A moment of dumbness later I remembered make config and found the port config option needed.


re-installing the port unfortunately broke svn with the following message:

svn: E175002: OPTIONS of '': SSL handshake failed: SSL disabled due to library version mismatch (

Thanks to this that was quickly fixed by rebuilding neon:

sudo portupgrade -f neon*