Wiki.js fails to render inline equations

I have been playing with Wiki.js, I happened to notice that some math was not getting rendered correctly. Wiki.js uses Katex to render Latex (like) math expressions. I wanted to produce an unordered list of equations like this,

  • \braket{v_i|v_i} = 1 because we are dealing with a unit vector
  • \braket{v_1|v_2}=\braket{v_2^*|v_1^*}
  • \braket{v_1\vert(a\vert v_2}+b\ket{v_3})=a\braket{v_1\vert v_2}+b\braket{v_1\vert v_3}

The markdown to do this looks like

* $\braket{v_i|v_i} = 1$ because we are dealing with a unit vector
* $\braket{v_1|v_2}=\braket{v_2^*|v_1^*}$
* $\braket{v_1\vert(a\vert v_2}+b\ket{v_3})=a\braket{v_1\vert v_2}+b\braket{v_1\vert v_3}$

On the preview pane it looked fine but on the actual wiki page it was being rendered like this

  • \braket{v_i|v_i} = 1 because we are dealing with a unit vector
  • \braket{v_1|v_2}=\braket
  • \braket{v_1\vert(a\vert v_2}+b\ket{v_3})=a\braket{v_1\vert v_2}+b\braket

It turns out that there needs to be text after the closing $ symbol. Adding just a “.” to the end of each line fixed it and the rendering worked fine.

Working with Materialized View tables in ClickHouse

There must be something about January which makes John prod me into a blog post about something I’ve just teased out. So here we are, it’s 2020, it’s January, and what is fast (OK, not so fast) becoming an annual tradition.

Today’s post is a selection on snippets on Materialized Views. Materialized Views, if you haven’t met them, are tables automatically populated when data is inserted into some other table.

Materialized View inner tables

There are two ways to create a table populated from a Materialized View. This first, which we’ve used up to now, is like this:

CREATE MATERIALIZED VIEW QueriesPerSecond
(
    Date Date,
    DateTime DateTime,
    NodeID UInt16,
    QueryCount UInt32,
    QueryDOCount UInt32,
)
ENGINE = SummingMergeTree()
PARTITION BY toYearWeek(Date)
ORDER BY (Date, DateTime, NodeID)
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

When records are added to table QueryResponse, the SELECT runs and the results added to QueriesPerSecond.

Well, almost. If you SHOW TABLES, you’ll see there’s now QueriesPerSecond and another table named .inner.QueriesPerSecond. The latter table is where the records actually end up.

How to refer to that inner table, a digression…

You need a little trick to refer to the inner table from SQL. You’ll need to enclose the table name in back-ticks. This also means that you can’t refer to inner tables in a different database to the current.

So to verify that the inner table is indeed the table containing the data for the view, SELECT * FROM `.inner.QueriesPerSecond` LIMIT 10.

… back to Materialized Views

All this is fine and dandy. But what if you want to, say, rename a column in QueryResponse? How do you change the SELECT associated with the Materialized View? Well, you can drop and re-create the Materialized View, but if you drop the view then the inner table also gets dropped and you lose all the existing data. What you need to do instead is to find out where the definition is saved on the server:

:) select metadata_path from system.tables where name='QueriesPerSecond'
+-metadata_path--------------------------------------------+
| /var/lib/clickhouse/metadata/wombat/QueriesPerSecond.sql |
+----------------------------------------------------------+
:) detach table QueriesPerSecond

Head to the server, edit that .sql file to your taste, and reattach the table:

:) attach table QueriesPerSecond

So, you can do it, but it’s not great. You can’t do it from SQL, and you need to log into the server and directly edit files in the ClickHouse working space.

Materialized View without inner tables

Here’s what we’ll be doing from now on. It turns out you can create a normal table to hold the results of the materialized view updates:

CREATE TABLE QueriesPerSecond
(
    Date Date,
    DateTime DateTime,
    NodeID UInt16,
    QueryCount UInt32,
    QueryDOCount UInt32,
)
ENGINE = SummingMergeTree()
PARTITION BY toYearWeek(Date)
ORDER BY (Date, DateTime, NodeID);

CREATE MATERIALIZED VIEW QueriesPerSecondMV
TO QueriesPerSecond
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

To change the view SELECT, drop QueriesPerSecondMV and re-create it. Unlike the materialized view with the inner table we saw earlier, this won’t delete the underlying table. So now we can modify the materialized view query from SQL, rather than having to monkey with files on the server.

Convert from inner table Materialized View to a separate table Materialized View

So, can you convert from the inner table to a separate table Materialized View? Yes.

:) DETACH TABLE QueriesPerSecond
:) RENAME TABLE `.inner.QueriesPerSecond` TO QueriesPerSecond
:) CREATE MATERIALIZED VIEW QueriesPerSecondMV
TO QueriesPerSecond
AS SELECT
    Date,
    DateTime,
    NodeID,
    CAST(1 as UInt32) AS QueryCount,
    CAST(QueryDO AS UInt32) AS QueryDOCount,
FROM QueryResponse
WHERE QueryResponseHasQuery;

Renaming the old inner table to the previous name of the view plus table overwrites any old definition of that view, and we have a pure data table. Create a new materialized view to populate it, and we’re done.

JUNOS DHCPv6 DNS Search List

I have been looking at the DHCPv6 server in JUNOS 15.1X49-D160.2. It is easy enough to setup Note: Step 5 is wrong, the first two instructions should have dhcpv6 as the last argument e.g.

set security zones security-zone untrust interfaces ge-0/0/0.0 host-inbound-traffic system-services dhcpv6

It would be nice to configure the DNS Search List option. The option code for this is 24. So I tried this

set access address-assignment pool my-pool family inet6 dhcp-attributes option 24 array string [ "sinodun.com" "ipv4.sinodun.com" ]

but it didn’t work. There appears to be no examples of how to do this correctly so after reading RFC3315 section 8 and RFC1035 section 3.1 I realised it had to be in uncompressed wire format like this

set access address-assignment pool my-pool family inet6 dhcp-attributes option 24 array hex-string [ "0773696e6f64756e03636f6d00" "04697076340773696e6f64756e03636f6d00" ]

which IMHO is not the easiest way to configure things.

Using executable external dictionaries in ClickHouse

External dictionaries, a dictionary populated by an external source, are a rather useful way to make data external to ClickHouse accessible when working in ClickHouse. One option for the source of the external data is an executable. I found, though, that the documentation doesn’t clearly tell you how to use this, so here I’m trying to rectify this.

There are two basic types of executable external dictionary, which I’ll call whole file and lookup dictionaries. Let’s look at each in turn. Oh, and I’m using ClickHouse version 18.5.1 for these examples.

Whole file dictionary

This is the most straightforward type. It is similar to an external dictionary populated from a local file. A command is run and the results read from the command standard output in one of the formats supported by ClickHouse.

Keeping things as simple as possible, let’s just use the command to cat a file to standard output. Here’s a simple input file with two fields separated by a tab:

$ cat /home/jim/test
1	One
2	Two
3	Three

And here’s an extract of the configuration for a simple flat dictionary.

    <source>
<executable>
<command>cat /home/jim/test</command>
<format>TabSeparated</format>
</executable>
</source>
<lifetime>3600</lifetime>
<layout>
<flat/>
</layout>
<structure>
<id>
<name>item</name>
</id>
<attribute>
<name>value</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>

As you might guess, the command is run every hour and the results stored in the dictionary. Here’s a lookup on that dictionary.

SELECT dictGetString('test', 'value', toUInt64(1))

+-dictGetString('test', 'value', toUInt64(1))-+
| One                                         |
+---------------------------------------------+

1 rows in set. Elapsed: 0.002 sec.

If your key isnt’ a simple integer, you can switch to a complex_key_hashed dictionary.

    <source>
<executable>
<command>cat /home/jim/test</command>
<format>TabSeparated</format>
</executable>
</source>
<lifetime>3600</lifetime>
<layout>
<complex_key_hashed/>
</layout>
<structure>
<key>
<attribute>
<name>item</name>
<type>String</type>
</attribute&gt;
</key>
<attribute>
<name>value</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>

And here’s a lookup on that dictionary.

SELECT dictGetString('test2', 'value', tuple('3'))

+-dictGetString('test2', 'value', tuple('3'))-+
| Three                                       |
+---------------------------------------------+

1 rows in set. Elapsed: 0.043 sec.

lookup dictionary

So far, so straightforward. But ClickHouse has another trick up its sleeve. If you specify a dictionary layout of complex_key_cache and a cache size, ClickHouse requests key items from the command by passing the key value into the command on its standard input. The command reads the key, and prints the dictionary record to its standard output.

Here’s an example looking at disc usage. I’m using a simple Python3 application sys-info to read the disc usage; here’s the core part of that application.

        statfs = os.statvfs('/home/jim')

        for l in sys.stdin.readlines():
            l = l.strip()
            if l == 'disc-block-size':
                val = statfs.f_bsize
            elif l == 'disc-size':
                val = statfs.f_blocks * statfs.f_frsize // 1024
            elif l == 'disc-available':
                val = statfs.f_bavail * statfs.f_frsize // 1024
            elif l == 'disc-percent-free':
                val = 100 *statfs.f_bavail // statfs.f_blocks
            elif l == 'disc-percent-used':
                val = 100 - (100 * statfs.f_bavail // statfs.f_blocks)
            else:
                print('Unknown key: {key}'.format(key=l))
                logging.error('Unknown key: {key}'.format(key=l))
                sys.exit(1)

            print('{key}\t{val}'.format(key=l, val=val))
        sys.exit(0)

Running it from the command line looks like this:

$ echo "disc-percent-used" | sys-info
disc-percent-used	6

Note that the first field returned has to be the index value. Let’s match this with a dictionary definition.

    <source>
<executable>
<command>sys-info</command>
<format>TabSeparated</format>
</executable>
</source>
<lifetime>3600</lifetime>
<layout>
<complex_key_cache>
<size_in_cells>20</size_in_cells>
</complex_key_cache>
</layout>
<structure>
<key>
<attribute>
<name>item</name>
<type>String</type>
</attribute>
</key>
<attribute>
<name>value</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>

And now we’re set up for a query.

SELECT dictGetString('sysinfo', 'value', tuple('disc-percent-used'))
+-dictGetString('sysinfo', 'value', tuple('disc-percent-used'))-+
| 6                                                             |
+---------------------------------------------------------------+
1 rows in set. Elapsed: 0.087 sec.

Using OpenSSL from inside a chroot

A little something we tripped over this week. We’re providing an experimental DNS-over-TLS server that supports TLS v1.3. Right now TLS v1.3 is still an Internet Draft; in other words, it’s not a finished standard, though close to it. The latest version of the draft is draft 23, support for which was merged into the OpenSSL master branch yesterday, January 25th. Yup, we’re living on the bleeding edge.

Support for the final standard TLS v1.3 will be in the next OpenSSL release, v1.1.1.

We’re providing the service by fronting a regular name server with haproxy v1.8.3 built against OpenSSL master.

For some time, our experimental server has happily accepted connections for an hour or two, but then stopped accepting new connections. To deepen the mystery, it’s configured in exactly the same way as two other servers that are working fine; the only difference is that those servers are using the standard packaged OpenSSL libraries from Ubuntu Xenial. In odd moments this week I’ve been digging into why.

The answer turns out to be entropy. OpenSSL needs a source of random bits for its crypto magic, and these are provided by a Deterministic Random Bit Generator (DRBG) seeded by some entropy. This part of OpenSSL has been completely rewritten for v1.1.1, and while I’m certainly not in a position to judge the technical details, the code looks far cleaner than the previous code, and appears to offer expanded possibilities for alternate entropy sources and hardware DRBG in the future. So, a thoroughly good thing.

However, there is change in behaviour on Linux compared to OpenSSL v1.1.0 and previous. In the old version, OpenSSL would attempt to read entropy from /dev/urandom (or /dev/random or /dev/srandom if not found). It would then mix in entropy from any Entropy Gathering Daemon (EGD) present, and then mix in further entropy based on the process PID, process UID and the current time. In v1.1.1 at present (the comments indicate an ongoing discussion on this), only the first configured entropy source is used, which in the case of a default Linux build is getting entropy from /dev/urandom (and again falling back to /dev/random or /dev/srandom if not found).

We have haproxy configured to run in a chroot jail. And this chroot jail did not contain /dev/urandom and friends. As it happens, OpenSSL obtains its first slab of entropy before the chroot takes effect, so that succeeds and haproxy starts to run. When, however, OpenSSL needs to read more entropy (which by default will be after at hour at latest), it cannot open /dev/urandom and friends and get more entropy. This appears as the connection failing to open, as SSL_new() fails. This is generally reported as a memory allocation failure. If you print the OpenSSL error chain, it’s slightly more informative:

140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:2406B072:random number generator:RAND_DRBG_generate:in error state:crypto/rand/drbg_lib.c:479:
140135125062464:error:2406C06E:random number generator:RAND_DRBG_instantiate:error retrieving entropy:crypto/rand/drbg_lib.c:221:
140135125062464:error:140BA041:SSL routines:SSL_new:malloc failure:ssl/ssl_lib.c:839:

So, if you’re seeing mysterious OpenSSL failures and you are running in a chroot jail, make sure /dev/urandom at least is available.

# mkdir -p <chroot-base>/dev
# mknod <chroot-base>/dev/urandom c 1 9
# chmod 0666 <chroot-base>/dev/urandom

In fact, we’d recommend you do the same if you’re using OpenSSL 1.1.0 or before in an application run in a chroot. If you don’t, and you aren’t running an EGD, the chances are that the only entropy you’re getting is from your PID, UID and the time. All of which may be guessable from a relatively small range.

At least we recommend you read this page on the OpenSSL wiki, which discusses the issue in more detail.

More on Debian Jessie/Ubuntu Trusty packet capture woes

Back in September I wrote about a problem we’d come across when capturing traffic with pcap_dispatch() or pcap_next_ex() on Ubuntu Trusty or Debian Jessie. When the traffic was slow, we saw packets not being captured.

We’ve since done a bit more digging. The problem, we think, is a bug in the Linux kernel select() system call. With both pcap_dispatch() and
pcap_next_ex() we’re using a central loop that is basically:

 pcap_dispatch();
 select(pcapfd, timeout);

The length of timeout in the select() call shouldn’t matter. But it does. In our test scenario, set it to 1ms and every packet in a ping to an otherwise idle network connection will be captured. Set it to 2s and most or all will be missed.

Robert Edmonds has suggested that it’s this kernel bug. Thanks, Robert – that looks like the problem to us. This was fixed in kernel 3.19. We’ve filed a Debian bug and a Ubuntu bug.

So, what can you do about it for now?

  • If using Ubuntu Trusty, consider switching to the LTS Enablement Stack. This has the fix applied.
  • If using Debian Jessie, consider switching to a 4.9 series kernel from Jessie backports,
  • Otherwise consider reducing the timeout in your call to select(). As noted above, this certainly improves the situation for our specific test scenario. However, we can’t be confident that it is a definitive fix; make sure you test your particular circumstances.

Compressing pcap files

Here at Sinodun Towers, we’re often dealing with pcap DNS traffic capture files created on a far distant server. These files need to be compressed, both to save space on the server, and also to speed transfer to our machines.

Traditionally we’ve used gzip for quick but basic compression, and xz when space was more important than CPU and we really needed the best compression we could get. At a recent conference, though, an enthusiastic Facebook employee suggested we take a look at zstd. We’d looked at it quite some time ago, but our contact said it has improved considerably recently. So we thought we’d compare the latest zstd (1.2.0) with current gzip (1.8) and xz (5.2.3) and see how they stack up when you’re dealing with pcap DNS traffic captures.

Compressing

We took what is (for us) a big file, a 662Mb DNS traffic capture, and timed compressing it at all the different compression levels offered by each compressor. We did three timed runs for each and averaged the time. Here’s the results. Each point on the graph is a compression level.

zstd turns in an impressive performance. For lower compression levels it’s both notably quicker than gzip and far more effective at compressing pcap DNS traffic captures. In the same time gzip can compress the input pcap to 25% of its original size, zstd manages 10% of the original size. Put another way, in our test the compressed file size is 173Mb for gzip versus 65Mb for zstd at similar runtimes.

zstd is also competitive with xz at higher compression levels, though xz does retain a slight lead in file size and runtime at higher compression levels.

Decompressing

Of course, being able to compress is only half the problem. If you’re collecting data from a fleet of servers and bringing that data back to a central system for analysis, you may well find that decompressing your files becomes your main bottleneck. So we also checked decompression times.

There’s little to choose between zstd and gzip at any compression level, while xz generally lags.

Resource usage

So, if zstd gives better compression in similar times, what other costs does it have over gzip? The short answer there is memory. Our measurements show that while gzip has much the same working set size regardless of compression level, zstd working sets begin an order of magnitude larger and increases; by the time zstd is competing with xz, its working set size is up to nearly 3x the size of xz.

That being said, by modern standards gzip‘s working set size is absolutely tiny, comparable to a simple ls command. You can very probably afford to use zstd. As ever with resource usage, you pays your money and you takes your choice.

Conclusion

It looks to us that if you’re currently using gzip to compress pcap DNS traffic captures, then you should definitely look at switching to zstd. If you are going for higher compression, and currently using xz, the choice is less clear-cut, and depends on what compression level you are using.

A note on the comparison

We generated the above numbers using the standard command line tools for each compressor. Some capture tools like to build compression into their data pipeline, typically by passing raw data through a compression library before writing out. While attractive for some use cases, we’ve found that for higher compression you risk having the compression becoming the processing bottleneck. If server I/O load is not an issue (which it is not for many dedicated DNS servers), we prefer to write temporary uncompressed files and compress these once they are complete. Given sufficient cores, this allows you to parallelise compression, and employ much more expensive – but effective – compression than would be possible with inline compression.

Packet capture woes with libpcap on Ubuntu Trusty and Debian Jessie

Usually when you’re using libpcap to capture network traffic, your chief worry will be whether or not your application will keep up with the flow of traffic.

Today, though, I’ve stubbed my toe on a problem with traffic that’s too slow. It happens with both Ubuntu Trusty and Debian Jessie. If there’s a gap between packets of more than about 50 milliseconds, the first packet to arrive after the gap will be dropped and you’ll never see it. I was capturing DNS queries and responses, and found that with a query rate of under 20 queries per second you start dropping queries. By the time you’re down to 15 queries per second, nearly every query is dropped.

After spotting that tcpdump doesn’t have this problem, and much experimentation later, it’s not quite as simple as that. Whether or not you drop packets depends on the libpcap API you are using. If you’re using pcap_loop() to capture packets, you can stop worrying. This works properly. I guess that tcpdump is using pcap_loop() to capture packets and that’s why it works.

If, on the other hand, you’re using pcap_dispatch() or pcap_next_ex(), as the documentation urges you to do, than you’re doomed. This is regardless of whether you are using blocking or non-blocking mode.

So, what can you do? Your choices are limited.

  1. Switch your application to using pcap_loop(). If you were using non-blocking mode with either pcap_dispatch() or pcap_next_ex(), this will be non-trivial, as pcap_loop() doesn’t observe non-blocking, but always blocks. It won’t be straightforward either if you’re using pcap_next_ex() in your own loop.
  2. Upgrade. The problem is fixed if you upgrade Ubuntu to Xenial. I also found the problem apparently fixed by updating Jessie to the 4.7.0 kernel in Debian Backports.

Comparing drill output

I wanted to compare the output of multiple drill queries. After a bit of trial error, I came up with this sed to remove the parts of drlll’s output that change from query to query.

drill www.sinodun.com | grep -v "Query time" | grep -v WHEN 
        | sed 's/t[[:digit:]]{1,}t|,sid.{1,}$//'