Book review: Reservoir 13

When Jon McGregor was a guest on A Good Read, a BBC radio program where two guests and the host discuss their book choice, he chose a book with short essays on plants, stars and animals. It was as if he wanted to make the point that he really does see poetry in everything.

https://images.gr-assets.com/books/1481095579l/32699538.jpg

Reservoir 13 is his fourth novel — and also the fourth one I read. The book is about an English village where one day a thirteen-year old girl, on holiday in the village, suddenly goes missing. The book describes the thirteen years following this incident.

Except that it is not really about the girl, or her disappearance. Though she is regularly referred to throughout the book, the fact that she’s always the thirteen-year old girl, while everyone else is getting older, is a clever way to show the passing of time, which is the book’s real theme.

In short snapshots — thirteen per each of the thirteen years of the book; that is poetry too — one sees children grow up, couples grow closer together or drift apart, annual traditions continue yet change a little bit every year. None of the characters receives special focus and none of the story lines is particularly interesting in itself; it is what they make together that is very beautiful.

I have never lived in a village myself and it is not one of the life experiences I am very sad to have missed, yet I do appreciate there is something uniquely beautiful about small communities: a certain kind of poetry that is less noticeable in towns and cities.

Book review: The Information: A History, a Theory, a Flood

Long before Samuel Morse invented the electric telegraph, people in West-Africa could send messages over long distances using ‘talking drums’. But while the telegraph requires human language to be encoded into ‘dots’ and ‘dashes’ (and spaces — Morse code isn’t binary), the drums actually do talk: the high and low pitches of the drums correspond to high and low tones in the language. Whatever is lost in the lack of consonants and vowels is made up for by making the words longer, not dissimilar to how we sometimes say ‘Alpha’ or ‘Bravo’ instead of ‘A’ and ‘B’, or how digital storage often contains extra bits to allow possible errors to be detected and corrected.

Few would argue that talking drums are part of IT, yet they are a technology to transmit information. I think it is thus right that James Gleick opens his truly fascinating book on Information (“The Information: A History, a Theory, a Flood“) with a chapter on these drums.

After talking drums, Gleick goes on to discuss early alphabetically-ordered dictionaries (a concept unique to languages that actually use alphabets), Charles Babbage’s analog computers, the mechanical and electrical telegraphs (that allowed information to be transfered faster than humans even for people who didn’t know how to use talking drums) and the telephone. But all throughout the book, the technology is made subordinate to information itself, which is implicitly treated as a philosophical concept.

The true hero of both the book, “father of information theory” Claude Shannon (1916-2001), makes his first appearance about halfway through. Shannon did important work on early computers and on code breaking (he cooperated with his British contemporary Alan Turing during the Second World War) but most importantly of all, he was the inventor of the ‘bit’ as the basic unit of information.

This is defined by Shannon not as the result of counting the number of zeros and ones used to store a particular message, but as a the amount of information actually contained in that message. Anyone who has ever used a program like WinZip to compress a text file, or who has ever found the need to rephrase a message so that it would fit into 140 (now 280) characters, knows that there is often less information contained in a message than that is used to store it. Though the exact amount of information is in most cases impossible to measure accurately, Shannon estimated that an average English text contained about 50% redundancy.

With the appearance of Shannon, things really took off for the information age. Computers came along, while at the same time DNA was discovered: the hard drives contained in every cell in every living organism. A full chapter of the book is dedicated to memes, a concept from evolutionary biology, and specific pieces of information ‘going viral’.

Again, Gleick’s book focuses less on “how does it work?” and more on “what does it mean?”, hence logicians like Bertrand Russel and Kurt Gödel feature prominently; I was intrigued to learn how the paradoxes studied by the former (for example the Berry Paradox) are very relevant to information theory. Important but surprisingly hard to grasp notions like randomness and entropy, often passed over briefly in more practically focused books, are also discussed intensively.

The book ends with a chapter on information overload, and it was somewhat of a relief to learn that concerns about more information not necessarily being better do in fact have a very long history.

Before that, Gleick has already cast a glimpse into the future of computing and the possible arrival of quantum computers. As a mathematician, I always feel a bit uncomfortable about the possibility that future computers may be based on physics rather than on mathematics and I hold on to the beliefs of some experts who say that practical quantum computers will never see the light of day. Conceptually, however, such computers are truly fascinating. And of course, Gleick’s focus is on that.

Setting up your WordPress blog as a Tor Hidden Service

This blog is now also available at http://bgaxaar7xx6dpptt.onion/, in other words, as a Tor Hidden Service (or location-hidden service, as the Tor Project itself calls it). It was surprisingly easy to set it up like this, and I thought I’d explain what I did.

First, however, let me make it clear that the location of this blog isn’t hidden. The Tor and non-Tor version run on the same server, which is hosted on a VPS at Bitfolk. There are a number of reasons why I decided to set it up like this as well, one of which is that it’s fun to do so. However, I believe it also helps support Tor in general and hidden services in particular — just like Facebook setting up a hidden service at their not-too-hidden servers did. Anonymity needs friends and by accessing this blog through Tor, you can be such a friend.

In what follows, I will assume that you are interested in setting things up for the same reasons. If you actually have something you really want hide from a powerful adversary, you should do some proper research and not just copy and paste things from a blog like mine. And at the very least, you shouldn’t run a non-Tor version of the blog on the same server.

I will also assume that you are running your WordPress site on a Linux server that you fully control, that the server runs Apache and that you aren’t afraid of the Linux command line. And I will assume that you have already installed Tor, which on most Linux distributions is trivial.

  1. Configure your hidden service

    (See here for a slightly more detailed version of the first two points.)

    Tor comes with a configuration file, which is probably called /etc/tor/torrc. Open this file in a text editor and look for the bit that says
    ############### This section is just for location-hidden services ###
    a few lines below, you should see see lines like
    # HiddenServiceDir /var/lib/tor/hidden_service/
    # HiddenServicePort 80 127.0.0.1:80

    Uncomment these two lines, by removing the # in front of them.

  2. Restart Tor

    You do this by running something like
    service tor restart

    You can now access your site as a Tor Hidden service. To find its .onion address, go to the directory /var/lib/tor/hidden_service/ configured above and look for the file hostname inside it.

    (If you don’t like the hostname, remove the whole directory, restart Tor and repeat until you’ve got one you like. The hostname some kind of hash of the randomly generated private key, so you can’t control it and you’re unlikely going to have the resources to repeat this until you’ve got something you really like.)

  3. Configure Apache
    Depending on how you have configured Apache, accessing your blog using the .onion address may redirect you to the default web server, outside of Tor. That’s not what you want, so you need to add a virtual server to your Apache settings.

    Apache changes its structure every few years, but in my case (I’m running Apache 2.4.10 on Debian Linux) I had to edit /etc/apache2/sites-enabled/000-default.conf. In this file, look for a block starting with <VirtualHost that defines your current WordPress site. Just copy this block, paste it into the same file and substitute your .onion domain for the one of your blog.

    In my case, the new block looked like this:

    <VirtualHost *:80>
    ServerName bgaxaar7xx6dpptt.onion
    ServerAdmin youremail@address

    DocumentRoot /var/www/lapsedordinary
    <Directory />
    Options FollowSymLinks
    AllowOverride None
    </Directory>
    <Directory /var/www/lapsedordinary>
    Options FollowSymLinks
    # Options -Indexes FollowSymLinks
    AllowOverride All
    Order allow,deny
    allow from all
    </Directory>

    ErrorLog ${APACHE_LOG_DIR}/tor_lapsedordinary_error.log

    # Possible values include: debug, info, notice, warn, error, crit,
    # alert, emerg.
    LogLevel warn

    CustomLog ${APACHE_LOG_DIR}/tor_lapsedordinary_access.log combined
    </VirtualHost>

    (Note that I created separate log files for people accessing my blog via the .onion domain. If you do so, you can see how many people access your blog this way and what pages they access. You don’t have to though.)

    Finally, restart Apache (service apache2 restart in my case).

  4. Set up the Domain Mirror plugin
    Now your blog should be accessible using the .onion domain. However, internal links will probably go to the domain you’ve set up in the WordPress settings. To let it work with both domains, you’ll need a plugin, of which there are a few. For me, the Domain Mirror plugin worked well (though I had to rename the installation directory). Once installed, it’s really easy to set up. Now internal links for those accessing your blog as a Tor hidden service should point to the .onion domain as well.

That’s all.

You will notice that, while I run HTTPS by default on the non-Tor version of the blog, the Tor version uses plain HTTP. That’s not a problem though: connections to hidden services are encrypted by design. One can get a certificate for a .onion domain (Facebook has one) but that’s just to get the Green lock in your browser’s address bar, which is something for people to look for before they enter their credentials.

Why WhatsApp won’t DGA

This week, messaging app WhatsApp was shut down by the government in Brazil for 48 hours. It is worrying that a government did this. It is also worrying that they technically could, at least to the point where it was unusable for most people in the country.

Rob Graham suggested WhatsApp should do what botnets have been doing for years to make the individual bots find the command and control server, even in the face of regular take-down by law enforcement and security firms: use domain generating algorithms (DGA).

Rob’s post is worth a read, if only because it shows that there may be things to be learned from how botnets operate, despite the fact that they generally use evil means to do evil things. But much as it is a nice idea, DGAs won’t work for WhatsApp.

Firstly, the pseudorandomly generated domains still need to point to one or more IP addresses. This quickly becomes a single point of failure: a government that wants to block the service can simply block this IP address. WhatsApp can respond by moving its servers somewhere else, but the number of places where it can go are limited. You can’t run WhatsApp from your parents’ basement. A game of whack-a-mole between the company and law enforcement wouldn’t give a great user experience either.

Secondly, DGAs make “sinkholing” very easy. This is the process where someone (typically a security researcher or a law enforcement officer) registers one of the pseudorandom domains used by the service and consequently find the individual nodes (bots or WhatsApp clients) connect to their server. Note that malware researchers regularly crack the DGAs used by botnets (which tend to hide their code far better than WhatsApp would ever want to do) and even if they can’t do that, they could just run the app in a sandbox and see what domains it performs DNS lookups against.

In theory, a service can be setup so that sinkholing can’t do any harm, for instance by using public key cryptography and only communicate to servers that show possession of a public key. In practice, things are more complicated. It turns every security vulnerability that requires a man-in-the-middle position to exploit into one that anyone can exploit at scale (man-in-the-middle scales badly for most attackers). It also provides all kinds of DDoS opportunities, both against WhatsApp and against unrelated third-party services that the domains can be pointed to.

It is often said that if you use a free product on the Internet that you are the product rather than the customer. Generally speaking, this is true. But unlike individual bots, who are rightly often called “zombies”, these product-customers can and will leave if the service becomes unavailable a lot of the time.

Should a service like WhatsApp really be concerned by governments trying to take them down, they will be pleased to know that Tor has become a lot faster in recent years. And while running your infrastructure as a Tor-hidden service won’t prevent a really powerful three-letter agency from taking it down, it should provide ample protection against silly judges with even sillier court orders.

(How) did they break Diffie-Hellman?

Earlier this year, a research paper presented a new attack against the Diffie-Hellman key exchange protocol. Among other things, the paper came with a reasonable explanation of how the NSA might be able to read a lot of the Internet’s VPN traffic. I wrote a blog about this in May.

Last month, the paper was presented at a conference and thus made the news again. Because I believed some of the news articles misunderstood the paper I just like writing about cryptography, I thought I’d explain what Diffie-Hellman is, what the paper showed and what its consequences are.

Chicken or the egg

Diffie-Hellman (named after its inventors Whitfield Diffie and Martin Hellman) attempts to solve the chicken-or-egg problem in cryptography: for Alice and Bob to communicate securely over a public channel such as the Internet they need to share a common encryption key. But for them to agree on such a key they need to be able to communicate securely over a public channel.

(N.B. in a typical situation where the protocol is used, Alice is a web browser or a VPN client; Bob is a web server using HTTPS, or a VPN server.)

Diffie-Hellman is called a key-exchange protocol, which is a bit of misnomer, as rather than exchange a previously generated key, the protocol actually generates the key.

In the first step, Alice and Bob both choose a (large) random number, which they both keep secret. Let’s call Alice’s number a and Bob’s number b.

Now using a ‘mechanism’ (more on which later) that is part of the protocol, Alice uses a to compute a second number, which is denoted ga. Bob uses the same mechanism to compute a number gb. Alice and Bob then share ga and gb with each other, over a public channel.

So we are now in the following situation:

  • Alice knows a, ga and gb;
  • Bob knows b, ga and gb;
  • Anyone being able to read their communication only knows ga and gb.

Now using the same mechanism as before, Alice uses her secret number a and the number gb, which Bob sent her, to create a new number (gb)a. Bob, likewise, uses b and ga to create a number (ga)b. Because maths can be kind like that, (gb)a and (ga)b are in fact the same number. This is the shared key they will use.

The reason this works is that, while it is easy for Alice to use a to compute ga, it is impossible for someone who only knows ga to compute a — and the same of course also holds for b and gb. It is also impossible to compute the shared secret key using only ga and gb.

Nothing is impossible

It is important to note that as so often in cryptography, ‘impossible’ doesn’t literally means that. It just means that it is extremely expensive and time-consuming. If the numbers involved are large enough, it can take the fastest clusters of computers millions of years to crack the algorithm; hence it is considered secure.

But computers keep getting faster, thus sometimes making the impossible possible. Many Diffie-Hellman implementations use numbers of a little over 300 digits long (1024 bits). These keys, the paper showed, can be cracked within a year for around 100 million US dollars. (Some people believe it can be done even more cheaply, but only the ballpark figure matters here.)

While 100 million dollars is not beyond the reach of the most powerful nation states (read: the NSA) it is unlikely that they ever pay this much to crack a single key. There’s almost always a cheaper way to get the same information; buying a very expensive wrench, for example.

Except for one essential detail. The mechanism used by Diffie-Hellman to generate the keys involves a choice: that of something mathematicians call an Abelian group, or, as they are more or less equivalent in this case, that of a prime number.

The attack the paper describes has two parts. The first part is the most expensive bit and involves doing a lot of computations that only depend on the chosen prime number. Only the second part involves the specific numbers ga and gb shared by Alice and Bob. An attacker who has done enough computations in the first part can perform the second part in more or less real time.

Sharing prime numbers

Imagine what would happen if many Diffie-Hellman implementations used the same fixed prime number: an adversary could spend a lot of time and money doing the required computations for this prime number and subsequently use that to crack key exchanges as they happen in real time. Knowledge of the secret key allows an adversary to read all the supposedly encrypted traffic between millions of Alices and Bobs around the world.

And this is exactly what was (and to some extent still is) the case: the paper showed for example that one in six of the most used HTTPS servers shared the same prime number. It’s even worse when it comes to VPN servers, of which 66% shared the same prime number. Although the latter figure has been disputed, if this was indeed what the NSA used to read VPN traffic, it would explain some Snowden slides rather well.

It is worth noting that sharing the same prime number is not as stupid as it may seem. Using your own unique prime number may be safer against this kind of attack, there are many traps to avoid when doing so. Most importantly, a lot of prime numbers are unsuitable and make the protocol a lot weaker. The paper also showed that several implementations used such unsuitable primes (and, somewhat intriguingly, some implementations used numbers that weren’t prime at all). There is thus a lot to say for choosing known safe (though widely used) prime numbers, despite the downsides we now understand well.

It would actually be a good rule of thumb to choose the strength of your encryption algorithms such that it would be too expensive for the most powerful adversary to attack, even if they would automatically crack all other implementations of the same protocol. For Diffie-Hellman, using longer numbers of 2048 bits (more than 600 digits) will do just fine.

It is thus not true that the researchers have broken Diffie-Hellman. Nor is it true that choosing “non-random prime numbers” (as I’ve seen someone claim somewhere) is inherently wrong.

However, longer numbers makes the algorithm more expensive to run. There would thus be a good argument to use Elliptic Curve Diffie-Hellman (ECDH) instead, a similar protocol that uses a different kind of maths. Its most important benefit is that it provides the same level of security with much smaller numbers.

Downgrading

On an aside, the paper also showed a related attack, which involves a “protocol downgrade”, where an adversary could convince Alice and Bob that the other could only use a mechanism with 160-digit (512-bit) numbers. The cost of cracking the secret key in this case, even if the mechanism used a unique prime number, is not beyond the means of a small criminal organisation.

Although this is a serious problem, from a cryptographic point of view it is less interesting. I also don’t know how likely it is for this attack to be used in the wild frequently: for a nation-state attacker a downgrade attack leaves too many traces, while for a criminal group, obtaining a man-in-the-middle position isn’t entirely trivial and often not needed to achieve their goals.

On Turkey, Twitter and SSL

Today, an attack on a peace rally (a peace rally!) in Ankara, Turkey left close to 100 people dead and many others injured. ‘Tragedy’ doesn’t even begin to describe what happened.

The Turkish government responded by banning the media from reporting on the issue. There were also rumours of Twitter being hard to reach from within Turkey, which wasn’t surprising given previous efforts by the Turkish government to ban the service.

Nicholas Weaver asked people to investigate what was going on. Using Hide My Ass, a VPN service, I was able to confirm I could reach Twitter from various Turkish IP addresses.

But then I noticed something odd. When using curl, I got an “Unknown SSL protocol error in connection to www.twitter.com:443” error. I got this error only when accessing Twitter from a Turkish VPN — I tried various Hide My Ass VPNs in difference countries — and only when accessing www.twitter.com, which normally redirects to twitter.com.

I don’t get the error in Firefox (Debian), but I do get the same error in the text browser w3m (which could use the same libraries). I’ve not been able to detect any difference between the server information and I get the same error when using curl -k, suggesting it is not a certificate issue. In verbose mode, curl gives the error right after reporting the sending of the client’s hello message.

I suspect this is entirely innocent — I assume Mozilla is doing a lot more to detect SSL/TLS shenanigans than curl, and they think everything’s fine — but I wanted to share this information, just in case.

NB As I only control the client side of the VPN connection, I’ve not been able to take useful PCAPs. There might be a way around this though. Suggestions are welcome.