June 27, 2014
Titus Isolation Techniques, Continued
In my previous blog post, I discussed the unique way in which titus, my high-security TLS proxy server, isolates the TLS private key in a separate process to protect it against Heartbleed-like vulnerabilities in OpenSSL. In this blog post, I will discuss the other isolation techniques used by titus to guard against OpenSSL vulnerabilities.
A separate process for every connection
The most basic isolation performed by titus is using a new and dedicated process for every TLS connection. This ensures that a vulnerability in OpenSSL can't be used to compromise the memory of another connection. Although most of the attention around Heartbleed focused on extracting the private key, Heartbleed also exposed other sensitive information, such as user passwords contained in buffers from other connections. By giving each TLS connection a new and dedicated process, titus confines an attacker to accessing memory related to only his own connection. Such memory would contain at most the session key material for the connection and buffers from previous packets, which is information the attacker already knows.
Currently titus forks but does not call execve(). This simplifies the code
greatly, but it means that the child process has access to all the
memory of the parent process at the time of the fork. Therefore, titus
is careful to avoid loading anything sensitive into the parent's memory.
In particular, the private key is never loaded by the parent process.
However, some low-grade sensitive information may be loaded into the parent's
memory before forking. For instance, the parent process calls getpwnam() during initialization,
so the contents of /etc/passwd may persist in memory and be accessible by child processes.
A future version of titus should probably call execve() to launch the child process so it
starts off with a clean slate.
Another important detail is that titus must reinitialize OpenSSL's random
number generator by calling RAND_poll() in the child process after forking. Failure
to do so could result in multiple children generating the same random numbers, which would
have catastrophic consequences.
Privilege separation
To protect against arbitrary code execution vulnerabilities, titus runs as dedicated non-root users. Currently two users are used: one for running the processes that talk to the network, and another for the processes that hold the private key.
Using the same user for all connections has security implications.
The most serious problem is that, by default, users can use the ptrace()
system call to access the memory of other processes running as the same
user. To prevent this, titus disables ptracing using the PR_SET_DUMPABLE
option to the prctl() syscall. This is an imperfect security measure:
it's Linux-specific, and doesn't prevent attackers from disrupting other
processes by sending signals.
Ultimately, titus should use a separate user for every concurrent connection. Modern Unix systems use 32 bit UIDs, making it completely feasible to allocate a range of UIDs to be used by titus, provided that titus reuses UIDs for future connections. To reuse a UID securely, titus would need to first kill off any latent process owned by that user. Otherwise, an attacker could fork a process that lies in wait until the UID is reused. Unfortunately, Unix provides no airtight way to kill all processes owned by a user. It may be necessary to leverage cgroups or PID namespaces, which are unfortunately Linux-specific.
Filesystem isolation
Finally, in order to reduce the attack surface even further, titus chroots into
an empty, unwritable directory. Thus, an attacker who can execute arbitrary
code is unable to read sensitive files, attack special files or setuid binaries,
or download rootkits to the filesystem. Note that titus chroots after reseeding
OpenSSL's RNG, so it's not necessary include /dev/urandom in the chroot directory.
Future enhancements
In addition to the enhancements mentioned above, I'd like to investigate using seccomp
filtering to limit the system calls which titus is allowed to execute. Limiting titus to
a minimal set of syscalls would reduce the attack surface on the kernel, preventing an
attacker from breaking out of the sandbox if there's a kernel vulnerability in a particular
syscall.
I'd also like to investigate network and process namespaces. Network namespaces would isolate titus from the network, preventing attackers from launching attacks on systems on your internal network or on the Internet. Process namespaces would provide an added layer of isolation and make it easy to kill off latent processes when a connection ends.
Why titus?
The TLS protocol is incredibly complicated, which makes TLS implementations necessarily complex, which makes them inevitably prone to security vulnerabilities. If you're building a simple server application that needs to talk TLS, the complexity of the TLS implementation is going to dwarf the complexity of your own application. Even if your own code is securely written and short and simple enough to be easily audited, your application may nevertheless be vulnerable if you link with a TLS implementation. Titus provides a way to isolate the TLS implementation, so its complexity doesn't affect the security of your own application.
By the way, titus was recently discussed on the Red Hat Security Blog along with some other interesting approaches to OpenSSL privilege separation, such as sslps, a seccomp-based approach. The blog post is definitely worth a read.
May 5, 2014
Protecting the OpenSSL Private Key in a Separate Process
Ever since Heartbleed, I've been thinking of ways to better isolate OpenSSL so that a vulnerability in OpenSSL won't result in the compromise of sensitive information. This blog post will describe how you can protect the private key by isolating OpenSSL private key operations in a dedicated process, a technique I'm using in titus, my open source high-isolation TLS proxy server.
If you're worried about OpenSSL vulnerabilities, then simply terminating TLS in a dedicated process, such as stunnel, is a start, since it isolates sensitive web server memory from OpenSSL, but there's still the tricky issue of your private key. OpenSSL needs access to the private key to perform decryption and signing operations. And it's not sufficient to isolate just the key: you must also isolate all intermediate calculations, as Akamai learned when their patch to store the private key on a "secure heap" was ripped to shreds by security researcher Willem Pinckaers.
Fortunately, OpenSSL's modular nature can be leveraged to out-source RSA private key operations (sign and decrypt) to user-defined functions, without having to modify OpenSSL itself. From these user-defined functions, it's possible to use inter-process communication to transfer the arguments to a different process, where the operation is performed, and then transfer the result back. This provides total isolation: the process talking to the network needs access to neither the private key nor any intermediate value resulting from the RSA calculations.
I'm going to show you how to do this. Note that, for clarity, the code presented here lacks proper error handling and resource management. For production quality code, you should look at the source for titus.
Traditionally, you initialize OpenSSL using code like the following:
SSL_CTX* ctx; FILE* cert_filehandle; FILE* key_filehandle; // ... omitted: initialize CTX, open cert and key files ... X509* cert = PEM_read_X509_AUX(cert_filehandle, NULL, NULL, NULL); EVP_PKEY* key = PEM_read_PrivateKey(key_filehandle, NULL, NULL, NULL); SSL_CTX_use_certificate(ctx, cert); SSL_CTX_use_PrivateKey(ctx, key);
The first thing we do is replace the call to PEM_read_PrivateKey, which reads
the private key into memory, with our own function that creates a shell of a private
key with references to our own implementations of the sign and decrypt operations.
Let's call that function make_private_key_shell:
EVP_PKEY* make_private_key_shell (X509* cert) { EVP_PKEY* key = EVP_PKEY_new(); RSA* rsa = RSA_new(); // It's necessary for our shell to contain the public RSA values (n and e). // Grab them out of the certificate: RSA* public_rsa = EVP_PKEY_get1_RSA(X509_get_pubkey(crt)); rsa->n = BN_dup(public_rsa->n); rsa->e = BN_dup(public_rsa->e); static RSA_METHOD ops = *RSA_get_default_method(); ops.rsa_priv_dec = rsa_private_decrypt; ops.rsa_priv_enc = rsa_private_encrypt; RSA_set_method(rsa, &ops); EVP_PKEY_set1_RSA(key, rsa); return key; }
The magic happens with the call to RSA_set_method. We pass it a struct
of function pointers from which we reference our own implementations of
the private decrypt and private encrypt (sign) operations. These implementations
look something like this:
int rsa_private_decrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { do_rsa_operation(1, flen, from, to, rsa, padding); } int rsa_private_encrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { do_rsa_operation(2, flen, from, to, rsa, padding); } int do_rsa_operation (char command, int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { write(sockpair[0], &command, sizeof(command)); write(sockpair[0], &padding, sizeof(padding)); write(sockpair[0], &flen, sizeof(flen)); write(sockpair[0], from, flen); int to_len; read(sockpair[0], &to_len, sizeof(to_len)); if (to_len > 0) { read(sockpair[0], to, to_len); } return to_len; }
The arguments and results are sent to and from the other process over a socket pair that has been previously opened. Our message format is simply:
uint8_t command; // 1 for decrypt, 2 for signint padding; // the padding argumentint flen; // the flen argumentunsigned char from[flen]; // the from argument
The response format is:
int to_len; // length of result buffer (to)unsigned char to[to_len]; // the result buffer
Here's the code to open the socket pair and run the RSA private key process:
void run_rsa_process (const char* key_path) { socketpair(AF_UNIX, SOCK_STREAM, 0, sockpair); if (fork() == 0) { close(sockpair[0]); FILE* key_filehandle = fopen(key_path, "r"); RSA* rsa = PEM_read_RSAPrivateKey(key_filehandle, NULL, NULL, NULL); fclose(key_filehandle); int command; while (read(sockpair[1], &command, sizeof(command)) == 1) { int padding; int flen; read(sockpair[1], &padding, sizeof(padding)); read(sockpair[1], &flen, sizeof(flen)); unsigned char* from = (unsigned char*)malloc(flen); read(sockpair[1], from, flen); unsigned char* to = (unsigned char*)malloc(RSA_size(rsa)); int to_len = -1; if (command == 1) { to_len = RSA_private_decrypt(flen, from, to, rsa, padding); } else if (command == 2) { to_len = RSA_private_encrypt(flen, from, to, rsa, padding); } write(sockpair[1], &to_len, sizeof(to_len)); if (to_len > 0) { write(sockpair[1], to, sizeof(to_len)); } free(to); free(from); } _exit(0); } close(sockpair[1]); }
In the function above, we first create a socket pair for communicating between the parent (untrusted) process and child (trusted) process. We fork, and in the child process, we load the RSA private key, and then repeatedly service RSA private key operations received over the socket pair from the parent process. Only the child process, which never talks to the network, has the private key in memory. If the memory of the parent process, which does talk to the network, is ever compromised, the private key is safe.
That's the basic idea, and it works. There are other ways to do the interprocess communication that are more complicated but may be more efficient, such as using shared memory to transfer the arguments and results back and forth. But the socket pair implementation is conceptually simple and a good starting point for further improvements.
This is one of the techniques I'm using in titus to achieve total isolation of the part of OpenSSL that talks to the network. However, this is only part of the story. While this technique protects your private key against a memory disclosure bug like Heartbleed, it doesn't prevent other sensitive data from leaking. It also doesn't protect against more severe vulnerabilities, such as remote code execution. Remote code execution could be used to attack the trusted child process (such as by ptracing it and dumping its memory) or your system as a whole. titus protects against this using additional techniques like chrooting and privilege separation.
My next blog post will go into detail on titus' other isolation techniques. Follow me on Twitter, or subscribe to my blog's Atom feed, so you know when it's posted.
Update: Read part two of this blog post.
April 8, 2014
Responding to Heartbleed: A script to rekey SSL certs en masse
Because of the Heartbleed vulnerability in OpenSSL, I'm treating all of my private SSL keys as compromised and regenerating them. Fortunately, certificate authorities will reissue a certificate for free that signs a new key and is valid for the remaining time on the original certificate.
Unfortunately, using the openssl commands by hand to rekey dozens of SSL certificates is really annoying and is not my idea of a good time. So, I wrote a shell script called openssl-rekey to automate the process. openssl-rekey takes any number of certificate files as arguments, and for each one, generates a new private key of the same length as the original key, and a new CSR with the same common name as the original cert.
If you have a directory full of certificates, it's easy to run openssl-rekey on all of them with find and xargs:
$ find -name '*.crt' -print0 | xargs -0 /path/to/openssl-rekey
Once you've done this, you just need to submit the .csr files to your certificate authority, and then install the new .key and .crt files on your servers.
By the way, if you're like me and hate dealing with openssl commands
and cumbersome certificate authority websites, you should check out my
side project, SSLMate, which
makes buying certificates as easy as running sslmate buy www.example.com
2 and reissuing certificates as easy as running sslmate
reissue www.example.com. I was able to reissue each of my SSLMate
certs in under a minute. As my old certs expire I'm replacing them with
SSLMate certs, and that cannot happen soon enough.
December 4, 2013
The Sorry State of Xpdf in Debian
I discovered today that Xpdf deterministically segfaults when you try to print any PDF under i386 Wheezy. It doesn't segfault on amd64 Wheezy which is why I had not previously noticed this. Someone reported this bug over two years ago, which you'd think would have given ample time for this to be fixed for Wheezy. This bug led me to a related bug which paints quite a sad picture. To summarize:
- Xpdf uses its own PDF rendering engine. Some other developers thought it would be nice to be able to use this engine in other applications, so they ripped out Xpdf's guts and put them in a library called Poppler. Unfortunately, Xpdf continued to use its own rendering engine instead of linking with Poppler, so now there are two diverging codebases that do basically the same thing.
- Apparently this code is bad and has a history of security vulnerabilities. The Debian maintainers quite reasonably don't want to support both codebases, so they have attempted to patch Xpdf to use Poppler instead of its own internal engine.
- Unfortunately, this is hard and they haven't done a perfect job at it, which has led to a situation where Xpdf and Poppler each define their own, incompatible, versions of the same struct... with which they then attempt to interoperate.
- As a consequence, Xpdf accesses uninitialized memory, or accesses initialized memory incorrectly, so it's a miracle any time you manage to use Xpdf without it crashing.
The Debian maintainers have three options:
- Stop trying to patch Xpdf to use Poppler.
- Fix their patch of Xpdf so it actually works.
- Do nothing.
There is a patch for option 2, but it has been rejected for being too long and complicated. Indeed it is long and complicated, and will make it more difficult to package new upstream versions of Xpdf. But that's the price that will have to be paid as Xpdf and Poppler continue to diverge, if the maintainers insist that Xpdf use Poppler. Unfortunately they seem unwilling to pay this price, nor are they willing to go with option 1. Instead they have taken the easy way out, option 3, even though this results in a totally broken, and possibly insecure, package that is only going to get more broken.
Clearly I can't be using this quagmire of a PDF viewer, so I have uninstalled it after being a user for more than 10 years. Sadly, the alternatives are not great. Most of them take the position that UIs are for chumps, preferring instead to use keyboard shortcuts that I will never remember because I don't view PDFs often enough. Debian helpfully provides a version of Evince that isn't linked with GNOME and it works decently. Unfortunately, after quitting it, I noticed a new process in my process list called "evinced." As in "evince daemon." As in, a PDF viewer launched its own daemon. As in, a PDF viewer launched its own daemon. I don't even...
epdfview is like Evince but less cool because it doesn't have its own daemon has buggy text selection. qpdfview is rather nice but copies selected text in such a way that it can't be pasted by middle-clicking (it copies into XA_CLIPBOARD instead of XA_PRIMARY).
I decided to go with Evince and pretend I never saw evinced.
Update (2014-01-20): The Xpdf bug has allegedly been fixed, by defining a new version of the incompatible struct. This is option 2, which means we can expect Xpdf to break again as Poppler and Xpdf continue to diverge. It appears to be a different patch from the one that was rejected earlier, though I can't find any of the discussion which led to this fix being adopted. I will update this blog post if I learn more information.
October 4, 2013
Verisign's Broken Name Servers Slow Down HTTPS for Google and Others
The problem was so bizarre that for a moment I suspected I was witnessing a man-in-the-middle attack using valid certificates. Many popular HTTPS websites, including Google and DuckDuckGo, but not all HTTPS websites, were taking up to 20 seconds to load. The delay occurred in all browsers, and according to Chromium's developer tools, it was occurring in the SSL (aka TLS) handshake. I was perplexed to see Google taking several seconds to complete the TLS handshake. Google employs TLS experts to squeeze every last drop of performance out of TLS, and uses the highly efficient elliptic curve Diffie-Hellman key exchange. It was comical to compare that to my own HTTPS server, which was handshaking in a fraction of a second, despite using stock OpenSSL and the more expensive discrete log Diffie-Hellman key exchange.
Not yet willing to conclude that it was a targeted man-in-the-middle attack that was affecting performance, I looked for alternative explanations. Instinctively, I thought this had the whiff of a DNS problem. After a slow handshake, there was always a brief period during which all handshakes were fast, even if I restarted the browser. This suggested to me that once a DNS record was cached, everything was fast until the cache entry expired. Since I run my own recursive DNS server locally, this hypothesis was easy to test by flushing my DNS cache. I found that flushing the DNS cache would consistently cause the next TLS handshake to be slow.
This didn't make much sense: using tools like host and dig, I could find no DNS problems with the affected domains, and besides, Chromium said the delay was in the TLS handshake. It finally dawned on me that the delay could be in the OCSP check. OCSP, or Online Certificate Status Protocol, is a mechanism for TLS clients to check if a certificate has been revoked. During the handshake, the client makes a request to the OCSP URI specified in the certificate to check its status. Since the URI would typically contain a hostname, a DNS problem could manifest here.
I checked the certificates of the affected sites, and all of them specified OCSP URIs that ultimately resolved to ocsp.verisign.net. Upon investigation, I found that of the seven name servers listed for ocsp.verisign.net (ns100.nstld.net through ns106.nstld.net), only two of them (ns100.nstld.net and ns102.nstld.net) were returning a response to AAAA queries. The other five servers returned no response at all, not even a response to say that an AAAA record does not exist. This was very bad, since it meant any attempt to resolve an AAAA record for this host required the client to try again and wait until it timed out, leading to unsavory delays.
If you're curious what an AAAA record is and why this matters, an AAAA record is the type of DNS record that maps a hostname to its IPv6 address. It's the IPv6 equivalent to the A record, which maps a hostname to its IPv4 address. While the Internet is transitioning from IPv4 to IPv6, hosts are expected to be dual-homed, meaning they have both an IPv4 and an IPv6 address. When one system talks to another, it prefers IPv6, and falls back to IPv4 only if the peer doesn't support IPv6. To figure this out, the system first attempts an AAAA lookup, and if no AAAA record exists, it tries an A record lookup. So, when a name server does not respond to AAAA queries, not even with a response to say no AAAA record exists, the client has to wait until it times out before trying the A record lookup, causing the delays I was experiencing here. Cisco has a great article that goes into more depth about broken name servers and AAAA records.
(Note: the exact mechanics vary between operating systems. The Linux resolver tries AAAA lookups even if the system doesn't have IPv6 connectivity, meaning that even IPv4-only users experience these delays. Other operating systems might only attempt AAAA lookups if the system has IPv6 connectivity, which would mitigate the scope of this issue.)
A History of Brokenness
This is apparently not the first time Verisign's servers have had problems: A year ago, the name servers for ocsp.verisign.net exhibited the same broken behavior:
The unofficial response from Verisign was that the queries are being handled by a GSLB, which apparently means that we should not expect it to behave correctly.
"GSLB" means "Global Server Load Balancing" and I interpret that statement to mean Verisign is using an expensive DNS appliance to answer queries instead of software running on a conventional server. The snarky comment about such appliances rings true for me. Last year, I noticed that my alma matter's website was taking 30 seconds to load. I tracked the problem down to the exact same issue: the DNS servers for brown.edu were not returning any response to AAAA queries. In the process of reporting this to Brown's IT department, I learned that they were using buggy and overpriced-looking DNS appliances from F5 Networks, which, by default, do not properly respond to AAAA queries under circumstances that appear to be common enough to cause real problems. To fix the problem, the IT people had to manually configure every single DNS record individually to properly reply to AAAA queries.
F5's "Global Traffic Manager": Because nothing says "optimized" like shipping with broken defaults that cause delays for users
F5's "Global Traffic Manager": Because nothing says "optimized" like shipping with broken defaults that cause delays for users
I find it totally unconscionable for a DNS appliance vendor to be shipping a product with such broken behavior which causes serious delays for users and gives IPv6 a bad reputation. It is similarly outrageous for Verisign to be operating broken DNS servers that are in the critical path for an untold number of TLS handshakes. That gives HTTPS a bad reputation, and lends fuel to the people who say that HTTPS is too slow. It's truly unfortunate that even if you're Google and do everything right with IPv6, DNS, and TLS, your handshake speeds are still at the mercy of incompetent certificate authorities like Verisign.
Disabling OCSP
I worked around this issue by disabling OCSP (in Firefox, set security.OCSP.enabled to 0 in about:config). While OCSP may theoretically be good for security, since it enables browsers to reject certificates that have been compromised and revoked, in practice it's a total mess. Since OCSP servers are often unreliable or are blocked by restrictive firewalls, browsers don't treat OCSP errors as fatal by default. Thus, an active attacker who is using a revoked certificate to man-in-the-middle HTTPS connections can simply block access to the OCSP server and the browser will accept the revoked certificate. Frankly, OCSP is better at protecting certificate authorities' business model than protecting users' security, since it allows certificate authorities to revoke certificates for things like credit card chargebacks. As if this wasn't bad enough already, OCSP introduces a minor privacy leak because it reports every HTTPS site you visit to the certificate authority. Google Chrome doesn't even use OCSP anymore because it is so dysfunctional.
Finally Resolved
While I was writing this blog post, Verisign fixed their DNS servers and now every single one is returning a proper response to AAAA queries. I know for sure their servers were broken for at least two days. I suspect it was longer considering the slowness was happening for quite some time before I finally investigated.