Skip to Content [alt-c]

Andrew Ayer

Sections

Login

You are here: Andrew's SiteBlog

April 13, 2018

Making Certificates Easier and Helping the Ecosystem: Four Years of SSLMate

I'm not actually sure when SSLMate was born. I got the idea, registered the domain name, and wrote the first lines of code in August 2013, but I put it on the backburner until March 2014. I think I "launched" in early April, but since I thought of SSLMate as a side project mainly for my own use, I didn't do anything special.

I do know that I sold my first certificate on April 13, 2014, four years ago to this day. I sold it to a friend who needed to replace his certificates after Heartbleed and was fed up with how hard his certificate authority was making it. It turns out a lot of people were generally fed up with how hard certificate authorities made things, and in the last four years, SSLMate has exceeded my wildest expectations and become my full-time job.

Sadly, certificate resellers have a deservedly bad reputation, which has only gotten worse in recent months thanks to the likes of Trustico and other resellers with terrible security practices. But it's not entirely a hellscape out there, and for SSLMate's birthday, I thought it would be nice to celebrate the ways SSLMate has been completely unlike a typical certificate reseller, and how SSLMate has allowed me to pursue work over the last four years that lifts up the Web PKI as a whole.

SSLMate was different from the beginning. When SSLMate launched in 2014, it was the first, and only, way you could get a publicly-trusted SSL certificate entirely from the command line. SSLMate made automated certificate issuance accessible to anyone, not just large customers of certificate authorities. More importantly, it significantly improved the usability of certificate issuance at a time when getting a certificate meant running long OpenSSL commands, copy-and-pasting PEM blobs to and from websites, and manually extracting a Zip file and assembling the correct certificate chain. The state of the art in usability was resellers generating, and possibly storing, your private key on their servers. SSLMate showed that certificate issuance could be easy without having to resort to insecure practices.

In September 2014, SSLMate stopped selling multi-year certificates. This was unusual at a time when certificates could be valid for up to five years. The change was unpopular among a handful of customers, but it was unquestionably the right thing to do. Shorter-lived certificates are better for the ecosystem, since they allow security standards to advance more quickly, but they are also better for customers. I learned from the SHA-1 deprecation that a multi-year certificate might not remain valid for its entire term, and it felt wrong to sell a product that I might not be able to deliver on. Sure enough, the five year Symantec certificates that SSLMate was reselling at the time will all become prematurely invalid as of the next Chrome release. (The affected customers all got free replacements, but it's still not what they were expecting.)

The industry is following SSLMate's lead: the CA/Browser Forum limited certificate lifetimes to three years beginning in 2015, and further limited lifetimes to two years beginning last month.

In April 2015, SSLMate released its first public REST API. (While we already had an API for use by the command-line tool, it wasn't previously documented.) As far as I know, this was the world's first fully self-service API for the automated issuance of publicly-trusted SSL certificates. Although major certificate authorities, and some resellers, had APIs (indeed, SSLMate used them under-the-hood), every one of the many APIs I looked at required you to ask a human to enable API access for your account. Some even required you to get on the phone to negotiate prices. With SSLMate's API, you could sign up yourself and immediately start issuing publicly-trusted certificates.

In June, SSLMate published a blog post explaining how to set up OCSP Stapling in Apache and nginx. Resources at the time were pretty bad, so I had to dive into the source code for Apache and nginx to learn how stapling really worked. I was rather horrified at what I saw. The worst bug was that nginx would sometimes staple an expired OCSP response, which would cause Firefox to reject the certificate. So, I submitted a patch fixing it.

In August, several folks asked me to review the ACME specification being worked on at the IETF and provide feedback based on my experience with automated certificate issuance APIs. While reading the draft, I was very bothered by the fact that RSA and ECDSA signatures were being used without any associated message. I had never heard of duplicate signature key selection attacks, but I knew that crypto wasn't being used properly, and when crypto isn't used properly, bad things tend to happen. So I dusted off my undergraduate number theory textbook and came up with an attack that broke ACME, allowing attackers to get unauthorized certificates. After my disclosure, ACME was fixed, before it was deployed in the Web PKI.

In March 2016, it occurred to me that signing OCSP responses with a weak hash function such as SHA-1 could probably lead to the forgery of a trusted certificate. It was already known that signing a certificate with a weak hash function could lead to the forgery of another certificate using a chosen-prefix collision attack, so CAs were forbidden from signing certificates using SHA-1. However, no one had demonstrated a collision attack against OCSP responses, and CAs were allowed to sign OCSP responses with SHA-1.

I figured out how to execute a chosen-prefix attack against OCSP responses, rented a GPU instance in EC2 to make a proof-of-concept with MD5, and scanned every OCSP responder I could find to see which ones could be used to forge a certificate with a SHA-1 collision attack. I reported my findings to the mozilla.dev.security.policy mailing list. This led to a change in Mozilla's Root Store Policy to forbid CAs from signing OCSP responses with SHA-1 except under safe conditions.

Since then, I've periodically scanned OCSP responders to ensure they remain in compliance.

In July 2016, SSLMate launched Cert Spotter, a Certificate Transparency monitor. The core of Cert Spotter is open source, because I wanted non-profits to be able to easily use Certificate Transparency without depending on a commercial service. I'm proud to say that the Wikimedia Foundation uses the open source Cert Spotter to watch for unauthorized certificates for wikipedia.org and their other domains.

Certificate Transparency was designed to be verifiable, but this only matters if a diverse set of people bother to actually do the verification. Cert Spotter has always verified log behavior, and it has detected log misbehavior that was missed by other monitors.

In March 2017, SSLMate started operating the world's second Certificate Transparency gossip endpoint (Graham Edgecombe gets credit for the first) to provide further resiliency to the Certificate Transparency ecosystem. SSLMate also released ct-honeybee, a lightweight program that queries each Certificate Transparency log for its current state and uploads it to Graham's and SSLMate's gossip endpoints. People are now running ct-honeybee on devices all around the world, helping ensure that logs do not present different views to different parts of the Internet.

In 2017, I attended both Certificate Transparency Policy Days hosted by Google to help hash out policy for the burgeoning Certificate Transparency ecosystem.

In September, to help with the upcoming CAA enforcement deadline, I released a free CAA Test Suite for CAs to use to test their implementations.

What's next for SSLMate? The biggest change over the last four years is that the price of certificates as individual goods has gone to zero. But SSLMate has never really been about selling certificates, but about selling easy-to-use software, good support, and a service for managing certificates. And I still see a lot of work to be done to make certificates even easier to work with, particularly with all the new ways certificates are going to be used in the future. I'm pleased to be kicking off SSLMate's fifth year with the release of SSLMate for SaaS, a new service that provides an easy, high-level way for SaaS companies to get certificates for the customer domains they host. This is the first of many exciting announcements in store for this year.

Posted on 2018-04-13 at 18:06:02 UTC | Comments

March 29, 2018

These Three Companies Are Doing the Internet a Solid By Running Certificate Transparency Logs

When we use the Internet, we rely on the security of the certificate authority system to ensure we are talking with the right people. Unfortunately, the certificate authority system is a bit of a mess. One of the ways we're trying to clean up the mess is Certificate Transparency, an effort to put all SSL certificates issued by public certificate authorities in public, verifiable, append-only logs. Domain owners can monitor the logs for unauthorized certificates, and web browsers can monitor for compliance with the rules and take action against non-compliant certificate authorities. After ramping up for the last four years, Certificate Transparency is about to enter prime time: Google Chrome is requiring that all certificates issued on or after April 30, 2018 be logged.

But who is supposed to run these Certificate Transparency logs? Servers, electricity, bandwidth, and system administrators cost money. Although Google is spearheading Certificate Transparency and operates nine logs that are recognized by Chrome, Certificate Transparency is supposed to benefit everyone and it would be unhealthy for the Internet if Google ran all the logs. For this reason, Chrome requires that certificates be included in at least one log operated by an organization besides Google.

So far, three organizations have stepped up and are operating Certificate Transparency logs that are recognized by Chrome and are open to certificates from any public certificate authority:

DigiCert was the first non-Google organization to set up a log, and they now operate several logs recognized by Chrome. Their DigiCert 2 log accepts certificates from all public certificate authorities. They are also applying for recognition of their Nessie and Yeti log sets, which accept certificates from all public certificate authorities and are each split into five shards based on the expiration year of the certificate. (They also operate DigiCert 1, which only accepts certificates from some certificate authorities, and have three logs acquired from Symantec which they are shutting down later this year.)

DigiCert is notable because they've written their own Certificate Transparency log implementation instead of using an open source one. This is helpful because it adds diversity to the ecosystem, which ensures that a bug in one implementation won't take out all logs.

Comodo Certification Authority (which is thankfully no longer owned by the blowhard who thinks he invented 90 day certificates) operates two logs recognized by Chrome: Mammoth and Sabre. Both logs accept certificates from all public certificate authorities, and run SuperDuper, which is Google's original open source log implementation.

In addition to operating two open logs, Comodo CA runs crt.sh, a search engine for certificates found in Certificate Transparency logs. crt.sh has been an invaluable resource for the community when investigating misbehavior by certificate authorities.

Cloudflare is the latest log operator to join the ecosystem. They operate the Nimbus log set, which accepts certificates from all public certificate authorities and is split into four shards based on the expiration year of the certificate. Nimbus runs Trillian, Google's latest open source implementation, with some Cloudflare-specific patches.

Cloudflare is unique because unlike DigiCert and Comodo CA, they are not a certificate authority. DigiCert and Comodo have an obvious motivation to run logs: they need somewhere to log their certificates so they will be trusted by Chrome. Cloudflare doesn't have such a need, but they've chosen to run logs anyways.

DigiCert, Comodo CA, and Cloudflare should be lauded for running open Certificate Transparency logs. None of them have to do this. Even DigiCert and Comodo could have adopted the strategy of their competitors and waited for someone else to run a log that would accept their certificates. Their willingness to run logs shows that they are invested in improving the Internet for everyone's benefit.

We need more companies to step up and join these three in running public Certificate Transparency logs. How about some major tech companies? Although we all benefit from the success of Certificate Transparency, large tech companies benefit even more: they are bigger targets than the rest of us, and they have more to gain when the public feels secure conducting business online. Major tech companies are also uniquely positioned to help, since they already run large-scale Internet infrastructure which could be used to host Certificate Transparency logs. And what kind of tech company doesn't want the cred that comes from helping the Internet out?

If you're a big tech company that knows how to run large-scale infrastructure, why aren't you running a Certificate Transparency log too?

Posted on 2018-03-29 at 16:36:23 UTC | Comments

January 21, 2018

Google's Certificate Revocation Server Is Down - What Does It Mean?

Earlier today, someone reported to the mozilla.dev.security.policy mailing list that they were unable to access any Google websites over HTTPS because Google's OCSP responder was down. David E. Ross says the problem started two days ago, and several Tweets confirm this. Google has since acknowledged the report. As of publication time, the responder is still down for me, though Ross reports it's back up. (Update: a fix is being rolled out.)

What's an OCSP Responder?

OCSP, which stands for Online Certificate Status Protocol, is the system used by SSL/TLS clients (such as web browsers) to determine if an SSL/TLS certificate is revoked or not. When an OCSP-using TLS client connects to a TLS server such as https://www.google.com, it sends a query to the OCSP responder URL listed in the TLS server's certificate to see if the certificate is revoked. If the OCSP responder replies that it is, the TLS client aborts the connection. OCSP responders are operated by the certificate authority which issued the certificate. Google has its own publicly-trusted certificate authority (Google Internet Authority G2) which issues certificates for Google websites.

I thought Chrome didn't support OCSP?

You're correct. Chrome famously does not use OCSP. Chrome users connecting to Google websites are blissfully unaware that Google's OCSP responder is down.

But other TLS clients do use OCSP, and as a publicly-trusted certificate authority, Google is required by the Baseline Requirements to operate an OCSP responder. However, certificate authorities have historically done a bad job operating reliable OCSP responders, and firewalls often get in the way of OCSP queries. Consequentially, although web browsers like Edge, Safari, and Firefox do contact OCSP responders, they use "soft fail" and allow a connection if they don't get a well-formed response from the responder. Otherwise, they'd constantly reject connections that they shouldn't. (This renders OCSP almost entirely pointless from a security perspective, since an attacker with a revoked certificate can usually just block the OCSP response and the browser will accept the revoked certificate.) Therefore, the practical impact from Google's OCSP responder outage is probably very small. Nearly all clients are going to completely ignore the fact that Google's OCSP responder is down.

That said, Google operates some of the most heavily trafficked sites on the Internet. A wide variety of devices connect to Google servers (Google's FAQ for Certificate Changes mention set-top boxes, gaming consoles, printers, and even cameras). Inevitably, at least some of these devices are going to use "hard fail" and reject connections when an OCSP responder is down. There are also people, like the mozilla.dev.security.policy poster, who configure their web browser to use hard fail. Without a doubt, there are people who are noticing problems right now.

I thought Google had Site Reliability Engineers?

Indeed they do, which is why this incident is noteworthy. As I mentioned, certificate authorities tend to do a poor job operating OCSP responders. But most certificate authorities run off-the-shelf software and employ no software engineers. Some regional European certificate authorities even complain when you report security incidents to them during their months-long summer vacations. So no one is surprised when those certificate authorities have OCSP responder outages. Google, on the other hand, sets higher expectations.

Posted on 2018-01-21 at 19:35:36 UTC | Comments

January 10, 2018

How will Certificate Transparency Logs be Audited in Practice?

Certificate Transparency, the effort to detect misissued SSL certificates by publishing all certificates in public logs, only works if TLS clients reject certificates that are not logged. Otherwise, certificate authorities could just not log the certificates that they misissue. TLS clients accomplish this by requiring that a certificate be accompanied by a "signed certificate timestamp" (SCT), which is a promise by a log to include the certificate within 24 hours of the SCT's issuance timestamp. (This period is called the Maximum Merge Delay. It doesn't have to be 24 hours, but it is for all logs currently trusted by Chrome.) The SCT can be embedded in the certificate, the OCSP response (which the TLS server must staple), or the TLS handshake.

But an SCT is only a promise. What if the log breaks its promise and never includes the certificate? After all, certificate authorities promise not to issue bad certificates, but they do so in droves. We don't want to replace the problem of untrustworthy certificate authorities with the problem of untrustworthy Certificate Transparency logs. Fortunately, Certificate Transparency was designed to be verifiable, making it possible, in theory, for the TLS client to verify that the log fulfills its promise. Unfortunately, SCT verification seems to be a rather elusive problem in practice. In this post, I'm going to explore some of the possible solutions, and explain their shortcomings.

Direct Proof Fetching

A Certificate Transparency log stores its certificates in the leaves of an append-only Merkle Tree, which means the log can furnish efficient-to-verify cryptographic proofs that a certificate is included in the tree, and that one tree is an appendage of another. The log provides several HTTP endpoints for acquiring these proofs, which a TLS client can use to audit an SCT as follows:

  1. Use the get-sth endpoint to retrieve the log's latest signed tree head (STH). Store the returned tree head.

  2. Use the get-sth-consistency endpoint to retrieve a consistency proof between the tree represented by the previously-stored tree head, and the tree represented by the latest tree head. Verify the proof. If the proof is invalid, it means the new tree is not an appendage of the old tree, so the log has violated its append-only property.

  3. Use the get-proof-by-hash endpoint to retrieve an inclusion proof for the SCT based on the latest tree head. Verify the proof. If the proof is invalid, it means the log has not included the corresponding certificate in the log.

Note that steps 1 and 2 only need to be done periodically for each log, rather than once for every certificate validation.

There are several problems with direct proof fetching:

  1. Logs can't handle the load of every web browser everywhere contacting them for every TLS connection ever made.

  2. When the client asks for an inclusion proof in step 3, it has to reveal to the log which SCT it wants the proof for. Since each SCT corresponds to a certificate, and certificates contain domain names, the log learns every domain the client is visiting, which is a violation of the user's privacy.

  3. Since the log has up to 24 hours to include a certificate, the certificate might not be included at the time the client sees an SCT. Furthermore, even if 24 hours have elapsed, it would slow down the TLS client to retrieve a proof during the TLS handshake. Therefore, the client has to store the SCT and certificate and fetch the inclusion proof at some future time.

    This exposes the client to denial-of-service attacks, wherein a malicious website could try to exhaust the client's storage by spamming it with certificates and SCTs faster than the client can audit them (imagine a website which uses JavaScript to make a lot of AJAX requests in the background). To avoid denial-of-service, the client has to limit the size of its unverified SCT store. Once the limit is reached, the client has to stop adding to the store, or evict old entries. Either way, it exposes the client to flushing attacks, wherein an attacker with a misissued certificate and bogus SCT spams the client with good certificates and SCTs so the bad SCT never gets verified.

    Another question is what to do if the verification fails. It's too late to abort the TLS handshake and present a certificate error. Most users would be terribly confused if their browser presented an error message about a misbehaving log, so there needs to be a way for the client to automatically report the SCT to some authority (e.g. the browser vendor) so they know that the log has misbehaved and needs to be distrusted.

Proxied Auditing

To solve the scalability problem, clients can talk with servers operated by the client software supplier, rather than the logs themselves. (The client software supplier has to be comfortable operating scalable infrastructure; Google is, at least.)

For example, Chrome doesn't retrieve STHs and consistency proofs directly from logs. Instead, Google-operated servers perform steps 1 and 2 and distribute batches of STHs (called STHSets) to clients using Chrome's update system. Users have to trust the Google servers to faithfully audit the STHs, but since Chrome users already have to trust Google to provide secure software, there is arguably no loss in security with this approach.

Unfortunately, proxied auditing doesn't fully solve the privacy problem. Instead of leaking every domain the user visits to the log operator, requesting an inclusion proof leaks visited domains to the client software supplier.

DNS Proof Fetching

Chrome's solution to the privacy problem is to fetch inclusion proofs over DNS. Instead of making an HTTP request to the get-proof-by-hash endpoint, Chrome will make a series of DNS requests for sub-domains of ct.googleapis.com which contain the parameters of the inclusion proof request (e.g. D4S6DSV2J743QJZEQMH4UYHEYK7KRQ5JIQOCPMFUHZVJNFGHXACA.hash.pilot.ct.googleapis.com). The name servers for ct.googleapis.com respond with the inclusion proof inside TXT records.

Chrome believes DNS proof fetching is better for privacy because instead of their servers learning the Chrome user's own IP address, they will learn the IP address of the user's DNS resolver. If the user is using an ISP's DNS resolver, its IP address will be shared among all the ISP's users. Although DNS is unencrypted and an eavesdropper along the network path could learn what inclusion proofs the user is fetching, they would already know what domains the user is visiting thanks to the DNS queries used to resolve the domain in the first place.

However, there are some caveats: if a user doesn't share a DNS resolver with many other users, it may be possible to deanonymize them based on the timing of the DNS queries. Also, if a user visits a domain handled by an internal DNS server (e.g. www.intranet.example.com), that DNS query won't leak onto the open Internet, but the DNS query for the inclusion proof will, causing a DNS privacy leak that didn't exist previously. A privacy analysis is available if you want to learn more.

Chrome's DNS proof fetching is still under development and hasn't shipped yet.

Embedded Proofs

The second version of Certificate Transparency (which is not yet standardized or deployed) allows inclusion proofs to be presented to clients alongside SCTs (embedded in the certificate, OCSP response, or TLS handshake) saving the client the need to fetch inclusion proofs itself. That solves the problems above, but creates new ones.

First, inclusion proofs can't be obtained until the certificate is included in the log, which currently takes up to 24 hours. If clients were to require inclusion proofs along with SCTs, a newly issued certificate wouldn't be usable for up to 24 hours. That's OK for renewals (as long as you don't wait until the last minute to renew), but people are used to being able to get new certificates right away.

Second, clients can no longer choose the STH on which an inclusion proof is based. With direct or DNS proof fetching, the client always requests an inclusion proof to the latest STH, and the client can easily verify that the latest STH is consistent with the previous STH. When the client gets an embedded inclusion proof, it doesn't immediately know if the STH on which it is based is consistent with other STHs that the client has observed. The client has to audit the STH.

Unfortunately, the more frequently a log produces STHs that incorporate recently-submitted certificates, the harder it is to audit them. Consider the extreme case of a log creating a new STH immediately after every certificate submission. Although this would let an inclusion proof be obtained immediately for an SCT, auditing the new STH has all the same problems as SCT auditing that were discussed above. It's bad for privacy, since a log can assume that a client auditing a particular STH visited the domain whose certificate was submitted right before the STH was produced. And if the client wants to avoid making auditing requests during the TLS handshake, it has to store the STH for later auditing, exposing it to denial-of-service and flushing attacks.

STH Frequency and Freshness

To address the privacy problems of excessive STH production, CTv2 introduces a new log attribute called the STH Frequency Count, defined as the maximum number of STHs a log may produce in any period equal to the Maximum Merge Delay. The CT gossip draft defines a fresh STH to be one that was produced less than 14 days in the past. Clients could require that embedded inclusion proofs be based on a fresh STH. Then, with an STH Frequency Count that permits one STH an hour, there are only 336 fresh STHs at any given time for any given log - few enough that auditing them is practical and private. Auditing one of 336 STHs doesn't leak any information, and since the number of STHs is bounded, there is no risk of denial-of-service or flushing attacks.

It would also be possible for the client software supplier to operate a service that continuously fetches fresh STHs from logs, audits them for consistency, and distributes them to clients, saving clients the need to audit STHs themselves. (Just like Chrome currently does with the latest STHs.)

Unfortunately, this is not a perfect solution.

First, there would still be a delay before an inclusion proof can be obtained and a newly-issued certificate used. Many server operators and certificate authorities aren't going to like that.

Second, server operators would need to refresh the embedded inclusion proof every 14 days so it is always based on a fresh STH. That rules out embedding the inclusion proof in the certificate, unless the certificate is valid for less than 14 days. The server operator could use OCSP stapling, with the certificate authority responsible for embedding a new inclusion proof every 14 days in the OCSP response. Or the server operator's software could automatically obtain a new inclusion proof every 14 days and embed it in the TLS handshake. Unfortunately, there are no implementations of the latter, and there are very few robust implementations of OCSP stapling. Even if implementations existed, there would be a very long tail of old servers that were never upgraded.

One possibility is to make DNS proof fetching the default, but allow server operators to opt-in to embedded proofs, much like server operators can use HSTS to opt-in to enforced HTTPS. Server operators who run up-to-date software with reliable OCSP Stapling and who don't mind a delay in certificate issuance would be able to provide better security and privacy to their visitors. Maybe at some point in the distant future, embedded proofs could become required for everyone.

All of this is a ways off. CTv2 is still not standardized. Chrome still doesn't do any SCT auditing, and consequentially its CT policy requires at least one SCT to be from a Google-operated log, since Google obviously trusts its own logs not to break its promises. Fortunately, even without widespread log auditing, Certificate Transparency has been a huge success, cleaning up the certificate authority ecosystem and making everyone more secure. Nevertheless, I think it would be a shame if Certificate Transparency's auditability were never fully realized, and I hope we'll be able to find a way to make it work.

Posted on 2018-01-10 at 00:41:54 UTC | Comments

September 28, 2017

Why Man-in-the-Middle Detection is Overrated

Last week, Nick Sullivan launched mitm.watch, a website that purports to tell you whether or not your HTTPS connection is being intercepted by a man-in-the-middle (MitM). mitm.watch uses Caddy's HTTPS MitM Detection Feature, which implements the techniques described in this paper. Basically, Caddy compares the browser name and version number advertised by the User-Agent header to the properties of the TLS handshake initiated by the client (e.g. ciphersuites). If the TLS handshake doesn't match the known properties of the purported browser, then the TLS handshake was probably not initiated by the browser, but by a man-in-the-middle. Caddy's documentation suggests that you could display an error message if a MitM is detected. mitm.watch displays either a green "No MITM!" page, or a red "Likely MITM!" page.

Unfortunately, there is a significant and intractable shortcoming to MitM detection: a MitM can defeat the detection by making its TLS implementation work exactly like that of the browser it's proxying, or at least similar enough that the differences are not observable by the server. You should assume a malicious MitM (one designed to steal data) will conceal itself this way. And if websites start displaying errors when a MitM is detected, you should expect the makers of commercial TLS interception devices (e.g. Bluecoat) to respond by making their interception devices indistinguishable from browsers.

We need to stop obsessing over MitM detection. In addition to server-side MitM detection, another recurring idea is to apply HTTP Public Key Pinning (HPKP) to certificates issued by private certificate authorities (e.g. those used by MitM devices), or to display a special icon in the browser when an HTTPS connection uses a private certificate authority. These proposals are barking up the wrong tree. Short of protocol or implementation vulnerabilities, there are only two ways a TLS connection can be intercepted without the consent of the server operator: one, an unauthorized certificate is issued by a publicly-trusted certificate authority, or two, a private certificate authority has been added to the client's trust store. (I assume the website is using HSTS, which prevents certificate errors from being bypassed.) For the first case, we have Certificate Transparency, which is better than even pie-in-the-sky MitM detection, since it detects rogue certificates even if they are never used. And the second case can only happen if the client's trust store is modified. At that point, the client's security should be considered compromised, as the ability to modify the trust store typically implies the ability to do much worse, such as install spyware that monitors and exfiltrates everything you do, without so much as touching a TLS connection. It's pointless to try to ensure end-to-end encryption when the security of an endpoint is in doubt.

That said, there is one potential benefit to MitM detection. Despite claiming to improve security, many commercial TLS interception devices actually harm security by using TLS client implementations that are vastly inferior to those of modern browsers. For instance, they use old, insecure ciphers, or even fail to validate the certificate. If the makers of commercial TLS interception devices are forced to emulate the TLS implementations of browsers to avoid detection, they may end up improving their security in the process. However, if this is the goal, MitM detection is rather superfluous: servers might as well just check for insecure attributes of the connection and raise an error if found, MitM or not. After all, MitMs are not the only perpetrators of poor TLS security; there are plenty of old and insecure browsers out there as well. Jeff Hodges' How's My SSL, which recently launched a subscription service that lets you use it on your own site, is one example of this approach.

Posted on 2017-09-28 at 15:34:49 UTC | Comments

January 24, 2017

Thoughts on the Systemd Root Exploit

Sebastian Krahmer of the SUSE Security Team has discovered a local root exploit in systemd v228. A local user on a system running systemd v228 can escalate to root privileges. That's bad.

At a high level, the exploit is trivial:

  1. Systemd uses -1 to represent an invalid mode_t (filesystem permissions) value.
  2. Systemd was accidentally passing this value to open when creating a new file, resulting in a file with all permission bits set: that is, world-writable, world-executable, and setuid-root.
  3. The attacker writes an arbitrary program to this file, which succeeds because it's world-writable.
  4. The attacker executes this file, which succeeds because it's world-executable.
  5. The attacker-supplied program runs as root, because the file is setuid-root.

In mitigation: The vulnerability was fixed a year ago and less than three months after it was introduced. It is present only in v228.

In aggravation: The vulnerability was mislabeled at the time as a local denial-of-service and the systemd team did not request a CVE-ID for it. Had they requested a CVE-ID, someone may have noticed that this was more than a DoS. (Krahmer accurately points out that the systemd commit log is "really huge," which makes it hard to spot security-relevant commits.)

In mitigation: The vulnerability depends on a yet-unfixed hole in how Linux clears a file's setuid and setgid bits when writing to it. Systemd merely creates an empty setuid-root file. Gaining root requires writing to this file, and when a non-root user writes to a setuid-root file, the setuid bit is supposed to be cleared. halfdog found a clever way to circumvent this by tricking a root process into writing to the file instead. This is an extremely interesting vulnerability in itself and I can't wait to dive deeper into it.

In aggravation: The vulnerability would have been prevented if systemd used a fail-safe umask rather than setting it to 0, something I called out last September as evidence of systemd's poor security hygiene. A more sensible umask, such as 022, would have caused open to create the setuid-root file without world-writable permissions, preventing exploitation. However, systemd maintainer David Strauss rejected a safe umask with a completely illogical argument that shows his cluelessness over how systemd uses umask.

Lastly, this is yet another example of "The Billion Dollar Mistake": systemd was using a magic value (-1) to represent an invalid mode_t value, and C's type system did not prevent passing it to the mode argument of open. A language with a better type system, such as Rust or C++ (which has std::optional) can help prevent this kind of error.

That said, this is not about programming languages. Dovecot (among a handful of others) has demonstrated that adherence to good coding practices can produce secure software written in C. Rewriting systemd in a safer language would not transform it into quality software, although certain classes of bugs would likely be reduced or eliminated.

Rather, this is about lock-in. Systemd is introducing unprecedented lock-in to the Linux userspace. They are replacing previously-independent userspace services with ones whose development is controlled by the systemd project and which only work if systemd is PID 1. They are defining their own non-standard protocols and encouraging applications to use them. They have even replaced DNS with a dbus-based protocol, which they "strongly recommend" applications use instead of DNS. Sadly, the most recent version of Ubuntu ships with this travesty.

Systemd's developers have repeatedly demonstrated their poor judgment and unfitness to hold such responsibility. Unfortunately, the lock-in they're creating will deprive people of the ability to vote with their feet and switch to better alternatives.

Posted on 2017-01-24 at 17:54:16 UTC | Comments

October 2, 2016

Systemd is not Magic Security Dust

Systemd maintainer David Strauss has published a response to my blog post about systemd. The first part of his post is replete with ad hominem fallacies, strawmen, and factual errors. Ironically, in the same breath that he attacks me for not understanding the issues around threads and umasks, he betrays an ignorance of how the very project which he works on uses threads and umasks. This doesn't deserve a response beyond what I've called out on Twitter.

In the second part of his blog post, Strauss argues that systemd improves security by making it easy to apply hardening techniques to the network services which he calls the "keepers of data attackers want." According to Strauss, I'm "fighting one of the most powerful tools we have to harden the front lines against the real attacks we see every day." Although systemd does make it easy to restrict the privileges of services, Strauss vastly overstates the value of these features.

The best systemd can offer is whole application sandboxing. You can start a daemon as a non-root user, in a restricted filesystem namespace, with mandatory access control. Sandboxing an entire application is an effective way to run potentially malicious code, since it protects other applications from the malicious one. This makes sandboxing useful on smartphones, which need to run many different untrustworthy, single-user applications. However, since sandboxing a whole application cannot protect one part of the application from a compromise of a different part, it is ineffective at securing benign-but-insecure software, which is the problem faced on servers. Server applications need to service requests from many different users. If one user is malicious and exploits a vulnerability in the application, whole application sandboxing doesn't protect the other users of the service.

For concrete examples, let's consider Apache and Samba, two daemons which Strauss says would benefit from systemd's features.

First Apache. You can start Apache as a non-root user provided someone else binds to ports 443 and 80. You can further sandbox it by preventing it from accessing parts of the filesystem it doesn't need to access. However, no matter how much you try to sandbox Apache, a typical setup is going to need a broad amount of access to do its job, including read permission to your entire website (including password-protected parts) and access to any credential (database password, API key, etc.) used by your CGI, PHP, or similar webapps.

Even under systemd's most restrictive sandboxing, an attacker who gains remote code execution in Apache would be able to read your entire website, alter responses to your visitors, steal your HTTPS private keys, and gain access to your database and any API consumed by your webapps. For most people, this would be the worst possible compromise, and systemd can do nothing to stop it. Systemd's sandboxing would prevent the attacker from gaining access to the rest of your system (absent a vulnerability in the kernel or systemd), but in today's world of single-purpose VMs and containers, that protection is increasingly irrelevant. The attacker probably only wants your database anyways.

To provide a meaningful improvement to security without rewriting in a memory-safe language, Apache would need to implement proper privilege separation. Privilege separation means using multiple processes internally, each running with different privileges and responsible for different tasks, so that a compromise while performing one task can't lead to the compromise of the rest of the application. For instance, the process that accepts HTTP connections could pass the request to a sandboxed process for parsing, and then pass the parsed request along to yet another process which is responsible for serving files and executing webapps. Privilege separation has been used effectively by OpenSSH, Postfix, qmail, Dovecot, and over a dozen daemons in OpenBSD. (Plus a couple of my own: titus and rdiscd.) However, privilege separation requires careful design to determine where to draw the privilege boundaries and how to interface between them. It's not something which an external tool such as systemd can provide. (Note: Apache already implements privilege separation that allows it to process requests as a non-root user, but it is too coarse-grained to stop the attacks described here.)

Next Samba, which is a curious choice of example by Strauss. Having configured Samba and professionally administered Windows networks, I know that Samba cannot run without full root privilege. The reason why Samba needs privilege is not because it binds to privileged ports, but because, as a file server, it needs the ability to assume the identity of any user so it can read and write that user's files. One could imagine a different design of Samba in which all files are owned by the same unprivileged user, and Samba maintains a database to track the real ownership of each file. This would allow Samba to run without privilege, but it wouldn't necessarily be more secure than the current design, since it would mean that a post-authentication vulnerability would yield access to everyone's files, not just those of the authenticated user. (Note: I'm not sure if Samba is able to contain a post-authentication vulnerability, but it theoretically could. It absolutely could not if it ran as a single user under systemd's sandboxing.)

Other daemons are similar. A mail server needs access to all users' mailboxes. If the mail server is written in C, and doesn't use privilege separation, sandboxing it with systemd won't stop an attacker with remote code execution from reading every user's mailbox. I could continue with other daemons, but I think I've made my point: systemd is not magic pixie dust that can be sprinkled on insecure server applications to make them secure. For protecting the "data attackers want," systemd is far from a "powerful" tool. I wouldn't be opposed to using a library or standalone tool to sandbox daemons as a last line of defense, but the amount of security it provides is not worth the baggage of running systemd as PID 1.

Achieving meaningful improvement in software security won't be as easy as adding a few lines to a systemd config file. It will require new approaches, new tools, new languages. Jon Evans sums it up eloquently:

... as an industry, let's at least set a trajectory. Let's move towards writing system code in better languages, first of all -- this should improve security and speed. Let's move towards formal specifications and verification of mission-critical code.

Systemd is not part of this trajectory. Systemd is more of the same old, same old, but with vastly more code and complexity, an illusion of security features, and, most troubling, lock-in. (Strauss dismisses my lock-in concerns by dishonestly claiming that applications aren't encouraged to use their non-standard DBUS API for DNS resolution. Systemd's own documentation says "Usage of this API is generally recommended to clients." And while systemd doesn't preclude alternative implementations, systemd's specifications are not developed through a vendor-neutral process like the IETF, so there is no guarantee that other implementers would have an equal seat at the table.) I have faith that the Linux ecosystem can correct its trajectory. Let's start now, and stop following systemd down the primrose path.

Posted on 2016-10-02 at 18:19:33 UTC | Comments

September 28, 2016

How to Crash Systemd in One Tweet

The following command, when run as any user, will crash systemd:

NOTIFY_SOCKET=/run/systemd/notify systemd-notify ""

After running this command, PID 1 is hung in the pause system call. You can no longer start and stop daemons. inetd-style services no longer accept connections. You cannot cleanly reboot the system. The system feels generally unstable (e.g. ssh and su hang for 30 seconds since systemd is now integrated with the login system). All of this can be caused by a command that's short enough to fit in a Tweet.

Edit (2016-09-28 21:34): Some people can only reproduce if they wrap the command in a while true loop. Yay non-determinism!

The bug is remarkably banal. The above systemd-notify command sends a zero-length message to the world-accessible UNIX domain socket located at /run/systemd/notify. PID 1 receives the message and fails an assertion that the message length is greater than zero. Despite the banality, the bug is serious, as it allows any local user to trivially perform a denial-of-service attack against a critical system component.

The immediate question raised by this bug is what kind of quality assurance process would allow such a simple bug to exist for over two years (it was introduced in systemd 209). Isn't the empty string an obvious test case? One would hope that PID 1, the most important userspace process, would have better quality assurance than this. Unfortunately, it seems that crashes of PID 1 are not unusual, as a quick glance through the systemd commit log reveals commit messages such as:

Systemd's problems run far deeper than this one bug. Systemd is defective by design. Writing bug-free software is extremely difficult. Even good programmers would inevitably introduce bugs into a project of the scale and complexity of systemd. However, good programmers recognize the difficulty of writing bug-free software and understand the importance of designing software in a way that minimizes the likelihood of bugs or at least reduces their impact. The systemd developers understand none of this, opting to cram an enormous amount of unnecessary complexity into PID 1, which runs as root and is written in a memory-unsafe language.

Some degree of complexity is to be expected, as systemd provides a number of useful and compelling features (although they did not invent them; they were just the first to aggressively market them). Whether or not systemd has made the right trade-off between features and complexity is a matter of debate. What is not debatable is that systemd's complexity does not belong in PID 1. As Rich Felker explained, the only job of PID 1 is to execute the real init system and reap zombies. Furthermore, the real init system, even when running as a non-PID 1 process, should be structured in a modular way such that a failure in one of the riskier components does not bring down the more critical components. For instance, a failure in the daemon management code should not prevent the system from being cleanly rebooted.

In particular, any code that accepts messages from untrustworthy sources like systemd-notify should run in a dedicated process as a unprivileged user. The unprivileged process parses and validates messages before passing them along to the privileged process. This is called privilege separation and has been a best practice in security-aware software for over a decade. Systemd, by contrast, does text parsing on messages from untrusted sources, in C, running as root in PID 1. If you think systemd doesn't need privilege separation because it only parses messages from local users, keep in mind that in the Internet era, local attacks tend to acquire remote vectors. Consider Shellshock, or the presentation at this year's systemd conference which is titled "Talking to systemd from a Web Browser."

Systemd's "we don't make mistakes" attitude towards security can be seen in other places, such as this code from the main() function of PID 1:

/* Disable the umask logic */
if (getpid() == 1)
        umask(0);

Setting a umask of 0 means that, by default, any file created by systemd will be world-readable and -writable. Systemd defines a macro called RUN_WITH_UMASK which is used to temporarily set a more restrictive umask when systemd needs to create a file with different permissions. This is backwards. The default umask should be restrictive, so forgetting to change the umask when creating a file would result in a file that obviously doesn't work. This is called fail-safe design. Instead systemd is fail-open, so forgetting to change the umask (which has already happened twice) creates a file that works but is a potential security vulnerability.

The Linux ecosystem has fallen behind other operating systems in writing secure and robust software. While Microsoft was hardening Windows and Apple was developing iOS, open source software became complacent. However, I see improvement on the horizon. Heartbleed and Shellshock were wake-up calls that have led to increased scrutiny of open source software. Go and Rust are compelling, safe languages for writing the type of systems software that has traditionally been written in C. Systemd is dangerous not only because it is introducing hundreds of thousands of lines of complex C code without any regard to longstanding security practices like privilege separation or fail-safe design, but because it is setting itself up to be irreplaceable. Systemd is far more than an init system: it is becoming a secondary operating system kernel, providing a log server, a device manager, a container manager, a login manager, a DHCP client, a DNS resolver, and an NTP client. These services are largely interdependent and provide non-standard interfaces for other applications to use. This makes any one component of systemd hard to replace, which will prevent more secure alternatives from gaining adoption in the future.

Consider systemd's DNS resolver. DNS is a complicated, security-sensitive protocol. In August 2014, Lennart Poettering declared that "systemd-resolved is now a pretty complete caching DNS and LLMNR stub resolver." In reality, systemd-resolved failed to implement any of the documented best practices to protect against DNS cache poisoning. It was vulnerable to Dan Kaminsky's cache poisoning attack which was fixed in every other DNS server during a massive coordinated response in 2008 (and which had been fixed in djbdns in 1999). Although systemd doesn't force you to use systemd-resolved, it exposes a non-standard interface over DBUS which they encourage applications to use instead of the standard DNS protocol over port 53. If applications follow this recommendation, it will become impossible to replace systemd-resolved with a more secure DNS resolver, unless that DNS resolver opts to emulate systemd's non-standard DBUS API.

It is not too late to stop this. Although almost every Linux distribution now uses systemd for their init system, init was a soft target for systemd because the systems they replaced were so bad. That's not true for the other services which systemd is trying to replace such as network management, DNS, and NTP. Systemd offers very few compelling features over existing implementations, but does carry a large amount of risk. If you're a system administrator, resist the replacement of existing services and hold out for replacements that are more secure. If you're an application developer, do not use systemd's non-standard interfaces. There will be better alternatives in the future that are more secure than what we have now. But adopting them will only be possible if systemd has not destroyed the modularity and standards-compliance that make innovation possible.

Posted on 2016-09-28 at 19:14:27 UTC | Comments

February 5, 2016

Domain Validation Vulnerability in Symantec Certificate Authority

Symantec was disregarding + and = characters in email addresses when parsing WHOIS records, allowing certificate misissuance for domains whose WHOIS contacts contained these characters. The vulnerability has been reported and fixed. Read on for more...

There are three common ways for the requester of a domain-validated SSL certificate to prove control over the domain in the certificate request: add a record to the domain's DNS, publish a file on the domain's website, or respond to an email sent to an administrative address at the domain. The basic idea is that getting a DV certificate should require doing something that only the domain administrator can do. Adding a DNS record is the best way to ensure this: it's unlikely that anyone but the administrator could add records to a domain's DNS. Publishing a file on the domain's website is pretty good too, although some websites accept user uploads in a way that might be abused.

Email validation, on the other hand, is the worst way. Besides being impossible to automate, it's certainly the easiest to abuse. One problem is that the person receiving an email at an "administrative" address might not actually be an administrator. This is a big problem for email service providers who allow users to register arbitrary email addresses at their domains. In 2008, Mike Zusman was able to register the email address sslcertificates@live.com and use it to approve an SSL certificate for live.com. Back then, certificate authorities allowed you to choose from quite a large list of possible administrative addresses, which compounded the problem. Now, the Baseline Requirements (the rules governing public certificate issuance) define the administrative addresses as admin@, administrator@, hostmaster@, postmaster@, and webmaster@, which helps but is not a panacea: just last year, an unnamed Finn was able to register hostmaster@live.fi and use it to obtain a certificate for live.fi.

The Baseline Requirements also allow email addresses from the domain's WHOIS record to be used for certificate approval. These addresses don't suffer from the problem above: the WHOIS record is controlled by the domain's registrant, who wouldn't list an email address if it didn't belong to an administrator of the domain. Unfortunately, there's a pretty major implementation pitfall awaiting those who use WHOIS: WHOIS records are not machine-readable. They are unstructured, human-readable text. Even worse, every TLD uses its own format!

How does one take an unstructured, human-readable document, and extract some bit of information such as an email address from it in realtime? I suspect that most solutions are going to involve a regular expression that matches a sequence of characters that look like an email address. What constitutes a valid email address? The answer may surprise you. The relevant standard, RFC 5322, allows a shocking assortment of characters to appear in the local part (the part to the left of the at sign) of an email address, including, but not limited to, +, =, !, #, {, `, and ^. If you escape or quote the local part, you can even include control characters, although the madness of permitting control characters is considered obsolete.

If a certificate authority does not properly consider the full range of characters when parsing a WHOIS record, they risk extracting the wrong email address for a domain, allowing an unauthorized party to obtain certificates for it. Last October, I discovered that Symantec's DV certificate products (RapidSSL, QuickSSL) did not consider + and = characters when parsing WHOIS records. If an email address in WHOIS contained either character, Symantec would treat the part of the address following the character as a valid administrative address. For example, if a domain's WHOIS contact was this:

andrew+whois@gmail.com

then Symantec would allow the following address to approve certificates for the domain:

whois@gmail.com

This was a serious flaw. + is commonly used in email addresses for sub-addressing, which permits person+anything@example.com to be used as an alias for person@example.com. Several popular email service providers, including Gmail, Outlook.com, and Fastmail support sub-addressing with the plus character, and I know from my experience running SSLMate that it is not uncommon for domain administrators to use an email address such as person+whois@gmail.com for their WHOIS contact. An attacker could register whois@gmail.com and fraudulently obtain certificates from Symantec for any domain whose WHOIS contact followed this pattern.

As a proof of concept, I set all three WHOIS email addresses of a test domain, cloudpork.com, to alice+bob@cloudpork.com:

$ whois cloudpork.com | grep Email Registrant Email: ALICE+BOB@CLOUDPORK.COM Admin Email: ALICE+BOB@CLOUDPORK.COM Tech Email: ALICE+BOB@CLOUDPORK.COM Registrar Abuse Contact Email: abuse@enom.com

I then went to RapidSSL to obtain a DV certificate for www.cloudpork.com, and was presented with the following choice of email addresses:

Screenshot of RapidSSL UI showing bob@cloudpork.com as an acceptable administrative email address

I selected bob@cloudpork.com, received the approval email at that address, and after approving the certificate was issued a valid SSL certificate for www.cloudpork.com, despite bob@cloudpork.com not being a valid administrative address for cloudpork.com.

I reported the vulnerability to Symantec on October 21, 2015. I also reported it to the four major trust store operators (Google, Mozilla, Microsoft, and Apple), which I believe is appropriate for vulnerabilities in publicly-trusted certificate authorities. Symantec reported that the issue was fixed on October 28, which I confirmed. Symantec then conducted a rather lengthy audit of previously-issued certificates to ensure that this vulnerability had not been exploited. The vulnerability was publicly disclosed yesterday.

In the end, I was able to confirm that Symantec properly handled +, =, -, _, and . characters in email addresses. I wish I could have tested with additional non-alphanumeric characters, but I couldn't find a domain registrar who would let me include such characters in WHOIS. Fortunately, the likelihood of someone using a special character besides +, =, -, _, or . in their domain's WHOIS contact is pretty low. Furthermore, email providers tend not to allow such bizarre characters in email addresses, so if someone were to use such an email address, it would most likely be hosted at their own domain, where the risk of the email being misdirected to an unauthorized person would be low.

I'm glad this vulnerability is fixed, but it serves as a reminder that the certificate authority system still has much room for improvement. I have high hopes for Certificate Transparency, a system which would require certificate authorities to log all certificates they issue to public, append-only, and auditable logs (browsers would enforce this by only accepting a certificate if it was accompanied by cryptographic proof of the certificate's inclusion in a log). Domain owners could monitor these logs for certificates related to their domain, so if a vulnerability such as this one were exploited to misissue certificates, it would be detected. The Certificate Transparency experiment is already underway: many certificates have been logged, and can be searched using the awesome crt.sh tool from Comodo. I'm working on some of my own tools to help domain owners use Certificate Transparency; stay tuned!

Posted on 2016-02-05 at 14:50:03 UTC | Comments

December 2, 2015

Duplicate Signature Key Selection Attack in Let's Encrypt

Cryptography is notorious for its sharp edges. It's easy to make a minor mistake that totally dooms your security. The situation is improving thanks to the development of easier-to-use libraries like libsodium which provide a high-level interface instead of forcing the user to combine basic building blocks. However, you still need to know exactly what security guarantees your cryptographic primitives provide and be sure not to go beyond their guarantees.

As an example of what can go wrong when you assume too much from a primitive, consider the duplicate signature key selection attack which I discovered in ACME, the protocol used by Let's Encrypt. The vulnerability was severe and would have allowed attackers to obtain SSL certificates for domains they didn't control. Fortunately, it was mitigated before Let's Encrypt was publicly trusted, and was definitively fixed a couple weeks ago.

The vulnerability was caused by a misuse of digital signatures. The guarantee provided by digital signatures is the following:

Given a message, a signature, and a public key, a valid digital signature tells you that the message was authored by the holder of the corresponding private key.

This guarantee is handy for many use cases, such as verifying that an email is authentic. If you receive a signed email that claims to be from Bob, you can use Bob's public key to verify the signature. If an attacker, Mallory, alters the email, the signature is no longer valid. It is computationally infeasible for Mallory to compute a valid signature since she doesn't know Bob's private key.

What if Mallory could trick you into using her public key, not Bob's, to verify the message? Clearly, this would doom security. After altering the email, Mallory could replace Bob's signature with a signature from her own private key. When you verify it with Mallory's public key, the message will appear authentic.

But what if Mallory were able to alter the message and trick you into using her public key, but she was not able to replace the signature, perhaps because it was delivered out-of-band? The obvious attack, re-signing the message with her private key, won't work. So is this system secure? Is Mallory stymied?

No. Mallory just needs to find a private key which produces the same signature for her altered message as Bob's private key produced for his original message, and nothing says this can't be done. Digital signatures guarantee that a message came from a particular private key. They do not guarantee that a signature came from a particular private key, and with RSA it's quite easy to find a private key that produces a desired signature for a particular message. This means that a signature does not uniquely identify a message, which is interesting because it's easy to naively think of signatures as "hashes with public key crypto" but in this way they are very unlike hashes. Similarly, a signature alone does not identify a key, which makes digital signatures unlike handwritten signatures, which (theoretically) uniquely identify a person.

A system that gets this wrong may be vulnerable to a duplicate signature key selection attack. Let's see how this works with RSA.

Brief recap of RSA

RSA signatures work using exponentiation modulo an integer. RSA public keys consist of the modulus n (typically a 2048 bit integer that is the product of two random primes) and the public exponent e (typically 65537). Private keys consist of the same modulus n, plus the private exponent d, such that (xd)e = x (mod n) for all x. It's easy to calculate d from e if you know the prime factorization of n, which only the person who generated the key pair should know. Without this information, calculating d is considered infeasible.

To sign a message m, you raise it to the power of d (mod n) to produce the signature s:

s=md (mod n)

To verify a message, you take the signature, raise it to the power of e (mod n), and compare it against the message:

sem (mod n)

Since s = md, and (xd)e = x (mod n) for all x, raising s to the power of e should produce m, as long as neither the message nor the signature were altered.

Note that m has to be just the right length, so you never sign the message itself. Instead you sign a cryptographic hash of the message that has been padded using a padding scheme such as PKCS#1 v1.5 or PSS. This detail doesn't matter for understanding the attack so I will henceforth assume that the message to be signed has already been hashed and padded.

Crafting an RSA key

In a duplicate signature key selection attack, the signature s is fixed. The attacker gets to choose the message m, and then has to construct an RSA key under which s is a valid signature for m. In other words, find e, d, and n such that:

se=m (mod n)

and:

(xd)e=x (mod n) for all x

There's a trivial solution which is silly but works with some RSA implementations. Just set e = 1, d = 1, and n = s - m. Clearly, the second equation is satisfied. It's not hard to see that the first equation is satisfied too:

s=m (mod s - m)
s - m=0 (mod s - m)
0=0 (mod s - m)

This requires m < s, but since the first byte of PKCS#1 v1.5 padding is always zero, m < s will be true with high probability if you use PKCS#1 v1.5 padding (note that the choice of padding is controlled by the attacker; it doesn't matter what padding the victim's signature uses).

This produces a highly implausible RSA key pair. e and d are 1, which means that signing doesn't do anything, and the modulus n is less than the signature s, which shouldn't happen with modular arithmetic. However, not all RSA implementations are picky with these details. For example, Go's RSA implementation happily validates such signatures (Let's Encrypt's backend is written in Go). Note that this is in not a bug in Go, since these details don't matter when signatures are used properly.

There is a more sophisticated way to pick the RSA key that produces a valid key pair that would be accepted by all RSA implementations. Finding e such that se = m (mod n) is an instance of the discrete logarithm problem. Whether or not the discrete logarithm problem is difficult depends on n, which the attacker gets to choose. The attacker can choose n such that it's easy to find the corresponding e and d. Although the resulting key pair will look slightly odd to the human eye (since e is conventionally 3 or 65537), it will be a perfectly valid key pair. For more details about this technique, see page 4 of this paper by Koblitz and Menezes.

Attacking ACME

ACME is a protocol for the automated issuance of SSL certificates. It was developed for and is used by Let's Encrypt, and is currently undergoing standardization at the IETF. In ACME, messages from the client are signed using the client's ACME account key, which is typically an RSA or ECDSA key. When an ACME client asks the server to issue a certificate for a particular domain, the server replies with one or more "challenges" which the client must complete successfully to prove that it controls that domain.

One of the challenges is the DNS challenge. In an earlier draft of ACME, the client signed a "validation object" with its ACME account key, published the signature in a TXT record under the domain, and then sent the validation object and signature to the ACME server. The server would verify the signature using the client's account key and then query the TXT record. If the signature was valid, and the value of the TXT record matched the signature, the challenge would succeed. Since only the administrator of a domain can create DNS records, it was presumed that this challenge was secure.

As we saw above, such a scheme is vulnerable to a duplicate signature key selection attack. A digital signature does not uniquely identify a key or a message. So if Mallory wants to obtain a certificate for Bob's domain, she doesn't need to alter Bob's DNS records if Bob has already published his own signature in the DNS. Mallory just needs to choose her ACME account key so that her validation object has the same signature as Bob's. When Mallory sends her validation object to the ACME server, the server will query Bob's TXT record, see that Bob's signature matches the signature of Mallory's validation object, and conclude incorrectly that Mallory put the signature in Bob's DNS, and is therefore authorized to obtain certificates for Bob's domain.

For a more in-depth description of my attack, see my report to the IETF ACME list.

Resolution

Shortly after I reported the vulnerability to the IETF ACME mailing list on August 11, 2015, Let's Encrypt mitigated the attack by removing the ability to start a challenge with one account key and finish it with a different one, which deprived the attacker of the ability to pick an account key that would produce the right signature for the validation object. Since Let's Encrypt was not yet publicly trusted, at no point was the integrity of the public certificate authority system at risk from this attack. Still, the underlying misuse of signatures remained, so ACME has been redesigned so that a hash of the ACME account public key (plus a random token) is published in the DNS instead of a signature. The old challenges were disabled on November 19, 2015.

Edited (2015-12-04): Remove incorrect mention of modular inverses from my recap of RSA. Thanks to Reader Sam Edwards for pointing out my error.

Posted on 2015-12-02 at 04:07:27 UTC | Comments

October 8, 2015

I Don't Accept the Risk of SHA-1

Website operators have to configure a dizzying number of security properties for their website: protocol versions, TLS ciphers, certificate hash algorithm, and so on. Most of these properties provide an individual benefit: when you configure your server to require secure protocol versions and strong ciphers, connections to your website are immediately made more secure. It doesn't affect your website's security if some other schmuck is still using SSLv3 with RC4 and 1024 bit Diffie-Hellman on their website.

However, other security properties, particularly those related to certificates, provide more of a collective security benefit, where everyone's security is determined by the security of the lowest common denominator. A timely example is the hash algorithm used in certificate signatures. Until recently, SHA-1 was the most common algorithm. Unfortunately, SHA-1 is dangerously weak so the Internet is transitioning to the more secure SHA-2. Under the current deprecation schedule, certificate authorities must stop issuing SHA-1 certificates on January 1, 2016, and SHA-1 certificates that are issued before then must not be valid past January 1, 2017, which means that on January 1, 2017, browsers can stop trusting SHA-1 certificates.

Unfortunately, since this is a collective security property, there's nothing an individual website operator can do in the meantime to improve the security of their site. This site, www.agwa.name, uses a SHA-2 certificate, but the truth is that it's no more secure than a site using a SHA-1 certificate. That's because an attacker who can generate a SHA-1 collision can forge a SHA-1 certificate for www.agwa.name. Since so many websites still use SHA-1 certificates, and it's not 2017 yet, web browsers will accept the forged certificate and be none the wiser. None of us will be more secure until certificate authorities stop signing, and web browsers stop accepting, certificates with SHA-1 signatures.

For this reason, I was dismayed by the recent proposal from Symantec to allow certificate authorities to issue SHA-1 certificates through the end of 2016, because some of their "very large enterprise customers" can't complete the migration in time. Although their proposal would not change the date on which browsers would stop trusting SHA-1, it would extend the period during which new collisions could be created. This was troubling enough when the proposal was made last week, and is even more troubling in light of the research released today that estimates the cost of finding a SHA-1 collision on EC2 to be between just $75,000 and $120,000.

What made me really angry about the proposal was the following statement:

These customers accept the risk of continuing to use new SHA-1 certificates

"These customers" accept the risk? As I explained above, the use of SHA-1 is a collective risk shared by the entire Internet, not just the "very large enterprise customers" who want to keep using SHA-1. What about the rest of the Internet, who want their TLS connections to be secure and who have dutifully migrated to SHA-2 in time for the deadline? Did anyone ask them? I sure as hell don't accept the risk.

The statement is therefore vacuous and thoroughly unpersuasive to anyone who understands how certificates work. But to someone who doesn't understand or isn't reading too closely, it makes the proposal seem less bad for the Internet at large than it really is. I hope that the other members of the CA/Browser Forum see through this and reject the proposal.

Posted on 2015-10-08 at 22:12:39 UTC | Comments

August 7, 2015

Hardening OpenVPN for DEF CON

As people head off to DEF CON this week, many are probably relying on OpenVPN to safely tunnel their Internet traffic through "the world's most hostile network" back to an ordinarily hostile network. While I believe OpenVPN itself to be quite secure, the way in which it interacts with the operating system to route your traffic is quite unrobust and can be subverted in numerous ways on a hostile local area network. This article will describe some of the problems and suggest countermeasures. The article is Linux-centric, since I'm most familiar with Linux, but many of these concerns apply to other operating systems as well.

IPv6

Unless you explicitly configure your OpenVPN tunnel to support IPv6 (which is only possible if your server has IPv6 connectivity), then all IPv6 traffic from your client will bypass the VPN and egress over the local network. This should concern you as more and more websites are available over IPv6 (including this blog), and clients generally prefer to use IPv6 if it's available.

The easiest countermeasure is to just disable IPv6 while you're at DEF CON. As a bonus, you'll reduce your network stack's attack surface and will be safe in the unlikely event someone drops an IPv6-specific 0day at DEF CON.

DNS

If you're using a VPN, you want to make sure you're using a trusted DNS server. If you use an attacker-controlled DNS server, they can return rogue IP addresses and redirect all your traffic back to a network they control after it passes through your VPN, rendering your VPN moot. Unfortunately, Unix-based operating systems (including OS X, though it may have improved since I last looked at this a few years ago) handle DNS server configuration incredibly poorly. Even if your VPN server specifies the address of a trusted DNS server, the DHCP server on the local network might return the address of a rogue DNS server. Which DNS server you end up using depends on too many factors to discuss here, but needless to say it does not inspire confidence.

I recommend doing whatever it takes to disable the retrieval of DNS information over DHCP, and hard-coding the IP address of a trusted DNS server in /etc/resolv.conf. On Debian, if isc-dhcp-client is your DHCP client, the most airtight way to stop DHCP messing with your DNS settings is to place the following in /etc/dhcp/dhclient-enter-hooks.d/zzz-preserve-resolvconf:

make_resolv_conf() { true }

I don't know enough about other operating systems/DHCP clients to provide specific instructions.

Denial of service

An attacker can always block your VPN, preventing you from using it. If they did this continuously, you'd probably notice that your VPN failed to start, and then proceed cautiously (or not at all) knowing you didn't have the protection of a VPN. A more clever attacker would let you establish the VPN connection, and only start blocking it later. If the OpenVPN client times out and quits, you'll start sending traffic over the untrusted network, and you might not notice.

I will present a countermeasure for this along with the countermeasure for the next attack.

Attacks on redirect-gateway

The usual way of telling OpenVPN to route all Internet traffic over the VPN is to use the redirect-gateway def1 option. When this option is used, the OpenVPN client adds three routes to your system's main routing table:

  1. A specific route for the VPN server, via the local network's default gateway.
  2. A route for 0.0.0.0/1 via the VPN.
  3. A route for 128.0.0.0/1 via the VPN.

The first route prevents the encrypted VPN traffic from being routed via the VPN itself, which would cause a feedback loop. The last two routes are a clever hack: together, 0.0.0.0/1 and 128.0.0.0/1 cover the entire IPv4 address space, and since they are more specific than the default route for 0.0.0.0/0 that came from the local DHCP server, they take precedence.

However, a DHCP server can also push its own routes (called "classless static routes") to the DHCP client. So a rogue DHCP server can push routes even more specific than the OpenVPN routes, such as for 0.0.0.0/2, 64.0.0.0/2, 128.0.0.0/2, and 192.0.0.0/2. These routes cover the entire IPv4 address space, and take precedence over the less-specific OpenVPN routes.

You could tell your DHCP client to ignore classless static routes, but there's another attack: a rogue DHCP server could push a subnet mask for an extremely large subnet, such as /2. Then the interface route for the local network would be more specific than your OpenVPN routes. The attacker can only grab 25% of the IPv4 address space this way, but that's a sizable percentage of the Internet.

A better countermeasure is to take advantage of Linux's advanced routing and use multiple routing tables. Although rarely used, a Linux system can have multiple routing tables, and you can use routing policy rules to specify which routing table a packet should use. The idea is to put all your OpenVPN routes in a dedicated routing table, and then add routing policy rules that say:

  1. If a packet is destined for the VPN server, use the main routing table.
  2. Otherwise, use the OpenVPN routing table.

This keeps your OpenVPN routes safely segregated from routes pushed by the DHCP server. The only packets that will ever use the routing table controlled by the DHCP server will be encrypted packets to the VPN server itself. Everything else will use a routing table controlled only by OpenVPN.

The first step is to configure the routing rules. Unfortunately, distros don't provide a good way of managing these, leaving you to run a series of ip rule commands by hand. The changes made by these commands are lost when the system reboots, so I suggest placing them in a system startup script such as rc.local.

ip rule add to 203.0.113.41 table main pref 1000 ip rule add to 203.0.113.41 unreachable pref 1001 ip rule add table 94 pref 1002 ip rule add unreachable pref 1003

Replace 203.0.113.41 with the IP address of your VPN server. The preferences (1000-1003) ensure the rules are sorted correctly. 94 is the number of the OpenVPN routing table, which we'll reference below. The second rule prevents VPN server packets from being routed over the VPN itself in case the main routing table is empty, and the final rule prevents packets from using the main routing table in case the OpenVPN routing table is empty (which would happen if OpenVPN quit unexpectedly).

The next step is to configure the OpenVPN client to add its routes to table 94 instead of the main routing table. OpenVPN itself lacks support for this, but I wrote a routing hook that provides support. Download the hook and install it to /usr/local/lib/openvpn/route. Make it executable with chmod +x. Then, add the following options to your OpenVPN client config:

setenv OPENVPN_ROUTE_TABLE 94 route-noexec route-up /usr/local/lib/openvpn/route route 0.0.0.0 0.0.0.0

Remove the existing redirect-gateway option (also check the server config in case it's being pushed to the client).

The first option sets the routing table number. This has to match the number used in the ip rule command above. The second and third options tell OpenVPN to use my routing hook instead of its builtin routing code. The final option tells OpenVPN to route all traffic over the VPN.

Even worse attacks

If you want to be really careful, you should redirect your network device to an isolated VM and run all of your networking config (e.g. DHCP client, wireless supplicant) inside it. The Linux userspace networking stack is pretty hairy, and it all runs as root. A vulnerability would allow an attacker to take over your system before you even start your VPN.

Using a dedicated network VM is pretty complicated and beyond the scope of this blog post. Fortunately, if you're using an up-to-date operating system you're probably safe, since it seems unlikely anyone would burn a 0day at DEF CON just to take over random conference-goers' laptops. I'd be much more worried about the other attacks, which are straightforward enough to be in script kiddie territory.

Posted on 2015-08-07 at 03:00:52 UTC | Comments

March 21, 2015

How to Responsibly Publish a Misissued SSL Certificate

SSL certificate misissuance is in the news again. This time, it's not the certificate authorities who messed up, but rather email service providers who allowed their users to register email addresses at one of the five administrative addresses allowed to approve certificates (admin@, administrator@, hostmaster@, postmaster@, and webmaster@). Last week, Windows Live's Finnish domain (live.fi) fell victim, and today, Remy van Elst got a misissued cert for xs4all.nl, a Dutch ISP.

Unfortunately, van Elst, at the end of an otherwise good blog post, published the private key for the misissued cert. This was irresponsible, because although the cert has been revoked, certificate revocation doesn't work (see also my own WTF moment with revocation), so the certificate will be still be usable for man-in-the-middle attacks until it expires on March 19, 2016. Fortunately, browsers can push out updates to explicitly blacklist it (Chrome is especially good since it has the CRLSet system, and in fact, the cert is already blacklisted), but this doesn't help non-browser clients or users running out-of-date browsers.

If I ever came across an email service provider allowing administrative addresses to be registered, I would contact them first and try to spare everyone the headache of a misissued cert. Unfortunately, sometimes a proof-of-concept is needed to get security problems fixed in a reasonable (or even finite) amount of time. If you need to go this route, you should do it responsibly, and remember that as a responsible security researcher, your goal is not to actually MitM the target, but to demonstrate that you were able to obtain a certificate for it. That means destroying the private key immediately, and using a method besides publishing the key to prove you had possession of it.

Conveniently, the CSR, which you need to generate anyways, is signed with the private key, providing proof of possession. When you generate the CSR, put something unique in the CSR's organization field, such as "YOURNAME's Fake Certs, Inc." (This field is ignored for DV certificates so this shouldn't affect your ability to get a certificate.) The CSR's signature will prove that whoever generated the CSR had possession of the private key, and putting your name in the CSR will prove that you generated the CSR, thus proving that you had possession of the private key. Submit the CSR to the CA, and when you get the misissued certificate back, publish it along with the CSR as your proof.

Here's the OpenSSL command to generate a private key and CSR for www.example.com. Note that I tell OpenSSL to write the private key to /dev/null, ensuring that it's immediately discarded, without even hitting the filesystem.

$ openssl req -new -nodes -newkey rsa:2048 -keyout /dev/null -out www.example.com.csr Generating a 2048 bit RSA private key ....................................................+++ ..................................+++ writing new private key to '/dev/null' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:California Locality Name (eg, city) []: Organization Name (eg, company) [Internet Widgits Pty Ltd]:Andrew's Fake Certs, Inc. Organizational Unit Name (eg, section) []: Common Name (e.g. server FQDN or YOUR name) []:www.example.com Email Address []: Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []:

For reference, here's the resulting CSR, and here's the corresponding certificate (signed by an untrusted certificate authority that I use for testing).

Here's how to compare the public key in the CSR against the public key in the certificate:

$ openssl req -noout -pubkey -in www.example.com.csr | openssl pkey -pubin -pubout -outform DER | sha256sum a501928ab50fb9c0e8c8f006816acb462eb90cfea00c5139ba7333046260bff0 - $ openssl x509 -noout -pubkey -in www.example.com.crt | openssl pkey -pubin -pubout -outform DER | sha256sum a501928ab50fb9c0e8c8f006816acb462eb90cfea00c5139ba7333046260bff0 -

And here's how to verify the CSR's signature and print its subject:

$ openssl req -noout -verify -subject -in www.example.com.csr verify OK subject=/C=US/ST=California/O=Andrew's Fake Certs, Inc./CN=www.example.com

There you go - proof that I got a certificate for a private key under my control, without having to publish, or even keep around, the private key.

Posted on 2015-03-21 at 18:45:51 UTC | Comments

October 17, 2014

Renewing an SSL Certificate Without Even Logging in to My Server

Yesterday I renewed the about-to-expire SSL certificate for one of my websites. I did so without running a single command, filling out a single form, taking out my credit card, or even logging into any server. All I did was open a link in an email and click a single button. Soon after, my website was serving a renewed certificate to visitors.

It's all thanks to the auto-renewal feature offered by my SSL certificate management startup, SSLMate, which I finally had a chance to dogfood yesterday with a real, live expiring certificate.

There are two halves to auto-renewals. The first, relatively straightforward half, is that the SSLMate service is constantly monitoring the expiration dates of my certificates. Shortly before a certificate expires, SSLMate requests a new certificate from a certificate authority. As with any SSL certificate purchase, the certificate authority emails me a link which I have to click to confirm that I still control the domain. Once I've done that, the certificate authority delivers the new certificate to SSLMate, and SSLMate stores it in my SSLMate account. SSLMate charges the credit card already on file in my account, just like any subscription service.

But a certificate isn't very useful sitting in my SSLMate account. It needs to be installed on my web servers so that visitors are served with the new certificate. This is the second half of auto-renewals. If I were using a typical SSL certificate vendor, this step would probably involve downloading an email attachment, unzipping it, correctly assembling the certificate bundle, installing it on each of my servers, and restarting my services. This is tedious, and worse, it's error-prone. And as someone who believes strongly in devops, it offends my sensibilities.

Fortunately, SSLMate isn't a typical SSL certificate vendor. SSLMate comes with a command line tool to help manage your SSL certificates. Each of my servers runs the following script daily (technically, it runs from a configuration management system, but it could just as easily run from a cron job in /etc/cron.daily):

#!/bin/sh

if sslmate download --all
then
	service apache2 restart
	service titus restart
fi

exit 0

The sslmate download command downloads certificates from my SSLMate account to the /etc/sslmate directory on the server. --all tells sslmate to look at the private keys in /etc/sslmate and download the corresponding certificate for each one (alternatively, I could explicitly specify certificate common names on the command line). sslmate download exits with a zero status code if new certificates were downloaded, or a non-zero status if the certificates were already up-to-date. The if condition tests for a zero exit code, and restarts my SSL-using services if new certificates were downloaded.

On most days of the year, this script does nothing, since my certificates are already up-to-date. But on days when a certificate has been renewed, it comes to life, downloading new certificate files (not just the certificate, but also the chain certificate) and restarting services so that they use the new files. This is devops at its finest, applied to SSL certificate renewals for the first time.

You may be wondering - is it a good idea to do this unattended? I think so. First, the risk of installing a broken certificate is very low, and certainly lower than when an installation is done by hand. Since SSLMate takes care of assembling the certificate bundle for you, there's no risk of forgetting to include the chain certificate or including the wrong one. Chain certificate problems are notoriously difficult to debug, since they don't materialize in all browsers. While tools such as SSL Labs are invaluable for verifying certificate installation, they can only tell you about a problem after you've installed a bad certificate, which is too late to avoid downtime. Instead, it's better to automate certificate installation to eliminate the possibility of human error.

I'm also unconcerned about restarting Apache unattended. sslmate download contains a failsafe that refuses to install a new certificate if it doesn't match the private key, ensuring that it won't install a certificate that would prevent Apache from starting. And I haven't done anything reckless with my Apache configuration that might make restarts unreliable. Besides, it's already essential to be comfortable with restarting system services, since you may be required to restart a service at any time in response to a security update.

One more thing: this certificate wasn't originally purchased through SSLMate, yet SSLMate was able to renew it, thanks to SSLMate's upcoming import feature. A new command, sslmate import, will let you import your existing certificates to your SSLMate account. Once imported, you can set up your auto-renewal cron job, and you'll be all set when your certificates begin to expire. And you'll be charged only when a certificate renews; importing certificates will be free.

sslmate import is in beta testing and will be released soon. If you're interested in taking part in the beta, shoot an email to sslmate@sslmate.com. Also consider subscribing to the SSLMate blog or following @SSLMate on Twitter so you get future announcements - we have a lot of exciting development in the pipeline.

Posted on 2014-10-17 at 17:34:55 UTC | Comments

September 30, 2014

CloudFlare: SSL Added and Removed Here :-)

One of the more infamous leaked NSA documents was a slide showing a hand-drawn diagram of Google's network architecture, with the comment "SSL added and removed here!" along with a smiley face, written underneath the box for Google's front-end servers.

NSA slide showing diagram of Google's network architecture, with the comment "SSL added and removed here!" along with a smiley face, written underneath the box for Google's front-end servers.

"SSL added and removed here! :-)"

The point of the diagram was that although Google tried to protect their users' privacy by using HTTPS to encrypt traffic between web browsers and their front-end servers, they let traffic travel unencrypted between their datacenters, so their use of HTTPS was ultimately no hindrance to the NSA's mass surveillance of the Internet.

Today the NSA can draw a new diagram, and this time, SSL will be added and removed not by a Google front-end server, but by a CloudFlare edge server. That's because, although CloudFlare has taken the incredibly generous and laudable step of providing free HTTPS by default to all of their customers, they are not requiring the connection between CloudFlare and the origin server to be encrypted (they call this "Flexible SSL"). So although many sites will now support HTTPS thanks to CloudFlare, by default traffic to these sites will be encrypted only between the user and the CloudFlare edge server, leaving plenty of opportunity for the connection to be eavesdropped beyond the edge server.

Arguably, encrypting even part of the connection path is better than the status quo, which provides no encryption at all. I disagree, because CloudFlare's Flexible SSL will lull website visitors into a false sense of security, since these partially-encrypted connections will appear to the browser as normal HTTPS connections, padlock and all. There will be no distinction made whatsoever between a connection that's protected all the way to the origin, and a connection that's protected only part of the way. Providing a false sense of security is often worse than providing no security at all, and I find the security of Flexible SSL to be quite lacking. That's because CloudFlare aims to put edge nodes as close to the visitor as possible, which minimizes latency, but also minimizes the percentage of an HTTPS connection which is encrypted. So although Flexible SSL will protect visitors against malicious local ISPs and attackers snooping on coffee shop WIFI, it provides little protection against nation-state adversaries. This point is underscored by a map of CloudFlare's current and planned edge locations, which shows a presence in 37 different countries, including China. China has abysmal human rights and pervasive Internet surveillance, which is troubling because CloudFlare explicitly mentions human rights organizations as a motivation for deploying HTTPS everywhere:

Every byte, however seemingly mundane, that flows encrypted across the Internet makes it more difficult for those who wish to intercept, throttle, or censor the web. In other words, ensuring your personal blog is available over HTTPS makes it more likely that a human rights organization or social media service or independent journalist will be accessible around the world.

It's impossible for Flexible SSL to protect a website of a human rights organization from interception, throttling, or censoring when the connection to that website travels unencrypted through the Great Firewall of China. What's worse is that CloudFlare includes the visitor's original IP address in the request headers to the origin server, which of course is unencrypted when using Flexible SSL. A nation-state adversary eavesdropping on Internet traffic will therefore see not only the URL and content of a page, but also the IP address of the visitor who requested it. This is exactly the same situation as unencrypted HTTP, yet as far as the visitor can tell, the connection is using HTTPS, with a padlock icon and an https:// URL.

It is true that HTTPS has never guaranteed the security of a connection behind the host that terminates the SSL, and it's already quite common to terminate SSL in a front-end host and forward unencrypted traffic to a back-end server. However, in almost all instances of this architecture, the SSL terminator and the back-end are in the same datacenter on the same network, not in different countries on opposite sides of the world, with unencrypted connections traveling over the public Internet. Furthermore, an architecture where unencrypted traffic travels a significant distance behind an SSL terminator should be considered something to fix, not something to excuse or encourage. For example, after the Google NSA slide was released, Google accelerated their plans to encrypt all inter-datacenter traffic. In doing so, they strengthened the value of HTTPS. CloudFlare, on the other hand, is diluting the value of HTTPS, and in astonishing numbers: according to their blog post, they are doubling the number of HTTPS sites on the Internet from 2 million to 4 million. That means that after today, one in two HTTPS websites will be using encryption where most of the connection path is actually unencrypted.

Fortunately, CloudFlare has an alternative to Flexible SSL which is free and provides encryption between CloudFlare and the origin, which they "strongly recommend" site owners enable. Unfortunately, it requires manual action on the part of website operators, and getting users to follow security recommendations, even when strongly recommended, is like herding cats. The vast majority of those 2 million website operators won't do anything, especially when their sites already appear to be using HTTPS and thus benefit from the main non-security motivation for HTTPS, which is preference in Google search rankings.

This is a difficult problem. CloudFlare should be commended for tackling it and for their generosity in making their solution free. However, they've only solved part of the problem, and this is an instance where half measures are worse than no measures at all. CloudFlare should abolish Flexible SSL and make setting up non-Flexible SSL easier. In particular, they should hurry up the rollout of the "CloudFlare Origin CA," and instead of requiring users to submit a CSR to be signed by CloudFlare, they should let users download, in a single click, both a private key and a certificate to be installed on their origin servers. (Normally I'm averse to certificate authorities generating private keys for their users, but in this case, it would be a private CA used for nothing but the connection between CloudFlare and the origin, so it would be perfectly secure for CloudFlare to generate the private key.)

If CloudFlare continues to offer Flexible SSL, they should at least include an HTTP header in the response indicating that the connection was not encrypted all the way to the origin. Ideally, this would be standardized and web browsers would not display the same visual indication as proper HTTPS connections if this header is present. Even without browser standardization, the header could be interpreted by a browser extension that could be included in privacy-conscious browser packages such as the Tor Browser Bundle. This would provide the benefits of Flexible SSL without creating a false sense of security, and help fulfill CloudFlare's stated goal to build a better Internet.

Posted on 2014-09-30 at 17:47:16 UTC | Comments

← Older Posts