Blog Posts from 2018 - Andrew Ayer

Blog

April 13, 2018

Making Certificates Easier and Helping the Ecosystem: Four Years of SSLMate

I'm not actually sure when SSLMate was born. I got the idea, registered the domain name, and wrote the first lines of code in August 2013, but I put it on the backburner until March 2014. I think I "launched" in early April, but since I thought of SSLMate as a side project mainly for my own use, I didn't do anything special.

I do know that I sold my first certificate on April 13, 2014, four years ago to this day. I sold it to a friend who needed to replace his certificates after Heartbleed and was fed up with how hard his certificate authority was making it. It turns out a lot of people were generally fed up with how hard certificate authorities made things, and in the last four years, SSLMate has exceeded my wildest expectations and become my full-time job.

Sadly, certificate resellers have a deservedly bad reputation, which has only gotten worse in recent months thanks to the likes of Trustico and other resellers with terrible security practices. But it's not entirely a hellscape out there, and for SSLMate's birthday, I thought it would be nice to celebrate the ways SSLMate has been completely unlike a typical certificate reseller, and how SSLMate has allowed me to pursue work over the last four years that lifts up the Web PKI as a whole.

SSLMate was different from the beginning. When SSLMate launched in 2014, it was the first, and only, way you could get a publicly-trusted SSL certificate entirely from the command line. SSLMate made automated certificate issuance accessible to anyone, not just large customers of certificate authorities. More importantly, it significantly improved the usability of certificate issuance at a time when getting a certificate meant running long OpenSSL commands, copy-and-pasting PEM blobs to and from websites, and manually extracting a Zip file and assembling the correct certificate chain. The state of the art in usability was resellers generating, and possibly storing, your private key on their servers. SSLMate showed that certificate issuance could be easy without having to resort to insecure practices.

In September 2014, SSLMate stopped selling multi-year certificates. This was unusual at a time when certificates could be valid for up to five years. The change was unpopular among a handful of customers, but it was unquestionably the right thing to do. Shorter-lived certificates are better for the ecosystem, since they allow security standards to advance more quickly, but they are also better for customers. I learned from the SHA-1 deprecation that a multi-year certificate might not remain valid for its entire term, and it felt wrong to sell a product that I might not be able to deliver on. Sure enough, the five year Symantec certificates that SSLMate was reselling at the time will all become prematurely invalid as of the next Chrome release. (The affected customers all got free replacements, but it's still not what they were expecting.)

The industry is following SSLMate's lead: the CA/Browser Forum limited certificate lifetimes to three years beginning in 2015, and further limited lifetimes to two years beginning last month.

In April 2015, SSLMate released its first public REST API. (While we already had an API for use by the command-line tool, it wasn't previously documented.) As far as I know, this was the world's first fully self-service API for the automated issuance of publicly-trusted SSL certificates. Although major certificate authorities, and some resellers, had APIs (indeed, SSLMate used them under-the-hood), every one of the many APIs I looked at required you to ask a human to enable API access for your account. Some even required you to get on the phone to negotiate prices. With SSLMate's API, you could sign up yourself and immediately start issuing publicly-trusted certificates.

In June, SSLMate published a blog post explaining how to set up OCSP Stapling in Apache and nginx. Resources at the time were pretty bad, so I had to dive into the source code for Apache and nginx to learn how stapling really worked. I was rather horrified at what I saw. The worst bug was that nginx would sometimes staple an expired OCSP response, which would cause Firefox to reject the certificate. So, I submitted a patch fixing it.

In August, several folks asked me to review the ACME specification being worked on at the IETF and provide feedback based on my experience with automated certificate issuance APIs. While reading the draft, I was very bothered by the fact that RSA and ECDSA signatures were being used without any associated message. I had never heard of duplicate signature key selection attacks, but I knew that crypto wasn't being used properly, and when crypto isn't used properly, bad things tend to happen. So I dusted off my undergraduate number theory textbook and came up with an attack that broke ACME, allowing attackers to get unauthorized certificates. After my disclosure, ACME was fixed, before it was deployed in the Web PKI.

In March 2016, it occurred to me that signing OCSP responses with a weak hash function such as SHA-1 could probably lead to the forgery of a trusted certificate. It was already known that signing a certificate with a weak hash function could lead to the forgery of another certificate using a chosen-prefix collision attack, so CAs were forbidden from signing certificates using SHA-1. However, no one had demonstrated a collision attack against OCSP responses, and CAs were allowed to sign OCSP responses with SHA-1.

I figured out how to execute a chosen-prefix attack against OCSP responses, rented a GPU instance in EC2 to make a proof-of-concept with MD5, and scanned every OCSP responder I could find to see which ones could be used to forge a certificate with a SHA-1 collision attack. I reported my findings to the mozilla.dev.security.policy mailing list. This led to a change in Mozilla's Root Store Policy to forbid CAs from signing OCSP responses with SHA-1 except under safe conditions.

Since then, I've periodically scanned OCSP responders to ensure they remain in compliance.

In July 2016, SSLMate launched Cert Spotter, a Certificate Transparency monitor. The core of Cert Spotter is open source, because I wanted non-profits to be able to easily use Certificate Transparency without depending on a commercial service. I'm proud to say that the Wikimedia Foundation uses the open source Cert Spotter to watch for unauthorized certificates for wikipedia.org and their other domains.

Certificate Transparency was designed to be verifiable, but this only matters if a diverse set of people bother to actually do the verification. Cert Spotter has always verified log behavior, and it has detected log misbehavior that was missed by other monitors.

In March 2017, SSLMate started operating the world's second Certificate Transparency gossip endpoint (Graham Edgecombe gets credit for the first) to provide further resiliency to the Certificate Transparency ecosystem. SSLMate also released ct-honeybee, a lightweight program that queries each Certificate Transparency log for its current state and uploads it to Graham's and SSLMate's gossip endpoints. People are now running ct-honeybee on devices all around the world, helping ensure that logs do not present different views to different parts of the Internet.

In 2017, I attended both Certificate Transparency Policy Days hosted by Google to help hash out policy for the burgeoning Certificate Transparency ecosystem.

In September, to help with the upcoming CAA enforcement deadline, I released a free CAA Test Suite for CAs to use to test their implementations.

What's next for SSLMate? The biggest change over the last four years is that the price of certificates as individual goods has gone to zero. But SSLMate has never really been about selling certificates, but about selling easy-to-use software, good support, and a service for managing certificates. And I still see a lot of work to be done to make certificates even easier to work with, particularly with all the new ways certificates are going to be used in the future. I'm pleased to be kicking off SSLMate's fifth year with the release of SSLMate for SaaS, a new service that provides an easy, high-level way for SaaS companies to get certificates for the customer domains they host. This is the first of many exciting announcements in store for this year.

Comments

March 29, 2018

These Three Companies Are Doing the Internet a Solid By Running Certificate Transparency Logs

When we use the Internet, we rely on the security of the certificate authority system to ensure we are talking with the right people. Unfortunately, the certificate authority system is a bit of a mess. One of the ways we're trying to clean up the mess is Certificate Transparency, an effort to put all SSL certificates issued by public certificate authorities in public, verifiable, append-only logs. Domain owners can monitor the logs for unauthorized certificates, and web browsers can monitor for compliance with the rules and take action against non-compliant certificate authorities. After ramping up for the last four years, Certificate Transparency is about to enter prime time: Google Chrome is requiring that all certificates issued on or after April 30, 2018 be logged.

But who is supposed to run these Certificate Transparency logs? Servers, electricity, bandwidth, and system administrators cost money. Although Google is spearheading Certificate Transparency and operates nine logs that are recognized by Chrome, Certificate Transparency is supposed to benefit everyone and it would be unhealthy for the Internet if Google ran all the logs. For this reason, Chrome requires that certificates be included in at least one log operated by an organization besides Google.

So far, three organizations have stepped up and are operating Certificate Transparency logs that are recognized by Chrome and are open to certificates from any public certificate authority:

DigiCert was the first non-Google organization to set up a log, and they now operate several logs recognized by Chrome. Their DigiCert 2 log accepts certificates from all public certificate authorities. They are also applying for recognition of their Nessie and Yeti log sets, which accept certificates from all public certificate authorities and are each split into five shards based on the expiration year of the certificate. (They also operate DigiCert 1, which only accepts certificates from some certificate authorities, and have three logs acquired from Symantec which they are shutting down later this year.)

DigiCert is notable because they've written their own Certificate Transparency log implementation instead of using an open source one. This is helpful because it adds diversity to the ecosystem, which ensures that a bug in one implementation won't take out all logs.

Comodo Certification Authority (which is thankfully no longer owned by the blowhard who thinks he invented 90 day certificates) operates two logs recognized by Chrome: Mammoth and Sabre. Both logs accept certificates from all public certificate authorities, and run SuperDuper, which is Google's original open source log implementation.

In addition to operating two open logs, Comodo CA runs crt.sh, a search engine for certificates found in Certificate Transparency logs. crt.sh has been an invaluable resource for the community when investigating misbehavior by certificate authorities.

Cloudflare is the latest log operator to join the ecosystem. They operate the Nimbus log set, which accepts certificates from all public certificate authorities and is split into four shards based on the expiration year of the certificate. Nimbus runs Trillian, Google's latest open source implementation, with some Cloudflare-specific patches.

Cloudflare is unique because unlike DigiCert and Comodo CA, they are not a certificate authority. DigiCert and Comodo have an obvious motivation to run logs: they need somewhere to log their certificates so they will be trusted by Chrome. Cloudflare doesn't have such a need, but they've chosen to run logs anyways.

DigiCert, Comodo CA, and Cloudflare should be lauded for running open Certificate Transparency logs. None of them have to do this. Even DigiCert and Comodo could have adopted the strategy of their competitors and waited for someone else to run a log that would accept their certificates. Their willingness to run logs shows that they are invested in improving the Internet for everyone's benefit.

We need more companies to step up and join these three in running public Certificate Transparency logs. How about some major tech companies? Although we all benefit from the success of Certificate Transparency, large tech companies benefit even more: they are bigger targets than the rest of us, and they have more to gain when the public feels secure conducting business online. Major tech companies are also uniquely positioned to help, since they already run large-scale Internet infrastructure which could be used to host Certificate Transparency logs. And what kind of tech company doesn't want the cred that comes from helping the Internet out?

If you're a big tech company that knows how to run large-scale infrastructure, why aren't you running a Certificate Transparency log too?

Comments

January 21, 2018

Google's Certificate Revocation Server Is Down - What Does It Mean?

Earlier today, someone reported to the mozilla.dev.security.policy mailing list that they were unable to access any Google websites over HTTPS because Google's OCSP responder was down. David E. Ross says the problem started two days ago, and several Tweets confirm this. Google has since acknowledged the report. As of publication time, the responder is still down for me, though Ross reports it's back up. (Update: a fix is being rolled out.)

What's an OCSP Responder?

OCSP, which stands for Online Certificate Status Protocol, is the system used by SSL/TLS clients (such as web browsers) to determine if an SSL/TLS certificate is revoked or not. When an OCSP-using TLS client connects to a TLS server such as https://www.google.com, it sends a query to the OCSP responder URL listed in the TLS server's certificate to see if the certificate is revoked. If the OCSP responder replies that it is, the TLS client aborts the connection. OCSP responders are operated by the certificate authority which issued the certificate. Google has its own publicly-trusted certificate authority (Google Internet Authority G2) which issues certificates for Google websites.

I thought Chrome didn't support OCSP?

You're correct. Chrome famously does not use OCSP. Chrome users connecting to Google websites are blissfully unaware that Google's OCSP responder is down.

But other TLS clients do use OCSP, and as a publicly-trusted certificate authority, Google is required by the Baseline Requirements to operate an OCSP responder. However, certificate authorities have historically done a bad job operating reliable OCSP responders, and firewalls often get in the way of OCSP queries. Consequentially, although web browsers like Edge, Safari, and Firefox do contact OCSP responders, they use "soft fail" and allow a connection if they don't get a well-formed response from the responder. Otherwise, they'd constantly reject connections that they shouldn't. (This renders OCSP almost entirely pointless from a security perspective, since an attacker with a revoked certificate can usually just block the OCSP response and the browser will accept the revoked certificate.) Therefore, the practical impact from Google's OCSP responder outage is probably very small. Nearly all clients are going to completely ignore the fact that Google's OCSP responder is down.

That said, Google operates some of the most heavily trafficked sites on the Internet. A wide variety of devices connect to Google servers (Google's FAQ for Certificate Changes mention set-top boxes, gaming consoles, printers, and even cameras). Inevitably, at least some of these devices are going to use "hard fail" and reject connections when an OCSP responder is down. There are also people, like the mozilla.dev.security.policy poster, who configure their web browser to use hard fail. Without a doubt, there are people who are noticing problems right now.

I thought Google had Site Reliability Engineers?

Indeed they do, which is why this incident is noteworthy. As I mentioned, certificate authorities tend to do a poor job operating OCSP responders. But most certificate authorities run off-the-shelf software and employ no software engineers. Some regional European certificate authorities even complain when you report security incidents to them during their months-long summer vacations. So no one is surprised when those certificate authorities have OCSP responder outages. Google, on the other hand, sets higher expectations.

Comments

January 10, 2018

How will Certificate Transparency Logs be Audited in Practice?

Certificate Transparency, the effort to detect misissued SSL certificates by publishing all certificates in public logs, only works if TLS clients reject certificates that are not logged. Otherwise, certificate authorities could just not log the certificates that they misissue. TLS clients accomplish this by requiring that a certificate be accompanied by a "signed certificate timestamp" (SCT), which is a promise by a log to include the certificate within 24 hours of the SCT's issuance timestamp. (This period is called the Maximum Merge Delay. It doesn't have to be 24 hours, but it is for all logs currently trusted by Chrome.) The SCT can be embedded in the certificate, the OCSP response (which the TLS server must staple), or the TLS handshake.

But an SCT is only a promise. What if the log breaks its promise and never includes the certificate? After all, certificate authorities promise not to issue bad certificates, but they do so in droves. We don't want to replace the problem of untrustworthy certificate authorities with the problem of untrustworthy Certificate Transparency logs. Fortunately, Certificate Transparency was designed to be verifiable, making it possible, in theory, for the TLS client to verify that the log fulfills its promise. Unfortunately, SCT verification seems to be a rather elusive problem in practice. In this post, I'm going to explore some of the possible solutions, and explain their shortcomings.

Direct Proof Fetching

A Certificate Transparency log stores its certificates in the leaves of an append-only Merkle Tree, which means the log can furnish efficient-to-verify cryptographic proofs that a certificate is included in the tree, and that one tree is an appendage of another. The log provides several HTTP endpoints for acquiring these proofs, which a TLS client can use to audit an SCT as follows:

Use the get-sth endpoint to retrieve the log's latest signed tree head (STH). Store the returned tree head.
Use the get-sth-consistency endpoint to retrieve a consistency proof between the tree represented by the previously-stored tree head, and the tree represented by the latest tree head. Verify the proof. If the proof is invalid, it means the new tree is not an appendage of the old tree, so the log has violated its append-only property.
Use the get-proof-by-hash endpoint to retrieve an inclusion proof for the SCT based on the latest tree head. Verify the proof. If the proof is invalid, it means the log has not included the corresponding certificate in the log.

Note that steps 1 and 2 only need to be done periodically for each log, rather than once for every certificate validation.

There are several problems with direct proof fetching:

Logs can't handle the load of every web browser everywhere contacting them for every TLS connection ever made.
When the client asks for an inclusion proof in step 3, it has to reveal to the log which SCT it wants the proof for. Since each SCT corresponds to a certificate, and certificates contain domain names, the log learns every domain the client is visiting, which is a violation of the user's privacy.
Since the log has up to 24 hours to include a certificate, the certificate might not be included at the time the client sees an SCT. Furthermore, even if 24 hours have elapsed, it would slow down the TLS client to retrieve a proof during the TLS handshake. Therefore, the client has to store the SCT and certificate and fetch the inclusion proof at some future time.

This exposes the client to denial-of-service attacks, wherein a malicious website could try to exhaust the client's storage by spamming it with certificates and SCTs faster than the client can audit them (imagine a website which uses JavaScript to make a lot of AJAX requests in the background). To avoid denial-of-service, the client has to limit the size of its unverified SCT store. Once the limit is reached, the client has to stop adding to the store, or evict old entries. Either way, it exposes the client to flushing attacks, wherein an attacker with a misissued certificate and bogus SCT spams the client with good certificates and SCTs so the bad SCT never gets verified.

Another question is what to do if the verification fails. It's too late to abort the TLS handshake and present a certificate error. Most users would be terribly confused if their browser presented an error message about a misbehaving log, so there needs to be a way for the client to automatically report the SCT to some authority (e.g. the browser vendor) so they know that the log has misbehaved and needs to be distrusted.

Proxied Auditing

To solve the scalability problem, clients can talk with servers operated by the client software supplier, rather than the logs themselves. (The client software supplier has to be comfortable operating scalable infrastructure; Google is, at least.)

For example, Chrome doesn't retrieve STHs and consistency proofs directly from logs. Instead, Google-operated servers perform steps 1 and 2 and distribute batches of STHs (called STHSets) to clients using Chrome's update system. Users have to trust the Google servers to faithfully audit the STHs, but since Chrome users already have to trust Google to provide secure software, there is arguably no loss in security with this approach.

Unfortunately, proxied auditing doesn't fully solve the privacy problem. Instead of leaking every domain the user visits to the log operator, requesting an inclusion proof leaks visited domains to the client software supplier.

DNS Proof Fetching

Chrome's solution to the privacy problem is to fetch inclusion proofs over DNS. Instead of making an HTTP request to the get-proof-by-hash endpoint, Chrome will make a series of DNS requests for sub-domains of ct.googleapis.com which contain the parameters of the inclusion proof request. For example:

D4S6DSV2J743QJZEQMH4UYHEYK7KRQ5JIQOCPMFUHZVJNFGHXACA.hash.pilot.ct.googleapis.com

The name servers for ct.googleapis.com respond with the inclusion proof inside TXT records.

Chrome believes DNS proof fetching is better for privacy because instead of their servers learning the Chrome user's own IP address, they will learn the IP address of the user's DNS resolver. If the user is using an ISP's DNS resolver, its IP address will be shared among all the ISP's users. Although DNS is unencrypted and an eavesdropper along the network path could learn what inclusion proofs the user is fetching, they would already know what domains the user is visiting thanks to the DNS queries used to resolve the domain in the first place.

However, there are some caveats: if a user doesn't share a DNS resolver with many other users, it may be possible to deanonymize them based on the timing of the DNS queries. Also, if a user visits a domain handled by an internal DNS server (e.g. www.intranet.example.com), that DNS query won't leak onto the open Internet, but the DNS query for the inclusion proof will, causing a DNS privacy leak that didn't exist previously. A privacy analysis is available if you want to learn more.

Chrome's DNS proof fetching is still under development and hasn't shipped yet.

Embedded Proofs

The second version of Certificate Transparency (which is not yet standardized or deployed) allows inclusion proofs to be presented to clients alongside SCTs (embedded in the certificate, OCSP response, or TLS handshake) saving the client the need to fetch inclusion proofs itself. That solves the problems above, but creates new ones.

First, inclusion proofs can't be obtained until the certificate is included in the log, which currently takes up to 24 hours. If clients were to require inclusion proofs along with SCTs, a newly issued certificate wouldn't be usable for up to 24 hours. That's OK for renewals (as long as you don't wait until the last minute to renew), but people are used to being able to get new certificates right away.

Second, clients can no longer choose the STH on which an inclusion proof is based. With direct or DNS proof fetching, the client always requests an inclusion proof to the latest STH, and the client can easily verify that the latest STH is consistent with the previous STH. When the client gets an embedded inclusion proof, it doesn't immediately know if the STH on which it is based is consistent with other STHs that the client has observed. The client has to audit the STH.

Unfortunately, the more frequently a log produces STHs that incorporate recently-submitted certificates, the harder it is to audit them. Consider the extreme case of a log creating a new STH immediately after every certificate submission. Although this would let an inclusion proof be obtained immediately for an SCT, auditing the new STH has all the same problems as SCT auditing that were discussed above. It's bad for privacy, since a log can assume that a client auditing a particular STH visited the domain whose certificate was submitted right before the STH was produced. And if the client wants to avoid making auditing requests during the TLS handshake, it has to store the STH for later auditing, exposing it to denial-of-service and flushing attacks.

STH Frequency and Freshness

To address the privacy problems of excessive STH production, CTv2 introduces a new log attribute called the STH Frequency Count, defined as the maximum number of STHs a log may produce in any period equal to the Maximum Merge Delay. The CT gossip draft defines a fresh STH to be one that was produced less than 14 days in the past. Clients could require that embedded inclusion proofs be based on a fresh STH. Then, with an STH Frequency Count that permits one STH an hour, there are only 336 fresh STHs at any given time for any given log - few enough that auditing them is practical and private. Auditing one of 336 STHs doesn't leak any information, and since the number of STHs is bounded, there is no risk of denial-of-service or flushing attacks.

It would also be possible for the client software supplier to operate a service that continuously fetches fresh STHs from logs, audits them for consistency, and distributes them to clients, saving clients the need to audit STHs themselves. (Just like Chrome currently does with the latest STHs.)

Unfortunately, this is not a perfect solution.

First, there would still be a delay before an inclusion proof can be obtained and a newly-issued certificate used. Many server operators and certificate authorities aren't going to like that.

Second, server operators would need to refresh the embedded inclusion proof every 14 days so it is always based on a fresh STH. That rules out embedding the inclusion proof in the certificate, unless the certificate is valid for less than 14 days. The server operator could use OCSP stapling, with the certificate authority responsible for embedding a new inclusion proof every 14 days in the OCSP response. Or the server operator's software could automatically obtain a new inclusion proof every 14 days and embed it in the TLS handshake. Unfortunately, there are no implementations of the latter, and there are very few robust implementations of OCSP stapling. Even if implementations existed, there would be a very long tail of old servers that were never upgraded.

One possibility is to make DNS proof fetching the default, but allow server operators to opt-in to embedded proofs, much like server operators can use HSTS to opt-in to enforced HTTPS. Server operators who run up-to-date software with reliable OCSP Stapling and who don't mind a delay in certificate issuance would be able to provide better security and privacy to their visitors. Maybe at some point in the distant future, embedded proofs could become required for everyone.

All of this is a ways off. CTv2 is still not standardized. Chrome still doesn't do any SCT auditing, and consequentially its CT policy requires at least one SCT to be from a Google-operated log, since Google obviously trusts its own logs not to break its promises. Fortunately, even without widespread log auditing, Certificate Transparency has been a huge success, cleaning up the certificate authority ecosystem and making everyone more secure. Nevertheless, I think it would be a shame if Certificate Transparency's auditability were never fully realized, and I hope we'll be able to find a way to make it work.

Comments

Andrew Ayer

Sections

Blog

Making Certificates Easier and Helping the Ecosystem: Four Years of SSLMate

These Three Companies Are Doing the Internet a Solid By Running Certificate Transparency Logs

Google's Certificate Revocation Server Is Down - What Does It Mean?

What's an OCSP Responder?

I thought Chrome didn't support OCSP?

I thought Google had Site Reliability Engineers?

How will Certificate Transparency Logs be Audited in Practice?

Direct Proof Fetching

Proxied Auditing

DNS Proof Fetching

Embedded Proofs

STH Frequency and Freshness