Skip to Content [alt-c]

June 25, 2020

Writing an SNI Proxy in 115 Lines of Go

The very first message sent in a TLS connection is the Client Hello record, in which the client greets the server and tells it, among other things, the server name it wants to connect to. This is called Server Name Indication, or SNI for short, and it's quite handy as it allows many different servers to be co-located on a single IP address.

The server name is sent in plaintext, which is unfortunately really bad for privacy and censorship resistance, but does enable something very useful: a proxy server can read the server name and use it to decide where to route the connection, without having to decrypt the connection. You can leverage this to make many different physical servers accessible from the Internet even if you have only one public IPv4 address: the proxy listens on your public IP address and forwards connections to the appropriate private IP address based on the SNI.

I just finished writing such a proxy server, which I plan to run on my home network's router so that I can easily access my internal servers from anywhere on the Internet, without a VPN or SSH port forwarding. I was pleased by how easy it was to write this proxy server using only Go's standard library. It's a great example of how well-suited Go is for programs involving networking and cryptography.

Let's start with a standard listen/accept loop (right out of the examples for Go's net package):

func main() {
	l, err := net.Listen("tcp", ":443")
	if err != nil {
		log.Fatal(err)
	}
	for {
		conn, err := l.Accept()
		if err != nil {
			log.Print(err)
			continue
		}
		go handleConnection(conn)
	}
}

Here's a sketch of the handleConnection function, which reads the Client Hello record from the client, dials the backend server indicated by the Client Hello, and then proxies the client to and from the backend. (Note that we dial the backend using the SNI value, which works well with split-horizon DNS where the proxy sees the backend's private IP address and external clients see the proxy's public IP address. If that doesn't work for you, can use more complicated routing logic.)

func handleConnection(clientConn net.Conn) {
	defer clientConn.Close()

	// ... read Client Hello from clientConn ...

	backendConn, err := net.Dial("tcp", net.JoinHostPort(clientHello.ServerName, "443"))
	if err != nil {
		log.Print(err)
		return
	}
	defer backendConn.Close()

	// ... proxy clientConn <==> backendConn ...
}

Let's assume for now we have a convenient function to read a Client Hello record from an io.Reader and return a tls.ClientHelloInfo:

func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error)

We can't simply call this function from handleConnection, because once the Client Hello is read, the bytes are gone. We need to preserve the bytes and forward them along to the backend, which is expecting a proper TLS connection that starts with a Client Hello record.

What we need to do instead is "peek" at the Client Hello record, and thanks to some simple but powerful abstractions from Go's io package, this can be done with just six lines of code:

func peekClientHello(reader io.Reader) (*tls.ClientHelloInfo, io.Reader, error) {
        peekedBytes := new(bytes.Buffer)
        hello, err := readClientHello(io.TeeReader(reader, peekedBytes))
        if err != nil {
                return nil, nil, err
        }
        return hello, io.MultiReader(peekedBytes, reader), nil
}

What this code does is create a TeeReader, which is a reader that wraps another reader and writes everything that is read to a writer, which in our case is a byte buffer. We pass the TeeReader to readClientHello, so every byte read by readClientHello gets saved to our buffer. Finally, we create a MultiReader which essentially concatenates our buffer with the original reader. Reads from the MultiReader initially come out of the buffer, and when that's exhausted, continue from the original reader. We return the MultiReader to the caller along with the ClientHelloInfo. When the caller reads from the MultiReader it will see a full TLS connection stream, starting with the Client Hello.

Now we just need to implement readClientHello. We could open up the TLS RFCs and learn how to parse a Client Hello record, but it turns out we can let crypto/tls do the work for us, thanks to a callback function in tls.Config called GetConfigForClient:

// GetConfigForClient, if not nil, is called after a ClientHello is
// received from a client.
GetConfigForClient func(*ClientHelloInfo) (*Config, error) // Go 1.8

Roughly, what we need to do is create a TLS server-side connection with a GetConfigForClient callback that saves the ClientHelloInfo passed to it. However, creating a TLS connection requires a full-blown net.Conn, and readClientHello is passed merely an io.Reader. So let's create a type, readOnlyConn, which wraps an io.Reader and satisfies the net.Conn interface:

type readOnlyConn struct {
        reader io.Reader
}
func (conn readOnlyConn) Read(p []byte) (int, error)         { return conn.reader.Read(p) }
func (conn readOnlyConn) Write(p []byte) (int, error)        { return 0, io.ErrClosedPipe }
func (conn readOnlyConn) Close() error                       { return nil }
func (conn readOnlyConn) LocalAddr() net.Addr                { return nil }
func (conn readOnlyConn) RemoteAddr() net.Addr               { return nil }
func (conn readOnlyConn) SetDeadline(t time.Time) error      { return nil }
func (conn readOnlyConn) SetReadDeadline(t time.Time) error  { return nil }
func (conn readOnlyConn) SetWriteDeadline(t time.Time) error { return nil }

readOnlyConn forwards reads to the reader and simulates a broken pipe when written to (as if the client closed the connection before the server could reply). All other operations are a no-op.

Now we're ready to write readClientHello:

func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error) {
        var hello *tls.ClientHelloInfo

        err := tls.Server(readOnlyConn{reader: reader}, &tls.Config{
                GetConfigForClient: func(argHello *tls.ClientHelloInfo) (*tls.Config, error) {
                        hello = new(tls.ClientHelloInfo)
                        *hello = *argHello
                        return nil, nil
                },
        }).Handshake()

        if hello == nil {
                return nil, err
        }

        return hello, nil
}

Note that Handshake always fails because the readOnlyConn is not a real connection. As long as the Client Hello is successfully read, the failure should only happen after GetConfigForClient is called, so we only care about the error if hello was never set.

Let's put everything together to write the full handleConnection function. I've added deadlines (thanks, Filippo!) and a check that the SNI value ends with .internal.example.com to prevent this from being used as an open proxy. When I deploy this, I will use the DNS suffix of my home network.

func handleConnection(clientConn net.Conn) {
	defer clientConn.Close()

	if err := clientConn.SetReadDeadline(time.Now().Add(5 * time.Second)); err != nil {
		log.Print(err)
		return
	}

	clientHello, clientReader, err := peekClientHello(clientConn)
	if err != nil {
		log.Print(err)
		return
	}

	if err := clientConn.SetReadDeadline(time.Time{}); err != nil {
		log.Print(err)
		return
	}

	if !strings.HasSuffix(clientHello.ServerName, ".internal.example.com") {
		log.Print("Blocking connection to unauthorized backend")
		return
	}

	backendConn, err := net.DialTimeout("tcp", net.JoinHostPort(clientHello.ServerName, "443"), 5*time.Second)
	if err != nil {
		log.Print(err)
		return
	}
	defer backendConn.Close()

        var wg sync.WaitGroup
        wg.Add(2)

        go func() {
                io.Copy(clientConn, backendConn)
                clientConn.(*net.TCPConn).CloseWrite()
                wg.Done()
        }()
        go func() {
                io.Copy(backendConn, clientReader)
                backendConn.(*net.TCPConn).CloseWrite()
                wg.Done()
        }()

        wg.Wait()
}

Here's the complete Go source code - just 115 lines! (Not counting copyright legalese)

Comments

June 18, 2020

Security Review of CFSSL Signer Code

Certificate signing is the most security-sensitive task performed by a certificate authority. The CA has to sign values, like DNS names, that are provided by untrusted sources. The CA must rigorously validate these values before signing them. If an attacker can bypass validation and get untrusted data included in a certificate, the results can be dire. For example, if an attacker can trick a CA into including an arbitrary SAN extension, they can get a certificate for domains they don't control.

Unfortunately, there is a history of CAs including unvalidated information in certificates. A common cause is CAs copying information directly from CSRs instead of from more constrained information sources. Since CSRs can contain both subject identity information and arbitrary certificate extensions, directly ingesting CSRs is extremely error-prone for CAs. For this reason, CAs would be well-advised to extract the public key from the CSR very early in the certificate enrollment process and discard everything else. In a perfect world, CAs would accept standalone public keys from subscribers instead of CSRs. (Before you say "proof-of-possession", see the appendix.)

I decided to review the signing code in CFSSL, the open source PKI toolkit used by Let's Encrypt's Boulder, to see how it stacks up against this advice. Unfortunately, I found that CFSSL copies subject identity information from CSRs by default, has features that are hard to use safely, and uses complicated logic that obfuscates what is included in certificate fields. I recommend that publicly-trusted CAs not use CFSSL.

Scope of review

I reviewed the CFSSL Git repository as of commit 6b49beae21ff90a09aea3901741ef02b1057ee65 (the HEAD of master at the time of my review). I reviewed the code in the signer and signer/local packages.

The signing operation

In CFSSL, you sign a certificate by invoking the Sign function on the signer.Signer interface, which has this signature:

Sign(req SignRequest) (cert []byte, err error)

There is only one actual implementation of signer.Signer: local.Signer. (The other implementations, remote.Signer and universal.Signer, are ultimately wrappers around local.Signer.)

Inputs to the signing operation

At a high level, inputs to the signing operation come from three to four places:

  • The Signer object, which contains:

    • The private key and certificate of the CA
    • A list of named profiles plus a default profile
    • A signature algorithm

    I will refer to this object as Signer.

  • The SignRequest argument, whose relevant fields are:

    Hosts []string Request string // The CSR Subject *Subject Profile string CRLOverride string Serial *big.Int Extensions []Extension NotBefore time.Time NotAfter time.Time

    I will refer to this object as SignRequest.

  • The Signer's default certificate profile, represented by an instance of the SigningProfile struct. I will refer to the default profile as defaultProfile.

  • The effective certificate profile, represented by an instance of the SigningProfile struct. I will refer to the effective profile as profile. If the profile named by SignRequest.Profile exists in Signer, then profile is that profile. If it doesn't exist, then profile equals defaultProfile.

The Sign function takes values from these places and combines them to produce the input to x509.CreateCertificate in Go's standard library. There is overlap - for instance SANs can be specified in the CSR, SignRequest.Hosts, or SignRequest.Extensions. How does Sign decide which source to use when constructing the certificate?

Certificate construction logic

To understand how Sign works, I looked at each certificate field and worked backwards to figure out how Sign decides to populate the field. Below are my findings.

Serial Number
  • If profile.ClientProvidesSerialNumbers is true: use SignRequest.Serial (error if not set).
  • Else: generate a random 20 byte serial number.
Not Before
  • If SignRequest.NotBefore is non-zero: use it.
  • Else if profile.NotBefore is non-zero: use it.
  • Else if profile.Backdate is non-zero: use current time - profile.Backdate.
  • Else: use current time - 5 minutes.
Not After
  • If SignRequest.NotAfter is non-zero: use it.
  • Else if profile.NotAfter is non-zero: use it.
  • Else if profile.Expiry is non-zero: use not before + profile.Expiry.
  • Else: use not before + defaultProfile.Expiry.
Signature Algorithm
  • If profile.CSRWhitelist is nil or profile.CSRWhitelist.SignatureAlgorithm is true: Use Signer's signature algorithm.
  • Else: CFSSL leaves the signature algorithm unspecified and the Go standard library picks a sensible algorithm.

Comments: it's weird how something named CSRWhitelist is used to decide whether to use a value that comes not from the CSR, but from Signer. This is probably because CFSSL's ParseCertificateRequest function gets this field from Signer rather than from the CSR that it is parsing. This sort of indirection and misleading naming makes the code hard to understand.

Public Key
  • If profile.CSRWhitelist is nil or profile.CSRWhitelist.PublicKey is is true: Use the CSR's public key.
  • Else: the certificate won't have a public key (this probably causes x509.CreateCertificate to return an error).

Comments: it's unclear why you'd ever want profile.CSRWhitelist.PublicKey to be false. The public key is literally the only piece of information that should be taken from the CSR.

SANs

This one's a doozy...

  • If profile.CopyExtensions is true and profile.CSRWhitelist is nil and the CSR contains a SAN extension and SignRequest.Extensions contains a SAN extension, and the SAN OID is present in profile.ExtensionWhitelist: add two SAN extensions to the certificate, one from the CSR and one from SignRequest.Extensions. Note that SignRequest.Hosts is ignored and profile.NameWhitelist is bypassed.
  • Else if profile.CopyExtensions is true and profile.CSRWhitelist is nil and the CSR contains a SAN extension: use the SAN extension verbatim from the CSR. Note that SignRequest.Hosts is ignored and profile.NameWhitelist is bypassed.
  • Else if SignRequest.Extensions contains a SAN extension, and the SAN OID is present in profile.ExtensionWhitelist: use the SAN extension verbatim from SignRequest.Extensions. Note that SignRequest.Hosts is ignored and profile.NameWhitelist is bypassed.
  • Else if profile.CAConstraint.IsCA is true: the certificate will not contain a SAN extension.
  • Else if SignRequest.Hosts is non-nil:
    1. Use each string in SignRequest.Hosts as follows:
      • If string parses as an IP address: make it an IP Address SAN.
      • Else if string parses as an email address: make it an email address SAN.
      • Else if string parses as a URI, make it a URI SAN.
      • Else: make it a DNS SAN.
    2. If profile.NameWhitelist is non-nil: return an error unless the string representation of every DNS, email, and URI SAN matches the profile.NameWhitelist regex (IP address SANs are not checked).
  • Else if profile.CSRWhitelist is nil and the CSR contains a SAN extension:
    1. Copy the DNS names, IP addresses, email addresses, and URIs from the CSR's SAN extension.
    2. If profile.NameWhitelist is non-nil: enforce whitelist as described above.
  • Else if profile.CSRWhitelist is non-nil and the CSR contains a SAN extension:
    1. If profile.CSRWhitelist.DNSNames is true: use DNS names from the CSR's SAN extension.
    2. If profile.CSRWhitelist.IPAddresses is true: use IP addresses from the CSR's SAN extension.
    3. If profile.CSRWhitelist.EmailAddresses is true: use email addresses from the CSR's SAN extension.
    4. If profile.CSRWhitelist.URIs is true: use URIs from the CSR's SAN extension.
    5. If profile.NameWhitelist is non-nil: enforce whitelist as described above.
Subject

For each supported subject attribute (common name, country, province, locality, organization, organizational unit, serial number):

  • If the attribute was specified in SignRequest.Subject: use it.
  • Else if profile.CSRWhitelist is nil or profile.CSRWhitelist.Subject is true: use the attribute from the CSR's subject, if present.

Common name only: if profile.NameWhitelist is non-nil: return an error unless the common name matches the profile.NameWhitelist regex.

Note: SignRequest.Hosts does not override the common name.

Basic Constraints
  • If SignRequest.Extensions contains a basic constraints extension, and the basic constraints OID is present in profile.ExtensionWhitelist: copy the basic constraints extension verbatim from SignRequest.Extensions.
  • Else: use the values from profile.CAConstraint.

Comments: given how security-sensitive this extension is, it's a relief that there's no way for the value to come from the CSR. Despite this, there is code earlier in the signing process that looks at the CSR's Basic Constraints extension. First it's extracted from the CSR in ParseCertificateRequest and then it's validated in Sign. This code ultimately has no effect, but it makes the logic harder to follow (and gave me a mild heart attack when I saw it).

Extensions besides SAN and Basic Constraints

For a given extension FOO:

  • If profile.CopyExtensions is true and profile.CSRWhitelist is nil and the CSR contains a FOO extension and SignRequest.Extensions contains a FOO extension, and FOO is present in profile.ExtensionWhitelist: add two FOO extensions to the certificate, one from the CSR and one from SignRequest.Extensions. Note that fields in SignRequest (like CRLOverride) or profile (like OCSP, CRL, etc.) that would normally control the FOO extension are ignored.
  • Else if profile.CopyExtensions is true and profile.CSRWhitelist is nil and the CSR contains a FOO extension: copy it verbatim from the CSR. Note that fields in SignRequest (like CRLOverride) or profile (like OCSP, CRL, etc.) that would normally control the FOO extension are ignored.
  • Else if SignRequest.Extensions contains a FOO extension, and FOO is present in profile.ExtensionWhitelist: copy it verbatim to the certificate. Note that fields in SignRequest (like CRLOverride) or profile (like OCSP, CRL, etc.) that would normally control the FOO extension are ignored.
  • Else: use fields from SignRequest (like CRLOverride) and profile (like OCSP, CRL, etc.) to decide what value the extension should have, if any.

Other comments

By default, CSRWhitelist is nil. This is a bad default, as it means SANs will be copied from the CSR unless SignRequest.Hosts is set. Likewise, any subject attribute not specified in SignRequest.Subject will be copied from the CSR. This is practically impossible to use safely: to avoid including unvalidated subject information you have to specify a value for every attribute in SignRequest.Subject - and if you don't want the attribute included in the final certificate you're out of luck. If CFSSL ever adds support for a new attribute type, you had better update your code to specify a value for the attribute or unvalidated information might slip through. This is exactly the sort of logic that makes it so easy to accidentally issue certificates with "Some-State" in the subject.

If the profile specified by SignRequest.Profile doesn't exist, the default profile is used. This could lead to an unexpected certificate profile being used if a CA deletes a profile from their configuration but there are still references to it elsewhere. Considering the trouble that CAs have with profile management (see the infamous TURKTRUST incident or the CA that discovered they had a whopping 85 buggy profiles), I think it would be much safer if a non-existent profile resulted in an error.

SignRequest.Hosts is untyped - everything is a string and there is no distinction between IP addresses, email addresses, URIs, and DNS names. (Also, URIs and email addresses aren't "hosts".) CFSSL decides what type of SAN to include based on what the string in Hosts successfully parses as, and assumes it's a DNS name if it doesn't parse as anything else. This could lead to unexpected SAN types in the certificate. Determining if a string was intended to be a URI by trying to parse it is an especially bad idea considering how hellish URIs are to parse, and how much variation there is between different URI parsing implementations. If the user of CFSSL adds a string which they believe to be a valid URI to SignRequest.Hosts, but Go's URI parser rejects it, the URI will end up in a DNS SAN instead.

Variable names are inconsistent and often unhelpful. In Sign, req is used for values from SignRequest and safeTemplate is used for values from the CSR. But in PopulateSubjectFromCSR (which is called by Sign), req is used for values from the CSR, and s is used for values from the SignRequest. This increases the likelihood of accidentally using data from the wrong source.

ParseCertificateRequest blindly and unconditionally copies the extensions from the CSR to the Extensions field of the x509.Certificate template - even if profile.CopyExtensions is false. Fortunately, this field is ignored by x509.CreateCertificate so it's probably harmless. It just means that attacker-controlled input is propagated further through the program, increasing the opportunity for it to be misused.

CopyExtensions is a foot cannon

I am extremely concerned by the presence of the CopyExtensions option. Enabling it practically guarantees misissuance because all extensions (except Basic Constraints) are copied verbatim from the CSR, overriding any value specified in the profile or the SignRequest. In particular, SignRequest.Hosts and profile.NameWhitelist are ignored if the CSR contains a SAN extension. Also, profile.ExtensionWhitelist only applies to extensions specified in SignRequest - not those specified in the CSR. I think it's quite likely that users of CopyExtensions will be surprised when neither of these whitelists are effective.

Lack of documentation

As I showed above, the logic for constructing a certificate is very complicated, and you have to use CFSSL in exactly the right way to avoid copying unvalidated information from CSRs. Unfortunately, documentation is practically non-existent and I could only figure out CFSSL's logic by reading the source code. Obviously, the lack of documentation makes it hard to use CFSSL safely. But the more fundamental problem is that documentation writing wasn't a core part of CFSSL's engineering process. Had documentation been written in tandem with the design and implementation of CFSSL, it would have been evident that incomprehensibility was spiraling out of control. This information could have been fed back into the engineering process and used to redesign or even reject features that made the system too hard to understand. I have personally saved myself many times from releasing overly-complicated software just by writing the documentation for it.

Final thoughts

CFSSL has some nice features, like its friendly command line interface and its certificate bundler for building optimal certificate chains. However, I struggle to see the value provided by its signer package. Its truly useful functionality, like Certificate Transparency submission and pre-issuance linting, could be extracted into standalone libraries. The rest of the signer is just a complicated wrapper around Go's x509.CreateCertificate that obscures what gets included in certificates and will include the wrong thing if you hold it wrong. A long history of misissuance shows us why we need better. If you're a CA, just call x509.CreateCertificate directly - it will be much easier to ensure you are only including validated information in your certificates.

Appendix: Proof-of-Possession and TLS

A common but unfounded objection to discarding everything in a CSR except the public key is that checking the CSR's signature is necessary because it ensures proof-of-possession of the private key. If a CA doesn't verify proof-of-possession, then someone could obtain a certificate for a key which belongs to someone else. (In fact, someone recently got a certificate containing Let's Encrypt's public key.) For TLS, this doesn't matter. (Other protocols, like S/MIME, may be different.) The TLS protocol ensures proof-of-possession every time the certificate is used.

For TLS 1.3, this is easy to see: the server or client has to send a Certificate Verify message which contains a signature from their private key over a transcript of the handshake. The handshake includes their certificate, which is a superset of the information in a CSR. Therefore, the Certificate Verify message proves at least as much as the CSR signature does. In fact it's better, since the proof is fresh and not reliant on a trusted third party doing its job correctly.

In earlier versions of TLS, client certificates are verified in the same way (signing a handshake transcript which includes the certificate). Server certificates are used differently, but ultimately the handshake transcript (which includes the server certificate) is authenticated by a shared secret that is known only to the client and the holder of the certificate private key (provided neither party deliberately sabotages their security). So as with TLS 1.3, private key possession is proven, rendering the CSR signature unnecessary.

Comments

May 30, 2020

Fixing the Breakage from the AddTrust External CA Root Expiration

A lot of stuff on the Internet is currently broken on account of a Sectigo root certificate expiring at 10:48:38 UTC today. Generally speaking, this is affecting older, non-browser clients (notably OpenSSL 1.0.x) which talk to TLS servers which serve a Sectigo certificate chain ending in the expired certificate. See also this Twitter thread by Ryan Sleevi.

This post is going to explain what you should do to avoid problems, from the perspectives of both server operators (tldr: test your server with What's My Chain Cert? and do what it says) and client operators (tldr: upgrade your TLS libraries if possible, otherwise remove AddTrust External CA Root from your trust store).

Quick primer on certificate chains

When you connect to a TLS server, the server sends the client a certificate that proves its identity. The client needs to build a chain of certificates from the server certificate to a root certificate that the client trusts. To help the client build this chain, the server sends back one or more intermediate certificates after its own certificate.

For example, my website sends the following two certificates:

SubjectIssuerExpiration
www.agwa.nameSectigo RSA Domain Validation Secure Server CA2021-04-03
Sectigo RSA Domain Validation Secure Server CAUSERTrust RSA Certification Authority2030-12-31

The first certificate is mine and is issued by Sectigo RSA Domain Validation Secure Server CA. The second certificate is Sectigo RSA Domain Validation Secure Server CA and is issued by USERTrust RSA Certification Authority, which is a root certificate. These two certificates form a complete chain to a trusted root.

However, USERTrust RSA Certification Authority is a relatively new root. It was created in 2010, and it took many years for it to become trusted by all clients. As recently as last year I heard reports of clients not trusting this root.

For this reason, some servers send back a chain with an additional intermediate certificate:

SubjectIssuerExpiration
www.agwa.nameSectigo RSA Domain Validation Secure Server CA2021-04-03
Sectigo RSA Domain Validation Secure Server CAUSERTrust RSA Certification Authority2030-12-31
USERTrust RSA Certification AuthorityAddTrust External CA Root2020-05-30

This sequence of certificates form a chain to another root called AddTrust External CA Root which was created in 2000 and is trusted by many client platforms. Or rather, it was trusted before it expired today.

Fortunately, modern clients with well-written certificate validators (this includes all mainstream web browsers) won't have a problem with the expiration. Since they trust the USERTrust RSA Certification Authority root, they will build a chain to that root and ignore the fact that the server sent an expired intermediate certificate.

Other clients, notably anything using OpenSSL 1.0.x or GnuTLS, will have a problem. Even if these clients trust the USERTrust RSA Certification Authority root, and could build a chain to it if they wanted, they'll end up building a chain to AddTrust External CA Root instead, causing the certificate validation to fail with an expired certificate error.

Fixing this problem as a server operator

Basically, you need to remove the intermediate certificate issued by AddTrust External CA Root from your certificate chain.

If you get your certificates from SSLMate, you don't need to worry. I saw this coming over a year ago, and configured SSLMate to start providing a chain without AddTrust External CA Root. As certificates renewed, SSLMate customers received the new chain, and since SSLMate has long capped certificate lifetimes at one year, the older chain was cycled out before the intermediate expired.

But if your server is using Sectigo certificates from another source, you might need to worry. You can quickly test if your server is affected using What's My Chain Cert?. If your server is OK, it will say "correct chain". If it's sending the expired intermediate, it will say "trusted chain containing an expired certificate" and provide you with a link to download a correct, non-expired chain.

Fixing this problem as a client operator

In a perfect world, all of your libraries would be up-to-date and you wouldn't be using clownish TLS implementations like GnuTLS. But the world isn't perfect. OpenSSL 1.0.x is still common, and curl used it as recently as Debian Stretch. And APT, the package manager used by Debian and Ubuntu, links with GnuTLS.

Fortunately, OpenSSL 1.0.x and GnuTLS (at least on Debian) only choke on the expired intermediate if the AddTrust External CA Root root is in the local trust store. If it isn't, they will build a chain to USERTrust RSA Certification Authority instead. On Debian (and probably Ubuntu but I haven't tested), you can easily remove this root from the trust store as follows:

  1. Edit /etc/ca-certificates.conf and put a bang/exclamation mark (!) before mozilla/AddTrust_External_Root.crt
  2. Run update-ca-certificates

For Fedora and RHEL, see this Tweet by Christian Heimes.

Comments

February 8, 2020

Short Take: Why Trust-On-First-Use Doesn't Work (Even for SSH)

Considering all the progress that has been made over the last decade making SSL certificates on the Web easy, free, automated, and transparent, it's a bit jarring to see someone arguing in 2020 that trust-on-first-use (TOFU) would be better for the Web:

Unpopular opinion. Most people would be better off with a Trust On First Use system for accessing sites. Like SSH, perhaps with some unique (per user) OOB addition to it. Would we really design it this way of starting again?

— Nick Hutton @nickdothutton, Feb 6, 2020

First, be wary of any comparison with SSH, because in the grand scheme of things, very few people use SSH. *nix sysadmins do, obviously. Many, but not all, software developers do. Some people in engineering/science fields might. But that's a drop in the bucket compared to the Web, which basically everyone uses. So just because something appears to work for SSH doesn't mean it will work for the Web.

And I would argue that TOFU actually doesn't work very well for SSH, and the only reason we put up with it is because of SSH's low deployment. SSH server host keys rarely change (which is bad for post-compromise security, so this is nothing to celebrate), but when they do, SSH handles it very poorly. The user gets a big scary message about a possible man-in-the-middle attack. And then what do you think they do? They do this:

Hi all,

It appears that as of midnight last night, SSH and login are working. However, there were a couple students last night who were getting errors such as “REMOTE HOST IDENTIFICATION HAS CHANGED!” or “POSSIBLE DNS SPOOFING DETECTED!” when trying to SSH in.

To fix this, you can run `ssh-keygen -R [REDACTED]` then try to SSH in again. I believe someone else mentioned last night that you could also just delete the entire ~/.ssh/known_hosts file as well to fix the issue, but this seems to be a less destructive solution.

That's from a real email that I once received. I would not be at all surprised if TOFU actually devolves to opportunistic encryption in practice, because users just bypass any man-in-the-middle error they receive.

You could make it really hard to bypass man-in-the-middle errors, but then people would brick their servers, as happened with HTTP public key pinning, which is one of the reasons why that technology is now extinct.

Proponents of TOFU might say that even if TOFU devolves to opportunistic encryption, the man-in-the-middle errors at least make attacks noisy. True, but the errors are seen by people who generally don't know what they mean and even if they did, can't evaluate whether an error is a legitimate key change or an actual attack. In contrast, a PKI with Certificate Transparency (i.e. the system currently deployed on the Web) also makes attacks noisy, but alerts about new certificates go to server operators, who actually know whether a new certificate is legitimate or not. They just need to be monitoring Certificate Transparency logs.

So yes, I do believe we would design the Web this way if starting again.

Comments

February 3, 2020

When Will Your DNS Record Be Published?

When publishing a DNS record through an API, it's often useful to know when the DNS record has been fully published and is visible to DNS resolvers. A perfect example which comes up at SSLMate is automatically validating a certificate request by publishing a DNS record. SSLMate must be sure that the DNS record is visible before it tells the certificate authority to validate it, or the certificate request may fail.

Unfortunately, I know of only one DNS provider that has an API to tell you when a change is published: Route 53. After submitting a DNS change request to Route 53, the API returns a ChangeInfo object which contains a status of either "PENDING" or "INSYNC". You can poll the change until its status becomes "INSYNC", which means the change has taken effect on all Route 53 servers. SSLMate has published a lot of DNS records through Route 53 and this API has never let me down, which makes me happy.

Other DNS providers offer absolutely nothing to help you determine when a DNS change is visible. In these cases, SSLMate can do nothing but sleep for 10-120 seconds (depending on the provider) and hope for the best. Unfortunately, it doesn't help for SSLMate to try to resolve the DNS record to see if the record has been published - modern authoritative DNS services use many different servers, often with anycast or load balancing, so just because SSLMate sees the record doesn't mean that others will.

And then there's Google Cloud DNS, which deserves a special mention because they offer an API that looks very similar to Route 53's: after submitting a change request, the API returns a change object with a status of "pending" that you can poll until the status becomes "done". Sounds perfect! Except if you read the fine print, it says:

A status of "done" means that the request to update the authoritative servers has been sent, but the servers might not be updated yet

Sure enough, I found that it often takes two minutes after a change becomes "done" for it to be fully visible. The change object also contains a very bizarre boolean called "isServing", which is documented as:

If the DNS queries for the zone will be served.

I'm not sure what this means, or why information about the zone's status would be present in a record change object. In my testing I never once saw a value besides false, even long after queries for both the individual record and the zone as a whole were being served.

So the change object API is completely useless, and I don't know why it exists - who cares if the "request to update the authoritative servers has been sent"? That's an internal implementation detail. It only matters to users of the API if the change has been fully applied everywhere. So, SSLMate doesn't use change objects. It sleeps for 2 minutes after adding the record and hopes for the best.

All of this is exasperated when requesting a certificate using the ACME protocol. With ACME, if you tell the server the DNS record is published, but the server doesn't see the record, your certificate order is invalidated. You have to create a new order, and you're given a different DNS record that you have to publish. That means your ACME client could potentially get in a situation where it never makes forward progress, because on each attempt it fails to wait long enough before telling the ACME server to check the record.

SSLMate has a workaround for this when talking to an ACME-using certificate authority such as Let's Encrypt. Instead of publishing the record returned by the ACME server, SSLMate publishes an NS record that delegates the record to a custom-built authoritative DNS server operated by SSLMate. SSLMate's authoritative server returns the record provided by the ACME server. The NS record never changes, so if checking the record fails and SSLMate has to create a new ACME order, it doesn't need to republish a DNS record in the customer's zone; instead it just has to update the record that SSLMate's authoritative server returns, which can be done instantaneously. Therefore, every retry is more likely to succeed than the previous one since more time has elapsed since publishing the NS record. All of this happens completely automatically and transparently to the user of SSLMate, and is one of the ways that SSLMate provides great dependability. (Another benefit is that if the customer's DNS provider doesn't provide an API, they can publish the NS record manually and never have to touch it again, even for renewals.)

Nevertheless, it would be really nice if more DNS providers offered an API like Route 53 to report when a DNS record has been published.

Comments

Older Posts