Skip to Content [alt-c]

November 12, 2021

It's Now Possible To Sign Arbitrary Data With Your SSH Keys

Did you know that you can use the ssh-keygen command to sign and verify signatures on arbitrary data, like files and software releases? Although this feature isn't super new - it was added in 2019 with OpenSSH 8.0 - it seems to be little-known. That's a shame because it's super useful and the most viable alternative to PGP for signing data. If you're currently using PGP to sign data, you should consider switching to SSH signatures.

Here's why I like SSH signatures:

  • It's not PGP. For years, security professionals have been sounding the alarm on PGP, including its most popular implementation, GnuPG/GPG. PGP is absurdly complex, has an awful user experience, and is full of crufty old cryptography which shouldn't be touched with a ten foot pole.

  • SSH is everywhere, and people already have SSH keys. If you use Debian Bullseye or Ubuntu 20.04 or newer, you already have a new enough version of SSH installed. And if you use GitHub, or any other service that uses SSH keys for authentication, you already have an SSH key that can be used to generate signatures. This is why I'm more excited about SSH signatures than other PGP signature alternatives like signify or minisign. Signify and minisign are great, but require you to install new software and generate new keys, which will hinder widespread adoption.

  • SSH key distribution is easy. SSH public keys are one line strings that are easy to copy around. You don't need to use the Web of Trust or worry about configuring "trust levels" for keys. GitHub already acts as a key distribution service which is far easier to use and more secure than any of the PGP key servers ever were. You can retrieve the SSH public keys for any GitHub user by visiting a URL like https://github.com/USERNAME.keys. (For example, my public keys are at https://github.com/AGWA.keys.)

    (GitHub acts as a trusted third party here, and you have to trust them not to lie about people's public keys, so it may not be appropriate for all use cases. But relying on a trusted third party with a professional security team like GitHub seems like a way better default than PGP's Web of Trust, which was nigh impossible to use. Key Transparency would address the concerns with trusted third parties, if anyone ever figures out how to audit transparency logs in practice.)

  • SSH has optional lightweight certificates. You don't have to use SSH certificates (and most people shouldn't) but if certificates would make your life easier, SSH has a lightweight certificate system that is considerably simpler than X.509. This makes SSH signatures a good alternative to S/MIME as well!

You'll soon be able to sign Git commits and tags with SSH

Signing Git commits and tags gives consumers of your repository assurance that your code hasn't been tampered with. Unfortunately, you currently have to use either PGP or S/MIME, and personally I haven't bothered to sign Git tags since my PGP keys expired in 2018.

But that will soon change in Git 2.34, which adds support for SSH signatures.

Signing files

Signing a file is straightforward:

ssh-keygen -Y sign -f ~/.ssh/id_ed25519 -n file file_to_sign

Here are the arguments you may need to change:

  • ~/.ssh/id_ed25519 is the path to your private key. This is the standard path to your SSH Ed25519 private key. If you have an RSA key, use id_rsa instead.

  • file is the "namespace", which describes the purpose of the signature. SSH defines file for signing generic files, and email for signing emails. Git uses git for its signatures.

    If you are using the signature for a different purpose, such as a custom protocol, you must specify your own namespace. This prevents cross-protocol attacks whereby a valid signature is removed from a message for one protocol and attached to a message from a different protocol. If the protocols don't use distinct namespaces for their signatures, there's a risk that the signature is considered valid by the second protocol even though it was meant for the first protocol.

    Namespaces can be arbitrary strings. To ensure global uniqueness of namespaces, SSH recommends that you structure them like an email address under a domain that you own. For example, I would use a namespace like protocolname-v1@agwa.name.

  • file_to_sign is the path to the file to be signed.

The signature is written to a new file called file_to_sign.sig, which looks like this:

-----BEGIN SSH SIGNATURE----- U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAg2rirQQddpzEzOZwbtM0LUMmlLG krl2EkDq4CVn/Hw7sAAAAEZmlsZQAAAAAAAAAGc2hhNTEyAAAAUwAAAAtzc2gtZWQyNTUx OQAAAEDyjWPjmOdG8HJ8gh1CbM8WDDWoGfm+TTd8Qa8eua9Bt5Cc+43S24i/JqVWmk98qV YXoQmOYL4bY8t/q7cSNeMH -----END SSH SIGNATURE-----

If you specify - for the filename, the file to sign is read from standard in and the signature is written to standard out.

Verifying signatures

Verifying signatures is a bit more involved. First you need to create an allowed signers file which maps email addresses to public keys, like this:

alice@example.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINq4q0EHXacxMzmcG7TNC1DJpSxpK5dhJA6uAlZ/x8O7 alice@example.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCfHGCK5jjI/Oib4vRBLB9rG30A8y/Br9U75rfAYsitwFPFfl/CaTAvfRlW1lIBqOCshLWxGsN+PFiJCiCWzpW4iILkD5X5KcBBYHTq1ojYXb70BrQXQ+QBDcGxqQjcOp/uTq1D9Z82mYq/usI5wdz6f1KNyqM0J6ZwRXMu6u7NZaAwmY7j1fV4DRiYdmIfUDIyEdqX4a1Gan+EMSanVUYDcNmeBURqmTkkOPYSg8g5xYgcXBMOZ+V0ZUjreV9paKraUD/mVDlZbb/VyWhJGT4FLMNXHU6UHC2FFgqANMUKIlL4vhqc23MoygKbfF3HgNB6BNfv3s+GYlaQ3+66jc5j bob@example.net ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBgQuuEvhUXerOTIZ2zoOx60M/HHJ/tcHnD84ZvTiX5b eve@example.org ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFxsKcWHB9hamTXCPWKVUw0WM0S3IXH0YArf8iJE0dMG

Once you have your allowed signers file, verification works like this:

ssh-keygen -Y verify -f allowed_signers -I alice@example.com -n file -s file_to_verify.sig < file_to_verify

Here are the arguments you may need to change:

  • allowed_signers is the path to the allowed signers file.

  • alice@example.com is the email address of the person who allegedly signed the file. This email address is looked up in the allowed signers file to get possible public keys.

  • file is the "namespace", which must match the namespace used for signing as described above.

  • file_to_verify.sig is the path to the signature file.

  • file_to_verify is the path to the file to be verified. Note that this file is read from standard in. In the above command, the < shell operator is used to redirect standard in from this file.

If the signature is valid, the command exits with status 0 and prints a message like this:

Good "file" signature for alice@example.com with ED25519 key SHA256:ZGa8RztddW4kE2XKPPsP9ZYC7JnMObs6yZzyxg8xZSk

Otherwise, the command exits with a non-zero status and prints an error message.

Is it safe to repurpose SSH keys?

Short answer: yes.

Always be wary of repurposing cryptographic keys for a different protocol. If not done carefully, there's a risk of cross-protocol attacks. For example, if the structure of the messages signed by Git is similar to the structure of SSH protocol messages, an attacker might be able to forge Git artifacts by misappropriating the signature from an SSH transcript.

Fortunately, the structure of SSH protocol messages and the structure of messages signed by ssh-keygen are dissimilar enough that there is no risk of confusion.

To convince ourselves, let's consult RFC 4252 section 7, which specifies how SSH keys are traditionally used by SSH to authenticate a user logging into a server. The RFC specifies that the input to the signature algorithm has the following structure:

string session identifier byte SSH_MSG_USERAUTH_REQUEST string user name string service name string "publickey" boolean TRUE string public key algorithm name string public key to be used for authentication

The first field is the session identifier, a string. In the SSH protocol, strings are prefixed by a 32-bit big endian length. The session identifier is a hash. Since hashes are short, the first three bytes of the above signature input will always be zero.

Meanwhile, the PROTOCOL.sshsig file the OpenSSH repository specifies how SSH keys are used with ssh-keygen-generated signatures. It specifies that the input to the signature algorithm has this structure:

#define MAGIC_PREAMBLE "SSHSIG" byte[6] MAGIC_PREAMBLE string namespace string reserved string hash_algorithm string H(message)

Here, the first three bytes are SSH, from the magic preamble. Since the first three bytes of the SSH protocol signature input are different from the ssh-keygen signature input, the SSH client and ssh-keygen will never produce identical signatures. Therefore, there is no risk of cross-protocol attacks, and I am totally comfortable using my existing SSH keys to sign messages with ssh-keygen.

Comments

July 9, 2021

How Certificate Transparency Logs Fail and Why It's OK

Last week, a Certificate Transparency log called Yeti 2022 suffered a single bit flip, likely due to a hardware error or cosmic ray, which rendered the log unusable. Although this event will have zero impact on Web users and website operators, and was reported on an obscure mailing list for industry insiders, it captured the interest of people on Hacker News, Twitter, and Reddit. Certificate Transparency plays an essential role in ensuring security on the Web, and numerous commentators were concerned that logs could be wiped out by a single bit flip. I'm going to explain why these concerns are misplaced and why log failure doesn't worry me.

Background: Certificate Transparency (CT) is a system to log publicly-trusted SSL certificates in public, append-only logs. Website owners can monitor these logs and take action if they discover an unauthorized certificate for one of their domains. Thanks to Certificate Transparency, several untrustworthy certificate authorities have been distrusted, and the ecosystem has improved enormously compared to the pre-CT days when misissued certificates usually went unnoticed.

To ensure that CT logs remain append-only, submitted certificates are placed in the leaves of a data structure called a Merkle Tree. The leaves of the Merkle Tree are recursively hashed together with SHA-256 to produce a root hash that represents the contents of the tree. Periodically, the CT log publishes a signed statement, called a Signed Tree Head or STH, containing the current tree size and root hash. The STH is a commitment that at the specified size, the log has the specified contents. To enforce the commitment, monitors collect STHs and verify that the root hashes match the certificates downloaded from the log. If the downloaded certificates don't match the published STHs, or if a monitor detects two STHs for the same tree size with different root hashes, it means the log contents have been altered - perhaps to conceal a malicious certificate.

On June 30, 2021 at 01:02 UTC, my CT monitor, Cert Spotter, raised an alert that the root hash it calculated from the first 65,569,149 certificate entries downloaded from Yeti 2022 did not equal the root hash in the STH that Yeti 2022 had published for tree size 65,569,149.

I noticed the alert the next morning and reported the problem to the ct-policy mailing list, where these matters are discussed. The Google Chrome CT team reported that they too had observed problems, as did a user of the open source certspotter. Later that day, I drilled down that the problem was with entry 65,562,066 - the hash of the certificate returned by Yeti at this position was not part of the Merkle Tree.

On Thursday, the log operator reported that this entry had "shifted one bit" before the STH was signed. Curious, I calculated the correct hash of entry 65,562,066 and then tried flipping every bit and seeing if the resulting hash was part of the Merkle Tree. Sure enough, flipping the lowest bit of the first byte of the hash resulted in a hash that was part of the Merkle Tree and would ultimately produce the root hash from the STH.

There is no way for the log operator to fix this problem: they can't change entry 65,562,066 to match the errant hash, as this would require breaking SHA-2's preimage resistance, which is computationally infeasible. And since they've already published an STH for tree size 65,569,149, they can't publish an updated STH with a root hash that correctly reflects entry 65,562,066.

Consequentially, Yeti 2022 is toast. It has been made read-only and web browsers will soon cease to rely on it.

Yeti 2022 is not the first Certificate Transparency log to fail. Seven logs have previously failed, all without causing any impact to Web users:

The largest risk of log failure is that previously-issued certificates will stop working and require replacement prior to their natural expiration dates. CT-enforcing browsers only accept certificates that contain receipts, called Signed Certificate Timestamps or SCTs, from a sufficient number of approved (i.e. non-failed) CT logs. Each SCT is a promise by the respective log to publish the certificate (Technically, the precertificate, which contains the same information as the certificate). What if one or more of the SCTs in a certificate is from a log which failed after the certificate was issued?

Fortunately, browser Certificate Transparency policies anticipated the possibility. At a high level, the Chrome and Apple policies require the following:

  1. At least one SCT from a log that is approved at time of certificate validation
  2. At least 2-5 SCTs (depending on certificate lifetime) from logs that were approved at time of SCT issuance

(The precise details are a bit more nuanced but don't matter for this blog post. Also, some very advanced website operators deliver SCTs using alternative mechanisms, which are subject to different rules, but this is extremely rare. Read the policies if you want the nitty gritty.)

Consequentially, a single log failure can't cause a certificate to stop working. The first requirement is still satisfied because the certificate still has at least one SCT from a currently-approved log. The second requirement is still satisfied because the failed log was approved at time of SCT issuance. The minimum number of SCTs from approved-at-issuance logs increases with certificate lifetime to reflect the increased probability of log failure: a 180 day certificate only needs 2 SCTs, whereas a 3 year certificate (back when they were allowed) needed 5 SCTs.

Note that when a log fails, its public key is not totally removed from browsers like a distrusted certificate authority's key would be. Instead, the log transitions to a state called "retired" (previously known as "disqualified"), which means its SCTs are still recognized for satisfying the second requirement. This led to an interesting question in 2020 when a log's private key was compromised: should the log be retired, or should it be totally distrusted? Counter-intuitively, it was retired, even though SCTs could have been forged to make it seem like a certificate was included in the log when it really wasn't. But that's OK, because the second requirement isn't about making sure certificates are logged, but about making sure certificate authorities aren't dumbasses. In a world of competent CAs, the second requirement wouldn't be necessary since CAs would have the good sense to include SCTs from extra logs in case some logs failed. But we do not live in a world of competent CAs - indeed, that's why CT exists - and there would no doubt be CAs embedding a single SCT in certificates if browsers didn't require redundancy.

Of course, there is still a chance that all of the SCTs in a certificate come from logs that end up failing. That would suck. But I don't think the solution is to change Certificate Transparency. Catastrophic CT failure is just one of several reasons that a certificate might need to be replaced before its natural expiration, and empirically it's the least likely reason. When a certificate authority is distrusted, as has happened several times, all of its certificates must be replaced. When a certificate is misissued, it has to be revoked and replaced, and there have been numerous incidents since 2019 in which a considerable number of certificates have required revocation - sometimes as many as 100% of a CA's active certificates:

The ecosystem is currently ill-prepared to handle mass replacement events like these, and in many of the above cases CAs missed the revocation deadline or declined to revoke entirely. Although the above misissuances had relatively low security impact, other cases, such as distrusting a compromised certificate authority, or events like Heartbleed or the Debian random number fiasco, are very security critical. This makes the inability to quickly replace certificates at scale a serious problem, larger than the problem of CT logs failing. To address the problem, Let's Encrypt is working on a specification called ACME Renewal Info (ARI) that would allow CAs to instruct TLS servers to request new certificates prior to their normal expiration. They've committed to deploying ARI or a similar technology in their staging environment by 2021-11-12.

Comments

December 17, 2020

Security Vulnerabilities in Smallstep PKI Software

I recently did a partial security review of Smallstep, a commercially-backed open source private certificate authority written in Go. I found that Smallstep is vulnerable to JSON injection, misuses JWTs, and relies on client-side enforcement of server-side security. These vulnerabilities can be exploited to obtain unauthorized certificates. This post is a full disclosure of the issues.

I reviewed the certificates repository as of commit 1feb4fcb26dc78d70bc1d9e586237a36a8fbea9c and the crypto repository as of commit 162770cad29063385cb768b0191814e4c6a94e45. The vulnerabilities are present in version 0.15.5 of step-certificates and version 0.15.3 of step-cli, and have been fixed as of version 0.15.6.

JSON Injection in Certificate Templates

Like many PKI systems, Smallstep supports user-definable certificate profiles (which they call certificate templates) to specify the contents of a certificate (the subject, SANs, extensions, etc.). Smallstep's certificate profiles are implemented as JSON objects which are templated using Go's text/template package. Using templates, you can substitute a variety of information into the profile, including information from the certificate request.

Here's an example template from Smallstep's blog post announcing the feature:

{ "subject": {"commonName":"{{ .Insecure.CR.Subject.CommonName }}"}, "sans": {{ toJson .SANs }}, "keyUsage": ["digitalSignature", "keyAgreement"], "extKeyUsage": ["clientAuth"] }

20 years of HTML and SQL injection vulnerabilities have shown that it's a bad idea to use raw text templating to construct syntactic data like HTML, SQL, or JSON, which is why best practice is to use context-aware templating like SQL prepared statements or Go's html/template package. When using raw text templates, it's way too easy to suffer an injection vulnerability when the author of the template inevitably forgets to escape a value. Indeed, in the above example, the commonName field is unescaped, and Smallstep is rife with other examples of unescaped data in their documentation and in Go string literals that define the default SSH and X.509 templates.

Two factors make it easy for attackers to exploit the injection vulnerability. First, if a JSON object has more than one field with the same name, Go's JSON decoder takes the value from the last one. Second, Smallstep uses json.Decode to decode the object instead of json.Unmarshal, which means trailing garbage is ignored (this is an unfortunate foot gun in Go). Thus, an attacker who can inject values into a template has total control over the resulting object, as they can override the values of earlier fields, and then end the object so later fields are ignored. For example, signing a CSR containing the following common name using the above template results in a CA:TRUE certificate with a SAN of wheeeeeee.example and no extended key usage (EKU) extension:

"}, "basicConstraints":{"isCA":true}, "sans":[{"type":"dns","value":"wheeeeeee.example"}]}

Fortunately, despite the use of unescaped values in the default templates, I found that in virtually all cases the unescaped values come from trusted sources or are validated first. The one exception is the AWS provisioner, which in the default configuration will inject an unvalidated common name from the CSR into the template. However, in this configuration the DNS SANs are also completely unvalidated, so clients are already pretty trusted. Still, the vulnerability could be used to get a certificate with arbitrary or no EKUs, giving an attacker more capabilities than they would otherwise have. (The attacker could also get a CA:TRUE certificate, although by default the issuing CA has a pathlen:0 constraint so the CA:TRUE certificate wouldn't work.)

In any case, this approach to templates is extremely concerning. This issue cannot be fixed just by updating the built-in templates to use proper escaping, as users who write their own templates are likely to forget to escape, just as scores of developers have forgotten to escape HTML and SQL. The responsible way to offer this feature is with a context-aware JSON templating system akin to template/html that automatically escapes values.

Weaknesses in AWS Provisioner

This section applies to Smallstep's AWS provisioner, which enables EC2 instances to obtain X.509 certificates which identify the instance.

To prove that it is authorized to obtain a certificate, the Smallstep client retrieves a signed instance identity document from the EC2 metadata service and sends it to the Smallstep server. The Smallstep server verifies that the instance identity document was validly signed by AWS, and that the EC2 instance belongs to an authorized AWS account. If so, the Smallstep server issues a certificate.

EC2 instance identity documents aren't bound to a particular purpose when they are signed and they never expire. An attacker who obtains a single instance identity document from any EC2 instance in their victim's account has a permanent capability to obtain certificates. Instance identity documents are not particularly protected; anyone who can make an HTTP request from the instance can obtain them. Smallstep attempts to address this in a few ways. Unfortunately, the most effective protection is off by default, and the others are ineffective.

First, instance identity documents contain the date and time that the instance was created, and Smallstep can be configured to reject instance identity documents from instances that are too old. This is a good idea and works well with architectures where an instance obtains its certificate after first bootup and never needs to obtain future certificates. However, this protection is off by default. (Also, the documentation for the configuration parameter is not very good: it's described as "the maximum duration to grant a certificate in AWS and GCP provisioners" which sounds like maximum certificate lifetime. This is not going to help people choose a sensible value for this option.)

Second, by default Smallstep only allows an EC2 instance to request one certificate. After it requests a certificate, the instance is not supposed to be able to request any more certificates. Unfortunately, this logic is enforced client-side. Specifically, the server relies on the client to generate a unique ID, which in a non-malicious implementation is derived from the instance ID. Of course, a malicious client can just generate any ID it wants and the server will think the instance has never requested a certificate before.

(Aside: Smallstep claims that the one-certificate-per-instance logic "allows" them to not validate certificate identifiers by default. This doesn't make much sense to me, as an instance which has never requested a certificate before would still have carte blanche ability to request a certificate for any identifier, including those belonging to instances which already have requested certificates. It seems to me that to get the TOFU-like security properties that they desire, they should be enforcing a one-certificate-per-identifier rule rather than one-certificate-per-instance.)

Finally, Smallstep tries to use JWT in a bizarre and futile attempt to limit the lifetime of identity documents to 5 minutes. Instead of simply sending the identity document and signature to the server, the client sticks the identity document and signature inside a JSON Web Token which is authenticated with HMAC, using the signature as the HMAC key - the same signature that is in the payload of the JWT.

This leads to an incredible sequence of code on the server side which first deserializes the JWT payload without verifying the HMAC, and then immediately deserializes the payload again, this time verifying the HMAC using the key just taken out of the unverified payload:

var unsafeClaims awsPayload
if err := jwt.UnsafeClaimsWithoutVerification(&unsafeClaims); err != nil {
	return nil, errs.Wrap(http.StatusUnauthorized, err, "aws.authorizeToken; error unmarshaling claims")
}

var payload awsPayload
if err := jwt.Claims(unsafeClaims.Amazon.Signature, &payload); err != nil {
	return nil, errs.Wrap(http.StatusUnauthorized, err, "aws.authorizeToken; error verifying claims")
}

Obviously, this code adds zero security, as an attacker who wants to use an expired JWT can just adjust the expiration date and then recompute the HMAC using the key that is right there in the JWT.

Edited to add: Smallstep explains that they used JWT here because their other cloud provisioners use JWT natively and therefore it was easier to use JWT here as well. It was not intended as a security measure. Indeed, the 5 minute expiration is not documented anywhere, so I wouldn't call this a vulnerability. However, it would be wrong to call it a harmless no-op: the JWT provided the means for an untrusted, client-generated ID to be transmitted to the server, which was accidentally used instead of the trusted ID from the signed instance identity document, causing the vulnerability above. This is why it's so important to avoid unnecessary code and protocol components, especially in security-sensitive contexts.

Timeline

  • 2020-12-17 18:17 UTC: full disclosure and notification to Smallstep
  • 2020-12-17 21:02 UTC: Smallstep merges patch to escape JSON in default X.509 template
  • 2020-12-17 23:52 UTC: Smallstep releases version 0.15.6 fixing the vulnerabilities, plans to address other feedback
  • 2020-12-27 19:04 UTC: Ten days later, Smallstep's blog post and documentation still show examples of unescaped JSON, giving me doubts that they have fully understood the issue

Comments

November 30, 2020

The Lengths People Go To Just To Avoid DNSSEC

Connecting to a website, say example.com, over TLS is a relatively straightforward affair. The client looks up the DNS A/AAAA record for example.com, connects to the IP address over TLS, and confirms that the presented certificate is valid for example.com.

In contrast, connecting to other services, like XMPP or SMTP, over TLS is less straightforward. That's because clients don't directly look up the A/AAAA record for example.com. Instead they look up a SRV record (for XMPP) or an MX record (for SMTP) which contains the hostname of the XMPP or SMTP server. Then they look up the A/AAAA record of that hostname and connect to it. This layer of indirection makes it easy to delegate the operation of certain services to other hosts. For instance, if example.com wants to use Gmail for their email, their MX record would contain aspmx.l.google.com.

This raises the question of which hostname the certificate should certify: the original hostname (example.com), or the hostname listed in the SRV/MX record (aspmx.l.google.com). Both options have problems.

Approach 1, using the original hostname (example.com), is undesirable because there is no automated way for the operator of a service like SMTP or XMPP to obtain certificates for the hostnames which are delegated to them. Google can automatically get a certificate for aspmx.l.google.com because they own google.com, but they can't for example.com. The admins of example.com would have to request the certificate themselves and give it to their SMTP and XMPP providers. Such a manual approach is bound to cause outages as people forget to renew expiring certificates. But there's another problem: the certificate would permit the SMTP and XMPP providers to impersonate all services for example.com. You probably don't want your instant messaging provider to be able to impersonate your website.

Approach 2, using the SRV/MX hostname (aspmx.l.google.com), doesn't have these problems. It's quite easy for the service operator to automatically obtain a certificate for their own domain. Unfortunately, this approach is not secure. Since the DNS lookup for the SRV/MX record is most likely unauthenticated, a man-in-the-middle attacker could intercept the query and return a rogue record that says the SMTP or XMPP service for example.com is handled by a server operated by the attacker. The attacker would have no problem obtaining a valid certificate for their own domain.

Perhaps the most obvious solution to this problem is to just make the DNS lookup authenticated. The IETF has been trying to do this since 1997 with DNSSEC. More than 20 years later, the results are not promising: fewer than 2% of .com domains support DNSSEC, only 25% of Internet users validate DNSSEC even when it is supported, the use of insecure crypto like 1024 bit RSA and SHA-1 is still rampant, and DNSSEC is so hard to deploy correctly that outages are common.

Unsurprisingly, many people would like to avoid the DNSSEC quagmire, which has led to some very interesting workarounds...

POSH

One workaround is POSH, or "PKIX over Secure HTTP", standardized in RFC 7711. With POSH, the owner of example.com publishes a JSON document at https://example.com/.well-known/posh/SERVICE.json containing a list of certificate fingerprints which are allowed to be used for the given SERVICE (e.g. xmpp-server) on example.com. To connect to example.com's XMPP server, a POSH-aware client would first retrieve https://example.com/.well-known/posh/xmpp-server.json, ensuring that the HTTPS server presents a valid, publicly-trusted certificate for example.com. It would then connect to the XMPP server indicated in example.com's XMPP SRV record, and ensure that it presents a certificate whose fingerprint is listed in the JSON document.

POSH solves the problems presented above. It's secure, because the JSON document containing the fingerprints is authenticated by a certificate for example.com, which remains under the control of the owner of example.com. The operator of the XMPP service doesn't need to obtain a publicly-trusted certificate for example.com. To facilitate certificate rotation, POSH supports delegation: the JSON file at example.com can reference the URL of a different JSON file which is hosted by the XMPP operator. This gives the XMPP operator flexibility to rotate certificates at any time without needing to inform their customers to update their JSON documents.

Although POSH was designed to be protocol agnostic, it was only ever used with XMPP, and even then I could only find a few XMPP clients which support it. It's fair to say POSH never caught on.

MTA-STS

A more recent workaround, for server-to-server SMTP only, is MTA-STS, standardized in RFC 8461. The underlying concept of MTA-STS is the same as POSH: the owner of example.com publishes a document over HTTPS with information about how to validate secure SMTP connections to example.com's mail servers. Several details are different. One cosmetic difference is that the document is published at mta-sts.example.com rather than example.com, which simplifies deployment for domain owners who can't easily make changes to their main website. A more fundamental difference is that instead of listing certificate fingerprints, the document lists the hostnames which are allowed in the MX record set. When connecting to an SMTP server for example.com, the client verifies both that it's connecting to a server listed in the MTA-STS document at mta-sts.example.com, and that the server presents a publicly-trusted certificate valid for the SMTP server's hostname.

The amusing thing about MTA-STS is that it basically boils down to duplicating the contents of the MX record in a document that is published over HTTPS, leveraging the WebPKI to authenticate the MX record rather than DNSSEC. It's kind of incredible that this is considered easier than using DNSSEC, despite having more moving parts and requiring duplication. That says way more about DNSSEC being a failure than about MTA-STS being good, and I've written before about how I think MTA-STS will prove hard to deploy in practice. I suggested how DNS providers could alleviate the problems by automating MTA-STS for their customers, but I'm not aware of any DNS providers to do so.

I am happy to report that SSLMate now offers MTA-STS automation as part of Cert Spotter. It's not quite as seamless as what a DNS provider could offer, but it's pretty good: Cert Spotter continuously monitors your domains' MX records and automatically publishes an appropriate MTA-STS policy for you, obtaining and renewing the necessary SSL certificates. All you need to do is publish two CNAME records delegating the MTA-STS-related subdomains (mta-sts and _mta-sts) to SSLMate-operated servers. Cert Spotter updates the policy automatically any time it detects a change to your MX records, ensuring the policy never falls out of sync with your DNS. For transparency, Cert Spotter emails you when this happens, so you can detect unauthorized MX record changes. Cert Spotter will also alert you if any of your MX servers have a TLS or certificate problem that would prevent MTA-STS from working.

Looking forward: SRVName certificates

There is another solution which, if it's ever deployed, will be much nicer than POSH or MTA-STS: SRVName certificates. SRVName certificates authenticate not just a domain name, like normal certificate, but a particular service running on that domain. For example, you could get a certificate that's valid for only SMTP on example.com, or only XMPP on example.com. This solves the security problem of the first approach above: the owner of example.com can give their mail server operator a SRVName certificate that's valid only for SMTP, allowing the mail server to operate an SMTP service for example.com, but not impersonate any other example.com services. Assuming the validation rules are flexible enough, the SMTP service operator could even obtain the SRVName certificate themselves provided the certificate authority validates that they are listed in the MX record for example.com.

Technically speaking, SRVName is a type of subject alternative name (SAN) which can be placed in a certificate, akin to the DNS SAN which certificates use today for authenticating domain names. It's possible for a certificate to contain both SRVName and DNS SANs, and here I use "SRVName certificate" to mean a certificate containing a SRVName SAN.

Unfortunately, there's a major obstacle blocking SRVName certificates: technically-constrained subordinate certificate authorities. A technically-constrained sub-CA is a certificate authority which is restricted to issuing certificates only for namespaces that are enumerated in the sub-CA certificate's name constraints field. For example, an enterprise that needs to issue a large number of certificates might operate a publicly-trusted, technically-constrained sub-CA that is constrained to the enterprise's domains and IP address ranges. Since their sub-CA can only issue certificates for namespaces that they control, they're allowed to operate it under looser security standards than unconstrained publicly-trusted CAs.

The problem is that if a particular type of SAN (in this case, the SRVName SAN) isn't listed in the name constraints as either allowed or denied, the standard says that all instances of that SAN type are allowed by default. Allow-by-default is usually a bad idea when security is concerned, and this case is no different. Clients can't accept SRVName certificates because it would be unsafe: every existing technically-constrained sub-CA that doesn't have SRVName in its name constraints field has unconstrained ability to issue SRVName certificates. Unfortunately, the Baseline Requirements (the rules governing public certificate issuance) only require technically-constrained sub-CAs to have DNS, IP, and Directory Name constraints. Consequentially, there are many existing technically-constrained sub-CAs out there that would need to be revoked and reissued with SRVName constraints before it's safe to deploy SRVName certificates.

Will that ever happen? Who knows. In the meantime, we're stuck with hacks like MTA-STS.

Comments

June 25, 2020

Writing an SNI Proxy in 115 Lines of Go

The very first message sent in a TLS connection is the Client Hello record, in which the client greets the server and tells it, among other things, the server name it wants to connect to. This is called Server Name Indication, or SNI for short, and it's quite handy as it allows many different servers to be co-located on a single IP address.

The server name is sent in plaintext, which is unfortunately really bad for privacy and censorship resistance, but does enable something very useful: a proxy server can read the server name and use it to decide where to route the connection, without having to decrypt the connection. You can leverage this to make many different physical servers accessible from the Internet even if you have only one public IPv4 address: the proxy listens on your public IP address and forwards connections to the appropriate private IP address based on the SNI.

I just finished writing such a proxy server, which I plan to run on my home network's router so that I can easily access my internal servers from anywhere on the Internet, without a VPN or SSH port forwarding. I was pleased by how easy it was to write this proxy server using only Go's standard library. It's a great example of how well-suited Go is for programs involving networking and cryptography.

Let's start with a standard listen/accept loop (right out of the examples for Go's net package):

func main() {
	l, err := net.Listen("tcp", ":443")
	if err != nil {
		log.Fatal(err)
	}
	for {
		conn, err := l.Accept()
		if err != nil {
			log.Print(err)
			continue
		}
		go handleConnection(conn)
	}
}

Here's a sketch of the handleConnection function, which reads the Client Hello record from the client, dials the backend server indicated by the Client Hello, and then proxies the client to and from the backend. (Note that we dial the backend using the SNI value, which works well with split-horizon DNS where the proxy sees the backend's private IP address and external clients see the proxy's public IP address. If that doesn't work for you, can use more complicated routing logic.)

func handleConnection(clientConn net.Conn) {
	defer clientConn.Close()

	// ... read Client Hello from clientConn ...

	backendConn, err := net.Dial("tcp", net.JoinHostPort(clientHello.ServerName, "443"))
	if err != nil {
		log.Print(err)
		return
	}
	defer backendConn.Close()

	// ... proxy clientConn <==> backendConn ...
}

Let's assume for now we have a convenient function to read a Client Hello record from an io.Reader and return a tls.ClientHelloInfo:

func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error)

We can't simply call this function from handleConnection, because once the Client Hello is read, the bytes are gone. We need to preserve the bytes and forward them along to the backend, which is expecting a proper TLS connection that starts with a Client Hello record.

What we need to do instead is "peek" at the Client Hello record, and thanks to some simple but powerful abstractions from Go's io package, this can be done with just six lines of code:

func peekClientHello(reader io.Reader) (*tls.ClientHelloInfo, io.Reader, error) {
        peekedBytes := new(bytes.Buffer)
        hello, err := readClientHello(io.TeeReader(reader, peekedBytes))
        if err != nil {
                return nil, nil, err
        }
        return hello, io.MultiReader(peekedBytes, reader), nil
}

What this code does is create a TeeReader, which is a reader that wraps another reader and writes everything that is read to a writer, which in our case is a byte buffer. We pass the TeeReader to readClientHello, so every byte read by readClientHello gets saved to our buffer. Finally, we create a MultiReader which essentially concatenates our buffer with the original reader. Reads from the MultiReader initially come out of the buffer, and when that's exhausted, continue from the original reader. We return the MultiReader to the caller along with the ClientHelloInfo. When the caller reads from the MultiReader it will see a full TLS connection stream, starting with the Client Hello.

Now we just need to implement readClientHello. We could open up the TLS RFCs and learn how to parse a Client Hello record, but it turns out we can let crypto/tls do the work for us, thanks to a callback function in tls.Config called GetConfigForClient:

// GetConfigForClient, if not nil, is called after a ClientHello is
// received from a client.
GetConfigForClient func(*ClientHelloInfo) (*Config, error) // Go 1.8

Roughly, what we need to do is create a TLS server-side connection with a GetConfigForClient callback that saves the ClientHelloInfo passed to it. However, creating a TLS connection requires a full-blown net.Conn, and readClientHello is passed merely an io.Reader. So let's create a type, readOnlyConn, which wraps an io.Reader and satisfies the net.Conn interface:

type readOnlyConn struct {
        reader io.Reader
}
func (conn readOnlyConn) Read(p []byte) (int, error)         { return conn.reader.Read(p) }
func (conn readOnlyConn) Write(p []byte) (int, error)        { return 0, io.ErrClosedPipe }
func (conn readOnlyConn) Close() error                       { return nil }
func (conn readOnlyConn) LocalAddr() net.Addr                { return nil }
func (conn readOnlyConn) RemoteAddr() net.Addr               { return nil }
func (conn readOnlyConn) SetDeadline(t time.Time) error      { return nil }
func (conn readOnlyConn) SetReadDeadline(t time.Time) error  { return nil }
func (conn readOnlyConn) SetWriteDeadline(t time.Time) error { return nil }

readOnlyConn forwards reads to the reader and simulates a broken pipe when written to (as if the client closed the connection before the server could reply). All other operations are a no-op.

Now we're ready to write readClientHello:

func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error) {
        var hello *tls.ClientHelloInfo

        err := tls.Server(readOnlyConn{reader: reader}, &tls.Config{
                GetConfigForClient: func(argHello *tls.ClientHelloInfo) (*tls.Config, error) {
                        hello = new(tls.ClientHelloInfo)
                        *hello = *argHello
                        return nil, nil
                },
        }).Handshake()

        if hello == nil {
                return nil, err
        }

        return hello, nil
}

Note that Handshake always fails because the readOnlyConn is not a real connection. As long as the Client Hello is successfully read, the failure should only happen after GetConfigForClient is called, so we only care about the error if hello was never set.

Let's put everything together to write the full handleConnection function. I've added deadlines (thanks, Filippo!) and a check that the SNI value ends with .internal.example.com to prevent this from being used as an open proxy. When I deploy this, I will use the DNS suffix of my home network.

func handleConnection(clientConn net.Conn) {
	defer clientConn.Close()

	if err := clientConn.SetReadDeadline(time.Now().Add(5 * time.Second)); err != nil {
		log.Print(err)
		return
	}

	clientHello, clientReader, err := peekClientHello(clientConn)
	if err != nil {
		log.Print(err)
		return
	}

	if err := clientConn.SetReadDeadline(time.Time{}); err != nil {
		log.Print(err)
		return
	}

	if !strings.HasSuffix(clientHello.ServerName, ".internal.example.com") {
		log.Print("Blocking connection to unauthorized backend")
		return
	}

	backendConn, err := net.DialTimeout("tcp", net.JoinHostPort(clientHello.ServerName, "443"), 5*time.Second)
	if err != nil {
		log.Print(err)
		return
	}
	defer backendConn.Close()

        var wg sync.WaitGroup
        wg.Add(2)

        go func() {
                io.Copy(clientConn, backendConn)
                clientConn.(*net.TCPConn).CloseWrite()
                wg.Done()
        }()
        go func() {
                io.Copy(backendConn, clientReader)
                backendConn.(*net.TCPConn).CloseWrite()
                wg.Done()
        }()

        wg.Wait()
}

Here's the complete Go source code - just 115 lines! (Not counting copyright legalese)

Comments

Older Posts