Skip to Content [alt-c]

October 29, 2025

I'm Independently Verifying Go's Reproducible Builds

When you try to compile a Go module that requires a newer version of the Go toolchain than the one you have installed, the go command automatically downloads the newer toolchain and uses it for compiling the module. (And only that module; your system's go installation is not replaced.) This useful feature was introduced in Go 1.21 and has let me quickly adopt new Go features in my open source projects without inconveniencing people with older versions of Go.

However, the idea of downloading a binary and executing it on demand makes a lot of people uncomfortable. It feels like such an easy vector for a supply chain attack, where Google, or an attacker who has compromised Google or gotten a misissued SSL certificate, could deliver a malicious binary. Many developers are more comfortable getting Go from their Linux distribution, or compiling it from source themselves.

To address these concerns, the Go project did two things:

  1. They made it so every version of Go starting with 1.21 could be easily reproduced from its source code. Every time you compile a Go toolchain, it produces the exact same Zip archive, byte-for-byte, regardless of the current time, your operating system, your architecture, or other aspects of your environment (such as the directory from which you run the build).

  2. They started publishing the checksum of every toolchain Zip archive in a public transparency log called the Go Checksum Database. The go command verifies that the checksum of a downloaded toolchain is published in the Checksum Database for anyone to see.

These measures mean that:

  1. You can be confident that the binaries downloaded and executed by the go command are the exact same binaries you would have gotten had you built the toolchain from source yourself. If there's a backdoor, the backdoor has to be in the source code.

  2. You can be confident that the binaries downloaded and executed by the go command are the same binaries that everyone else is downloading. If there's a backdoor, it has to be served to the whole world, making it easier to detect.

But these measures mean nothing if no one is checking that the binaries are reproducible, or that the Checksum Database isn't presenting inconsistent information to different clients. Although Google checks reproducibility and publishes a report, this doesn't help if you think Google might try to slip in a backdoor themselves. There needs to be an independent third party doing the checks.

Why not me? I was involved in Debian's Reproducible Builds project back in the day and developed some of the core tooling used to make Debian packages reproducible (strip-nondeterminism and disorderfs). I also have extensive experience monitoring Certificate Transparency logs and have detected misbehavior by numerous logs since 2017. And I do not work for Google (though I have eaten their food).

In fact, I've been quietly operating an auditor for the Go Checksum Database since 2020 called Source Spotter (à la Cert Spotter). Source Spotter monitors the Checksum Database, making sure it doesn't present inconsistent information or publish more than one checksum for a given module and version. I decided to extend Source Spotter to also verify toolchain reproducibility.

The Checksum Database was originally intended for recording the checksums of Go modules. Essentially, it's a verifiable, append-only log of records which say that a particular version (e.g. v0.4.0) of a module (e.g. src.agwa.name/snid) has a particular SHA-256 hash. Go repurposed it for recording toolchain checksums. Toolchain records have the pseudo-module golang.org/toolchain and versions that look like v0.0.1-goVERSION.GOOS-GOARCH. For example, the Go1.24.2 toolchain for linux/amd64 has the module version v0.0.1-go1.24.2.linux-amd64.

When Source Spotter sees a new version of the golang.org/toolchain pseudo-module, it downloads the corresponding source code, builds it in an AWS Lambda function by running make.bash -distpack, and compares the checksum of the resulting Zip file to the checksum published in the Checksum Database. Any mismatches are published on a webpage and in an Atom feed which I monitor.

So far, Source Spotter has successfully reproduced every toolchain since Go 1.21.0, for every architecture and operating system. As of publication time, that's 2,672 toolchains!

Bootstrap Toolchains

Since the Go toolchain is written in Go, building it requires an earlier version of the Go toolchain to be installed already.

When reproducing Go 1.21, 1.22, and 1.23, Source Spotter uses a Go 1.20.14 toolchain that I built from source. I started by building Go 1.4.3 using a C compiler. I used Go 1.4.3 to build Go 1.17.13, which I used to build Go 1.20.14. To mitigate Trusting Trust attacks, I repeated this process on both Debian and Amazon Linux using both GCC and Clang for the Go 1.4 build. I got the exact same bytes every time, which I believe makes a compiler backdoor vanishingly unlikely. The scripts I used for this are open source.

When reproducing Go 1.24 or higher, Source Spotter uses a binary toolchain downloaded from the Go module proxy that it previously verified as being reproducible from source.

Problems Encountered

Compared to reproducing a typical Debian package, it was really easy to reproduce the same bytes when building the Go toolchains. Nevertheless, there were some bumps along the way:

First, the Darwin (macOS) toolchains published by Google contain signatures produced by Google's private key. Obviously, Source Spotter can't reproduce these. Instead, Source Spotter has to download the toolchain (making sure it matches the checksum published in the Checksum Database) and strip the signatures to produce a new checksum that is verified against the reproduced toolchain. I reused code written by Google to strip the signatures and I honestly have no clue what it's doing and whether it could potentially strip a backdoor. A review from someone versed in Darwin binaries would be very helpful!

Second, to reproduce the linux-arm toolchains, Source Spotter has to set GOARM=6 in the environment... except when reproducing Go 1.21.0, which Google accidentally built using GOARM=7. I don't understand why cmd/dist (the tool used to build the toolchain) doesn't set this environment variable along with the many other environment variables it sets.

Finally, the Checksum Database contains a toolchain for Go 1.9.2rc2, which is not a valid version number. It turns out this version was released by mistake. To avoid raising an error for an invalid version number, Source Spotter has to special case it. Not a huge deal, but I found it interesting because it demonstrates one of the downsides of transparency logs: you can't fix or remove entries that were added by mistake!

Source Code Transparency

Although the toolchain binaries are published in the Checksum Database, the source code is not. This means Google could serve Source Spotter, and only Source Spotter, source code which contains a backdoor. To mitigate this, Source Spotter publishes the checksums of every source tarball it builds.

Filippo suggested that Source Spotter build from Go's Git repository and publish the Git commit IDs instead, since lots of Go developers have the Go Git repository checked out and it would be relatively easy for them to compare the state of their repos against what Source Spotter has seen. Regrettably, Git commit IDs are SHA-1, but this is mitigated by Git's use of Marc Stevens' collision detection, so the benefits may be worth the risk. I think building from Git is a good idea, and to bootstrap it, Filippo used Magic Wormhole to send me the output of git show-ref --tags from his repo while we were both at the Transparency.dev Summit last week.

Ultimately, I would like to see the Go project publish source tarballs in the Checksum Database.

Conclusion

Thanks to Go's Checksum Database and reproducible toolchains, Go developers get the usability benefits of a centralized package repository and binary toolchains without sacrificing the security benefits of decentralized packages and building from source. The Go team deserves enormous credit for making this a reality, particularly for building a system that is not too hard for a third party to verify. They've raised the bar, and I hope other language and package ecosystems can learn from what they've done.

Comments

No comments yet.

Post a Comment

Your comment will be public. To contact me privately, email me. Please keep your comment polite, on-topic, and comprehensible. Your comment may be held for moderation before being published.

(Optional; will be published)

(Optional; will not be published)

(Optional; will be published)

  • Blank lines separate paragraphs.
  • Lines starting with > are indented as block quotes.
  • Lines starting with two spaces are reproduced verbatim (good for code).
  • Text surrounded by *asterisks* is italicized.
  • Text surrounded by `back ticks` is monospaced.
  • URLs are turned into links.
  • Use the Preview button to check your formatting.