LibreSSL's PRNG is Unsafe on Linux [Update: LibreSSL fork fix]

Blog

July 13, 2014

LibreSSL's PRNG is Unsafe on Linux [Update: LibreSSL fork fix]

The first version of LibreSSL portable, 2.0.0, was released a few days ago (followed soon after by 2.0.1). Despite the 2.0.x version numbers, these are only preview releases and shouldn't be used in production yet, but have been released to solicit testing and feedback. After testing and examining the codebase, my feedback is that the LibreSSL PRNG is not robust on Linux and is less safe than the OpenSSL PRNG that it replaced.

Consider a test program, fork_rand. When linked with OpenSSL, two different calls to RAND_bytes return different data, as expected:

$ cc -o fork_rand fork_rand.c -lcrypto $ ./fork_rand Grandparent (PID = 2735) random bytes = f05a5e107f5ec880adaeead26cfff164e778bab8e5a44bdf521e1445a5758595 Grandchild (PID = 2735) random bytes = 03688e9834f1c020765c8c5ed2e7a50cdd324648ca36652523d1d71ec06199de

When the same program is linked with LibreSSL, two different calls to RAND_bytes return the same data, which is a catastrophic failure of the PRNG:

$ cc -o fork_rand fork_rand.c libressl-2.0.1/crypto/.libs/libcrypto.a -lrt $ ./fork_rand Grandparent (PID = 2728) random bytes = f5093dc49bc9527d6d8c3864be364368780ae1ed190ca0798bf2d39ced29b88c Grandchild (PID = 2728) random bytes = f5093dc49bc9527d6d8c3864be364368780ae1ed190ca0798bf2d39ced29b88c

The problem is that LibreSSL provides no way to safely use the PRNG after a fork. Forking and PRNGs are a thorny issue - since fork() creates a nearly-identical clone of the parent process, a PRNG will generate identical output in the parent and child processes unless it is reseeded. LibreSSL attempts to detect when a fork occurs by checking the PID (see line 122). If it differs from the last PID seen by the PRNG, it knows that a fork has occurred and automatically reseeds.

This works most of the time. Unfortunately, PIDs are typically only 16 bits long and thus wrap around fairly often. And while a process can never have the same PID as its parent, a process can have the same PID as its grandparent. So a program that forks from a fork risks generating the same random data as the grandparent process. This is what happens in the fork_rand program, which repeatedly forks from a fork until it gets the same PID as the grandparent.

OpenSSL faces the same issue. It too attempts to be fork-safe, by mixing the PID into the PRNG's output, which works as long as PIDs don't wrap around. The difference is that OpenSSL provides a way to explicitly reseed the PRNG by calling RAND_poll. LibreSSL, unfortunately, has turned RAND_poll into a no-op (lines 77-81). fork_rand calls RAND_poll after forking, as do all my OpenSSL-using programs in production, which is why fork_rand is safe under OpenSSL but not LibreSSL.

You may think that fork_rand is a contrived example or that it's unlikely in practice for a process to end up with the same PID as its grandparent. You may be right, but for security-critical code this is not a strong enough guarantee. Attackers often find extremely creative ways to manufacture scenarios favorable for attacks, even when those scenarios are unlikely to occur under normal circumstances.

Bad chroot interaction

A separate but related problem is that LibreSSL provides no good way to use the PRNG from a process running inside a chroot jail. Under Linux, the PRNG is seeded by reading from /dev/urandom upon the first use of RAND_bytes. Unfortunately, /dev/urandom usually doesn't exist inside chroot jails. If LibreSSL fails to read entropy from /dev/urandom, it first tries to get random data using the deprecated sysctl syscall, and if that fails (which will start happening once sysctl is finally removed), it falls back to a truly scary-looking function (lines 306-517) that attempts to get entropy from sketchy sources such as the PID, time of day, memory addresses, and other properties of the running process.

OpenSSL is safer for two reasons:

If OpenSSL can't open /dev/urandom, RAND_bytes returns an error code. Of course the programmer has to check the return value, which many probably don't, but at least OpenSSL allows a competent programmer to use it securely, unlike LibreSSL which will silently return sketchy entropy to even the most meticulous programmer.
OpenSSL allows you to explicitly seed the PRNG by calling RAND_poll, which you can do before entering the chroot jail, avoiding the need to open /dev/urandom once in the jail. Indeed, this is how titus ensures it can use the PRNG from inside its highly-isolated chroot jail. Unfortunately, as discussed above, LibreSSL has turned RAND_poll into a no-op.

What should LibreSSL do?

First, LibreSSL should raise an error if it can't get a good source of entropy. It can do better than OpenSSL by killing the process instead of returning an easily-ignored error code. In fact, there is already a disabled code path in LibreSSL (lines 154-156) that does this. It should be enabled.

Second, LibreSSL should make RAND_poll reseed the PRNG as it does under OpenSSL. This will allow the programmer to guarantee safe and reliable operation after a fork and inside a chroot jail. This is especially important as LibreSSL aims to be a drop-in replacement for OpenSSL. Many properly-written programs have come to rely on OpenSSL's RAND_poll behavior for safe operation, and these programs will become less safe when linked with LibreSSL.

Unfortunately, when I suggested the second change on Hacker News, a LibreSSL developer replied:

The presence or need for a [RAND_poll] function should be considered a serious design flaw.

I agree that in a perfect world, RAND_poll would not be necessary, and that its need is evidence of a design flaw. However, it is evidence of a design flaw not in the cryptographic library, but in the operating system. Unfortunately, Linux provides no reliable way to detect that a process has forked, and exposes entropy via a device file instead of a system call. LibreSSL has to work with what it's given, and on Linux that means RAND_poll is an unfortunate necessity.

Workaround

If the LibreSSL developers don't fix RAND_poll, and you want your code to work safely with both LibreSSL and OpenSSL, then I recommend putting the following code after you fork or before you chroot (i.e. anywhere you would currently need RAND_poll):

unsigned char c;
if (RAND_poll() != 1) {
	/* handle error */
}
if (RAND_bytes(&c, 1) != 1) {
	/* handle error */
}

In essence, always follow a call to RAND_poll with a request for one random byte. The RAND_bytes call will force LibreSSL to seed the PRNG if it's not already seeded, ~~making it unnecessary to later open /dev/urandom from inside the chroot jail~~. It will also force LibreSSL to update the last seen PID, fixing the grandchild PID issue. (Edit: the LibreSSL PRNG periodically re-opens and re-reads /dev/urandom to mix in additional entropy, so unfortunately this won't avoid the need to open /dev/urandom from inside the chroot jail. However, as long as you have a good initial source of entropy, mixing in the sketchy entropy later isn't terrible.)

I really hope it doesn't come to this. Programming with OpenSSL already requires dodging numerous traps and pitfalls, often by deploying obscure workarounds. The LibreSSL developers, through their well-intended effort to eliminate the pitfall of forgetting to call RAND_poll, have actually created a whole new pitfall with its own obscure workaround.

Update (2014-07-16 03:33 UTC): LibreSSL releases fix for fork issue

LibreSSL has released a fix for the fork issue! (Still no word on the chroot/sketchy entropy issue.) Their fix is to use pthread_atfork to register a callback that reseeds the PRNG when fork() is called. Thankfully, they've made this work without requiring the program to link with -lpthread.

I have mixed feelings about this solution, which was discussed in a sub-thread on Hacker News. The fix is a huge step in the right direction but is not perfect - a program that invokes the clone syscall directly will bypass the atfork handlers (Hacker News commenter colmmacc suggests some legitimate reasons a program might do this). I still wish that LibreSSL would, in addition to implementing this solution, just expose an explicit way for the programmer to reseed the PRNG when unusual circumstances require it. This is particularly important since OpenSSL provides this facility and LibreSSL is meant to be a drop-in OpenSSL replacement.

Finally, though I was critical in this blog post, I really appreciate the work the LibreSSL devs are doing, especially their willingness to solicit feedback from the community and act on it. (I also appreciate their willingness to make LibreSSL work on Linux, which, despite being a Linux user, I will readily admit is lacking in several ways that make a CSPRNG implementation difficult.) Ultimately their work will lead to better security for everyone.

Older (View Archive)

xbox.com IPv6 Broken, Buggy DNS to Blame

Newer

STARTTLS Considered Harmful

Comments

Reader x on 2014-07-15 at 00:49:

They should just use pthread_atfork() to reseed the RNG at fork...

Or if they really want to check the PID, then they should also check the process creation time.

Andrew Ayer on 2014-07-15 at 01:15:

pthread_atfork() requires linking with libpthread, which a single-threaded program would not normally do. Otherwise, it's not a bad suggestion. Checking process creation time is a very interesting suggestion (not perfect due to clock changes but still pretty darn good). Can it be done without needing to read /proc, which wouldn't exist in a chroot jail?

Still, on top of everything LibreSSL does to automatically detect forks, it should still expose a way to explicitly reseed the PRNG in an OpenSSL-compatible way, since OpenSSL has made guarantees that certain functions will re-seed the PRNG, and there may be some scenarios where even the best automatic fork detection fails (imagine a program calling the clone syscall directly for whatever reason, in which case pthread_atfork handlers won't be called). Since LibreSSL is billed as a drop-in replacement for OpenSSL, you should not be able to write a valid program that's safe under OpenSSL's guarantees but not when linked with LibreSSL.

Reader The Great Forker on 2014-07-15 at 02:19:

proc(5) says that the 22nd field of /proc/pid/stat, starttime, is clock ticks since since system boot, so this wouldn't be influenced by clock changes. Looks like the only way to get that value is to read /proc though.

Anonymous on 2014-07-15 at 08:57:

LibreSSL also deprecated RAND_status() so you can't tell whether the PRNG was seeded properly from urandom or the "truly scary function"

Anonymous on 2014-07-15 at 12:07:

The creators of LibreSSL have stated that the target platform is going to be OpenBSD, which does things quite differently from Linux. Although I haven't had a chance to test it myself, I would hazard a guess that these issues don't exist on that platform.

They indicated that a lot of work would be required to port LibreSSL to Linux due to the inherent lack of certain secure functions within Linux.

Anonymous on 2014-07-15 at 12:15:

And you obviously haven't heard of LibReSSL Portable being rleased recently?

http://www.libressl.org/

Anonymous on 2014-07-15 at 12:21:

They haven't indicated that a lot of work is required to port LibreSSL to Linux. Most of the secure functions are actually pretty easy; they're entirely implemented in userspace and don't rely on any system specific or third party functionality. So it is a matter of bundling a copy of these functions with the portable distribution, and adding the appropriate checks in the configure system.

But you cannot get around the fact that Linux does not provide a reliable library-wrappable way to get entropy, except for the sysctl which is apparently deprecated. So they try do what they can. Is it good enough? Hard to say, but as it cannot really be worse than what OpenSSL did (if we for a moment ignore the bug with pid based fork checking).

Reader Your Mom on 2014-07-15 at 09:16:

You are right in this analysis and the LibreSSL developers are wrong, if not outright frighteningly incompetent in these matters. Much history and exploits show the necessity for making RAND_poll forcibly reseed with a good source of entropy. Their "truly scary function" is cargo cult cryptography.

Reader no. 6 on 2014-07-15 at 10:43:

RAND_poll is rarely used at all, and correct use is very uncommon. Sure the API exists but looking in package source in Debian and OpenBSD I only found 5 programs using it safely: Net::SSLeay, libevent sample code (also present in the copy of libevent in mozilla trees), Tor, lldpd, dnscrypt-proxy.

A few more use it incorrectly i.e. without checking the return code:

virtuoso, libjingle (some correct checks, some not), uim, kopete, x11vnc/ssvnc, libofetion, libevhtp, libtango, Crypt::SMIME, stone

But then given that OpenSSL doesn't document it (surprise!) and itself uses it unchecked (in the stub for RAND_screen), who can blame them?

I think people are missing the point of the "truly scary function". Read the comments, especially lines 130-153. This is more a case of "if your kernel doesn't provide a reliable means to fetch entropy, we'll try what we can, but there really isn't much we can do".

In my opinion, what is needed is an intent to continue maintaining the sysctl interface until an alternative safe kernel interface is available. This is far simpler than retrofitting the thousands of other programs that use libssl/libcrypto with RAND_poll calls.

Andrew Ayer on 2014-07-15 at 14:17:

That's nice research about RAND_poll. It's possible other programs are using another way of reseeding the PRNG, such as RAND_add. If any program forks without reseeding the OpenSSL PRNG, they are unsafe under OpenSSL and are unsafe under LibreSSL. What I object to are security-conscious programs (such as Tor) doing it right under OpenSSL now being unsafe when linked with an allegedly drop in replacement for OpenSSL. Nothing excuses that even if only a handful of programs are currently safe.

Read the comments, especially lines 130-153. This is more a case of "if your kernel doesn't provide a reliable means to fetch entropy, we'll try what we can, but there really isn't much we can do".

Line 142 suggests a much better alternative to using sketchy entropy: "Could raise(SIGKILL) resulting in silent program termination." They don't justify why silently returning sketchy entropy is better than SIGKILLing the process, except with: "This code path exists to bring light to the issue that Linux does not provide a failsafe API for entropy collection." Well, I agree that's an issue, but I don't think that's a good reason to potentially compromise the entropy gathering of a security critical library.

Also note that even if Linux retains sysctl or provides a safe alternative, it only fixes the chroot issue - forking will still be a problem unless LibreSSL rips out their userspace PRNG entirely and just passes RAND_bytes calls straight through to the syscall.

Anonymous on 2014-07-15 at 14:55:

The rationale for not aborting is there: unsafe core files. Is it worse than sketchy entropy? I can't tell. It's a case by case thing really. Also, the sketchy looking code isn't necessarily all that bad. Keep this in mind people, rdrand is a rather recent addition and prior to that, most common systems weren't expected to have a hardware random generator. So we devised lots of ways to collect entropy from timing and supposedly unpredictable events. The in-kernel entropy generation has always seemed sketchy, just like the fallback function. Ugly it is, but not necessarily bad. Also, some of the kernel entropy is likely to be in the data the fallback uses; see AT_RANDOM, ASLR, PIE, etc. Hopefully though Linux will get a getentropy() call and we can get rid of the sketchy looking code. I think everyone should be happy with that.

Andrew Ayer on 2014-07-15 at 15:17:

Unsafe core files are only an issue if you use SIGABRT. SIGKILL leaves no core. While it's true that there's a certain degree of sketchiness inherent to any entropy gathering, the kernel has access to far better sources of entropy, like network traffic, disk latencies, and mouse movements, and bolsters it by mixing in a random seed that's carried over across reboots. getentropy_fallback does an impressive job considering the circumstances, but it's still not nearly as good as the kernel, and loses quite a bit of its effectiveness if a program is statically linked and addresses of functions are constant.

Reader henning on 2014-07-15 at 13:06:

We really want to see linux provide the getentropy() syscall, which fixes all the mentioned issues. Requiring the consumers to fiddle with the PRNG is not a sustainable "solution".

/dev/*random has way more issues than just chroot. For one - what do you do in the fd exhaustion case?

And yes, there are no such issues natively on OpenBSD, since we do have getentropy and our PIDs are random, plus they don't get recycled quickly.

Andrew Ayer on 2014-07-15 at 13:55:

Having Linux fix the PRNG situation would be the best solution. As I said in the post, this mess is evidence of a design flaw in Linux. However, LibreSSL has to do the best with what it's given.

Reader mar77i on 2014-07-15 at 13:35:

/dev/random isn't that much less safe than /dev/urandom.

http://www.2uo.de/myths-about-urandom/

just to reiterate this.

Andrew Ayer on 2014-07-15 at 14:09:

The /dev/random vs /dev/urandom distinction is completely unrelated to the issue at hand here.

Reader Richard Yao on 2014-07-15 at 19:16:

There is no such thing as a "chroot jail" on Linux. You mean a container using a mix of namespaces and device cgroups. I see no reason why /dev/urandom cannot be made available inside containers:

https://www.kernel.org/doc/Documentation/cgroups/devices.txt

If it is not there, the container is likely misconfigured.

Anonymous on 2014-07-15 at 21:15:

You are mistaken. Linux does have chroot(2) and therefore "chroot jails". The "chroot jail" concept dates back at least to SVr4 and is older than namespaces and cgroups. In fact, chroot is probably the original Unix "container" mechanism. The problems with chroot are well-known, in particular that it only affects filesystem access and that root can easily escape from a chroot jail. Other problems come from the isolation provided by chroot--if /dev isn't bind-mounted inside the chroot jail or a duplicate /dev/urandom created inside the jail, a jailed process does not have access to it. Similarly, data in /proc is inaccessible unless /proc is mounted inside the jail. Most programs that use chroot for security intentionally lock themselves into a very limited environment, thus the use case for a getentropy syscall.

Andrew Ayer on 2014-07-16 at 00:13:

As another commenter pointed out, Linux, as do many Unixes, has chroot(). If you're going to start a program's execution from inside a chroot, it makes sense to set up a basic directory structure. However, a very common privilege separation technique is to start execution outside the jail, open all the resources you need, and then chroot into a completely empty unwritable directory. OpenSSL has the API to make this work (as do other crypto libraries such as libsodium); LibreSSL does not.

Reader Jayson Vantuyl on 2014-07-15 at 22:09:

Memory locks are released on fork, so LibreSSL could lock some single page and watch for it being unlocked to trigger a reseed.

Alternately, what about using the combination of getpid(2) and times(2)? CPU times reset on fork. It's not 100% safe, but it would be pretty good.

Andrew Ayer on 2014-07-16 at 00:30:

Could you clarify what you mean by "memory locks" in this context? As for checking times(), you can always make the fork detection better, and that's a good thing, but if it's not 100% you still need to provide an explicit way for the programmer to reseed.

Reader Nix on 2014-07-16 at 00:50:

By 'memory locks', Jayson means mlock(). Unfortunately this is not really enough: you can mlock(), sure, but detecting that the page is still locked in is hard. Neither munlock() nor mlock() error if asked to act on an already-locked page, and the only way to see if it is still locked (or if anything is still locked) or even paged in is to consult things in /proc/$pid/ -- and if you had access to that, you could reseed more conventionally.

I don't see how you could make this work.

Andrew Ayer on 2014-07-16 at 01:23:

I had a feeling it was mlock(), but I couldn't think of a good way to check if a page is locked. OpenBSD provides a really nifty way to detect forks: you can use the minherit() syscall with the INHERIT_ZERO argument to specify that a page should be replaced with zeros when forking. This experience has been a real eye opener to how an operating system can make it easier to write a secure crypto library.

Reader David Johnston on 2014-07-16 at 22:48:

A library RNG should probably accept its place in the universe as a thing that will get forked and rather than trying to second guess the system it should instead either mix in new entropy on every call or should direct the programmer to use more effective resources (operating system RNG service, instruction set RNG, etc.).

I don't think a linkable library is really the right place for an RNG. It's fine for providing a conservative way to access system entropy, but not fine for operating in isolation and a CSPRNG.

Reader Kenny on 2014-07-15 at 23:00:

pid_t is not 16-bits, though often the default /proc/sys/kernel/pid_max is set to a value that fits in an unsigned 16-bit number (likely to keep ancient programs working).

The value can be raised to a little more than 4 million, and on most of the systems I influence, this number is raised as otherwise busy systems run out of id's. With the limit set much higher, the chance of pid reuse is lessened, but not eliminated.

Andrew Ayer on 2014-07-16 at 00:14:

You're correct - thanks for the clarification. Point still stands that PIDs are reused, and by default it's effectively 16 bits.

Reader Olivier Mengué on 2014-07-17 at 08:17:

My pid_mad is 32768, so this is effectively 15 bits.

Reader Joe on 2014-07-15 at 23:43:

This is the response of one of the OpenBSD developers on twitter:

Miod in the Middle ‏@MiodVallat 4h

People complaining about #LibreSSL PRNG ought to get their OS fixed to provide a decent entropy source instead. `Must be that tall to ride'

Andrew Ayer on 2014-07-16 at 00:15:

As I said in my article, this mess is evidence of a design flaw in Linux, so I kind of agree with the LibreSSL devs. But if they want to provide a portable version of LibreSSL, they need to provide the facilities to make it safe. OpenSSL does.

Reader Antoine on 2014-07-16 at 00:18:

"""It can do better than OpenSSL by killing the process instead of returning an easily-ignored error code."""

Hello? Seriously?? Please, this is not the 1980s anymore. Killing a process on error is a major PITA to anyone writing library code, or bindings for higher-level languages (which usually have a proper exception propagation mechanism, meaning you can't ignore an error return by mistake, you know).

Frankly, anyone who thinks that "killing a process" is a legitimate response for non-catastrophic failures should be banned from writing anything else than throwaway scripts and enterprise frameworks.

Andrew Ayer on 2014-07-16 at 00:25:

From my own selfish perspective, I completely agree with you, since I'm a C++ programmer and I wrap calls to RAND_bytes in a function that checks the return value and throws an exception if it fails. I don't want my programs raising SIGKILL on error. However, I'm all too aware of how cryptographic libraries are used in practice, and since OpenSSL/LibreSSl is a C library, it's all too likely that programmers are going to ignore the return value of RAND_bytes. Fortunately, a missing /dev/urandom is a pretty exceptional error so raising SIGKILL is not too unreasonable.

Reader John Spencer on 2014-07-16 at 13:50:

it's not just a missing /dev/urandom, but also resource exhaustion (for example out-of-fd's) than can cause open("/dev/urandom", ...) to fail. an attacker may find ways to make your application run out of fds (for example by creating many connections).

aborting the program from a library however is very bad and precludes usage of said library in a robust application. they should rather just return an error code when getentropy() fails, so the library user can handle the error gracefully (and if he doesn't check the return value, it's neither the library's fault nor responsibility). doing an abort/kill is only acceptable if the API is misdesigned in a way that prevents checking for such an error.

Reader Jacques on 2014-07-16 at 21:05:

Some excellent catches here. The removal, or rather conversion to noop, for RAND_poll() is especially bizarre. The LibreSSL folks did a disservice to themselves and the integrity of the project by attempting to downplay the issues you've exposed.

Anonymous on 2014-07-17 at 04:40:

If an application knows it is going to chroot it's child away from accessing /dev/urandom, why doesn't the parent take responsibility to provide a named pipe in the chroot environment?

Part of the problem also seems to be that to be a drop-in replacement to OpenSSL requires leaving the API the way it is. Hopefully if LibreSSL gain popularity, they will be able to revise the API to include such things as being able to specify the entropy quality where the calling app decides if a scary function is an acceptable source of entropy or not.

Andrew Ayer on 2014-07-17 at 18:28:

Yes, part of the problem is that on one hand, the LibreSSL developers are trying to make LibreSSL a drop-in replacement for OpenSSL, but on the other hand they want to ignore parts of the API that they don't like. This is a problem even if they are right about those API parts being bad.

Still, even if LibreSSL were being designed from scratch, I'd still want it to provide a way to open /dev/urandom in advance. A long-standing privilege separation idiom is to start execution outside of the chroot, open needed resources, and then chroot into a completely empty directory. We shouldn't change the way we do chroot jails just because LibreSSL refuses to provide an API to make it possible. Even libsodium, a modern crypto library that is frequently lauded for its good design, provides an API to open /dev/urandom in advance. Fortunately, LibreSSL's API deficiency is easily worked around by just asking for 1 byte of random data.

Reader Major Variola on 2014-07-17 at 16:42:

Nice find actually, I appreciated the explanation of the PID / grandparent trick. Thanks!

Reader Daurnimator on 2015-06-01 at 04:38:

http://daurnimator.com/post/120415954844/linux-fork-detection-using-thread-specific

I was reading Linux’s keys.txt and realised that this provided a place to store process or even thread local data that didn’t depend on libc.

I wrote up an example to show how you might accomplish pid detection on Linux utilising the kernel’s thead specific keyring.

https://gist.github.com/daurnimator/dfdbaef3c255bdc11531

Reader zagam on 2016-06-15 at 04:34:

Linux is correct in that it uses a device file. This is the Unix way. To control access to resources everything should be a device file.

You can control who dumps entropy with group write access. However, it looks like any one can in Debian:

crw-rw-rw- 1 root root 1, 9 Jun 15 12:17 /dev/urandom

The ioctl(2) is just a side band of those files.

Need to bind mount more for the chroot or use containers.

Andrew Ayer on 2016-06-19 at 15:26:

Not all information on Unix is retrieved using a device file. For example, getting the time of day is done with a system call. Getting entropy is more analogous to getting the time of day than accessing a device: it's a core function needed by many applications, and there is no need to ever restrict access to it (running out of entropy is a myth created by the Linux man page for random(4)). Combined with the security problems with using a device file for such a security-critical function, you have a compelling case for using a system call for entropy.

Andrew Ayer

Sections

Blog