What Did I Learn from Running the “Reflections on Trusting Trust” Compiler?
Revisiting Ken Thompson’s sourceless backdoor
In October 1983, Dennis Ritchie and Ken Thompson received the Turing Award for their work on Unix. Thompson’s lecture, reprinted in Communications of the ACM under the title “Reflections on Trusting Trust,”5 explained in three steps how to modify a C compiler binary to insert a backdoor when compiling a target program, leaving no trace in any source code. This article revisits that backdoored compiler, presenting the original code Thompson wrote more than 50 years ago. First, a brief review of Thompson’s three steps.
Step 1: Write a Self-Reproducing Program
Step 1 is to write a program that prints its own source code. Although such programs were not widely known in 1975, they are now known in computing as quines and were popularized by Douglas Hofstadter in his book Gödel, Escher, Bach.4 Here is a quine in Python:6
s=' s=%r;print(s%%s) ';print(s%s)
And here is a slightly less cryptic quine in Go:
package main
func main() { print(q + "\x60" + q + "\x60") }
var q = `package main
func main() { print(q + "\x60" + q + "\x60") }
var q = `
The general idea of the solution is to put the text of the program into a string literal, with some kind of placeholder where the string itself should be repeated. Then the program prints the string literal, substituting that same literal for the placeholder. In the Python version, the placeholder is %r; in the Go version, the placeholder is implicit at the end of the string. Quines can be written in essentially any programming language and even in some file formats, such as zip files.3
Step 2: Compilers Learn
Step 2 is to notice that when a compiler compiles itself, important details can persist in the compiler binary without being present in the source code. Thompson gives the example of the numeric values of escape sequences in C strings. You can imagine a compiler containing code like this during the processing of escaped string literals:
c = next();
if(c == '\\') {
c = next();
if(c == 'n')
c = '\n';
}
That code is responsible for processing the two-character sequence \n in a string literal and turning it into a corresponding byte value, specifically '\n'. But that’s a circular definition, and the first time you write code like that, it won’t compile. So, instead, you write c = 10, you compile and install the compiler, and then you can change the code to c = '\n'. The compiler has “learned” the value of '\n', but that value appears only in the compiler binary, not in the source code.
Step 3: Learn a Backdoor
Step 3 is to put these together to help the compiler “learn” to miscompile a target program (login in the lecture). It is fairly straightforward to add nefarious code to a compiler to recognize and modify a particular input program, but that nefarious code would be easy to find if the compiler source were inspected. Instead, we can go deeper, making two changes to the compiler:
• Recognize login and insert the backdoor.
• Recognize the compiler itself and insert the code for these two changes.
The “insert the code for these two changes” step requires being able to write a self-reproducing program: The code must reproduce itself into the new compiler binary. At this point, the compiler binary has “learned” the miscompilation steps, and the clean source code can be restored.
Running the Code
Thompson saved the code for the backdoor in a file named nih.a, a cryptic name for a cryptic program. Today, .a files are normally archives containing compiler object files, but this one contains two source files. The code applies cleanly to the C compiler from the Research Unix Sixth Edition (V6). Let’s run the code and look at what it does. (If you want to type the commands yourself, you can follow along by running Sixth Edition Unix in an online PDP11 simulator.1)
Reflections on “Reflections” and Exercises
When Thompson sent me nih.a and I got it running, my immediate reaction was disbelief at the size of the change: 99 lines of code, plus a 20-line shell script. If you already know how to make a program print itself, the biggest surprise is that there are no surprises! It’s one thing to say, “I know how to do it in theory,” and quite another to see how small and straightforward the backdoor is in practice. Seeing it run and seeing how tiny it is really drives home how easy it would be to make a change like this and how important it is to build from trusted sources using trusted tools. I don’t say any of this to diminish Thompson’s doing it in the first place: It seems easy because he did it and explained it to us. But it’s still very little code for an extremely serious outcome.
Thompson reports that the backdoor never left AT&T, but a version did get installed on the Unix system, used by the Programmer’s Workbench group. That version was discovered because the compiler grew on repeated recompilation.
Exercise: Reintroduce the bug, making nihstr grow by one byte on each recompilation.
Even seeing the code run in the V6 simulator, it can be easy to mentally dismiss this kind of backdoor as an old problem.
Exercise: Add this backdoor to a modern compiler: Change the compiler to compile the usual “hello, world” program to print “backdoored!” instead, and then cover your tracks.
An important advancement of the past couple of decades is that there is a defense against this backdoor, which is to build the compiler source using two different compilers, as shown in figure 1.
Figure 1. Compiler source built using two different compilers
Specifically, suppose you have the suspect binary—compiler 1—and its source code. First, compile that source code with a trusted second compiler, compiler 2, producing compiler 2.1. If everything is on the up and up, compiler 1 and compiler 2.1 should be semantically equivalent, even though they will be very different at the binary level, since they were generated by different compilers. Also, compiler 2.1 cannot contain a binary-only backdoor inserted by compiler 1, since it wasn’t compiled with that compiler. Now compile the source code again with both compiler 1 and compiler 2.1. If they really are semantically equivalent, then the outputs, compilers 1.1 and 2.1.1, should be bit-for-bit identical. If that’s true, then you’ve established that compiler 1 does not insert any backdoors when compiling itself.
The great thing about this process is that you don’t even need to know which of compiler 1 and 2 might be backdoored. If compilers 1.1 and 2.1.1 are identical, then they’re both either clean or backdoored the same way. If compilers 1 and 2 are independent implementations from independent sources, the chance of both being backdoored the same way is far less likely than the chance of compiler 1 being backdoored. We’ve bootstrapped trust in compiler 1 by comparing it against compiler 2, and vice versa. This approach is called diverse double-compiling.7
Exercise: Use diverse double-compiling to detect the backdoor you added in the previous exercise.
This code can be dated to some time in the one-year period from June 1974 to June 1975, probably early 1975, making it at least 50 years old and almost certainly the oldest code ever published in Queue. Thompson kept it for all that time and then shared it with me in September 2023. (I asked for it after he mentioned during a public talk that he was waiting for someone to ask!) An earlier version of this article, which appeared on my personal blog, contains detailed references for the timeline and deployment of the backdoor as well as solutions to these exercises.2
Russ Cox is a distinguished engineer at Google, where he had the good fortune to work with Ken Thompson on the Go programming language and its compiler.
References
1. Cox, R. Research Unix Sixth Edition (WASM). research!rsc, 2023; https://research.swtch.com/v6.
2. Cox, R. Running the “Reflections on Trusting Trust” compiler. research!rsc, 2023; https://research.swtch.com/nih.
3. Cox, R. Zip files all the way down. research!rsc, 2010; https://research.swtch.com/zip.
4. Hofstadter, D. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, New York, NY 1979.
5. Thompson, K. Reflections on trusting trust. Communications of the ACM 27, 8 1984, 761–763; https://tinyurl.com/y3h5qfz4.
6. Toal, R. Quine programs; https://cs.lmu.edu/~ray/notes/quineprograms/.
7. Wheeler, D. Fully countering trusting trust through diverse double-compiling. Ph.D. thesis, George Mason University, 2009; https://dwheeler.com/trusting-trust/.












