Trusting Software and Trusting Data
Establishing total information assurance for computer programs is difficult. Software certification & accreditation (C&A) is necessary and critically important, but it is also a costly and time-consuming process. The Department of the Navy spends immense amounts of labor, funds, and personnel time to certify and accredit software. Overhead includes significant “opportunity cost” of people who must live with tedious workarounds and reduced capabilities while waiting for new software programs to be approved.
For example, C&A prior to installing and running a new application on the Navy Marine Corps Intranet (NMCI) has typically cost sponsoring commands many tens of thousands of dollars - and many months - to accomplish. The actual work is highly specialized and often performed by contractors, adding further distance and overhead to the overall process. Once complete (if successful), adding future enhancements and correcting bugs becomes similarly onerous, since follow-on codebase changes must also be carefully examined and tested in order to ensure that new vulnerabilities (either malicious or unintended) have not been introduced.
History can be instructive – some lessons are timeless. Here is one important lesson about the limits of software assurance that often seems to be forgotten.
The Turing Award is considered the equivalent of the Nobel Prize for computer science. Since 1965 it has been awarded annually, with each recipient giving an eagerly anticipated talk describing their work. The Turing Award Lectures are essential reading and show the evolving foundations of computer science.
In 1983, Dennis Ritchie and Ken Thompson jointly received the Turing Award for their development of generic operating systems theory, and specifically for the implementation of the UNIX operating system. Ken Thompson’s lecture was Reflections on Trusting Trust, with the subtitle “To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.” This talk can still surprise: he describes source code that looks like it does one thing, but actually performs things that are quite different. Here are key excerpts, quoted from the original.
- Stage I. In college, before video games, we would amuse ourselves by posing programming exercises. One of the favorites was to write the shortest self-reproducing program. [...]
- Stage II. The C compiler is written in C. What I am about to describe is one of many "chicken and egg" problems that arise when compilers are written in their own language. [...] shows a minimalist self-replicating code algorithm [...] This is a deep concept. It is as close to a "learning" program as I have seen. You simply tell it once, then you can use this self-referencing definition.
- Stage III. [...] Figure 6 shows a simple modification to the compiler that will deliberately miscompile source whenever a particular pattern is matched. If this were not deliberate, it would be called a compiler "bug." Since it is deliberate, it should be called a "Trojan horse." [...]
- The actual bug I planted in the compiler would match code in the UNIX "login" command. The replacement code would miscompile the login command so that it would accept either the intended encrypted password or a particular known password. Thus if this code were installed in binary and the binary were used to compile the login command, I could log into that system as any user.
- Moral. The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code. In demonstrating the possibility of this kind of attack, I picked on the C compiler. I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well-installed microcode bug will be almost impossible to detect.
So in effect, Ken Thompson chose his Turing Award moment to reveal to the world that he had superuser and user access for every Unix system and server on the planet. Further he revealed that, even with a great many people scrutinizing and rebuilding the source code, and even despite users banging on Unix daily everywhere, anyone else might use a super password for each and every account. Meanwhile no one else knew that the super password existed, much less that it quietly insisted on re-propagating itself in each fresh new copy of Unix.
What an amazing reveal. I’ve always imagined that some people in the audience that day might not have waited for the end of the lecture, instead rushing out and calling back to their offices, sounding the alarm to shut down all computer access!
These fundamental principles and constraints about software testing remain unchanged. Therefore it is quite reasonable for anyone today to understand that, at best, an extremely rigorous software certification and accreditation evaluation still has limits nevertheless. Strictly speaking, even the best evaluators can only conclude “we didn’t notice or detect anything bad happening when we tested the codebase.” Even more worrisome are accompanying disclaimers like “the accredited software is only considered secure when run in a secure operating environment, on secure hardware... at all times.”
Perhaps considering a data-centric point of view can help us. Dialog in the Data Dilemma MMOWGLI game clearly shows that the Navy has great dependence – and even greater potential benefit – deriving from data that might be shared broadly. Data sharing can occur both “outside” with public and partners, as well as “inside” among Navy stakeholder communities. Might that data-centric point of view help to improve our information assurance in ways that are beyond the expressive power of software to guarantee?
Data is simpler than software and a lot easier to check. Data that is frequently used also tends be well defined. We ought to take advantage of those traits, in the large, across all of our information systems. It is time to consider how Data Security might complement Software Security.
- Can we create data that is valid, signed, trusted, certified, accredited and secure at birth?
- Can we use, re-use, adapt and “mash up” secure data throughout its lifetime and lifecycle?
- Can we reduce code complexity and attackable surface within our software applications, by focusing on the full information assurance (IA) of the data they are producing and consuming?
- Can the same security techniques be used for data in motion, data at rest, and data in use - across multiple applications and also within the cloud?
A good check question for any broad concept is “assume success – then what?” Let’s apply that test to this potential approach. If data security can indeed be accomplished to properly complement software security, then here is one possible cybersecurity scenario:
- Incident: applications in a networked enclave are 100% penetrated by malevolent intruders, who are later detected and locked out.
- Impact: no unauthorized access to information occurs because all data sets remain secure.
Data-centric security presents worthy challenges… that are beginning to appear feasible. Open international standards provide major building blocks to work with. Pieces of this puzzle are getting pushed around right now, with contributions by many thoughtful players in the Data Dilemma game. Much more expertise is available to provide help on every question… if we can find the right paths forward. Simply perpetuating the current status quo and maintaining an unchanging course down An Unsustainable Path doesn’t scale to meet our growing challenges.
Thirty two years have passed since Ken Thompson's revelation... I wonder whether anyone is calling back to headquarters yet.
How does the Navy get beyond software barriers to reach the next level of capability: trust for shared data?