A High Dependability Computing Consortium

James H. Morris, Dean

Carnegie Mellon School of Computer Science

December 9, 2000 (Updated from April, 2000)

This essay suggests that universities, government, and industry should initiate a long-term research and education program to make computing and communication systems dependable enough for people to trust with their everyday lives and livelihoods.


Why is something needed?

We're about to reach the end of what might be known as the golden age of personal computer software. Like the automobiles of the 1950's, the software of the 1990's delighted and amused us despite its many flaws and shortcomings. In the 1950's what was good for the car industry was good for the US¾ an argument that in ways has applied to the software and "dot com" industries in the 1990's. As with car quality in the 1950's, it is widely argued that it is a disservice to stockholders to make software more reliable than the market has demanded. Instead of solid engineering values, fancy features and horsepower are the two factors used to sell computing systems. While this euphoric era of desktop computing will be remembered fondly by many, its days are numbered.

The current era of desktop computing will pass soon, just as it did for automobiles when the combination of oil shortages and Japanese manufacturing prowess threw Detroit from the leading force in the economy into part of the rust belt. It will only be possible to pin down the triggering factors for demise of the current "golden era" of computer software in retrospect. But, it seems likely that the shift to a new era will involve factors such as the globalization of the software business, the adoption of desktop computers as an essential business tool rather than an occasional productivity enhancer, and the continuing proliferation of computers into embedded systems that form the new infrastructure of our society. The issue of what is eventually said to cause the transition to this new era of computing systems is, however, not as important as the fact that it is inevitable in our changing world.

A central theme of the new era of computing will be an absolute requirement for high dependability (but, without the traditionally exorbitant price tag usually associated with critical systems). Public computing fiascoes such as probes crashing into Mars or on-line services of all sorts going belly up for a hours at a time are only the tip of the iceberg of this need. Every one of us personally experiences computing system meltdowns on a regular basis, and it would be no surprise if our children develop a quick reflex for pressing control-alt-delete before they've memorized their multiplication tables. While stories of bad software killing people are still rare, they exist and may portend the future. The lessons of the Y2K experience are twofold: such problems can indeed be overcome by dint of extraordinary effort and expenditures, but just as importantly, we rely upon computers far more than we fully realize until we're forced to step back and take notice of the true situation.

The point is that enthusiasm for computers has progressed to the point that our society is already completely committed to using them, and is becoming utterly dependent on them working correctly and continuously. But, commercial computer systems, as we currently build them, simply aren't worthy of our unreserved trust and confidence.

A number of related, long-term trends draw attention to the need for HDCC, including:

There has not been widespread public outcry for better computer systems, but there have been a few persuasive journalistic essays recently:
http://www.salon.com/tech/feature/2000/12/06/bad_computers/index.html, http://www.latimes.com/business/cutting/20001127/t000113753.html.

There is a pendulum effect in which certain industries alternate between an emphasis on dependability and a quest for new features. In the auto industry, that pendulum reached it's features apex sometime in the 1960's when tail fins counted as important features and Charles Wilson of General Motors suggested to Congress that "What is good for General Motors is good for the country." The pendulum began to swing back when Ralph Nader wrote "Unsafe at Any Speed." Perhaps the apex-defining quote for the computer industry has already been uttered by Guy Kawasaki: "Don't worry be crappy." http://www.brandera.com/features/00/02/14/readings.html. We're still waiting for the defining book to be written. Maybe its title should be "Unsafe at Any Clock Speed."

What is it?

We propose to create a consortium of universities, government agencies, and corporationsthe High Dependability Computing and Communication Consortium¾ to undertake basic, empirical, and engineering research aimed at making the creation and maintenance of computer systems a true professional discipline comparable to civil engineering and medicine¾ disciplines people stake their lives on without question.

It will have a permanent research and education program that transforms computing practices over the next 50 years. The researchers and educators should number about 500 and be contributed by the partners.

It is envisioned to have a central base of operations in the San Francisco Bay Area, but incorporate activities around the country and, as appropriate, around the world in member organization locations.

Strategic Goals

The HDCC research agenda embodies four strategic goals.

Protect the Public. We must assure the nation's critical infrastructure services upon which individual citizens depend. To meet this strategic goal, we must identify and promote technologies that can increase confidence in the safety, reliability, trustworthiness, security, timeliness, and survivability of systems such as transportation systems and communications systems.

Protect the Consumer. We must find cost-effective means to gain assurance that enables commercial products to meet certain minimum quality standards. This includes expedited quality certification, validation, and verification; shortened times to market; simplicity of use; plug-and-play interconnection; lower lifecycle costs; and improved customer satisfaction. Confidence is needed in consumer products and services. Such products could include "smart" cars, medical devices, consumer electronics, business systems, smart houses, sensor technologies, Global Positioning System (GPS) receivers, smart cards, educational technologies, electronic commerce software packages, educational technologies, and digital libraries.

Preserve Competitiveness. Software production is the ultimate intellectual industry and there are few barriers to entry. Ten years ago we felt beleaguered because the Japanese engineering culture seemed to be dominating us in electronics and semi-conductors. Wise men (Gordon Bell, for one) said we must change the game; and, indeed, we did by making it a software/network game. But now the game is clear to all and we can expect crushing competition, not only in price but also in deep ideas. Educating more hackers will not solve our problem; we must educate new generations of sophisticated software engineers backed by new science to stay ahead in the global economic race.

Promote National Security. Dependability is most crucial to military systems that are used to defend our national interests. National security will require defense-in-depth protection services and assurance that those services will perform as required. However, economic reality will dictate that these services be accomplished using largely commercial rather than specifically military technology.

Scope

The relentless pressure to keep up with "Internet Time" results in most organizations using ad hoc approaches to survive on a daily basis, with no time or energy left for long-term investments in surviving the coming months and years. While such an approach can be made to work in the short term, it is inherently inadequate at addressing trends over the span of years or decades. Instead, it is vital that a concerted effort be made to prepare for downstream problems in a number of key areas. The long-term scope will evolve as appropriate to address the hard, long-term problems facing us. Current areas include:

Activities

Six research and education activities will contribute to the HDCC strategic goals:

  1. Provide a sound theoretical, scientific and technological basis for assured construction of safe, secure systems. To meet this goal, the research agenda must:

These are still hot topics in universities despite the general acceptance of C (and perhaps, someday, Java) as do-everything programming languages. Ultimately the proper and reliable functioning of a system depends upon people describing their designs in a formal specification, namely a language. When the language is shaky, the entire edifice will be built on a soft foundation. Special areas of interest include applications of logic, techniques for designing and implementing programming languages, and formal specification and verification of hardware and software systems. It is important to apply these techniques to problems of realistic scale and complexity, for example: implementation of high speed network communication software and application of type theoretic principles in the construction of compilers for proof carrying code. For Carnegie Mellon activities in principles of programming see http://www.cs.cmu.edu/Groups/pop/pop.html

  1. Develop hardware, software, and system engineering tools that incorporate ubiquitous, application-based, domain-based, and risk-based assurance. To meet this goal the HDCC research agenda must:

Software Engineering has grown into a field of Computer Science in its own right. Its aim is that systems constructed from software can attain the same reliability and predictability as bridges and other symbols of engineering excellence. At Carnegie Mellon much of the research and education in this field is conducted by the Institute for Software Research (http://spoke.compose.cs.cmu.edu/isri/) and the Software Engineering Institute (http://www.sei.cmu.edu/).

  1. Reduce the effort, time, and cost of assurance and quality certification processes. To meet this goal, the HDCC research agenda must:

The industrial use of system analysis and verification tools has been limited, but university researchers have made considerable progress in producing tools that find bugs in real hardware and software. So far, most of the success has been in hardware where complexity is lower and specifications cleaner; but there have been promising successes in software as well. For Carnegie Mellon activities in formal systems see http://www.cs.cmu.edu/Groups/formal-methods/formal-methods.html

  1. Understand the human problems in creating, maintaining, and using computer systems. This has become a vital area of research as computers have become ubiquitous. Seat-of-the-pants design might have been sufficient when the users of computers were engineers, scientists, and programmers; but now a deep understanding of human capabilities must be built into design because the users are often very different from the designers. "Pilot error" is the most frequently cited cause of airline mishaps, and "programmer error" is similarly often the purported cause of software defects, except in the frequent case in which problems are blamed on "user error". We need to understand and account for the capabilities of both the designers and end users of systems. For Carnegie Mellon activities in human-computer interaction see http://www.hcii.cmu.edu/.
  2. Provide measures of results. To meet this goal, the HDCC research agenda must:

One reason to do system fault discovery is to find a metric. Fault discovery is only somewhat helpful as a debugging technique¾ it is much more powerful as a quality assurance technique in support of building dependable systems. For some Carnegie Mellon research in this area see http://www.ices.cmu.edu/ballista

  1. Promote software engineering education. Currently, de facto software engineers coming from universities are emerging from departments of computer science and engineering. Unfortunately the computer scientists are often too theoretical while the engineers are often too hardware-oriented. What is needed is professional education akin to what medical doctors receive, but nobody is doing it. Both software engineering research and education must have strong connections to practice: education needs a practical setting to develop skill, and research needs access to real problems that expose the deep issues involved in real-world development.

We should create an institution that serves software engineering as a teaching hospital serves medicine. Students would learn in the context of real cases. Clinical faculty would both practice and teach. Research would exploit access to real cases and data. We would provide a development laboratory in which real software developers produce real software for real clients. Developers would interact with researchers to infuse the research agenda with visibility into real problems, and developers can take advantage of research results. Students would learn through direct experience in a real¾ not just "realistic"¾ setting. Clinical faculty would be skilled professional software developers and have significant responsibilities for both teaching and software production

Who Should Participate

As shapers of the future, universities should address the software quality problem now, before the world at large sees a crisis. Just as John Hopkins led a reform in medical practice in the early 20th century, we can lead a reform in software practice now. Fortunately, this effort needn't begin from scratch because computer scientists and academic software engineers have always taken the issue of software quality seriously. Computer science's first gift to industry was the programming language, which has now been thoroughly digested and exploited. It's time to continue that tradition with a practical, but comprehensive way to create and operate dependable systems.

The universities whose faculty have expressed interest so far are Carnegie Mellon, Georgia Institute of Technology, MIT, and the Universities of California and Washington. Collectively, these schools have diverse group of researchers already attacking the problem and a strong commitment to engineering education.

For inspiration, look to a 15th century character, Prince Henry the Navigator of Portugal. He was the first great program manager. Intent on finding a westward route to India, he founded schools for navigators and research into shipbuilding. Columbus et al. were the ultimate instruments of his foresighted plan. He died long before 1492. While the government's role should not really be to seek silver bullets to solve any one problem, they have a definite role to play in leading and creating a real movement.

The government agency members should include:

The major event in the last twenty years in the computer field is that the industry has taken the lead in the creation of real systems. The academically oriented ACM Software and Systems Award has been going to industrial projects since 1982: UNIX, System R, and the Alto System, to name a few. Some of Software Engineering's academic leaders (Fred Brooks, Barry Boehm, Watts Humphreys, and David Garlan) developed their insights in industrial settings and then moved to continue work in academe. It is essential that experienced engineers from industry contribute their wisdom to subsequent generations

The following companies have expressed an interest in the project and signed a memorandum of understanding committing to a planning process: Adobe Systems, Compaq, Hewlett-Packard, IBM , Ilog, Marimba, Microsoft, Novell, SGI, Siebel Systems, SUN Microsystems, and Sybase.