It’s the spec bugs that kill you

Most bugs which kill people (and projects) are spec / requirements bugs, not implementation bugs. This could be a missing top-level requirement, a missing implication of some requirement on a lower-level spec, or some emergent behavior which was never spec’d. This post will explore the problem of spec bugs, and discuss possible solutions.

Note that there are several disciplines which target spec bugs (e.g. requirements engineering). I’ll talk about them briefly below.

However, this blog is about creating the Holistic, Dynamic-Simulation-based, Constrained-Random, Big-System Verification Framework, alias SVF (I settled on this acronym after I got friendly hints that HDSCRBSVF, my original favorite, was somewhat less catchy than I assumed).

So I’ll mainly investigate what we can do about spec bugs in the context of SVF.

The story of a bug

Let me start with the story of a bug. I first heard it in a presentation by Michael Jackson, and it has stuck with me ever since (I think I have real psychological issues with so-called “friendly fire” situations). The following is from Michael Jackson:

“A US soldier in Afghanistan used a Precision Lightweight GPS Receiver—a “plugger”—to set coordinates for an air strike. He then saw that the “battery low” warning light was on. He changed the battery, then pressed “Fire.” The device was designed, on starting or resuming operation after a battery change, to initialize the coordinate variables to its own location. The resulting strike killed the user and three comrades”

The archived Washington Post writeup of the incident is here.

So this is a pretty bad way to die. This blog tries, in general, to keep a light tone, but this is serious stuff.

It is also, probably, a spec bug: I have no further information about this particular system, but a complex system like this (consisting of the “plugger”, the airborne system and a lot more) is usually designed by a fairly large group of people, using a hierarchy of design specs (and their related requirements docs). Finally, one reaches a low-level (and often quite short) spec, which is handed to somebody to implement. So the SW guy who wrote that piece of logic (which, upon reset, initializes the “location” register to the current GPS coordinates) probably implemented the spec he was given. And he then probably tested it according to that spec, and it worked fine.

It should have been caught somewhere higher in the spec tree.

Somebody simply did not consider one specific implication of some top-level level requirement (“Don’t harm the ground operator”) on some lower-level specs.

[Edited 11-Aug-2015 to add: Hacker News discussion is here. So far it mainly discusses the plugger incident itself, not the general question of “can simulation-based methods help prevent such spec bugs”.]

Why spec bugs are getting even more important

There are a lot of those spec bugs. Complex socio-technical systems are made of many pieces (each of which could be HW, SW, mechanics, a human procedure, etc.). Any of these pieces, or the interaction between them, could be mis-spec’d. As another example, consider the sad story of Avianca flight 52 , where the bug was in the plane-crew-to-air-traffic-controller human procedure: There were long delays for landing at JFK, the pilot had little English, the co-pilot did not challenge the authority of the air-traffic controller (and also said “priority” rather than “emergency”), and eventually the airplane ran out of fuel and crashed.

The plugger bug is extremely easy to fix, in any number of ways. And it will probably never happen again, because it is now in the collective memory of the people who design such things. The only problem was that plugger-like devices were new at the time, and nobody thought of this.

Similarly, civil aviation (which is a good source of interesting incidents) is actually extremely safe because we had such a long time to fix all the common bugs. The Avianca incident was indeed treated as a bug and then fixed: it “helped create uniform cockpit language that is used by all pilots and ATC around the world”.

However, it seems we are now in the middle of a whole bunch of technological revolutions-in-progress (autonomous vehicles, UAVs, robotics, CRISPR-based biotech and so on) which are going to be particularly hard to verify, because:

They are not “stand-alone boxes”, but rather are multi-component solutions grafted into an already-existing, complex, multi-component environment. Note that the systems where the plugger and Avianca bugs occurred had similar characteristics.
They interact with the open, un-spec’d, wild world out there
They don’t have long-term institutional memory of what-worked-and-what-did-not
Some of them may, as they emerge, change the world around them (as in the bridge-to-Malmo story), making verification even harder.

That last point is worth emphasizing. Some of the technologies we are talking about (e.g. CRISPR) are extremely powerful, and are bound to change the environment in which they operate. With great power, as they say, comes a great need for an undo button. Except there ain’t any: Some genies will just refuse to return to their respective bottles.

So those revolutions will march on. For instance, that CRISPR moratorium is not going to last forever: In fact, it may not last a year. Which means that any crystal ball machinery, even if imperfect, could come in handy.

To summarize: Finding spec bugs early in complex socio-technical systems (or cyber-physical systems, or systems-of-systems) is getting to be pretty important. Hence this post.

Note that all bugs are a result of somebody neglecting to think of something. But in this post, I am discussing spec bugs, i.e. bugs which could reasonably be attributed to a mistake or omission in some spec / requirement.

See also my previous post, which discussed various verification concepts and how they relate to spec bugs.

An example system

Just so we’ll have a concrete example when we talk about systems and spec trees, I picked for you the “system” of landing in airports, complete with air crew, cockpit equipment, air traffic controller and so on.

Of course, landing in airports does not really need to be designed nowadays (“We are now approaching San Francisco International – wait, it just occurred to me that we may have neglected to think of something fairly basic”). However, we’ll pretend it does, precisely because it is a multi-component system we are all familiar with (to some degree – my own familiarity is mainly based on movies, so don’t take the details too seriously).

Ideally, we would like to find spec bugs (like the one that caused the Avianca disaster) before they happen.

Here is a simplified diagram of our Device Under Test (DUT) and the environment around it:

In our example, the DUT is the landing-related system, which consists of the following subsystems (most of which can be further subdivided):

The cockpit: Equipment, pilot, co-pilot
The physical airplane: Body, engines, aerodynamic behavior etc.
The airport: Runways, landing-related equipment, air traffic controller etc.

The environment in which this DUT operates also consists of various subsystems / phenomena, as shown (note that it is actually somewhat hard to draw the line between the DUT and the environment).

In this examples one can think of a lot of scenarios, each with many parameters or knobs (in a CDV environment, those parameters are the obvious candidates for constrained-random generation). Here is a small subset of the knobs which one can set to effect the flow of a specific landing scenario (assume that the actual airport is fixed). Note that some parameters control inputs to the DUT, but some (perhaps surprisingly) control the DUT itself:

Airplane kind: Passenger, cargo, military plane, small hobby Piper?
Airplane configuration: Does the airplane have the Mark-A auto-pilot, or the new, dual-GPS, brushed-aluminum Mark-B? This is not relevant to that Piper.
Airport “operational modes”: Is it in an all-normal mode? Special night-time quiet approach mode? Very busy day with shorter inter-arrival spacing? Emergency landing mode?
Traffic controller to cockpit communication: Any communication disturbances? Are they both speaking in their native language?
Pilot behavior: Is he tired/distracted? Which of many possible decisions will he take?
Further disturbances: Other airplanes, UAVs near airport, landing lights failure, equipment on runway, rain on runway, birds, balloons, kites, and that crazy researcher trying new GPS-jamming techniques (only an issue with the Mark-A auto-pilot).

Here is the corresponding spec tree:

Each box logically contains:

The rough structure and theory-of-operation for that system (e.g. a Word doc)
The requirements from that system (e.g. a Word doc and a Doors file)
Eventually, where applicable, the implementation of that system (e.g. C / Verilog files)

Obviously, this is just one possibility. In particular, there is no reason that specs will mainly talk about physical objects. There could be a box for communications, etc..

The requirements for the top-level system are probably safety, landing-slot efficiency and passenger comfort, in some mix. Note that none of these are absolute: You can say “Safety overrides everything else” till you are blue in the face, but in reality safety is just a very, very high priority. There are always tradeoffs (thus we don’t go for 30-minute inter-landing gaps even though that would have been slightly safer).

This is a general truth about verifying those complex socio-technical systems: There are almost no absolute never-to-be-overridden assertions – this is more like a complex optimization problem. For instance in hospitals, it is not that nobody should ever die there – it is just that we want to avoid somebody dying for silly bugs like mistaken patient identity.

Current approaches for eliminating spec bugs

The fact that spec bugs are important has obviously not escaped system engineers and researchers. Before jumping to my favorite solution for all things (SVF), let me briefly mention some of the current solutions.

Note: In the tradition of this blog, I’ll also explain why some of these solutions, while certainly helpful, still leave some room for improvement. Also, I plan to over-simplify like crazy. And I feel OK about all this, following the noble path of that Douglas Adams character who took it upon himself to insult everybody in the known universe, in alphabetical order.

OK, so here is a partial list:

Requirements engineering is a whole discipline devoted to how to elicit, write, check and distribute requirements
There are various tools for requirement management (the most popular being IBM Doors). They don’t really “understand” the English-language requirements, but they let you enter, and then track, the sub-requirements, design artifacts and verification artifacts corresponding to a requirement.
There is a whole world out there of SW which tries to understand (restricted) variants of English, with a specific emphasis on reading specs. See, for example, here, here and here.
Various formal tools were proposed for checking requirements (written in suitably formal languages). This usually comes with various assume-guarantee frameworks which let you divide-and-conquer the verification job (an excellent idea not just in the formal domain).
People have been suggesting many notations as a “the” notation to help us reason about (and even simulate) specs: UML/SysML/LSC, Matlab/Simulink/Modelica, SystemC, various fault-tree notations, combinations of the above, etc. One could perhaps say that the raison d’être of model-based design is to eliminate spec bugs.
Nancy Leveson, perhaps the best thinker on “why complex systems go bad”, has a model for why they do (STAMP) and a corresponding hazard-analysis technique (STPA).
And so on

Enough already (I hear you say). You are not going to suggest yet another new solution, are you?

The suggested SVF way of dealing with spec bugs

Yes I am. Well, perhaps not completely new. In the context of SVF, I suggest borrowing liberally from the best ideas to create a multi-notation, multi-resolution, aspect-oriented, constraint-driven, simulation-based framework, which will let you actually run, explore and catalogue behaviors.

Ah, so this is yet another executable-spec format (you say). Well, let me talk, will ya? Of course I know that (a) this was done a million times before and (b) this can’t be done. Enough with those interruptions.

Now where was I? Ah yes. Here is why I think this is needed (again, not taking away from all the good stuff mentioned above):

The issue with requirement lists (as in requirements engineering and Doors) is that they are too long. You can’t really hold them in your head and understand their implications. Similarly, while (smallish) fault trees are a pretty intuitive concept, in the end somebody writes the big, official fault-tree document, and nobody can really look at that.
I actually think UML is good for you, in small doses. State charts, object diagrams etc. are pretty helpful. But when I look at hierarchical UML/SysML diagrams of big, complex systems, the eye glazes over. I know there are stronger men than me in this regard, and I admire them for it. But remember: If it were not for the lazy, the function call would never have been invented. One yearns for the ability to e.g. use textual tables to tame repetition. And whenever somebody reassures me that “this whole thing here simply means xxx”, I wish one could just type “xxx” instead.
Those restricted-English systems are still not ready. They are pretty readable, but still not quite writable (you are never sure whether a specific phrase will be understood, and how it will be understood). However, like the aforementioned Douglas Adams character, I am a patient guy: When these systems will be good enough, I’ll be happy to adopt them as another input format.
Formal systems are pretty powerful, but have their own issues (which I discussed here).

But my biggest reason for “why simulation” is the way most people deal with complexity.

Consider again our landing system (figure 1): Understanding such complex, diverse systems and their interrelations is pretty hard. And one of the best ways for people to do that (beyond looking at various views of the static structure) is to try out scenarios involving the various pieces. For these scenarios, users want to:

Animate them and understand them
Observe emergent behavior / properties of scenarios and scenario families
Mutate them in various ways while playing with all those parameters
Look for possible failures within them
Generalize them, and so on
Go back to the whiteboard, discuss what they saw, change the spec / requirements / model, and repeat

In such big, complex systems there is usually no other way to do this except for simulating all the various pieces together, while checking the various assertions, collecting coverage and statistics, and so on.

Note that before you can simulate this environment you first have to model it – a non-trivial task which is the main topic of the next chapter. However, this investment can be amortized over several uses:

Finding spec bugs (the original goal)
Verifying the actual DUT: For some boxes (e.g. your new cockpit radar equipment) you will later remove the model and put in “the real thing” (the real SW, some simulation of the real HW, or even the actual HW box via HIL), but much else will stay unchanged.
Training of various categories of personnel
Doing what-if analysis towards future modifications
And so on

I still did not answer directly “how this will find spec bugs”. I hope the next chapter will do a better job, but in general, I am hoping that SVF will be able to identify (with user help) various “hazard areas” at all spec levels, and then will use various goal-directed techniques to reach those areas (singly or in groups) at a relatively-high frequency, with the hope of triggering “real errors”.

Examples of such hazard areas are:

Code areas marked by the user as “special” or “dangerous”
Functional coverage marked as such, or just plain coverage holes
Assertions or warnings in the model
Instantiated “generic hazard patterns”

About that last point: Michael Jackson (you can tell I like this guy) talks (e.g. here) about generic hazard patterns (like initialization, mode changes, identity and so on) which may cause bugs in many systems. For instance, in hospitals, the identity generic hazard pattern can materialize as either “wrong patient identity” or “wrong drug identity”, both potentially nasty.

Developing a taxonomy of such generic hazard patterns, and applying it to the system-at-hand where applicable (with some user help) may be one way of tackling the “how do you even dream up new basic events” issue mentioned here.

So what should SVF do to support this?

Here is a first pass of the requirements for SVF, if it is to support the kind of simulation-based hunting for spec bugs mentioned above. Note that this is just a skeleton: each of the topics below merits at least another full post. And for some, I am not quite sure what that post will contain ;-).

Also, there are other requirements from SVF, not related to spec bugs, which are not mentioned here.

OK, so here is the list (slightly long – skip to the final paragraph if you get drowsy). SVF should:

Support multiple, convenient notations
- Any specific block within figure 1 may have a “natural” description: A system-dynamics diagram, a BDI human-behavior notation, an activity diagram, some agent-based notation, some tabular format, whatever. But they should all work together seamlessly and efficiently.
Support CDV
- Clearly, one needs constrained-random generation (for all those parameters / knobs). One also needs a uniform way to specify and record failure conditions, functional coverage etc.. That CDV support needs to somehow mesh well with “traditional” modeling languages and notations.
Support hierarchical verification
- Consider the auto-pilot system, which is a level-4 subsystem in the example: Clearly, it has its own (hierarchical) spec, which you will want to verify separately by constructing an auto-pilot-specific environment. But what should it be? It is not unreasonable to suggest that perhaps the full figure-1 environment should be used (as one of the environments) for verifying the auto-pilot. In any case, you will want to reuse as many verification artifacts as possible between the auto-pilot system and the bigger system.
Support multiple modeling resolutions
- You should be able to go wide (describe the whole system in a shallow way) then go deep
Support easy addition of different aspects
- There are many uses for incremental aspect additions in verification – see for instance this article by Morley, Noy and yours truly. For instance, it would be nice if we could graft that Jackson-style generic identity hazard on top of our already-modeled hospital.
Support easy creation of generic libraries
- Even with an ideal SVF, model-writing is going to be the main bottleneck for this to succeed. If we really hope for fast creation of modeling environments, it is imperative that we’ll enable the creation of a good (and growing) set of libraries for various things.
Support heterogeneous simulation
- While it is desirable to simulate all those notations in a single simulation engine, this is sometimes impossible (because one needs to support separate tools, HIL and so on). So this should be possible, ideally without sacrificing repeatability, efficiency etc.. In general, SVF should support many execution platform configurations.
Support goal-directed and scenario-directed simulation
- It should be possible for the simulation to automatically hunt for corner cases, unreached coverage, hazard areas and so on. There is a big (and constantly growing) arsenal of techniques to do that, including concolic simulation, genetic algorithms, formal-like high-level planning using various MBT notations (where applicable), and so on.
Support formal reasoning
- While SVF is mainly about simulation-based bug finding, and while I do think that FV gets better PR than CDV, formal techniques could be pretty helpful: they might help in goal-directed simulation, guide the user through missing spec requirements etc.. Note that most of the language requirements above are oblivious to the intended FV/CDV usage – once we have all those nicely-woven aspects written in all those nice notations, we should try to feed as much of it as possible into the formal side of things.
Support failure / concern management
- You should be able to collect failures from many runs, automatically cluster them into “areas of concerns”, debug / understand them, and discuss them across multiple revisions
Support extensibility:
- It should be possible to add new simulation notations, new goal-directed algorithms (and their corresponding user-supplied hints) and so on, without annoying any existing library.

That’s it for today. Some of the above requirements are pretty big. Others are huge – I am not even sure how to do them. But I think it is worth trying – I have an intuition that some variant of this will eventually work, and this is clearly a problem which needs to be solved.

Your comments are very welcome.

Notes

I’d like to thank Kerstin Eder, Ziv Binyamini, Yaron Kashai, Amiram Yehudai, Sandeep Desai, Shai Fuss, Yael Feldman and Shaul Ben-Haim for reviewing previous versions of this post.

Edited 9-Aug-2015: Added hyperlinks to terminology page.

7 thoughts on “It’s the spec bugs that kill you”

Silas McDermott says:

August 7, 2015 at 4:17 pm

Great post.

It’s worth noting that specification bugs have a higher frequency of occurrence at the boundary of the spec in question. It’s one of the main reasons SOC verification (ie: testing of the interactions between all the various specs in a system) is so important.

Avianca Flight 52 is a good example of this. The issue wasn’t with the cockpit specification or the airport specification. Instead, it was the interaction of the two that led to disaster.

From a modelling perspective, this could easily be represented as an interface between the two specs and either “good” or “bad” communication across that interface. This level of abstraction is one reason I believe modelling of these very large systems to be possible, but I’ll leave any additional comments on that for your next post.

Loading...

Moschops says:

August 11, 2015 at 3:13 pm

No no no, the one true way now is to think of a couple of features, implement them in the space of a week, and foist them onto your unlucky first customers while you start implementing some other features. You wanted to land? Circle for a few weeks, we might have that feature by the end of the month. 🙂

Loading...

Pingback: Tech news August 11th 2015 | devsoda
Pingback: 2 – It’s the spec bugs that kill you | Offer Your
Percy Wenzinger says:

September 2, 2015 at 4:56 pm

This is certainly an extreme example, but I can t tell you how many bugs I ve seen closed as Works As Intended where it may have been what the spec intended but certainly wasn t what the user expected is..

Loading...

Pingback: Musings on 5th Generation Warfare, #2 | thinkpatriot
Pingback: M-SDL, the autonomous vehicles verification language, is now open – The Foretellix Blog

	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…
	Aman on DeepXplore and new ideas for v…

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)