A companion post talks about spec bugs and ways to avoid them. Spec / requirement bugs (as opposed to implementation bugs) occur when the spec for a subsystem does not take into account some higher-level, full-system requirements.
Spec bugs are pretty important, so here I’ll talk about some common verification-related concepts, and (my interpretation of) how they relate to spec bugs. A common theme is that all these concepts claim to be absolute, but are really subsystem-relative.
Verification vs. Validation
Everybody swears by definition that verification is “Are you building the thing right” and validation is “Are you building the right thing”.
This is a valuable distinction for a specific subsystem, but (in any complex system) this is a subsystem-relative concept, not an absolute one (as in “one man’s ceiling is another man’s floor”).
Yes, checking that your shiny new radar display box shows the right, important things to the pilot is indeed validation for that box, but it is also part of the verification of the full airplane-flight-system, i.e. the system which includes all cockpit equipment, the (human) pilot’s procedures, and so on.
Edited 9-Aug-2015 to add: But see also this post.
Performance bugs vs. functional bugs
Performance bugs (or issues, but I call them “bugs” because presumably they were unexpected) are different from functional (“normal”) bugs. But this again is a subsystem-relative, not an absolute, concept:
- In highly-layered (e.g. communications) systems, a subsystem functional bug often manifests itself as a performance bug in some higher level (because some intermediate logic will simply retry until the information passed).
- A performance bug in a subsystem may cause malfunction of the bigger system in some scenario (e.g. your AV can’t have enough acceleration just when you need it to get out of trouble).
Conceptual bugs
When people talk of “a conceptual bug”, they often they mean “a spec bug which was discovered late”.
Example: “OK Sue, this is a bad one: I found the bug but I can’t fix it – it is a conceptual bug. Somebody higher up the spec chain will have to re-spec, and this will cascade to a whole bunch of places. Or perhaps we’ll just never fix it.”
Correct by construction
As I have said before, when somebody says “X is correct by construction”, all they mean is “X was automatically generated from Y, which is hopefully shorter and easier to understand”. This is sometimes not the case – the Y notation may be shorter but no simpler (for the intended audience).
However, in general I am optimistic about creating design artifacts from higher-level notations. For instance, there are multiple projects (e.g. this one) for constructing a state-machine controller out of a bunch of temporal assertions (of the style “x should never happen after y” and so on). The language for specifying them is still cryptic, but it will get better, and one day we’ll have a practical translator which takes a bunch of reasonably-readable assertions and produces procedural code for our controller.
Does this mean there is no longer need for verification, because this is “correct by construction”? Hell no:
- You may not have to verify the translation (like you don’t usually check that GCC did the right thing with your C code), but often the translation is also based on various scripts and hint files, and thus the results still need to be checked. This is one reason why chip designers routinely verify (in simulation) that the netlist produced (automatically) from their higher-level design still does the right thing.
- You still need to verify that the assertions lead to the behavior you meant, and that you did not forget some just because you assumed the machine will implicitly understand them – a common mistake. For instance, one way for your robot to never do anything bad is to never do anything.
- But most of all, you still need to check that this automatically-produced subsystem works well within the bigger system.
Root cause
Once you understand a bug, a natural question to ask is “what is the root cause of this bug” (people also talk about the root cause of an incident – they usually mean “the root cause of the bug causing this incident”).
The “root cause” question can have important legal and managerial implications, but in the context of engineering one can usually give it any number of arbitrary answers.
“The root cause of most aviation accidents is gravity” says an old proverb (which I invented, but Google led me to similar profound thoughts here).
Just to show how arbitrary the “root cause” assignment can be (and why it is usually subsystem-relative), here is the algorithm often used to do this assignment:
- Find what the problem is
- Examine the and-or tree of “things which might cause this problem to go away”
- Fix the problem by implementing the least disruptive sub-tree (usually one fully contained within the sub-spec owned by you)
- Assign the term “root cause” to the fact that this fix was not there originally
Ziv Binyamini sent me some good comments on this post, (partially) quoted below:
“I think you are making a fundamental progress here related to the realization that “everything is relative”.
Your terminology is not really a suggestion of terminology but rather a journey to show that ALL the terms we are using are kind of wrong or on very shaky grounds once you realize that everything is relative (subsystem vs system etc).
This leads me to believe you should start to suggest the terminology and conceptual framework for a hierarchical world.
And obviously supporting a hierarchical framework for system verification is a key requirement.
…
– “Spec / requirement bugs … occur when the spec for a subsystem does not take into account some higher-level, full-system requirements.”. … I guess if you said ““Spec / requirement bugs … occur when the spec for the collective set of subsystems does not guarantee some higher-level, full-system requirements.” The two changes I made:
o A single subsystem many times cannot meet the higher level requirement. Many times the only way to meet is a combination of subsystems specs and some “protocol” of how they together meet the requirement. This collective of all subsystems
o “does not take into account” – well in most cases they did try to take into account it just that they did not do it complete enough. So in the end the spec failed to guarantee the high level requirements.
– Verification vs validation – very good observation about the hierarchical nature but you ended up making observation and not suggesting a fixed terminology. So in your hierarchical subsystem framework what terminology are you suggesting?
– Perf vs functional. Again very good observation on both sides. What I would conclude is that in general bugs are bugs are bugs. Whether they are “functional” or “performance” they cause the system not to behave according to some higher level requirement. Same for power.
– Conceptual bugs: Right. Usually a certain subsystem was not design with this requirement in mind (even though it was always there , and addressing this new requirement will cost a lot since it requires a change in the subsystem architecture itself.”