Is a Universal Verification Framework possible?

Summary: This post investigates the (slightly crazy) idea of creating “one universal verification framework for everything”, and claims that while the goal itself is probably impossible (for now), some interesting and worthwhile sub-goals are perhaps possible.

As many people (including myself) have discussed before, there is no single “verification community”: there are really many verification “tribes”, corresponding to verification areas (mainly split mainly by industry, but also by other factors). They have different terminologies and different belief systems (about what’s important and how to do things).

So the current state of (multi-area) verification looks like this (details of the various areas will be given later):

This blog is trying, in a small way, to remove that cloud (e.g. via the terminology page). But why stop there?

So here is a funny question: Can a “Universal Verification Framework” (UVF) be constructed? I mean an actual, single tool / tool-suite which will work (with suitable “plugins” and options) for most of those areas/tribes?

In other words, is the following picture possible:

At first glance (and perhaps at all subsequent glances 😉 ) this looks fairly crazy. It may not even be possible to create a single-industry verification framework, let alone a multi-industry UVF.

Those industries are very different in their verification practices and those differences are not just arbitrary, historical whims. For instance, it makes sense for security verification to concentrate on “negative testing”, while everybody else mainly does “positive testing”. Similarly, Machine Learning verification has to be black-box, because Machine Learning systems are, almost by definition, opaque. And so on.

So this is a crazy idea. But I find it an interesting crazy idea: I think we can learn a lot from investigating it. And there are very good reasons wanting it to succeed (e.g. the fact that complex systems are increasingly composed of multiple areas). And even if UVF itself is impossible, perhaps we can at least move along the arrow marked “progress” below:

BTW, IBM’s Haifa Research Lab sort-of went for option 3: They developed a team, a methodology and some SW modules (a constraint engine, a coverage engine, an analytics engine etc.), and used those to create multiple verification tools for different areas (in HW, SW and security). See for instance here.

The rest of this post is organized as follows:

I’ll enumerate the various areas, highlighting their verification differences
I’ll explain why a UVF is impossible
I’ll explain why I really want it to be possible, at least in some limited sense
Because hope cannot be the whole strategy, I’ll then discuss how a potential UVF could reconcile some of those verification differences
I’ll finally attempt a summary

Notes:

The UVF I am talking about is mainly about dynamic verification, because (1) I am less familiar with formal verification and (2) dynamic verification finds most of the bugs, and will probably continue to do so – see here. However, I would love for somebody to do something similar for formal.
In case you were wondering, SVF (the framework which Foretellix is trying to build, and the main topic of this blog) is the subset of UVF which is hopefully implementable much sooner.

The various areas and how they are different

Here is a short, superficial list of some of the related industries, and how their verification practices differ:

HW module
- Cost of failure high, so 50% of engineering effort is verification
- Well-developed methodology, mainly using CDV
- Note: This is one area where there is a common per-area framework
SoCs + low level SW
- Less developed than HW module
- Needs CDV + scenarios / use cases (because space is too big and undefined)
Machine Learning (ML)
- Includes classification (e.g. via neural networks), reaction and so on.
- Hard to verify: black box, may change on-the-fly
- Nobody has a good scheme – see here
Intelligent Autonomous Systems (IAS)
- Includes autonomous robots, autonomous vehicles (AVs) and UAVs
- “Infinite world” and hard to spec
- The VE includes people
- Lots of HW-in-the-loop / people-in-the-loop verification, less “virtual” verification
- Nobody has a good scheme – see here regarding AV verification
Mil/Aero
- Includes airplanes, missile systems, …
- Many layers of specs, danger of spec bugs – see here.
- Simulation often on multiple, different machines.
- Simulation mainly for training (but also for verification)
Plant
- Includes nuclear power plants, airports, hospitals and so on
- People / human procedures are part of the DUT
- Looking for functional/safety bugs and also for performance / optimality bugs
- See the example system here.
Server
- This includes high-availability multi-server systems (e.g. database systems)
- Very hard to create controllable / repeatable verification environment
- But some do it – see last chapter here.
Security
- Includes just the verification of security, e.g. vulnerability detection
- A hot topic currently
- Mainly “negative testing”
- Mainly checking for just crashes / memory leaks
- Lots of success with “fuzzing” (which is CDV-like) – see here.
- Success in auto-filling of code/implementation coverage
Urban
- Includes various city-wide transportation / infrastructure systems: Stoplight systems, bridges, parking designs, new road designs, …
- Looking almost exclusively for performance / optimality bugs. See e.g. here.
Others
- Includes everything not mentioned above – a huge range: OS, device drivers, compilers, medical equipment, communications equipment, non-autonomous cars, web pages, apps, banking SW, insurance SW, and so on.

Why a UVF is probably impossible (for now)

As we saw above, the various areas are pretty different, verification-wise. So if we want to do something which will work for all of them, we’ll have to go to a least common denominator. And the obvious common denominators (e.g. a C compiler) hardly count as pieces of a UVF.

Also, trying to force things into a framework can lead to lovely inventions like the mini-car / kitchen-blender combo (“Why would anybody want to buy two separate multi-speed engines?”).

Another reason why UVF may never happen is that there are already a bunch of alternatives to it:

Per-area/industry verification frameworks
- Did not happen so far (with the exception of the HW modules area)
- Perhaps because these areas are split into many sub-areas
Adding verification to a per-sub-area non-verification tool
- Many sub-areas have their own common/standard simulators / modeling tools / packages: Robotics has ROS and Gazebo, automotive has CarSim and TruckSim, etc.. Note that simulators have many uses – verification is a small part.
- This may actually be the strongest alternative: Most people writing tests for robotics may need to know about ROS/Gazebo anyway, so it makes sense for a robotics verification framework to emerge out of those.
Universal modeling languages / simulators
- There are quite a few candidates for that, e.g. UML/SysML/LSC, Matlab/Simulink/Modelica, SystemC, AnyLogic, Ptolemy II, …
- BTW, a lot of effort has been expanded on those, but none emerged as universal
Various stand-alone verification-related point tools
- Quite a few exist: Continuous integration tools, address sanitizers, model checking tools, …

Why UVF may still happen (in some sense)

While there are excellent reasons to suspect UVF is impossible, there are also strong (and growing) reasons for UVF. The main ones are:

As verification becomes more important, UVF becomes more compelling
It succeeded in one area (HW modules)
UVF can save a huge amount of repetitive effort
Bigger systems need multiple areas
Many of the differences are not absolute, but rather a matter of degree

And now for the details:

As verification becomes more important, UVF becomes more compelling: There is sort of a “Maslow’s hierarchy of verification needs” (like Maslow’s hierarchy of human needs). Consider the following figure:

Many of these areas are fast-moving fields. The leading lights there are busy inventing the design side of the area, with verification playing second fiddle. But if / when verification becomes more important, UVF will suddenly look a lot more attractive.

Take autonomous vehicles: As far as I know, AV verification is currently a lot less than 50% of the AV engineering effort. This is not because these guys don’t care about safety: In fact, they expend a lot of effort into safety (better vision algorithms, better reaction logic and so on). It is just that safety verification still plays second fiddle.

This is probably the right trade-off for now. But it could change fairly quickly. Here is one way this could happen:

Currently there are ~100 fatal accidents / day in the USA
AVs would bring this number, say, to 20 / day – a very good result
However, this will still leave a few cases / day where a lawyer could say to some judge/jury: “A bug in the autonomous car killed my client’s child because of insufficient verification, and we demand $10,000,000”
Whatever the results, the judge/jury (and the press) will have to hear verification experts and get immersed in this area.
After a while, some regulatory body will be set up to draft explicit and very strict verification requirements
That could bump verification to 50% of AV engineering

As this happens in more and more areas, regulators / lawmakers may also push for uniform ways to assess safety and quality-of-verification, and thus for uniform multi-area terminologies, concepts and rules: To avoid confusion, it makes sense to have similar rules and terminologies for e.g. AVs, civilian UAVs and domestic robots (where applicable).

It succeeded in one area (HW modules): In the HW modules area, there is a single, agreed-upon technique for verification (CDV), as represented by Hardware Verification Languages (HVLs). There is even (to a lesser extent) a single agreed-upon methodology (UVM).

This started about 20 years ago, and is now an acknowledged fact: most module verification is done that way, in spite of the fact that the verified modules / subsystems are extremely different from each other: Ethernet controllers, ALUs, cache subsystems, anything – even analog HW blocks are (slowly) joining the fold.

And if you say “wait, that’s just one area, not multi-area”, I’ll forgive you – you are simply too young to remember how these different HW module kinds seemed like different kingdoms at the time.

Note that a similar unification is starting to happen in the area of SoC/LL SW. Further, the unified result may end up having some similarity to CDV.

UVF can save a huge amount of repetitive effort: The CDV revolution mentioned above brought extreme savings, removing waste in training, documentation, tool creation, integration of multi-module verification environments, and so on. These savings point the way to what may be possible with UVF, if we can do it.

And the higher up an area climbs in that Maslow hierarchy, the bigger the savings: So now they also want fancy failure clustering? Check. Automatic selection of tests based on change to the DUT? Check. The huge matrix of fancy-verification-function cross verification-sub-area, currently being randomly filled by research and industry, could be replaced by a single best-of-breed vector.

Bigger systems need multiple areas: Consider AVs again. While they belong to the IAS area, they actually need multiple other areas:

The video/Radar interpretation may be done via Machine Learning
Security verification is important (we don’t want somebody to take over the AV)
The AV clearly contains a lot of HW
Urban / traffic verification may also be relevant

If each of these has its own, very-different-and-incompatible verification framework, then:

It is hard to reason about the combined verification results
It is hard to move verification knowledge / artifacts between teams
Often in verification, what is the DUT for you is the VE for my DUT. This is much harder to achieve without a common framework.

UVF will take a big step towards solving those problems.

Many of the differences are not absolute, but rather a matter of degree: What may look like a binary difference (some areas need it, others don’t) often emerges to be something other areas also need, except to a lesser degree.

For instance, security is about negative testing and everything else is mainly about positive testing, but negative testing (what the DUT does upon erroneous input) is still pretty important for other areas.

Thus, the next chapter will go through some of the differences and discuss how they could be reconciled in a tentative UVF.

Reconciling the verification differences between the areas

Here are some verification dimensions on which verification tribes differ. For each of these, I suggest how UVF should handle the conflict.

Positive vs. negative testing
- Security is mainly about negative testing (i.e. seeing that erroneous input does not cause problems), others are mainly about positive testing
- But almost every area needs some negative testing. E.g. many safety bugs start from error handling going bad.
- UVF should support any mix of negative and positive testing.
Gen/check trade-offs
- In security, most checking is about crashes and memory leaks, and is not related to actual DUT spec
- This is an extreme case of the gen/check trade-offs: For many complex systems (e.g. IAS), one either creates tall-and-narrow scenarios (very specific scenarios where a lot can be checked), or short-and-wide scenarios (very general, unrestricted behavior, but very limited checking, e.g. just for crash/reset). Security just goes for the short-and-wide end of the spectrum
- UVF should support the whole range, including mixes.
Black-box / white-box testing:
- ML verification is almost completely black-box, while most areas allow white-box / black-box verification. Note that acceptance testing is almost completely black-box in all areas.
- UVF should support a mix.
Coverage-hole filling strategy
- Automatic coverage filling (alias Coverage Driven Generation or CDG) is a hot research topic, and has been tried by many in industry, with limited success
- In HW module / SoC verification, there is only limited success, mainly using MBT for filling input coverage.
- Security fuzzers (e.g. AFL and Sage) do have practical success in auto-filling various code / implementation coverage metrics (as a means for finding more vulnerabilities)
- UVF should support an open auto-filling architecture with various plugins
Difficulty of writing / debugging specs
- In some areas (e.g. related to HW verification) the specs are fairly complete
- In other areas (e.g. IAS, Mil/Aero, plant) many of the bugs happen between layers of specs, and spec’ing itself is pretty hard
- UVF should contain facilities to help find spec bugs, as suggested towards the end of this post.
Special modeling requirements
- Some areas need modeling of people’s behavior for either the DUT or the VE (e.g. via BDI).
- Some areas need various other special modeling notations, e.g. System Dynamics.
- UVF should allow many modeling notations, including the above
Very different execution platforms
- Some areas (e.g. IAS, Mil/Aero) put a huge emphasis on HW-in-the-loop and even people-in-the-loop verification. Mil/Aero also puts a big emphasis on very expensive “real life” testing (e.g. actually firing a missile).
- When they do run simulations (“virtual verification”), this is often parallel simulation on heterogeneous machines, and thus often non-repeatable.
- Others (e.g. most HW verification) mainly use repeatable, simulation-based verification
- UVF should support all. In particular, it should contain facilities for minimizing/optimizing expensive tests (e.g. via Combinatoric Test Design)
Performance verification
- Some areas (e.g. urban) are mostly about performance / optimality verification
- UVF should support multi-aspect verification: Functional, performance, reliability and so on.

Summary

UVF seems impossible (for now), and yet perhaps necessary (eventually). What should be do about it?

My own intuition is that upside is too big to miss completely, so we should do something about it. Here are some ideas:

Perhaps start working towards a unified conceptual framework (a “unified theory of verification”).
Investigate the practicality of the “architecture + some modules” option
See if a per-area solution is practical for some [sub]area, and learn from that. Anybody knows a good candidate?

Comments are very welcome.

Notes

I’d like to thank Ziv Binyamini, Sandeep Desai, Yaron Kashai, Amiram Yehudai and Matan Vax for reviewing previous drafts of this post.

17-Nov-2015: My HVC 2015 presentation about this is available here.

	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…
	Aman on DeepXplore and new ideas for v…

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)