On that “verification range” issue

Posted on July 6, 2015May 29, 2017 by Yoav Hollander

I got interesting comments from some [1] loyal readers of this blog, about these “examples of systems to verify” mentioned in my initial post:

Some people claimed that these systems are really so different in scale that they are really incomparable. In particular, I heard the claim that hardware verification (where I spent much of my adult life) is easier because all the interfaces are well-defined.

Clearly, there is a lot of truth to that. And I know from personal experience that even the step from (1) to (2) is hard: For instance, in “hardware module” verification, the primary focus is to try to reach all corner cases. It took us a while to realize that in “SoC + low-level-SW” verification, the primary focus should be on reaching corner cases only within some specified “use cases”. Many such realizations probably await on the road to (3), (4) and (5).

On the other hand, there are clearly many things which apply to all examples in the range, e.g.:

They all have more combinations than you could really test in a lifetime
Interactions of basic inputs could result in unexpected consequences (and if you assume that this does not happen in hardware because the interfaces are well-defined, then I can only smile condescendingly).
So you need a test / coverage plan, and some tradeoff analysis
And as part of that you need to clearly define your Device Under Test (DUT) [2] and the Verification Environment (VE) to be modeled / simulated around it. And you may have several of those in a single verification project.
And experience from your initial testing is bound to make you modify your test / coverage plan, usually causing your current test / coverage score to actually go down

And so on.

So anyway, putting the differences and commonalities in this “verification range” in some reasonable conceptual framework is an interesting exercise. Here are some further thoughts on that, just to get your creative juices going.

A few important (and related) questions are:

What is the DUT here?
Is the interface between the DUT and VE well-defined?
To what “radius” should we extend the VE?

For instance, in (1) the interface to the module (inputs and outputs) is normally completely specified, and the expectation is that this interface is all you need to know.

In (2), the interface (often in C code) is well-specified for some areas and not for others (and part of the verification process is understanding the boundaries).

In (3) and above, the input is the whole crazy word, including “goat-on-the-highway”. The expected output is also sometimes not completely determined: Somebody has to thing about the goat-on-the-highway and then supply a generic answer for all animals. And then you need to ask him about highway-bridge-destroyed, and so on.

In (4) and especially (5), one first has to think hard about “what is the DUT here”. You may decide that that the DUT is the change you are planning to engineer (e.g. “build a bridge to Malmo”). And then, the rest of the world is the VE.

Notes

[1] When I say “some”, don’t assume I mean “in the high thousands”. Reminds me of when I asked my cousin Doron about the popularity of his local choir, and he said “we were inundated with a postcard”.

[2] I know, you call it SUT, or DUV, or some other name. And you are of course right. Let’s call it DUT for now and get back to the topic at hand. I owe you one.

8 thoughts on “On that “verification range” issue”

Ziv Binyamini says:

July 6, 2015 at 12:35 pm

I am intrigued by the question of these unexpected basic events. How many are there? How different are these numbers of “basic events” between the various systems you are trying to address. Lets take the “goat on the highway”. How about “white goat with blue dots on the highway?”. Well – we don’t care about the color of the goat. Actually we don’t care about which animal it is. Actually we don’t care which kind of object it is moving on the highway. So when you abstract the event sufficiently to a basic event of “a slow object moving on the highway”. How many such basic abstract events are there to test? If the answers is few orders of magnitude than we are in the same space as HW SoC. If the answer is millions of such abstract events than its a different scale all together. My intuition is if the answer is millions than somebody did not do a good enough job abstracting them.
Ziv

Loading...

Reply
yoavhollander says:

July 6, 2015 at 1:01 pm

Good questions, all.

One good discussion of this topic is here: https://www.cs.york.ac.uk/ftpdir/reports/2015/YCS/496/YCS-2015-496.pdf . BTW, these are pretty good dudes to follow on related topics (Rob Alexander et al. from University of York).

Yoav

Loading...

Reply
Yaron Kashai says:

July 8, 2015 at 8:37 pm

Thought provoking – it’s not often that hospitals and mass-transport systems are considered on the same continuum as hardware design. You point out some common characteristics of the 5 classes and some distinct properties, but somehow the last 3 seem to lump together. Can you characterize what makes these distinct? Is it just “huge, huger hugest”?

Loading...

Reply
Yoav Hollander says:

July 9, 2015 at 7:14 pm

Hi Yaron

Still thinking about that.

Obviously, while all are “systems of systems”, an AV/UAV/robot is more of a unified system, designed by one primary contractor. Yes, there are lots of other organizations in the loop, but at least there is one organization responsible for reviewing specs and testing so that everything coheres.

A city normally has no single designer (and in fact, it is not an engineered artifact in the normal sense). And every engineered artifact within it (e.g. a new bridge) is “a patch added to the current whole”.

Still, not sure what difference this all makes, verification-wise.

Yoav

Loading...

Reply
Alon Shacham says:

July 12, 2015 at 1:16 pm

Hi Yoav,

My 2c, especially regarding the AV industry situation as described by you:
A. The gap between formal verification and the current mainly DIL verification seems to be enormous, and there is plenty of room for improvement by human guided CDV (PIL? for Person, not Driver, In the Loop). Two main types:
1. Use Case scenarios: where the human describes some normal use cases and CDV generates variations on those. The main advantages of CDV compared to DIL are low cost, once the system is set up, to find first problems, and volume of testing made possible with CDV. Incremental advantage over time with an increased volume and variations of reusable scenarios.
2. Extreme scenarios: where the human describes some extreme cases that are costly or to risky to test with a human driver.
Finding extreme scenarios may be helped by some kind of formal verification. E.g. use formal verification to see what can cause the brakes to lock, and use CDV to check how the AV reacts or how it prevents these conditions.

B. Google seems to be well positioned to take advantage of CDV, and it reminds me of Intel/National-Semi. of long ago. They feel the need, control most of their environment, and understand software – what it can do and its advantages. The rest of the automobile industry needs somebody to show them.

C. You may be interested in MirabilisDesign (http://www.mirabilisdesign.com).

Alon

Loading...

Reply
Yoav Hollander says:

July 12, 2015 at 3:42 pm

Hi Alon

About your point A: I very much agree with both (1) and (2). BTW, as an example of a current product which may (in the longer term) be relevant for both, take a look at Perspec by Cadence (http://www.cadence.com/products/sd/perspec_system_verifier/ – I still do a bit of consulting to that product’s team).

Perspec is currently used mainly for “SoC + low level SW” verification, but for your point (1) it has the ability to define use cases and do constraint-based randomization within them, and for your point (2) it has the formal-like capability, given some state S, of creating “all” scenarios leading to S (it uses a SAT solver for that).

So that could be part of the “right” toolbox, but still a lot to conceptualize here – so further comments are very welcome.

Yoav

Loading...

Reply
Naftaly Minsky says:

July 12, 2015 at 7:25 pm

I have two, related comments to make on this very thought provoking blog.

First, I think that it is a mistake to attempt to define a system of levels 4
and 5 and beyond as having a specifiedspecific function to perform,
with some input and output speciication.
For example, what is the function of a hospital?
Yes, one can say “treating patients”,
but this is way under defined and too fuzzy fuzzy to be useful.

I suggest that such a system can be viewed as a kind of social system,
like a country or a corporation, which one defines
by: (a) its composition, (b) a broad set of goals, and (c) the laws that
circomscribes the interaction between its actors.

Second, I do not believe that it is possible to verify a complex system—such
as of type 3 and above, unless such a system is built to facilitate
verification. This can mean various things, like strong typing, and the use of
functional languages, in software systems.
And, in a complex distributed system, one need to
impose some constraints—laws, again—over the interaction between the
various system-actors. Among other things, such laws can render a system
modular, and thus easier to reason about. Note that the existence of such
laws is what enable society to function, and which makes it possible to reason
about the physical world.

I have been doing this for software systems for a while,
and am hoping to do so for socio-technical systems, as well.
But how exactly such laws would effect the verification of such system, I do
not know.

Loading...

Reply
Yoav Hollander says:

July 13, 2015 at 6:27 pm

Hi Naftaly

Thanks fore those comments – I think both open important discussions. I’ll talk
just about the second one now:

“Our” job as verification engineers is to verify the design, whatever it is.
But clearly, some architectures are much easier to verify.

Let’s look at hardware verification: This is an area where the cost of failure
is huge, which is why it was a good early market for verification
technologies. But one thing which made verification easier there is
that the design side has achieved an early, and agreed-upon, separation
of concerns: When I verify that my floating point algorithm is correct,
I don’t have to think about what happens if somebody pulls the plug
and my chip has no electricity: Floating point logic is part of digital
design, and voltage regulation is somebody else’s problem, and the
“contract” between these groups is fairly clear (ignoring analog
design for now). So each group can design (and verify) against a much
narrower interface.

In new systems like AVs, the interfaces are not always clear. Note that
it is not enough to just “declare” clean interfaces. Actual bugs will not
necessarily respect those interfaces, and you’ll have a spec bug on
your hand. And BTW, most of the bugs which kill people are spec bugs,
not implementation-of-the-spec bugs.

All of this is a long-winded way of saying “I agree, a design with clear laws
and interfaces will make verification much easier, and even then verification
can be a big challenge”.

Yoav

Loading...

Reply

	Coverage-driven alig… on It’s the spec bugs that kill y…
	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)