Terminology used in this blog

Summary: This page contains the terminology used in this blog, and tries to bridge between terminologies used by different groups dealing with verification.

One problem with a blog which tries to bridge multiple verification communities is that of terminology. A previous post discussed the various verification tribes (v-tribes), and the sometimes amusing results of them using similar (but not identical) language. I suggest you read it before scanning this long list – I’ll just wait here.

So I decided to create (and keep updating) this terminology page. Notes:

This is an opinionated terminology page, in the sense that I personally come from the HW v-tribe (though I have read a lot of what other v-tribes wrote, and even dared go to some of their meetings). So this is my slant on some of those terms. As Abe Lincoln used to say, “When in doubt, consult Wikipedia”.
It may also be shallow, since I am taking a whole area and viewing it from the perspective of “how it relates to the kind of verification I know”. This can be annoying – I know how I would feel if somebody said of my beloved CDV “Oh, so they randomize numbers for testing. Sure, I once wrote a 10-line Perl script to do that”.
I am not suggesting there is “one, true meaning” to terminology, or even that people commenting on this blog should adhere to specific rules. Just please be aware of this terminology when writing, if you want to be better understood. Also, there are other good sources for verification terminology – I’ll try to include some pointers in a subsequent post.
The list is alphabetical, but I have grouped similar / related terms under the corresponding “main” topic. So e.g. SUT is grouped under DUT, rather than having its own entry under the letter S. Please consider use “search”.
If you want to add / correct entries in this list, please add a reply to this page. As I add/update terms, I’ll keep a change log at the bottom.

AV (Autonomous Vehicle)

AVs are covered at length in this post (and its sequel), because they are an interesting target for SVF to verify. AVs are also called SDCs (Self Driving Cars) and Robocars.

BTW, the companies which actually sell you cars (Mercedes, Ford etc.) are called in this industry OEMs (Original Equipment Manufacturers), because they OEM the technology coming out of tier-1 companies (Bosch, Magna etc.). This chunk of terminology can be pretty confusing to people outside the automotive industry.

Also, ECU (Electronic Control Unit) is any independent piece of electronics in the car. Much of automotive verification has to do with verifying ECUs.

BDI (Belief Desire Intention)

BDI is a common framework for modeling “intelligent agents” – see Wikipedia. One of its main uses is to actually program SW agents, but in the context of verification we are more interested in BDI as simple model of how humans (and human-like agents, like corporations and committees) make decisions and behave.

A BDI-style notation would be pretty useful for SVF: You would model each such agent (e.g. your simulated pilot, your simulated air traffic controller) as an object with a list of beliefs, desires and intentions.

Each agent then has a loop of “update beliefs, update plan to achieve desires, repeat”, and you can insert random errors (in new beliefs, in choosing the plan, etc.).

BDI agents can also be used for rough, initial modeling of things like AVs, UAVs or a plane’s auto-pilot system – it is sometimes useful to treat those as if they generate plans and execute them.

There are various BDI packages out there (e.g. Jason), but my view is that this should be just one of several notations. Specifically, I suggest resisting the temptation of treating everything as an intelligent agent. Plain old OO simulation is simpler, and is usually enough.

Bug

I use “bug” in the general sense of “an error in an engineered system which causes it to behave in an unintended way”. Verification is all about finding bugs.

A related term is area of concern, i.e. the behavior of an engineered system (in some subset of the scenario / parameter space) which is unexpected / surprising in some way.

A single bug can cause many failures in many runs during a regression. The (manual or automated) process of trying to cluster (organize) the failures such that each cluster corresponds to one bug is called failure clustering.

CDV (Coverage Driven Verification)

This verification technique, very popular in HW verification, is loosely explained here.

It usually involves random generation of inputs (often using constraints), driving them into (some representation of) the DUT, collecting coverage and doing checking. For big systems, writing the checkers (as reference models / oracles, assertions etc.) is often the hardest part.

CRV (Constrained Random Verification) is the name some people use for the variant of CDV which insists on using constraints for producing random values.

MDV (Metric Driven Verification) is similar, but with an emphasis on collecting metrics like coverage and bugs-per-week.

Correct by construction

See here.

Coverage

Coverage is a measure of how much of your DVE have been covered (reached) via the testing done so far.

The main two kinds are:

Implementation coverage (also called code coverage ), i.e. which source constructs (e.g. code blocks) have been exercised so far
Functional coverage, i.e. which user-defined DVE-related attributes (and crosses of attributes) have been reached so far

There are many specific measures of coverage. Also, it is well known that implementation coverage is not enough – you can reach 100% implementation coverage and still have a whole major piece of the functionality missing.

Distribution

Often when you simulate some model, that model will have variables (parameters, knobs) which can be set to many possible values. In the simplest case, this would be specified by a range (x must be in [0..20]), but in CDV (and certainly in CRV) it will be specified via a set of constraints (x in [0..20]; x < y).

Assigning (randomizing) a value for such a variable is done using some (explicit or implicit) distribution. You can use the expected distribution, e.g. according to the bell curve of the values you think this variable will have in real life (or in one typical subset of real life). This is what typically gets done in Monte-Carlo simulations.

Alternatively, you can use various bug-finding distributions, i.e. distributions which you (or some algorithm) expect will make bugs appear more often (e.g. by going to corner cases). This is what is usually done in CDV.

For instance, when testing a calculator, the expected distribution is probably mostly small numbers, then some bigger, etc.. A bug-finding distribution will put a much higher emphasis on MIN_INT, MAX_INT, MAX_INT – 1, adding two numbers so that they overflow or almost-overflow and so on.

DUT (Device Under Test)

This is the term used for “the thing we test” by most of the HW v-tribe. But some people use UUT (Unit Under Test), EUT (Equipment Under Test), SUT (System/Software Under Test) and so on.

Then there are the people who would rather use “verification” rather than “test” (correctly, in fact), so they say DUV, SUV etc.. But they really mean the same thing.

I’ll be using DUT (mainly out of habit). Here is an example diagram of a complex DUT+VE.

DUTs are usually nested (i.e. most verification is done on subsystems), but there are also top-level DUTs swimming in the soup of systems – see this post.

In AV-land:

The DUT is often called here VUT (for Vehicle Under Test), but many people call it the “Ego” or “Ego vehicle”. I think this started out as a mainly-European thing, but is now spreading. I got raised eyebrows once in the US when I said “and in this scenario the ego has to decide whether to turn left or right” (as in “I thought you were talking technology, not psychology”). It is also simply called “the AV” or “the SDC” (or “host” or even “hero”).

There is no fixed terminology to describe the agents around the VUT (which we control so as to cause a scenario to happen). Some call the other cars “targets” (this may be a residue of simpler times when most testing was “don’t hit that single car in your lane”, or may be military-speak for “the target we are sensing with our Radar”). Other people call them “NPCs” (Non-Player Characters – the computer-gaming term for the characters in the game which are not controlled by a human player). We use “NPCs” or “agents” internally.

Execution platform

An execution platform is where your DVE (DUT + VE) will actually run. A typical example is a simulator.

For instance, say you are designing a new anti-skid box for a car. The DUT is the anti-skid box (HW+SW), and the VE is the rest of the car, the road on which we travel, the behavior of the driver, weather and so on.

Each component of this DVE could be executed in a different way, thus leading to multiple execution platform configurations.

We could start, perhaps, with all components being models
Replace the model of the anti-skid SW with the “real thing”, i.e. the actual compiled C code (running on top of a model of the anti-skid HW). This is called SW-In-the-Loop by some people (the big-system v-tribe seems to like the “*-In-the-Loop” terminology).
Replace the model of the anti-skid HW with a simulation of the actual HW source code (e.g. in Verilog). This is called HW simulation.
Replace the model of the road by a recording of a drive over a real road. You can still have simulated weather and simulated disturbances on top of that.
Like (3), but use emulation rather than simulation. Emulation (for the HW v-tribe) means using a special, expensive box which simulates the HW 1000x (or more) faster than SW simulation.
Replace the whole anti-skid box (SW included) with the actual, physical box. This is Hardware-In-the-Loop (HIL)
Replace the whole simulated car with a real car, driving inside some driving contraption.
Replace the simulated driver by a real human driver. This is Human-In-the-Loop or Driver-In-the-Loop or Person-In-the-Loop.
Finally, go for a real test with a real driver on a real road – no simulation at all.

Here are some comments on the above list:

Automotive people call anything except (8) and (9) virtual testing. CDV, then, is mainly about virtual testing (but see comments under virtual testing).
Pure-SW simulations like (1) to (4) are the most flexible: They can be run any number of times on many machines, can run at any speed (faster or slower than real life, as needed), and are normally completely repeatable
(5) is usually repeatable. For CDV, you may need a whole “emulator farm”.
(6) is often not completely repeatable. For CDV, you may need to buy a whole “HIL farm”. Also, this often needs to run at “real life” speed, and thus the other components may need to be faked in various ways so they run “fast enough”.
(7) and above are very expensive, and not repeatable.

Fuzzing

Fuzzing, or fuzz testing, is a technique for finding bugs (mainly security vulnerabilities) in SW (or HW). It involves bombarding the SW with random, almost-legal inputs, with the hope of causing it to crash in various ways, or otherwise do something which exposes a vulnerability which the bad guys can use.

As the Wikipedia page explains, there are two main kinds: mutation-based fuzzing (where you take legal inputs and mutate them in various interesting ways), andgeneration-based fuzzing (where you build almost-legal inputs from some declarative description, e.g. a grammar). For more information, see here.

HIL (Hardware-In-the-Loop)

HIL refers to running the DUT (or part of it) as an actual, physical HW box, while the rest of the DVE still runs in simulation. See, for instance, configuration (6) under execution platform.

Actually interfacing a piece of real hardware to a simulation engine is somewhat tricky, but (because this so useful) it has been done many times.

In general, big-system verification people love this *-in-the-loop notation, so beware that HIL can also be Human-In-the-Loop. Also, when reading about HIL (e.g. in Wikipedia) you will often see them referring to the HIL component connected to a plant. Don’t worry – this is has nothing to do with the planting season – “plant” is a control-theory term, probably used for excellent historical reasons. In our skid-control example, the “plant” is the actual brake mechanics controlled by the skid-control unit.

ICE (In Circuit Emulation / Emulator)

This is another term well-calculated to confuse you. It has two different (but slightly-related) meanings.

For embedded SW verification folks, ICE means a device which helps debugging SW. It uses a special version of the actual, HW CPU, which has some special way (e.g. JTAG) to get debug information to the ICE debug SW. The Wikipedia entry summarizes this well.

For HW verification folks, ICE simply means using a HW emulator (i.e. an expensive machine which can emulate your chip much faster than simulator) in its full-speed mode (because it also has another mode, called acceleration). See e.g. here.

Model

A model is “an abstract description of something, which is good enough for the current purpose”. Models are useful for dynamic verification, formal verification and (obviously) for non-verification purposes.

In the context of dynamic verification, a model is some abstract description of some component of the DVE, to be simulated over time. This could be a model of the weather, a BDI – style model of human behavior, a state-machine model of some of the SW, and so on.

A specific execution platform configuration often consists of models for some of the components, and some variant of “the real thing” for other components.

Some models are declarative and some procedural, and there is a huge set of proposed notations for modeling.

Models also come at different resolutions: It is very useful to be able to run a specific component at low resolution (e.g. as a very abstract system dynamics model), and later swap it for a higher resolution, more detailed model (e.g. an object-oriented model).

Obviously, verifying a model of a DUT does not guarantee that the actual DUT is also verified (unless that model is automatically extracted from the DUT or compiled to produce the DUT, but see discussion of correct by construction).

SVF should support an extensible set of modeling notations. It should also support swapping a component’s model by a model with a different resolution, or by “the real thing”.

And one last potential confusion: models are obviously used in verification (since much of the DVE consists of models). However, model verification is something else: This term means “verifying that a model indeed represents well the relevant aspects of the thing being modeled” – an important activity which is distinct from “using models for verification”. Though of course if you are verifying your AV using some model of how tires behave on snow, you would hope that model has been verified.

MBT (Model Based Testing)

You would think that MBT simply means “doing testing via simulation, where some of the simulation is generated from models”.

You would be wrong, though. When people say MBT, they really mean it in a narrower sense, of guiding the test generation according to some (usually formal) model.

For instance, suppose some of your robot is modeled as a state machine. Just simulating your robot (including that state machine) is not considered MBT. But using that state machine description to automatically create a sequence of inputs which will take the robot to state S3 is considered MBT, and so is the act of extracting all possible state transitions into some coverage definition (to be later checked against actually-achieved coverage).

MBT is actually pretty important for big-system testing (and hence for SVF): When testing big systems, one needs (in addition to “plain” CDV) a way to define interesting use-cases / scenarios, and mix them in various ways. Many MBT notations have been invented to do just that.

Monte-Carlo simulation

For the big-system v-tribe this mean roughly “simulation with a random component”. “OK, so they mean something like CDV” – says a HW v-tribe member. Well, not quite: for the big-system v-tribe “Monte-Carlo” has the connotation of investigating what happens typically, i.e. when you randomize using expected frequencies. CDV is more about finding bugs by going to corner cases. Note that the Wikipedia entry will not help here: it is of correct, but too general.

NCAP (New Car Assessment Program)

There are several NCAP programmes for assessing cars, the most famous of which is Euro-NCAP. Among other things, it publishes safety reports on new cars, and awards ‘star ratings’ based on the performance of the vehicles in a variety of crash tests. It thus influences how cars are verified.

OO (Object-Oriented) simulation

This is probably what you assume when you think about normal, general simulation: There are a bunch of objects which get created over time, live, then die. Each has a behavior and some interaction with other objects. There is some scheduling mechanism (typically an event queue) to see what happens next within each existing object.

Discrete event simulation can be viewed as just an optimized variant of this, where all objects (and their connections) are fixed.

When people say “Agent-based simulation”, they mean either object-oriented simulation as above, or “Intelligent-agent simulation”, where each agent has a BDI loop or some such.

Run

In CDV (and more generally) the concepts of a DVE, test, run, seed, regression and repeatability are all related:

A DVE (DUT+VE) is the thing we are simulating / investigating.
A test (or “test specification”) is a specification of the DVE we want to run, plus some further test directives (e.g. which of all named scenarios of that DVE we would like to run today, which execution platform we want to run it on etc.).
A run is a single execution of a test on an execution platform (e.g. simulator) using a specific runtime seed. The results of the run include some logs, some collected coverage, and (often) some failures / issues.
The set of all runs executed for a particular DVE during a specific period (night, weekend) is often called (somewhat perversely) a regression. There are usually reporting tools to see the aggregate coverage / failures of that regression.
Ideally, a run is completely repeatable given the test and the runtime seed.

Repeatability

Given the same test and runtime seed, ideally a run is completely repeatable, i.e. everything will happen exactly the same, and at the same (simulated) HW-clock tick. This is obviously very important when you have a lot of failures to debug: Both human debugging and automatic debugging techniques are so much easier when the failure always happens (and at exactly the same point) in a specific run.

Any “reasonable” single-process simulator should be repeatable (and most are able to emulate tasks/processes in a repeatable way). However, once we go to true multi-process, multi-machine execution platforms, this is much harder.

A related (and less well known) concept is random stability. While repeatability means “same seed with same DVE+test should run the same”, random stability means “same seed with almost identical DVE+test should generate almost identical input streams”.

Specifically, adding a new feature X to the simulation or modifying some attribute of Y should not change the stream of random values generated for Z, assuming that Z is orthogonal to both X and Y. For instance, say you change your anti-skid test slightly, modifying the stream of user behaviors. This should not change the stream of weather behaviors (but it would, unless some precaution is taken, because they all advance the “current random number” of the same random number generator).

Simulator

For HW people (and many others) a simulator is typically a SW tool used to run your DVE, In fact, it is probably the most popular execution platform for CDV.

There is a bewildering array of simulator-related terminology. For instance, a simulator which can combine analog (continuous) behavior and discrete behavior is called a hybrid simulator by most people, but a mixed-signal simulator by the HW community.

Most people (but not all) seem to be OK with calling something a simulator even if it contains pieces of the “real thing”, as in HW-in-The-Loop, so this blog will also allow it.

For the automotive/aerospace industry, a simulator (also) means something very different: an expensive HW + SW setup used for training people, as in “flight simulator”.

Note that sometimes one could use significant pieces of e.g. a flight simulator as an execution platform for CDV, but this is rarely done.

Socio-technical systems

Socio-technical systems are big systems consisting of HW, SW, human procedures, and so on. Examples are airports, hospitals and so on.

The term cyber-physical systems seems to mean something similar (but with less emphasis on the human procedures part).

The term system of systems (SoS) seems to be used to describe any system which has mostly-independent subsystems. Thus, every socio-technical (or cyber-physical) system is also a system-of-systems, but not vice versa.

This blog uses the term big systems for all of the above, including AVs, UAVs, robots and so on. SFV is meant to be a tool for verifying all of the above.

Spec (specification)

Engineered systems normally have specs (and requirements – the line between the two is somewhat fuzzy). Verification is normally done with respect to specs. Specs can also have bugs in them – this whole post deals with that topic, and talks about spec-related tools.

System Dynamics simulation

System Dynamics is a way to model various feeback loops via continuous simulation – see Wikipedia.

System Dynamics is actually a pretty neat way to model some verification-related things. See, for instance, this paper by Nancy Leveson et al., where they explain the organizational processes which contributed to the Columbia shuttle loss: E.g. performance pressure leads to higher launch rate, which leads (for a while) to higher success rate, which eventually leads to complacency, which eventually leads to budget cuts in safety.

Because this is such a useful modeling abstraction, and because it can be supported via a fairly simple notation, I think it should be one of the notations supported by SVF.

For finer resolution and a more detailed view, you may sometimes want to switch from a System Dynamics simulation to the corresponding object-oriented simulation. For instance, you may want to take the above Columbia shuttle simulation and model it in more details as a community of interacting NASA employees with various attributes, launch events, etc..

SVF (System Verification Framework)

This is the name I gave to the framework I would like to see (and that this blog is about): the Holistic, Dynamic-Simulation-based, Constrained-Random, Big-System Verification Framework, alias SVF (I settled on this acronym after I got friendly hints that HDSCRBSVF, my original favorite, was somewhat less catchy than I assumed).

By “big systems” I mean items 3..5 in the figure in this post. This is a big range, including AVs, UAVs, robots, airports, hospitals, transportation systems and so on. See also Socio-technical systems.

For some of the requirements from SVF see the last chapter of this post.

UAV (Unmanned Aerial Vehicle)

Also called drone. An interesting target for SVF to verify.

VE (Verification Environment)

To do verification, you usually have to construct a Verification Environment around your DUT (Device Under Test). That VE consists of various verification artifacts, such as constraints, coverage definitions, monitors and so on.

A DVE (Design and Verification Environment) is simply the DUT + VE. It is executed on some execution platform.

The term testbench usually means “the VE” but sometimes means “the execution platform”.

Here is an example diagram of a complex DUT+VE.

Finally, people (including myself) use the term “environment” somewhat loosely to mean “the context in which the DUT operates”. A specific DUT may have several VEs (and thus several DVEs), each extending to a different radius around the DUT, and thus including a different chunk of the “total” environment. For instance, when testing your anti-skid HW, the VE can be just the neighboring components, or the whole car, or the car + road model + driver model, and so on.

Verification

By “verification” I mean here roughly “the process of finding bugs (or unintended consequences) in engineered systems before they annoy a customer / destroy a project / kill somebody”. This includes both formal verification and dynamic verification (i.e. actually running a test).

This is the common interpretation in the HW v-tribe, but some other people assume “verification” means only “formal verification”, and call dynamic verification “testing” – see this post. The Wikipedia article for SW verification more-or-less uses the terminology I like.

CDV is one common (and pretty successful, in the HW domain) variant of dynamic verification.

To do verification, you usually have to construct a VE (Verification Environment) around your DUT (Device Under Test).

Verification (“Are you building the thing right”) is often contrasted with validation (“Are you building the right thing”), but see here and here.

Virtual testing

Virtual testing is the name given by various big-system v-tribes to any kind of verification not involving “the real thing”. For instance, automotive people seem to use this term for anything not involving Human-in-The-Loop, i.e. all my execution platform configuration examples except for (8) and (9).

Note that human-in-the-loop is great for testing human reactions and “feel”, but is not strictly part of CDV (because you cannot rerun it a huge number of times). However, because it is the most realistic, it can be used as a “reality check” for CDV runs. For instance, CDV lets you decide which critical tests you also want to run with Human-in-the-loop.

v-tribe (verification tribe)

V-tribes is the name I gave to the various groups (tribes) involved in verification, e.g. the HW v-tribe, the embedded v-tribe and so on. See here.

Notes

16-Nov-2015: Added ICE, Fuzzing

5-June-2016: Added NCAP

One thought on “Terminology used in this blog”

Andy Piziali says:

August 25, 2015 at 2:37 pm

Regarding your definition of “bug,” you write: I use “bug” in the general sense of “an error in an engineered system which causes it to behave in an unintended way.” This appears to be a near circular definition because if a “bug” is this kind of “error,” i.e. one that causes unintended behavior, what other kinds of errors are there? If the definition was changed to “An unintended behavior in an engineered system,” it would discriminate one class of behaviors–intended behaviors–from this class of interest–unintended behaviors.

While verifying (using the HW meaning of the word) a mainframe RAS system several decades ago, we distinguished undetected, unintended behaviors from detected, unintended behaviors by referring the former as “errors” and the latter as “faults.” This discrimination may prove useful in these blogs.

Loading...

	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…
	Aman on DeepXplore and new ideas for v…

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)