What’s new in AV verification: Stuttgart report part two

Summary: This is part two of my report about what I saw at the Stuttgart 2017 Autonomous Vehicles test & development symposium. It covers frameworks, simulators, scenario definitions and extracting scenarios from recordings.

As I promised in part one, here is the rest of my trip report from that yearly symposium. It will cover the main changes I saw relative to previous years:

  • Simulation is now the accepted mode for finding most bugs, by just about everyone
  • Everybody talks about scenarios, scenario libraries, and running lots of random combinations
  • Very initial “frameworks” for handling simulations, scenarios and execution platforms are starting to appear
  • Sensor simulators and sensor modeling are improving and getting a lot of attention
  • There is more work on (semi-) automated analysis / labeling of recorded traffic, for both ML training and interesting-scenario extraction

As background, please take a look at the post Dynamic verification in one picture: It explains  concepts I’ll use below, like verification frameworks, scenario libraries, execution platforms, simulator configurations and so on.

Back already? OK, here we go. What follows are my unofficial impressions – comments are very welcome. Much of this is based on hallway conversations, but (because I’d rather not quote people without asking them first), I’ll resort to “some people said …”.


 There was quite a bit of discussion about the need for verification frameworks, with all the various components as described in that post. Here is that picture again for reference:


The first keynoter, Reiner Friedrich of BMW, essentially said they did not find any, so they had to build their own. This seems true of other AV OEMs (manufacturers) I talked to.

There are also some attempts at doing multi-stakeholder frameworks. The Pegasus project is a German consortium for “securing automated driving effectively”, and they seem to do a lot of interesting stuff. There is also the related, Pan-European Enable-S3 project, which has an automotive domain and which is coordinated with Pegasus.

Andrea Leitner of AVL gave an interesting talk about what Enable-S3 is up to, and why they are looking at a multi-execution-platform, scenario-based, extensive verification framework. Here are some of her bullets about the challenges:

  • No or not enough units under tests available for real life testing
  • Real life test can be too dangerous for humans
  • Realistic tests in pure simulation environments are often not possible
  • Certification of automated vehicles is unclear

And here are her bullets about what they hope to achieve in the project (to counter those challenges):

  • Scenario-based V&V in virtual, semi-virtual, and real testing environments
  • Environment models, sensor models, and sensor stimuli
  • Extraction of test scenarios (e.g. vehicle road data, accident data, etc.)
  • Coverage-oriented test selection methods
  • Integrated safety and security analysis approaches
  • Draft-standards for test scenario descriptions and interfaces
  • Reproducibility of scenarios (repeated execution as well as reproducibility of real world scenario in simulation)

These are early days, but the frameworks are coming.


Simulations are now pretty much the accepted way to find most AV bugs (though clearly they are not enough). There are quite a few simulator providers, and most also have tools for scenario composition (see below): For instance there are Tass PreScan, rFpro, dSPACE, Vires, Webots, IPG and so on. There are also free / cheap simulators based on Unity (e.g. Synthia).

As I explained in that “one picture” post, simulators vary greatly: Even assuming just Software-in-the-Loop (SIL) simulations, there may be multiple configurations of models (and related simulators), representing various points on the fidelity-vs-performance curve. Also, some running configurations may include detailed simulation of the various sensors, while others may choose to skip that altogether (see this post on the options for AV sensor simulations, and how they relate to various execution platforms).

The Siemens angle: An interesting presentation by Enguerrand Prioux of Siemens made the point that you often need a mix of simulators (even in a single run), and that the mix depends on the verification task at hand.  As he said:

  • Since the main validation will be virtual, more fidelity and physics will be required in the simulation models.
  • The Highly and Full Automated Driving (SAE levels 4 and 5) agents validation will need to be done through massive simulation.
  • A covering and minimal scenarios set, necessary for virtual Validation & Verification (V&V) needs to be built.

It seems like Siemens PLM is trying to position itself as the one-stop-shop for many AV verification needs, including being the reliable source for choosing and integrating (mostly other vendors’) simulators.

 Will slow, repeatable SIL find most bugs? My intuition is that while you need all those execution platforms, the platform for finding most bugs will be “slow, repeatable SIL”. It had better be repeatable, otherwise debug / regression / etc. will be extremely inefficient (see e.g. this post). And it will often be slow because (1) the easiest way to achieve repeatability is to fake parallelism inside a single thread, (2) physics and sensor simulations are often slow, and (3) unlike HIL simulations, it does not have to run in real time,

One guy (with lot of experience in system integration) disagreed with this contention: His experience is that most bugs are found in real-time HIL setups (even though they are either non-repeatable or extremely expensive to make almost-repeatable). He gave the example where SIL did not find a timing-related CPU cache bug, which was only found in HIL / real life (where the actual CPU and cache were used).

But I remain (tentatively) unconvinced: I mean, we all know the tradeoffs: Some bugs will only be caught in “the real HW”, and those will clearly be much more expensive to debug and fix. What is contentious is just how much you can achieve in slow, repeatable SIL, and how. For instance, when I hear people say “Yes, we wanted repeatability, but it was too hard to achieve”, I immediately suspect that they don’t know how much value they could have gotten by using that execution platform fully: If they did, they might have tried harder. Sometimes repeatability is too hard to achieve, but that’s a major (verification-system) design decision.

Sensor simulation: Simulating the various sensors (cameras, Radar, LiDAR etc.), and creating realistic input streams for them, are both extremely hard, as I explained here. But this is clearly important stuff, so lots of people are hard at work on it, and progress is being made (though there seem to be no simple, clean solution).

For instance, Andreas Höfer of IPG gave a presentation on their new Radar simulation solution: Radars are really though (for instance, you have to deal with all those reflections), and non-generic (your own Radar design may be subtly different). So they give you three levels of solutions (for both the sensors and the stream generation):

  • To see if the algorithms work, they give you something like the “object shortcut” I described in the above post (completely bypassing the sensors)
  • To see if the function generally works, they give you a technology-specific function interface (e.g. with the physical phenomena related to Radar)
  • Finally, to see if the sensor component works, they give you the raw signal interface

At that last level, there are lots of options (depending on the level of realism you need) and a lot of hooks for you to do your own modeling on top of theirs.

Scenario definitions

How should one describe AV-related scenarios? I discussed before why it may be a good idea to create a flexible scenario definition language which is portable across vendors and execution platforms.

Many of the above-mentioned simulator vendors have their own way to describe scenarios. Those are often GUI-based, but may have some hooks so you can script it e.g. in Python.

For instance, dSPACE (the king of HIL simulations) have a nice sequence diagram editor for composing scenarios: You drag in a “lifeline” (per-object vertical line) for each of the participants (the tested AV, other vehicles, people etc.). You then define a route for each, and synchronize them (e.g. “wait until the DUT turns, then perform this step”).

OpenScenario: There are also some initial attempts at vendor-agnostics way to describe scenarios. OpenScenario is one: It lets you define scenarios with quite a few parameters (see the example scenarios on that page).

OpenScenario is at an early stage, and some people I talked to thought it might not make it. One OEM representative told me “We are also looking at OpenScenario: We hope it will not be too complex”. But the need is clearly there, and this is a step in the right direction.

Representing scenarios: There is still no clear view about how to represent scenarios. One of the open questions is the level of abstraction and generality at which scenarios should be written: Some people talk about “a database of scenarios” in a way that implies a big list of very specific, almost directed scenarios.

Others (like myself) see a scenario description language which lets you describe very generic scenarios, which you can then combine and instantiate in multiple ways (say using constraints). Hopefully, this description will also guide the extraction of scenarios from recordings (see below).

And then there is the issue of very complex scenarios, and how to parameterize those: Everybody starts with simple examples (like lane change and overtaking): These already have a huge number of variants (if you take into account the weather, speeds and so on). But now consider “going through a packed city junction when the traffic lights are down and people are also trying to cross”: This is clearly at a different level, and yet one still needs to parameterize it and simulate the various variants.

Extracting scenarios from recordings

The idea of extracting interesting scenarios from e.g. video recordings of traffic came up several times. This seemingly low-level and technical topic turned out to be full of variety and passion:

Using recordings to build a scenario library: Sytze Kalisvaart of TNO gave a presentation titled “StreetWise scenario mining for virtual testing”. Here is his slide describing Streetwise in the context of related projects:


He explained the how and why of scenario mining: They basically take many recordings (video, etc.) of actual traffic, do sensor fusion to turn them into “objects + movements”, and then convert that into parameterized scenarios (this is the hard part). As you might expect, this involves lots of Machine Learning – see this post about the possible roles of ML in AV verification.

While doing that, they compute the joint distribution of everything within the scenario. This is good for checking, for any new recording, whether it is “new and different”. It is also good for efficient finding of what I called “expected bugs” here.

They are optimistic about the role of these techniques for mining the kind of complex scenarios as mentioned above.

Semi-automated annotation of recorded data: Matthias Zobel of CMORE IDS talked about the related problem of annotating recorded data (mainly for training of ML-based systems, but also for verification). Annotating here means “adding the ground truth” (identifying objects, adding bounding boxes, marking which pixels belong to which object etc.). This is a somewhat-different use case, but they are also use ML heavily.

You may feel uncomfortable about using one ML system to label things, and then using those labels directly to train another ML system. But they have a good answer to that: For one thing, they have people going over the results of the offline ML system, fixing them and feeding that back to that offline system. Also, the offline system can do a better job because it can be more expensive (and slower) than the one that goes into an actual car.

Using the internal AV recordings: OEMs will probably need a streamlined flow for turning the internal recordings (logs) of an AV into a scenario: After every accident (or even a tiny incident like an internal assertion firing), they will want to simulate the corresponding scenario, and ideally to generalize it. Brad Templeton (who also blogged about the symposium) reminds me that:

In the USA, current regulations (which may be revoked) require all teams to publish all incidents.  As such, there will be a demand to convert them all into a common scenario so every team can test against every incident, and variations of it.

Making the logs portable (across releases, and perhaps between companies) is an important goal by itself. Consider Apollo (Baidu’s attempt at creating the “Android of the autonomous driving industry”): They are currently using ROS as the infrastructure (thus inheriting ROS’ repeatability problems), but they did add support for log portability, as they explain here.

Portable logs vs. scenarios: Note that making logs portable is distinct from turning them into scenarios. Let me explain this using the analogy of recording text in a portable way, vs. matching that text to regular expressions: A log (also called a trace) is a stream of events like “The AV saw these pixels”, “The AV decided this was a car”, “The AV turned left” and so on. Encoding these events in a standard way is already hard (because everybody’s HW, SW and engineering decisions are different).

Once you have this stream of “characters” (encoded events), you can match it against your library of scenario definitions, much like you would match a text string against a set of regular expressions: It may match one, several, or none. Did that stream of events correspond to an instance of “N_car_overtake” with N == 3, or to a sequence of three instances of “simple_overtake”? Well, the answer depends on the specific “regular expressions” (really temporal expressions or some other notation) in your scenario library.

BTW, if we had a portable log format, and a standard scenario description language, and an agreed-upon “canonical scenario library” encoded in that language, that could be quite useful. Among other things, we could quickly convert any recording / log into its representation as an instantiated canonical scenario.


Not sure how to summarize it all, other than to say that AV verification is maturing (e.g. compare this to my previous Stuttgart reports). It seems to be going in the right direction (frameworks, scenario definitions, lots of simulations etc.), but still has a long way to go.

I’d like to thank Gil Amid, Brad Templeton and Amiram Yehudai for reviewing previous versions of this post.



Leave a Reply