AV verification updates: Waymo, Apple and more

Summary: This is a short collection of some news / papers related to Autonomous Vehicles verification

Some info about how Waymo does verification: The Atlantic came out with a long article about how Waymo verifies AVs.

I did not see much which was new or unexpected here (but I assume they did not divulge their most innovative stuff). It does show the significant scale of the verification effort. Here are some specific points I found interesting:

They explain how simulation is now the main tool for verification (they use an internal simulator called Carcraft). It is also the main tool for new engineering (“The vast majority of work done—new feature work—is motivated by stuff seen in simulation”). This is consistent with my observations (e.g. here) that simulations are now the accepted mode for finding most bugs.
Most of their simulations skip the whole sensor-modeling part, and just use what I called the “object shortcut” in figure 1 of this post.
There is an interesting description of their physical test site (“Castle”), including the kinds of props they collect and use, and how scenarios are shared between physical testing and simulations.
They talk about parametrizing scenarios, and then going through all combinations of the parameters. Interestingly, they call this “fuzzing” (I mentioned fuzzing and how it relates to Coverage Driven Verification in several posts). There is a chart demonstrating fuzzing of a four-way-stop scenario (search for “fuzzing” in the article) – presumably this shows the cross-coverage of the eight coverage items they chose for that scenario.

Apple discussed a new way to use GANs: I talked previously about the potential of using GANs (Generative Adversarial Networks) for stimuli generation, but always had a vague feeling that they should be used “on top of” something else – otherwise there will not be enough “controllable newness” in the generated stimuli.

Well, Apple came out with a paper titled Improving the Realism of Synthetic Images which may have added that missing piece: Rather than using, say, a conditional GAN to come up with novel stimuli, they create completely-synthetic images, and then just “refine” them using GANs, so as to look more realistic.

For context, I talked here about the various ways one can use ML for verification, but below is a more specific picture, showing where this usage of GANs fits in:

gans_for_generator

Note that for Apple, the purpose of the whole exercise was to use those refined images for ML training, but I assume that images which are good enough for that are also good enough to serve as verification stimuli.

In related developments:

There are now several examples (e.g. Lyrebird) of fitting somebody’s voice / video to an arbitrary message. This makes cheating easier (bad), but also realistic simulations easier (good).
DeepMind came out with a paper (pdf) about using “adversarial imitation” to learn human behavior. The related 3-minute video shows how, once the system learned a movement style, it can apply it while executing new tasks, such as climbing stairs and navigating obstacles.
Another trick for getting over the “reality gap” of synthetic images is domain randomization (pdf).

All of this could be fruitfully applied to making synthetic inputs more realistic (as in Fig. 1 above), so I am now fairly optimistic about this direction.

Ben Evans on AVs: On a slightly less technical note, I just read two interesting AV-related posts by Ben Evans (of Andreessen Horowitz).

The first, Cars and second order consequences, tries to look at possible effects of the move to autonomy (and to electric cars). He assumes that accidents will essentially go away in a fully-autonomous world (which I think is an exaggeration), but the whole post is pretty interesting.

The second, Winner-takes all effects in autonomous cars, asks how many AV companies will remain when the dust settles:

Are there network effects that would allow the top one or two companies to squeeze the rest out, as happened in smartphone or PC operating systems? Or might there be room for five or ten companies to compete indefinitely?

He claims hardware and sensors do not have a network effect, but mapping and data (for ML training) do. Simulations also fit into this latter category:

There are also clear scale advantages to simulation, in how much computing resource you can afford to devote to this, how many people you have working on it, and how much institutional expertise you have in large computing projects.

I tend to agree with him on that last part (though I would add “institutional expertise in verification” as a big factor), and I think the Waymo piece above demonstrates that. BTW, from my own discussions with AV people, I hear the number “five” mentioned quite a bit for the number of global AV winners, but I also hear a lot about geographical (and special-usage) niches.

That’s all for today. This space keeps being interesting.

Notes

I’d like to thank Sandeep Desai, Gil Amid and Amiram Yehudai for reviewing previous versions of this post.

	When is misalignment… on It’s the spec bugs that kill y…
	When is misalignment… on Verifying friendly AI: our fin…
	Coverage-driven alig… on It’s the spec bugs that kill y…
	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)