Summary: This is another “What’s new in verification land” post. It describes a video and a paper from Mobileye, and takes that opportunity to revisit four topics: How Autonomous Vehicles should handle unstructured human interaction, how to balance Reinforcement Learning and safety, why simulation is the main way to validate safety in these unstructured environments, and why nevertheless there seems to be a big emphasis on testing-via-test-tracks.
The world according to Mobileye: There is a new, interesting, 75-minute video titled The Convergence of Machine Learning and Artificial Intelligence Towards Enabling Autonomous Driving. The speaker is Amnon Shashua, Mobileye CTO and chairman (he starts talking at the 6:30 mark).
There’s a lot in this video, including a description of the map-heavy vs. the map-light approaches to AV perception (he is in the map-light camp), and a discussion of the relative difficulty of AI tasks (he claims driving policy is much harder than perception). The video also mentions the paper Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving by Amnon and others, which I found interesting in various ways.
So here are my takeaways / thoughts regarding all this (mainly about the safety and verification aspects):
The problem with humans: AVs have to interact with humans (other drivers, pedestrians etc.), which greatly complicates AV design and verification. I have discussed this before, giving several examples (interacting with a construction worker who uses hand signals, interacting with rude pedestrians etc.).
The paper above gives another interesting example, and then suggests a solution. The example is as follows: Suppose a one-way, two-lane highway merges with another one, and then splits. Some drivers want to stay on the original highway, and some (including our AV) want to cross over to the other one. Our AV has to carefully “negotiate” with the other cars and “understand” their intentions: Ideally, it should cross (and avoid undue hard braking). If that’s impossible (e.g. the traffic is too dense) it should at least not cause any accidents.
This seems like a natural thing to do via Reinforcement Learning, and this is indeed how they do it (with a twist, though, since they claim that “the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario”).
Another important twist is how they combine RL and safety:
RL and safety: RL uses a “reward signal” to know what’s good and what’s bad. The simplistic solution for handling safety (give some small reward whenever you achieve the “goal of the AV”, and a big negative reward whenever an accident occurs) does not work well, for several reasons.
This problem (“Safe RL”) has been studied by many. I previously described how some people suggest solving it by having a non-RL top-level decision-making agent (see “Verifying just the top-level rational agent” here), while others suggest a way to “weave” safety constraints into RL using shield synthesis (see “Shield synthesis and ML safety” here).
Well, Amnon et al. have yet another solution: Have the RL produce just high-level “desires” (e.g. “next, try to get into the left lane, while getting in front of car A and behind car B”). The low-level planning to accomplish these desires is done using a non-learning dynamic planner, which is built to satisfy safety constraints (and which may sometimes fail to satisfy the desire).
I doubt this solves all safety issues, but I do like this solution: It may have higher utility than the previous ones (in the sense that it can afford to plan for riskier “desires”, as long as the dynamic planner avoids the rare actual dangers).
Simulation is the main way to validate safety in unstructured environments: After the talk, somebody asked how one should validate this merge/split algorithm. I liked Amnon’ answer (starting at the 59-minute mark), which mostly corresponds to my own views. He suggests it can only be validated (e.g. for the purpose of regulation) in simulation, using a generative model of how humans, including reckless humans, drive (he also suggests using GAN-like techniques, as I discussed here). He further clarified that one cannot validate such things using test tracks.
As a reminder, here are the main execution platforms used in AV verification, taken from this post:
- Model in the loop: Uses a high-level model of the VE+DUT, can find conceptual bugs before SW is written
- SW in the loop (SIL): Uses the actual AV SW in a simulated framework, most bugs are found there, sensor modeling also a problem
- HW in the loop (HIL): Uses some of the actual HW boxes in a simulated framework, can use the actual sensors if needed
- Stationary vehicle: Uses a real vehicle in a setup where it “sees” projected inputs and the wheels spin in place
- Automated test track: With other cars, human puppets moving by command. Can be controlled by scenario, but with limitations.
- Street driving: The real thing, driving in actual city streets. Cannot control scenarios, but can collect scenario coverage
In general, higher-numbered execution platforms are more accurate. On the other hand, they are also less controllable, harder to debug, more expensive (and thus there are less copies), and tend to appear later in the development cycle.
Which leads me to:
The push for test tracks: While AV companies have a lot of resources, they still need to make tradeoffs in how much they invest in each execution platform. The above discussion may have convinced you that they should be leaning more towards scenarios-in-simulation, and they are certainly doing some of that, but I detect a strong tendency towards testing-via-test-tracks, even though most of the bugs are not found there.
Why is that? A friend of mine who is familiar with these companies claims much of the reason is “political”: Building a (physical) test track is a good source of budget and prestige, and (unlike simulation / verification infrastructure) it is visible. So lots of states / universities / etc. want to build one. All that test-track-related attention (and expected regulation) pushes AV companies to spend more time on that.
Test tracks are a good idea – all I am questioning here is the balance (in attention, budget and manpower) between test tracks and other execution platforms. My friend expects this somewhat-lopsided-balance to right itself “in a few months or after a few deaths”.
Seriously: Simulation / HIL, combined with good CDV-style verification infrastructure, let you try vastly more scenarios, try really-dangerous stuff, start verification much earlier, debug much faster and so on. If, further, your verification infrastructure works for all execution platforms (as it should), then it really becomes the corner-stone of the whole effort, and should be designed first. Finally, I hope regulators will also realize how they can use simulations and CDV.
I’d like to thank Kerstin Eder, Gil Amid, Amiram Yehudai, Yael Feldman and Sandeep Desai for commenting on previous versions of this post.