On Mobileye’s formal model of AV safety

Summary: This short post talks about Mobileye’s new paper (regarding a formal approach to Autonomous Vehicles safety). It claims that the paper has several issues, but is nevertheless an important start.

Mobileye came out with a paper titled “On a Formal Model of Safe and Scalable Self-driving Cars” (Bloomberg coverage, summary paper, full pdf). Their main contribution is a thoughtful, detailed, formal description of AV safety – do take a look.

I disagree with some of what they say (see below), but I think the paper is important. I especially like their audacity – trying to formally define safety once and for all. Even if it can’t quite be done, I think this is the start of something good.

I was originally going to write a longer post, summarizing the paper and discussing it. However, that got delayed (I do have excellent excuses), and in the meantime the ever-alert Junko Yoshida of EETimes wrote an initial summary / critique of the paper, and then interviewed Phil Koopman (whom I mentioned before) and Missy Cummings. These experts said most of what I wanted to say (and some things I did not think about), so I decided to just quickly summarize that below, and add what I thought was missing.

I agree with the following views: Here are some of the things Koopman and Cummings said (paraphrased, and perhaps mangled, by me):

They start by expressing similar views to what I said above: The paper oversimplifies somewhat, but is an important start.
They emphasize that SW bugs (mentioned but not further covered in the paper) could be a big issue (I’d add that HW logic bugs should also be considered).
They mention that there are many papers about provable robot safety, which need to be reconciled / considered together with the Mobileye paper
They discuss all the complexity created by the need for mutual understanding between AVs and human drivers (see my post about that).
They discuss all kinds of possible surprises and unintended consequences. Koopman mentioned that “It’s hard to believe that lidar and radar failure independence will work out as well as the discussion assumes”.
Koopman also mentioned the possibility of ML systems learning to game the system: “…what if there’s is a loophole due to measurement uncertainty or pessimistic assumptions that need to be made in practice? It’s likely that machine learning systems will find any such loophole and exploit it. And probably we won’t think of them all in advance”. (To me, this is just one more case where it’s the spec bugs that kill you).

Just one more thing: The one point I’d like to emphasize (beyond all of the above) is that I think the Mobileye authors downplay too much the role of verification-via-simulation. For instance, they say:

… Assume we desire an accident probability of 10⁻⁹ per hour, and a certain AV system provides only 10⁻⁸probability. Even if we obtain 10⁸ hours of driving, there is constant probability that our validation process will not be able to tell us that the system is dangerous.

Thus, they say (in corollary 1 in the paper, which indeed flows from their lemma 1), that “Any deterministic validation procedure” with a limited number of samples will not be enough to absolutely ensure safety.

Yes, they are right in saying that (and also in saying that “validating that the simulator faithfully represents reality is as hard as validating the policy itself”). Nevertheless, I think there is no way to do the job without edge-case-based, end-to-end, CDV-style verification-via-simulation. This kind of verification is probably the main way to find SW bugs, spec bugs and so on (e.g. see this post about Coverage Driven Verification).

Consider also that without this kind of verification, chip design would still be stuck at the gate (sorry for the bad pun). It does not come with any formal guarantee to make systems safe, but (in the complex, amorphous world as discussed above) it is perhaps the main tool for making it safe.

The paper probably assumes that simulations will be run using the expected distribution, rather than bug-finding distributions. This is probably not enough (see also the discussion of expected vs. unexpected bugs here).

To put it even more bluntly:

I assume that (unfortunately) there will be no way to fully, formally verify AV safety within the next 15 years (more on that here)
Yet not deploying AVs in that time frame would probably be wrong – morally (we can prevent so many accidents) and economically
So we should verify AVs as best we can (using a mix of the best tools), explain all this to regulators and the public (not easy), and agree on the right threshold and rate for AV introduction.

I still like the Mobileye approach: While this post concentrated on what’s missing in the Mobileye paper, I (and apparently the experts interviewed above) tend to agree with much of it, and to consider it an important start. Note also that the authors of the paper know much more than me about AV safety, vision, ML, and in fact just about any relevant topic (with the possible exception of verification).

Finally, I simply liked the paper (see also my review of Mobileye’s previous, related paper). I think some of the notations / concepts they introduced will also be helpful for creating edge-case-based scenarios for verification-via-simulation.

Notes

I’d like to thank Amiram Yehudai, Sankalpo Ghose, Gil Amid, Sandeep Desai and Yael Feldman for commenting on earlier drafts of this post.

[30-Oct-2017 – hopefully fixed EETimes links]

6 thoughts on “On Mobileye’s formal model of AV safety”

Joe Hupcey says:

November 4, 2017 at 5:58 am

Focusing on just the aspect of creating standards for safe following distance, vehicle speed and distance from roadside when passing parked cars from which people could emerge, etc., strikes me as a very positive, “correct by construction” approach. Granted, as functional verification people like us know, this is far from exhaustive in the universe of scenarios. But as long as we understand this and know that much more will need to be done, solidly covering many of the common base cases that we can think of can’t be a bad thing. Right? Or am I falling into the “hundreds of finely crafted directed tests are worth the investment” trap?

Loading...

1. Yoav Hollander says:
  
  November 4, 2017 at 6:27 am
  
  Hi Joe
  
  I think I agree with you: Having a generalized scenario for this case, and covering it well with the various possible permutations, is a good idea. Taking the original paper at face value, you don’t even need to do that (because everything is indeed correct by construction), but I am pretty sure they don’t really suggest that.
  
  The complexity on top of that then comes from possible SW bugs, more complex-and-hard-to-model situations, and so on.
  
  Also, please take a look at my comments regarding “correct by construction” here: https://blog.foretellix.com/2015/07/26/terminology-related-to-spec-bugs/
  
  Loading...
  
Sam says:

November 11, 2017 at 9:35 am

Curious whether you actually read the paper in detail. I’m someone with a math/ML background but kind of new to the AV field. I find this paper poorly written, very hard to chew, and still confused about many parts after many reads, but that could just be me.

Loading...

1. Yoav Hollander says:
  
  November 11, 2017 at 1:21 pm
  
  I am probably the wrong guy to give a well-considered answer to that question. I did read it, but both my math and ML knowledge are not enough to evaluate those aspects well. Same goes for Mobileye’s previous paper (mentioned above), which I liked even more.
  
  Loading...
  
Khan Safety says:

December 16, 2017 at 1:41 am

I Agree with You: It is really important to keep our mind wide and apply the possibilities. The author has made a good effort for analyzing the various applications.

Loading...

Arne Boralv says:

March 28, 2018 at 7:56 am

The Mobileye approach to define a set of safety rules, aiming to be a complete set, is obviously a good thing to do as this enables to achieve complete coverage in verification of software (using formal verification). Ensuring the safety rules are *complete* is hard, and the ultimate set of rules (rulebook) will evolve and likely be written in blood, as has been the case in other industries. Nonetheless, such safety rules are needed irrespective of how you are going to verify your system (testing, formal verification, driving a billion miles, …).

Loading...

	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…
	Aman on DeepXplore and new ideas for v…
	Angela on Verifying how AVs behave durin…

The Foretellix CTO Blog