On Mobileye’s formal model of AV safety

Summary: This short post talks about Mobileye’s new paper (regarding a formal approach to Autonomous Vehicles safety). It claims that the paper has several issues, but is nevertheless an important start.

Mobileye came out with a paper titled “On a Formal Model of Safe and Scalable Self-driving Cars” (Bloomberg coverage, summary paper, full pdf). Their main contribution is a thoughtful, detailed, formal description of AV safety – do take a look.

I disagree with some of what they say (see below), but I think the paper is important. I especially like their audacity – trying to formally define safety once and for all. Even if it can’t quite be done, I think this is the start of something good.

I was originally going to write a longer post, summarizing the paper and discussing it. However, that got delayed (I do have excellent excuses), and in the meantime the ever-alert Junko Yoshida of EETimes wrote an initial summary / critique of the paper, and then interviewed Phil Koopman (whom I mentioned before) and Missy Cummings. These experts said most of what I wanted to say (and some things I did not think about), so I decided to just quickly summarize that below, and add what I thought was missing.

I agree with the following views: Here are some of the things Koopman and Cummings said (paraphrased, and perhaps mangled, by me):

  • They start by expressing similar views to what I said above: The paper oversimplifies somewhat, but is an important start.
  • They emphasize that SW bugs (mentioned but not further covered in the paper) could be a big issue (I’d add that HW logic bugs should also be considered).
  • They mention that there are many papers about provable robot safety, which need to be reconciled / considered together with the Mobileye paper
  • They discuss all the complexity created by the need for mutual understanding between AVs and human drivers (see my post about that).
  • They discuss all kinds of possible surprises and unintended consequences. Koopman mentioned that “It’s hard to believe that lidar and radar failure independence will work out as well as the discussion assumes”.
  • Koopman also mentioned the possibility of ML systems learning to game the system: “…what if there’s is a loophole due to measurement uncertainty or pessimistic assumptions that need to be made in practice? It’s likely that machine learning systems will find any such loophole and exploit it.  And probably we won’t think of them all in advance”. (To me, this is just one more case where it’s the spec bugs that kill you).

Just one more thing: The one point I’d like to emphasize (beyond all of the above) is that I think the Mobileye authors downplay too much the role of verification-via-simulation. For instance, they say:

… Assume we desire an accident probability of 10−9 per hour, and a certain AV system provides only 10−8 probability. Even if we obtain 108 hours of driving, there is constant probability that our validation process will not be able to tell us that the system is dangerous.

Thus, they say (in corollary 1 in the paper, which indeed flows from their lemma 1), that “Any deterministic validation procedure” with a limited number of samples will not be enough to absolutely ensure safety.

Yes, they are right in saying that (and also in saying that “validating that the simulator faithfully represents reality is as hard as validating the policy itself”). Nevertheless, I think there is no way to do the job without edge-case-based, end-to-end, CDV-style verification-via-simulation. This kind of verification is probably the main way to find SW bugs, spec bugs and so on (e.g. see this post about Coverage Driven Verification).

Consider also that without this kind of verification, chip design would still be stuck at the gate (sorry for the bad pun). It does not come with any formal guarantee to make systems safe, but (in the complex, amorphous world as discussed above) it is perhaps the main tool for making it safe.

The paper probably assumes that simulations will be run using the expected distribution, rather than bug-finding distributions. This is probably not enough (see also the discussion of expected vs. unexpected bugs here).

To put it even more bluntly:

  • I assume that (unfortunately) there will be no way to fully, formally verify AV safety within the next 15 years (more on that here)
  • Yet not deploying AVs in that time frame would probably be wrong – morally (we can prevent so many accidents) and economically
  • So we should verify AVs as best we can (using a mix of the best tools), explain all this to regulators and the public (not easy), and agree on the right threshold and rate for AV introduction.

I still like the Mobileye approach: While this post concentrated on what’s missing in the Mobileye paper, I (and apparently the experts interviewed above) tend to agree with much of it, and to consider it an important start. Note also that the authors of the paper know much more than me about AV safety, vision, ML, and in fact just about any relevant topic (with the possible exception of verification).

Finally, I simply liked the paper (see also my review of Mobileye’s previous, related paper). I think some of the notations / concepts they introduced will also be helpful for creating edge-case-based scenarios for verification-via-simulation.

Notes

I’d like to thank Amiram Yehudai, Sankalpo Ghose, Gil Amid, Sandeep Desai and Yael Feldman for commenting on earlier drafts of this post.

[30-Oct-2017 – hopefully fixed EETimes links]

 

 

 


4 thoughts on “On Mobileye’s formal model of AV safety

  1. Focusing on just the aspect of creating standards for safe following distance, vehicle speed and distance from roadside when passing parked cars from which people could emerge, etc., strikes me as a very positive, “correct by construction” approach. Granted, as functional verification people like us know, this is far from exhaustive in the universe of scenarios. But as long as we understand this and know that much more will need to be done, solidly covering many of the common base cases that we can think of can’t be a bad thing. Right? Or am I falling into the “hundreds of finely crafted directed tests are worth the investment” trap?

    Like

    1. Hi Joe

      I think I agree with you: Having a generalized scenario for this case, and covering it well with the various possible permutations, is a good idea. Taking the original paper at face value, you don’t even need to do that (because everything is indeed correct by construction), but I am pretty sure they don’t really suggest that.

      The complexity on top of that then comes from possible SW bugs, more complex-and-hard-to-model situations, and so on.

      Also, please take a look at my comments regarding “correct by construction” here: https://blog.foretellix.com/2015/07/26/terminology-related-to-spec-bugs/

      Like

  2. Curious whether you actually read the paper in detail. I’m someone with a math/ML background but kind of new to the AV field. I find this paper poorly written, very hard to chew, and still confused about many parts after many reads, but that could just be me.

    Like

    1. I am probably the wrong guy to give a well-considered answer to that question. I did read it, but both my math and ML knowledge are not enough to evaluate those aspects well. Same goes for Mobileye’s previous paper (mentioned above), which I liked even more.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s