The Uber accident and the bigger picture

Summary: This post discusses the influence of the Uber accident on Autonomous Vehicle (AV) deployment. It claims that AVs should eventually be deployed, and yet that we should expect many fatal AV accidents. It then suggests that a comprehensive, transparent verification system could help solve this inevitable tension.

That tragic Uber accident has brought AV safety into sharp focus. Brad Templeton wrote an excellent summary of what is known so far, and others have weighed in on how people are more afraid of things they can’t control, on the need for third-party testing, on the insurance implications of all this and so on.

As I write this, there are still open questions. What is clear is that the victim, a homeless lady pushing a bicycle, crossed several lanes in an unlit, no-cross section of the road, before being fatally hit by the AV. There was probably enough time for a human driver to stop. Further, the AV’s Lidar (Laser radar), and probably other sensors, should have given enough advance warning for the AV to stop in time. Why did that not happen?

One possibility is some weird un-verified corner case (e.g. a person wearing dark clothes walking a bike with lots of bags) – I mentioned before that Machine Learning (ML) based systems (often used for scene understanding) are especially hard to verify comprehensively. But perhaps this was not a bug at all: One rumor says the Lidar was turned off for test purpose, and perhaps it was something else altogether.

Regardless of the specifics of this incident, this post will look at the bigger picture of AV deployment, safety and verification, and expand on the following claims:

Beyond a certain safety threshold, AVs should be deployed
While safer than human drivers, AVs will continue to have many fatal accidents
AV manufacturers and regulators should employ a well-thought-out, comprehensive, continuously-improving, multi-execution-platform, transparent verification system

Let me elaborate on all that.

Beyond a certain safety threshold, AVs should be deployed

Consider figure 1 below: It assumes (oversimplifying somewhat) that human-driver safety stays constant, while AV safety keeps improving. It also splits AV fatalities into “unavoidable accidents” and “AV bugs” (note that as AVs improve, some previously-unavoidable accidents will become avoidable).

curve

Where is AV safety right now on that yellow curve? That’s hard to say. In the US, human drivers have about one fatality every 100M miles, so Uber (with one fatality after just ~2M miles) looks pretty bad. However, the best AVs (say Waymo’s) are probably much safer, and constantly improving.

The accident has certainly lowered people’s perception of AV safety. Uber, in particular, may have taken the phrase “Move fast and break things” a bit too literally. In the above post, Brad says:

I suspect it may be a long time — perhaps years — before Uber can restart giving rides to the public in their self-driving cars. It may also slow down the plans of Waymo, Cruise and others to do that this year and next.

But I think many will agree that beyond a certain level of AV safety (assuming it can be demonstrated convincingly), society has a moral duty to deploy AVs (so as to prevent all those deaths between the black and the yellow curves). That “should deploy” date will be different for different domains (e.g. limited areas of Phoenix in good weather, all of Boston in any weather, all of Bangalore in any weather etc.), and I am not going to guess the dates here, but they will come.

There are also, of course, huge commercial incentives for deployment, but let’s stick for now with “what’s the right thing to do”. And it seems that the right thing to do is:

Find a reasonable way to determine the correct deployment date
Do whatever we can to improve AV safety as quickly as possible (both before and after that date)

That first bullet is tricky, though, for various reasons. Here is one:

There will probably be many fatal AV accidents

Consider a future date (say in the US), where AVs are clearly 10 times safer than humans (just how we determine that will be discussed below). At that point, we are probably well into “should deploy” territory. Assume further that by that date, a full 10% of driven miles are driven by AVs: This is clearly good, since each of these miles is 10 times safer than a human-driven mile.

But even then, we should expect about one fatal AV accident per day, right? There are about 100 fatal car accidents per day in the US, and 100 * 10% * 10% is 1.

Each of these daily accidents will (usually) be less news-worthy than that Uber accident, but will still be scrutinized much more than a human-caused accident. Was it unavoidable? Was it a bug? How well was that scenario verified by the AV manufacturer?

So there is a need for the various stakeholders (the public, lawmakers, regulators, AV manufacturers etc.) to agree on some general framework for handling these accidents (and the whole deployment process). That framework should ensure, among other things, that:

Not every accident results in a lengthy, billion-dollar lawsuit
Negligent AV manufacturers do get punished
Everybody (the public, the press, judges, lawmakers, regulators etc.) has an understandable way to scrutinize the safety of various AVs, both in general and as it relates to a specific accident scenario

This is going to be a non-trivial framework: It will surely have legal and regulatory components. It will probably include ISO-style “process” standards, such as ISO 26262, the SOTIF follow-on, and the expected “SOTIF-for-AVs” follow-on to that. It may contain a formal component and more.

But I think the central component (tying all others together) is going to be a verification system. Let me try to convince you of that:

The need for a well-thought-out verification system

I think this framework should be based on a verification system, which lets you:

Define a comprehensive, continuously-updated library of parameterized scenarios
Run variations of each scenario many times against the AV-in-question, using a proper mix of execution platforms (such as simulation, test tracks etc.)
Evaluate the aggregate of all these scenarios / runs (and any requested subset of it), to transparently understand what was verified (this is called “coverage”) and what “grade” it got

Such a coverage-driven verification system (enhanced by ML-based techniques) is probably our best bet. It will also be a crucial component for improving safety as quickly as possible (see this post for more details about coverage driven verification).

Here are some of the main attributes of such a verification system:

It should be comprehensive: The scenario library should be comprehensive along many dimensions. Here are some examples:

It should cover both avoiding accidents, and behaving during / after accidents
It should pay specific attention to interactions between AVs and people, and grade for both accidents and annoyances
It should emphasize the verification of ML-based algorithms
It should use “normal”, expected-frequency scenarios, but also (and primarily) corner-case-directed, bug-finding scenarios. Similarly, it should use both digitized topology of actual cities, as well as made-up topologies.
It should look for both “expected” and “unexpected” bugs (the example bug in figure 1 is an “unexpected” bug)
It should cover all the scenarios mentioned in the appendix of the Waymo safety report, all the scenarios allegedly not covered by Cruise (paywalled), all the Pegasus scenarios and many more
It should be relatively easy to modify per locale / domain / driving restrictions etc.

A big challenge is making the library comprehensive without making it unwieldy. For instance, a huge (and exponentially growing) spreadsheet / database of test cases will simply not scale: Beyond a certain size it will be neither transparent nor maintainable.

Much of the heavy lifting will have to be done by the Scenario Description Language in which the library will be written (and by the related tools). SDL should be constraint-and-coverage-based, and extensible.

In particular, the system should constantly try to mix various scenarios, parameter values, topologies, weather conditions etc. (so as to find new bugs), without the user having to specify, or even think of, every combination. For instance, if that Uber accident was indeed caused, say, by mis-classification of a person wearing dark clothes walking a bike with lots of bags, then we would expect such a system to find this bug (with good probability) without anybody having to think of this exact combination in advance.

On the other hand, a user should be able to define any such combination C (however specific or general) using functional coverage definitions. The user should then be able to ask the system: “Show me all instances of C in last week’s runs”, “Show me the instances of C which got the worst grade”, “Show me a graph of how we did on C over the last few releases”, or even “During tomorrow’s runs, tweak the various parameters so as to get many more instances of C”.

It should be continuously-improving: Even with the best verification system, it is impossible to consider everything ahead of time. As I described in The Tesla crash, Tsunamis and spec errors, some scenarios (like what an AV should do when it encounters a Tsunami) may only occur to people at a later stage.

We may realize (e.g. following a specific accident) that we are missing some scenarios, or that our coverage “mesh” is not tight enough, or that our grading function is incomplete. We may also want to automatically extract scenarios from recordings (of dashcams, static cameras, AV recordings etc.) – see “Extracting scenarios from recordings” in this post for some of the considerations.

Thus there will be a need to continuously update / enhance that scenario library, in a collaborative and safe way. If-and-when there will be some form of an open, standard scenario library (complete with coverage and grading definitions), then there will be e.g. the 2021 standard, the more-comprehensive 2023 standard, and so on.

It should run on many execution platforms: It was already becoming clear (as I reported here) that most verification should be done using simulations, because this lets you explore many more unconsidered scenarios in a scalable way. The Uber accident will probably accelerate this trend, because simulations let you do many dangerous things without the risk of causing damage or actually killing people.

However, verification should also be performed using other execution platforms (such as vehicle-in-the-loop, automated test tracks, street driving and so on). These execution platforms, and their sub-configurations (e.g. see this post) have various tradeoffs (e.g. realism vs. cost vs. speed), and trustworthy verification should involve judicial usage of several of them. Ideally, you should be able to use the same scenario definitions for driving / measuring scenarios on all these platforms.

It should be transparent: Verification transparency is key here: Many stakeholders need to see how well various scenarios were verified, without having to go into deep technical discussions (of SW, ML etc.).

For instance, the public would like to check that a specific accident scenario (say an unprotected left turn during fog) was indeed verified for that AV. People may also want to see how related areas were verified (e.g. unprotected left turns in any bad weather, or other challenging driving scenarios during fog).

The ability to scrutinize this using commonly-understood terms will be especially welcome for accidents which do go into litigation.

Regulators will need a common, clear way for specifying and tracking verification compliance, using terminology they can relate to (and perhaps dictate). Note that the verification system should also be able to give a rough estimate of the projected fatalities-per-mile under various conditions, though this is pretty hard to do.

Note that regulators and lawmakers (at least in the US) stress the need to evaluate safety (rather than technology), saying things like:

We’re looking at ways to evaluate outcomes. Instead of a regulation that says, ‘Machine must have A, B and C in a vehicle’, we hope to look at how safe a vehicle is at the other end.

Finally, the AV providers themselves will probably welcome a clear, transparent set of verification requirements: As somebody who is familiar with several OEMs recently told me, once they get clear requirements, they’ll make sure to exceed them by 20% – they just don’t have them yet.

To summarize: I hope I have been able to suggest why AV deployment should continue despite future accidents, and why a well-thought-out verification system could really help. Note that Foretellix is working on such a system, but this is something that no single company or organization can do all by itself.

Notes

I’d like to thank Ziv Binyamini (who BTW just joined Foretellix as CEO), Gil Amid, Sandy Hefftz, Moshe Gavrielov, Brad Templeton, Thomas (Blake) French, Yaron Kashai, Amiram Yehudai, Ohad Schwarzberg, Sankalpo Ghose and Kerstin Eder for commenting on earlier drafts of this post.

A note about the Foretellix blog: This will remain an ideas / technical blog. For updates about my company, Foretellix (team, advisory board etc.) please see the Foretellix web site.

This post leaves a lot of open questions, some of which will be discussed in future posts. Comments are very welcome.

Well written article Yoav, but in my opinion the foundations are flawed, not just for this but for all AV-think. Firstly the “moral obligation” to implement something just because it makes humans safer: there is no such obligation and human engineers and law-makers are typically very quick to implement things that have immediate and measurable negative effects on human safety. For example making weapons, starting wars, cutting healthcare and social welfare, failing to develop (for economic reasons) vaccines for 3rd world diseases like ebola. Engineers and scientist are often complicit in these things, despite clear moral obligations to the contrary. If this moral obligation really existed, universally, then we’d be forced to stop any human from taking any preventable risks. Take it to its logical conclusion and you’re into the realms of so many sci-fi stories where humans are kept safe from the world by robots doing everything for them and ultimately the point of being human, mortal, is lost.
Secondly, AV proponents claim that all that’s needed is a societal change supported by improved laws and technology. It makes me laugh to see how much energy is being put into automotive safety even just for ADAS let alone full-AV. All because apparently a technological “fix” is easier to apply than a societal one where we get people to stop taking stupid risks like using the phone while driving, speeding, driving tired, etc. In my view, the change needed to make humans into less-risky drivers is small, yet we can’t manage to make it happen, anywhere in the world. How then can we expect to make even bigger societal changes to support AV dependency?
One other point: during the transition phase where humans are expected to take over from the autonomous vehicle in difficult conditions, the less time humans spend actually driving, the more difficult it’ll be for them to take over from a machine when needed, either because of the lack of context (reading the news then looking up to see the car has asked you to take over due to a sudden rain storm) or lack of practice (perhaps reversing a big trailer down a crowded street and round a corner through a tight gateway. Thus we could actually see a spike in accidents during the transition phase, which would only tail off once AVs can be fully as opposed to mostly autonomous.
Personally I hope I’m still permitted to do my own driving for the next 30-odd years…

6 thoughts on “The Uber accident and the bigger picture”

Roman Gershman says:

March 28, 2018 at 7:22 am

Great article! I agree that maintaining huge library of specific test cases might be impractical, however coverage metric should be probably described by specific test cases to allow consistent testing during the release cycle of mature systems.

Loading...

1. Yoav Hollander says:
  
  March 30, 2018 at 2:53 pm
  
  Thanks.
  
  I think that usually, a coverage “group” should be associated with a specific scenario (and its parameters, possible outcomes etc.). A specific test may end up hitting many of these groups (though certainly there will be tests whose main purpose in life will be hitting specific coverage groups).
  
  Loading...
  
Daan van der Keur says:

March 29, 2018 at 4:31 pm

There’s still one thing which I miss in the discussion about AV’s and that’s the motorbike. I once wrote somewhere around 2008 that the motorbike would be forbidden by the authorities in the future. Am I right?

Loading...

Daan van der Keur says:

March 29, 2018 at 5:55 pm

Laat week I read in a Dutch newspaper that they need to roll out a 5G network in The Netherlands because of the AV. So in the near future you’ll have small antenna’s everywhere along the street, at busstops, on street lights etc. So besides radiation (I also work in the medical scientific field and I have knowledge of some very critical articles about EMR that were forbidden to be published) everywhere there’s also a need to het permanent functioning 5G-antenna’s because what if the antenna’s fall out for whatever reason, Do we get a very big traffic infarct then? and what if I want to go on holiday in Kazachstan like I’m doing this summer holiday. Okay I’ll go on the motorbike but what if I would go with an AV in the future and there’s no AV-5G-system across the border in The Ukraine and Russia etc. What also bothers me is the constant connection with the internet of an AV. That means that anyone who’s an enemy of the government can be killed in his AV if they want to. I’ve for instance just published a very critical article about Automotive software bugs on a Belgium Blog called Doorbraak.be because not one single newspaper/radio/tv in The Netherlands wanted or should I say dared to publish it because The Netherlands is a model country for AV’s at least that is what our government wants! I know they hate me for this and my second article will be even more critical. With all people driving in AV’s a government is able to control everything or am I talking bullshit now…….

Loading...

Steve Hobbs says:

April 3, 2018 at 8:59 am

Well written article Yoav, but in my opinion the foundations are flawed, not just for this but for all AV-think. Firstly the “moral obligation” to implement something just because it makes humans safer: there is no such obligation and human engineers and law-makers are typically very quick to implement things that have immediate and measurable negative effects on human safety. For example making weapons, starting wars, cutting healthcare and social welfare, failing to develop (for economic reasons) vaccines for 3rd world diseases like ebola. Engineers and scientist are often complicit in these things, despite clear moral obligations to the contrary. If this moral obligation really existed, universally, then we’d be forced to stop any human from taking any preventable risks. Take it to its logical conclusion and you’re into the realms of so many sci-fi stories where humans are kept safe from the world by robots doing everything for them and ultimately the point of being human, mortal, is lost.
Secondly, AV proponents claim that all that’s needed is a societal change supported by improved laws and technology. It makes me laugh to see how much energy is being put into automotive safety even just for ADAS let alone full-AV. All because apparently a technological “fix” is easier to apply than a societal one where we get people to stop taking stupid risks like using the phone while driving, speeding, driving tired, etc. In my view, the change needed to make humans into less-risky drivers is small, yet we can’t manage to make it happen, anywhere in the world. How then can we expect to make even bigger societal changes to support AV dependency?
One other point: during the transition phase where humans are expected to take over from the autonomous vehicle in difficult conditions, the less time humans spend actually driving, the more difficult it’ll be for them to take over from a machine when needed, either because of the lack of context (reading the news then looking up to see the car has asked you to take over due to a sudden rain storm) or lack of practice (perhaps reversing a big trailer down a crowded street and round a corner through a tight gateway. Thus we could actually see a spike in accidents during the transition phase, which would only tail off once AVs can be fully as opposed to mostly autonomous.
Personally I hope I’m still permitted to do my own driving for the next 30-odd years…

Loading...

1. Daan van der Keur says:
  
  April 3, 2018 at 4:07 pm
  
  Clear reaction Steve. I especially liked “Engineers and scientists are often complicit in these things, despite clear moral obligations to the contrary.”. I’ve been working in Genetics for over 20 years when they started to sequencing the complete human genome. During this proces they really thought that people wanted to know if they had a certain genetic disease. But most people don’t want to know this because they want to enjoy their life instead of knowing that they will certainly die. Things are often not that simple as engineers/scientists and also politicians want us to think it is. I also hope to drive my Renault Twizy and al the testcars – I’m testing cars besides my job http://www.autotesten.nl – for the next 20 years (not 30 because I’m a little bit older than you I think).
  
  Loading...

	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…
	Nakkeeran Kumaraswam… on Autonomy markets and their pot…
	Aman on DeepXplore and new ideas for v…
	Angela on Verifying how AVs behave durin…

The Foretellix CTO Blog