Whatever you think of my Stuttgart symposium trip report, at least I made the effort to attend it and report back. In this post I sink even lower, reporting from a conference I did not attend – the ICRA workshop on verification of autonomous systems.
I did, however, watch the full 4.5 hour video (warning: voice quality varies) and went through the summaries and several of the presentations (links on the workshop page).
What I found interesting about this workshop were the discussions: They were mostly unscripted, and gave a good overview of people’s thoughts and concerns. So here are my impressions:
These are early days
One thing that was fairly clear is how early in the process we are: Even the best experts are struggling with describing the problems, and none seem to have a full, coherent picture of the solutions. These are early days.
Also, they all assumed there will always be a lot of uncertainty about everything: About whether the robot will be able to observe correctly, about whether it will be able to actually perform its own higher-level decisions, about whether we can even think of and encode all the functional / moral rules, and so on.
I was glad to see both formal and dynamic verification discussed, though formal got a somewhat bigger stage. This is often the case when a lot of people from academia are involved (in fact, this conference was better than most in this regard). I always worry that listeners would be left with the feeling that dynamic verification is also good, but clearly inferior and not likely to solve the real problems. In my opinion (as I described here) the opposite is probably closer to the truth.
I guess some of the participants still had wrong assumptions about dynamic verification. A hint to that was the fact that (towards the end) somebody talked about formal verification vs. “statistical testing”. While it is true that dynamic verification uses randomness and some statistical methods, people who say “statistical testing” usually mean something like “Monte-Carlo testing”, which (as I explained here) is not what we normally do in CDV – the most successful methodology for dynamic verification.
The licensing model will probably not work
There was an interesting discussion towards the end of the first session about the “licensing model”, in which (say) an autonomous vehicle would have to pass something like a (human) driving test in order to get a license. A comment was made that this is clearly not enough because humans have all kinds of other background knowledge and norms (e.g. “don’t run over a baby on the road”) which a machine may not have, and which are not checked during a driving test.
This discussion (and similar ones) just reinforced my feeling that at least part of the solution should be a huge set of test scenarios which the AV (or some simulation thereof) has to pass before it is considered “safe enough”. I have discussed this here.
Verifying just the top-level “rational agent”
Michael Fisher of Liverpool U made the point that it is best to split the autonomous system’s logic into a lower-level control system (implemented e.g. using Machine Learning), controlled by a higher-level rational agent (implemented in some readable / rule-based language, e.g. BDI). I previously described this approach in the chapter “Using ML for just the lower level” of this post.
He expanded on this in the workshop. One point he made which was new to me (and which makes sense) is that the higher level replaces the human decision-making, which is the part usually held accountable in a trial. Thus, this part should remain rule-based and transparent (and ideally formally verifiable). This explains his insistence on verifying the intent of the driving agent, even if that intent failed: The real world is non-deterministic and full of failures (e.g. you may not be able to brake on a wet road), but as least we need to verify that you intend to do the right thing (e.g. brake).
I think this intent verification is a reasonable thing to do, but of course you also need end-to-end verification (whether formal or dynamic). Also, even if we assume that the higher level is always encoded in some “logical” language (e.g. BDI), it may become too complex to formally verify. And even if we could prove various assertions for the big, complex systems, we also need to go through many scenarios and check them using scenario-specific checks (though that, perhaps, can be done formally, at least theoretically).
Interestingly, Constance Heitmeyer of the Naval Research Labs commented that they are usually employ formal verification for the controllers, i.e. precisely the part which Michael left “below the line”. This is (probably) because those controllers are constructed from low-level C code (and not ML – ML is really hard to formally verify).
This, of course does not contradict Michael’s position: One could formally verify both, and in fact one could imagine a world where a design consists of multiple pieces, each of which is formally verified, and then the whole thing is further formally verified hierarchically using some assume-guarantee framework. I am just dubious that this will work in practice (especially for systems which also contain ML, mechanics, humans and so on).
Notes
I’d like to thank Ziv Binyamini, Amiram Yehudai, Kestin Eder and Sandeep Desai for reading a previous version of this post.