Note that while HVC is about “all things verification”, much of the audience comes from the chip verification world. Also, the conference historically leans towards the formal. This was still the case this year (with some exceptions, such as the special session on Intelligent Autonomous Systems – see below). Machine Learning – the new “hot tech” – also had a strong presence.
My presentation: Verifying Intelligent Autonomous Systems
The special session consisted of my presentation, followed by Shahank Pathak’s presentation about Probabilistic Model Checking (PMC – more on this below). As I said in a previous post
I consider the conference to be slightly too formal-heavy, so to balance that I plan to provide mainly light entertainment, gossip and sweeping generalizations about the current state of this increasingly-important field.
- Related posts in my blog: AV testing, Tesla crash, mostly-autonomous systems, spec bugs, Checking probabilistic components, fault trees, ML verification, explainable AI
- Other related blogs: Brad Templeton’s blog, Philip Koopman’s blog
Regular readers of this blog will probably not find much new in the presentation, except for that PMC angle:
Probabilistic Model Checking (PMC) and all that
I talked before (see links above) about how hard it is to compute the probability of failures of heterogeneous systems, and why fault trees, while important, have lots of issues.
Well, for the special case where we can assume that the system behaves like a Markov chain, there is a pretty reasonable solution: PMC. This special case is important enough to warrant a few more words:
A Markov chain, as you guys may remember, is simply a state machine with probabilities assigned to transitions. This does not sound all that exciting, but it turns out that many systems behave like that, for instance systems based on reinforcement learning (RL).
Most RL systems assume that the rest-of-the-world (the thing being controlled by the RL system) can be modeled as a Markov Decision Process (MDP) or even a Partially Observable Markov Decision Process (POMDP). POMDPs are not for the faint of heart: Not only are you unsure about the result of executing a specific action in a specific state, but you are never quite sure which state you are in. Still, this anxiety-inducing world turns out to be a good approximation for the world we live in, so RL systems are increasingly popular.
Training an RL system amounts to deciding on a “policy” (i.e. which action to choose at any point, so as to best push the MDP to “rewarding” states). Combining the resulting policy with the MDP gets us a Markov chain, and thus makes PMC possible.
This is fortunate, because RL-based systems are doing more and more things (from playing Go to controlling robots). Essentially, PMC does for Markov chains what normal model checking does for state machines. Consider the following slide from my HVC presentation:
If the light-blue box can indeed be modeled as a Markov chain, one can use PMC to get the exact probability that some “bad sequence” will happen.
Shashank’s presentation explained all that (see related paper), and described another important area where PMC is helpful (belief space planning).
Shashank and myself agree, however, that even in the above example PMC cannot be the whole answer, because:
- This is just one module (the full system may have, say, 100M code lines, some of which may indirectly influence the ML-based system)
- Training set may be deficient (e.g. missing rare, high-impact things like black ice)
- Other issues
Thus one still needs something like CDV for full-system verification. Still, PMC seems to be pretty useful, and it would be interesting to see how it could be better combined with fault trees and with CDV.
Shield synthesis and ML safety
How do we make sure that an ML-based system (trained, say, via RL) behaves safely, in the sense that it does not do dangerous things? I am talking here about the design question, not about the related verification question.
One way, as hinted above, is to make sure the training set includes all the rare-but-dangerous things we are trying to guard against. But this has problems: In your effort to go through all the things that could go wrong, you could skew the distribution of the training set. Also, defining the “right” reward signal (which also penalizes dangerous stuff) is tricky.
I mentioned here an interview with a Tesla director who said about ML-based systems (paraphrased):
Because there is such a long tail of corner cases, you can’t just “turn the crank” on the ML algorithm to fix them. Rather, those corner cases require “specific tuning of specific algorithms”.
So Tesla, apparently, adds that final “safety” touch via hand-tuning some SW – perhaps by “fixing” dangerous outputs of the ML system.
Well, there is at least one more way, which Bettina Könighofer from Graz University of Technology (Austria) presented (an older related paper is here). What they do is called “shield synthesis”, and while it is currently not ML-specific, here is how it would work for ML:
- Write the safety assertions in some suitable temporal language
- Synthesize those assertions into a SW module (the “shield”) to which you feed both the inputs and the outputs of the ML-based system
- If the shield observes that the output is safe, it lets it through, otherwise it blocks it and supplies its own, safe output
- Alternatively, you can put the shield before the ML system, allowing the ML system to choose only from safe outputs
- In any case, the resulting output is fed back into the training process of the ML-based system: This is important for better training-in-the-face-of-safety, and for making sure that subsequent, safe outputs will “flow smoothly”
This is hopefully somewhat better than the Tesla approach (as I imagined it above), since the SW module is automatically generated from the safety assertions. Note, however, that the fact that the module is “correct by construction” does not mean that you do not have to verify it, hopefully via CDV (see a discussion of that topic here).
Here are a bunch of other things which caught my attention:
Is formal growing, relatively? Formal Verification people always say “FV is finally taking over”, so I learned to tune that out (see also this post). However, in this HVC I heard this more, so maybe it is growing (in the HW module verification domain).
Creating a formal testbench: One group who said “FV is growing” was some (normally trustworthy) folks from Mellanox. They also explained how it is very easy to create formal-ready module-level testbenches: You simply write the testbench in RTL (the format used for describing the HW itself), and then you express any non-determinism via simple a simple “if” statement. For instance, if both behavior X and Y are legal in some situation, you write “if my_signal then X else Y”, and (crucially) you do not assign anything to my_signal – this lets the formal engine set it to any value so as to try and contradict the various assertions.
In hindsight, this is obviously a good way to write such testbenches – it just never occurred to me before.
Using ML to discover specifications: People often don’t write specs, and yet it is hard to test functions without specs. One way around that, discussed by Swarat Chaudhuri of Rice U in a talk called “Guiding formal methods with discovered knowledge” (which contained much else) uses an ML-based approach to discover the spec.
This is not as crazy as it sounds. Essentially, they mine lots of open-source repositories for functions, then create (symbolic) traces from them, and finally try to extract specs from those traces (e.g. “One should always open a file before writing to it”). This is all done probabilistically using ML: Essentially, they learn the “joint distribution” of functions, traces and specs. Then, given a new function, they see if its traces match the (probable) spec.
In addition, there were several presentations about automatic debugging, and some of them talked about combining logical (e.g. SAT-based) techniques with ML-based techniques (used for instance to cluster similar failures or to help debug via anomaly detection).
It seems that a common theme of this HVC (and several of my posts lately, come to think of it) is that important-but-uneasy interface between the hard, logical stuff and the softer, probabilistic stuff.
I’d like to thank Shashank Pathak, Bettina Könighofer and Amiram Yehudai for commenting on earlier drafts of this post.