Formal verification of really-complex systems

Summary: This post describes some new attempts at formal verification of complex systems (mainly AVs and ML-based systems)

In general, I think Formal Verification (FV) will only have a limited role in verifying Intelligent Autonomous Systems (IAS) and ML-based systems, because those systems are so complex and heterogenous.

In FV has better PR than CDV I said that if you only looked at academic papers, you could easily be misled into thinking that most “serious” verification is done formally. In reality, while FV can find high-quality bugs that are very hard to find using other techniques, it has lots of limitations, and thus most bugs (including most serious bugs) are found in other ways.

Nevertheless, I am constantly on the lookout for new places where FV might add real value in complex-systems-verification. And lately I ran into three new (to me) candidates. All three are not “typical” formal verification, and all three have fairly severe limitations, but I think they are still worth a look:

Formal verification for ML: Verifying ML-based systems is really hard for various reasons, as I discussed here and here.

So I never considered FV seriously for that job. I still hold that view, but somewhat less strongly, because the following has come to my attention:

Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks discusses how you take a whole Deep Neural Network (DNN), including the weights of the connections, the ReLU nodes (that thing which sets the output to max(0, input)) and all that, and use an SMT solver to prove assertions about what the whole DNN does.

There are lots of restrictions there, but nevertheless I think this is fairly impressive. Let me start with the example they give:

We evaluated our technique on a prototype deep neural network implementation of the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.

Simplifying somewhat, the DNN in question takes 7 inputs, and produces an “advisory”: Everything OK, turn left, turn left hard, turn right or turn right hard. The inputs are: Distance to intruder (other aircraft), relative angle of intruder, our speed, their speed, estimated time to “getting close”, and previous advisory.

Given a specific, already-trained DNN, they can then prove assertions like “If the intruder is sufficiently far away, the advisory will always be ‘everything OK’”, and “Even if the previous advisory was ‘turn right’, if there is a nearby intruder the new advisory will be ‘turn left hard’”.

Technically, they combine a SAT solver, a Simplex solver, and special tricks for the ReLU part – hence the name Reluplex (ReLU with Simplex).

I don’t see how this will generalize to verifying something like a sensor fusion module (input: pixels and point clouds, output: objects, directions etc.): This does not seem to match either conceptually or scaling-wise. However, until I read that article, I assumed even the ACAS example could not be formally verified, so I guess I should keep an eye on these techniques.

Hybrid reachability analysis: This is another non-mainstream formal verification technique, which seems to find bugs in the very multi-dimensional, “analog” space of multi-object movement.

For instance, consider the paper APEX: Autonomous Vehicle Plan Verification and Execution (pdf). The authors (from U of Pennsylvania and elsewhere) bring up a lane-change example, and explain that simulations will probably not find the dangerous cases, because there are so many combinations (relative angles and speeds at start, speed changes by the car ahead etc.).

They then describe their solution (note: “ego vehicle” means the vehicle-under-test):

In APEX, the verification engineer can

  • Specify the low-level dynamics of the vehicle, including the trajectory tracker. …
  • Provide a motion planner that takes in a starting position and end position and returns a trajectory that links the two points. The motion planner can be any piece of software: there are no restrictions on it. …
  • Specify a sequence of goal positions (or waypoints) that the vehicle must visit, or a behavioral planner that computes these waypoints in a reactive manner. …
  • Specify the uncertainty sets for the ego vehicle and the other agents in the scenario.
  • Specify the unsafe conditions to be avoided by the vehicle. APEX supports a rich specification language, Metric Interval Temporal Logic (MITL) for the description of unsafe behaviors

APEX will then verify, in an exhaustive fashion, that the ego vehicle can complete the scenario under the specified uncertainty, or return a specific case where it fails. The engineers can then use this counter-example in order to debug the controllers, and better understand how to avoid this failure at design-time.

This is limited in many ways, but is still pretty impressive. I especially liked the fact that it lets you use your own planner (at the lower level, for each trajectory selected by the planner they call the dReach reachability analysis tool).

Probabilistic Model Checking: Under some circumstances (i.e. when the world can be modeled as a Markov chain), this is a pretty-good way of assessing the probability of problems, as I discussed here.

General comments: Note that the above FV techniques use “unconventional” approaches, so as to deal with the continuous, probabilistic nature of these complex systems.

There are also attempts at using “conventional” formal verification for verifying such systems: For instance, as I described here, Michael Fisher of Liverpool U (and others) suggest that the “top level” of an IAS should be a rational agent written as rules. If this suggestion is indeed followed in practice (not sure about that), then that top level can, perhaps, be formally verified (in the “traditional” sense of the word).

Of course, this just verifies the top level: The full system (which includes e.g. an ML-based lower-level) still needs to be verified somehow. Note that the three techniques mentioned above also verify just some aspects of the whole system.

Finally, while still assume that verifying these complex systems will mostly be done non-formally (e.g. via CDV), I do see a growing role for formal techniques in producing interesting test cases.


I’d like to thank Kerstin Eder, Amiram Yehudai, Yaron Kashai, Gil Amid, Benjamin Maytal and Sandeep Desai for commenting on a previous version of this post.

Also, special thanks to Gil Amid and Alon Amid for pointing me at the Hybrid Reachability Analysis and the FV-for-ML papers.

[Edit 13-Mar-2017: Made the “Oscars bug” part into a separate post]

2 thoughts on “Formal verification of really-complex systems

  1. When hardware/alghorithmics verification does all this cool stuff of FV and CDV and software testing still stuck in manual (pun intended) or GUI level automation, where should systems combining both (plus human element) go?
    FAA requires complete separation of flight and mission control systems, with pilot advisory and pilot behavior link both and become liability.
    Is it possible to separate FV/CDV verified fortresses of safety from much lower quality countryside, without compromises?

    1. You bring up a pretty important question. I personally don’t believe in this “fortresses of safety” approach as a good-enough solution. It _does_ make sense to e.g. define protocols, and then to ruthlessly verify them. But this is, in a sense, “just” module-level verification, and clearly one can have terrible bugs in a system built from fully-verified modules (because the system is more than its parts).

      So, I think one must also do full-system (probably dynamic) verification, which includes running all those soft, human-related scenarios and seeing whether (and how often) problems occur.

Leave a Reply