One-shot imitation learning and verification

Summary: This post will talk about “One-shot imitation learning” (a new and exciting direction in Machine Learning), and how that direction could help coverage maximization (which is important for verification). It will then speculate about the general role of ML in Intelligent Autonomous Systems verification.

Note: You may have heard already about one-shot imitation learning (described briefly below), but “how does this relate to verification” was probably not your first thought. However, followers of this blog are probably not surprised: By now they are expecting “X and verification” for every X (coming soon: “One-track-minds and verification”).

One-shot imitation learning

This was fairly big news in ML lately: People at OpenAI were able to create a system which, after being given a single demonstration of a task, was then able to repeat that task even with different initial conditions. Specifically, this was done for the “task family” of building towers from blocks: The demonstration showed how to pick blocks from a tray and build some configuration of towers, and then the (robotic) system was able to build the same configuration of towers, regardless of the initial arrangement of the blocks in the tray. Here is their blog post and paper.

But wait (you say) – they must have done a lot of training to just get the system to that “single demonstration” phase, right? Right. But just how they trained it is really interesting. Here is the simple version (I’ll explain later where I lied):

Consider good old supervised learning: You train it with many input-output pairs (e.g. the input is an image which may or may not contain a cat, and the output is the corresponding cat/no-cat label), and eventually it gets good at producing the right cat/no-cat output for any image.

The block-stacking system was (more-or-less) trained in a similar way, except that the input was “The demonstration + the image of the current state of the tray”, and the output was “the action to do next”.

Specifically, they trained the system using 140 tower-building tasks, each with about 1000 different “trajectories”, but they chopped the trajectories such that the system learned to answer the question “If I was given demonstration X, and the current tray arrangement (including the blocks I already stacked) is Y, what is the next action Z that I should perform?”. During “deployment”, the robotic system asked that question after every action (e.g. moving the hand), and behaved according to the answer. Note that this works even if the robotic system makes some small mistakes (e.g. sometimes drops a block), because the next answer will take into account the new (unplanned-for) image of the tray.

Note: If you read the full paper, you’ll realized that I over-simplified / lied:

  • They actually trained three separate, interacting networks
  • They used techniques like LSTM (because demonstrations are long and can vary in size)
  • They used “soft attention” (to help the system learn what’s important at the current stage)
  • They were able to do simulation-only training of the vision system – this was described in a previous paper about domain randomization

But let’s ignore those fiddly details for now, and just call all this “advanced supervised learning”.

Why do I care

This whole thing made me think about the question “how far can you push supervised learning”, and specifically “what kinds of input-to-output mappings are possible”.

Consider how this is different from Reinforcement Learning, or even Inverse Reinforcement Learning (which I described at the end of this post, and which is also a kind of “learn by demonstration”). In RL/IRL, we train a system on one task, so that it can map from observation (e.g. image) to action. In one-shot imitation learning, we train a system on a (hopefully large) task family, so it can map from (task demonstration + observation) to action. So we have gone somewhat “meta”.

One of the keys to the success of one-shot reinforcement learning is soft attention: The system learns “what to pay attention to” in the current observation. For instance, in the above paper (where an observation is the image from the robot’s camera), we can sort-of see in fig. 13a that the system learned to mainly look at the previous (already-placed) block and the current (to-be-placed) block.

Now consider one of my favorite (mostly-unsolved) problems – Coverage Maximization via Machine Learning (CMML), which I described here (the bigger picture of “why coverage and maximization” is here). Note that we are talking dynamic verification here, not formal stuff.

CMML means using ML for automatically maximizing (filling) coverage in some Design Under Test (DUT). Ideally, you would first perform many non-CMML random runs of your verification environment + DUT, and then see what “coverage points” (events, field-values-sampled-on-events etc.) are still missing / rare, and let CMML hunt for them.

CMML is really hard to do in a general way (see the section “Consider yourself warned” in that bigger-picture post). One issue is that you want CMML to discover good solutions by itself (as in RL), but you also want to learn from those initial, non-CMML runs (because maximization will usually be invoked after engineers already worked hard to bring the system to interesting places).

One-shot imitation learning by itself cannot not do CMML, but perhaps we can build CMML by extending some of the ideas behind it. Consider the following (somewhat strained) analogy:

  • The “task family” here is “reaching all the various coverage points in this specific DUT”
  • We train the system on the dynamics of a DUT using many traces of actual runs of that DUT
  • We would like the trained system to map from (desired coverage point + current state) to “next action to perform” (the action may be just “determine the value for the next randomized field”)

We have many traces leading to various coverage points, so we can use them for training. Note that, unlike in the block-stacking example, we usually do not know ahead of time which coverage points will be hit by which trace. But we do know it after the run, and we can always pretend this is what we intended in the first place. Hopefully, during training the system will also use attention mechanisms to learn “what’s important” (e.g. “getting to packet-buffer-overflow is mainly influenced by the rate of incoming packets and their sizes”).

This does not look like an easy project, and many questions remain:

  • How general can a “task family” be? Clearly, CMML is no good if it can only reach previously-reached coverage points. Perhaps (at least initially) it will only be able to generalize over reaching similar/related coverage points
  • Can we come up with a general architecture which can be trained on any DUT? A related question is: What if you took the architecture in the paper, which was able to learn about stacking-blocks-into-towers, and trained it instead on drawing-little-pictures-with-blocks? Will it work? What will it take to make it work?
  • How do we add the other needed features? g. we’d like the system to tell us that some coverage points cannot easily be reached from the current observation.

So this won’t happen tomorrow. Nevertheless, one-shot imitation learning made me more bullish on CMML.

Which brings me to:

How important will ML be in autonomous systems verification?

 The more I look into the verification of AVs and other Intelligent Autonomous Systems (and the more I hear about new ML techniques), the more I see opportunities for using ML in IAS verification. Perhaps ML-based techniques will one day be as essential for advanced IAS verification as constraint solvers are today for HW verification.

This is not a sure thing. Let me first acknowledge that constrained-random Coverage Driven Verification (CDV) is not all that common yet in IAS verification (though it is starting). Still, I can see a path where it grows hand-in-hand with ML (unlike what happened in HW verification).

Here is what I mean: In HW verification, CDV got invented first, and only lately people started thinking about improving coverage by slapping CMML on top of that. The basic mechanism for “executing” the constrained-random stuff had nothing to do with ML: It was either manually-written “sequences” (as in UVM), or some sort of “planning” (as in some newer, use-case-based tools).

But IAS verification may play out differently, and ML may get into the picture much earlier, because:

  • Much of the behavior is not completely predictable (due to the huge complexity of HW, SW, physics, human behavior, the environment etc.), so you need a mechanism which “executes” your constrained-randomize abstract scenarios using some sort of probabilistic planning, and ML techniques excel in that
  • Deep Neural Networks are based on “differentiable” computations, so it is much easier to “nudge” them in any direction you want. More on this in a subsequent post
  • Lower-level fine movement control (“accelerate this car such that it almost, but not quite, bumps into our AV”) is something ML should be good at
  • Even monitoring (for checking and coverage collection) may need some ML. For instance, in a test-track setting, you probably need an ML-based sensor fusion module to tell the verification environment what happened (based on test track cameras/sensors)
  • Finally, AV/IAS engineers are already immersed in ML, so they are liable to grab that hammer

Thus, I can (vaguely, hesitantly) see a future where “execution/planning” and “ML-based maximization”, so clearly separate in HW verification, will merge into one in IAS verification.

This does not mean that IAS verification can skip CDV and somehow “just do it with ML”: Regulatory bodies (and common sense) will probably demand that verifiers present what they did as an organized list of “scenarios” at various levels, each with its agreed-upon, extensive set of filled coverage points and executed checks.

Doing this at scale calls for constrained-random CDV. If you want to dig deeper, see here (about using coverage) here (about probabilistic checking) and here (about why even verifying ML-based systems needs CDV).

All I am suggesting here is that ML may end up being an increasing-important tool in the CDV toolkit – perhaps viewed as the next logical step, complementing constrained-random generation. And if that happens, we’ll start seeing people adapting ML techniques to CDV’s special needs. For instance, we needed to “fix” constraint / SAT solvers so that they’ll give us many random answers rather than just the “best” answer: We’ll probably need to “fix” ML planners in the same way.


 I’d like to thank Yael Feldman, Shlomi Uziel, Sandeep Desai, Kerstin Eder and Gil Amid for commenting on earlier drafts of this post.

Unrelated to all this, I plan to attend the Stuttgart autonomous vehicles test & development symposium on 20..22 of June (see my posts about the 2015 and 2016 installments of that event). Should be interesting – there is even somebody (from Five AI) talking about constrained-random verification of AVs. I promise to post a summary, and if you are there and want to chat, drop me a line to yoav.hollander at

2 thoughts on “One-shot imitation learning and verification

  1. Hi Yoav, have you seen DeepXplore?
    It basically does what you call CDV by defining neuron coverage, i.e. that neurons need to be activated above a certain threshold. It also leverages what you describe in your text as ” “differentiable” computations”. Instead of optimizing weights as in training, it uses the same mechanism to find more interesting inputs.
    It is based on differential testing, i.e. does not have some test oracle, but rather uses different versions of the same network to find inconsistencies in the decision boundaries. The results are quite interesting and include the end2end network for autonomous driving by Nvidia.

Leave a Reply