Misc stuff: The verification gap, ML training and more

This post covers recent updates in machine learning, autonomous systems and verification. It has four sections:

  • Automation / ML keep accelerating, but verification of automation / ML seems to lag behind
  • HVC is coming, and I plan to attend (and even present)
  • The idea of training an ML-based system using synthetic inputs (which I like) has just been tried by some folks at U of Michigan
  • Inverse Reinforcement Learning only seems to imply that we can’t do “normal” verification

The speed of it all

 I know it is a cliché to say that everything is speeding up: You could argue that this has been going on since life started (3.8 million years ago next Wednesday) – just plot it on a logarithmic scale and you’ll see.

Still, the last few weeks somehow made me realize that all those things that I am interested in verifying (AVs, autonomous systems in general, ML etc.) are moving faster than even most believers of that cliché assumed just a year ago. Here are some examples:

  • The US seems to be officially behind AVs as a mission. Almost all car companies (and many others) now have concrete AV plans, and some are actually deploying experimentally
  • President Obama just gave a thoughtful interview (with MIT Media Lab’s Joi Ito) about AI, ML, the future of employment and all that
  • When the topic of the dangers of AGI come up (see this post), people often still say “oh, it is still X years away”, but the X keeps shrinking. I hear 20-to-30-years more and more, and that’s pretty soon.
  • The rate of inventions in ML seems to be accelerating. For instance, an article about “Deep Symbolic Reinforcement Learning” just came out (long summary here), which may help bring together the (previously very separate) symbolic-AI and ML fields.

The above (somewhat arbitrary) snapshot just reminded me that those technologies are speeding up, but the verification side of things is not speeding up at the same rate (though everybody says it is important). This should change, and I assume it will.

HVC is coming

 On a related note, the Haifa Verification Conference is happening on 15..17 of November (in Haifa, Israel). It is a pretty decent conference devoted to all-things-verification (I blogged about last year’s conference here).

I plan to be there on the first two days. Towards the end of the first day, I’ll give a presentation about the current state of autonomous systems verification (including verification of ML-based systems). I consider the conference to be slightly too formal-heavy, so to balance that I plan to provide mainly light entertainment, gossip and sweeping generalizations about the current state of this increasingly-important field.

I promise to blog about my impressions. And if you are there by any chance, I’d love to chat.

Using a verification environment to train an ML-based system

I mentioned this idea before, e.g. here:

This (somewhat-oddball) direction essentially says: If we already have a verification environment capable of producing rare corner cases, why not use it to actually train the ML-based system (since otherwise rare events will not appear in enough training data).

This assumes a CDV-based VE, capable of creating many constrained-random variants of diverse scenarios, as I described e.g. here. Such an environment can produce an essentially-infinite training set.

There are some potential problems with this idea, though. As I said here:

It is pretty hard to create realistic, completely-synthetic inputs. This is indeed a big problem: For instance, creating a synthetic, believable LIDAR stream with the matching video stream is pretty hard. So (at least currently) you may need to use pre-recorded streams, and e.g. superimpose people-and-animals-jumping-in. … This is what I think Google is doing. They talk here about doing “three million miles of testing in our simulators every single day”, which (assuming 1:1 simulation speed) translates to a few thousand cores constantly running simulations.

That was said in the context of building a VE for verification, but the same problem (let’s call it “synthetic-multi-streams”) obviously applies to training as well.

Another problem, unique for training-via-synthetic-inputs, is that this may cause overfitting to some artifacts of the scenario generation algorithm or the display engine. For instance, suppose we use this train-via-VE technique just to train the system on extreme and dangerous cases. If the display engine makes the sky too uniformly blue, the system could learn to be extra-careful only when the sky looks like that.

Finally, using the exact same VE for both training and verifying a system may carry some risks

Still, I think this is a pretty good direction, and the problems will eventually be solved. So I was happy to see a new article by some folks from the U of Michigan talking about a variant of this: They describe training a vehicle-detection ML-system (which maps from video inputs to vehicle-bounding-boxes) using synthetic inputs created by capturing scenes from runs of the Grand Theft Auto computer game.

This is not the full thing I want (e.g. it is not a reuse of a verification environment, they don’t seem to have the ability to control everything, and they explicitly ignore non-visual stuff). Nevertheless, the results seem pretty encouraging.

In general, I am fairly bullish on this idea (and hope to see more research directed at it). Here’s why:

  • As the article says, this gives you a potentially-infinite, very diverse set of inputs to train on. In the context of vehicle detection (assuming a good VE) this would give you cars-in-snow, cars-in-India, cars-on-hilly-road, cars-in-all-directions and so on (and any combination of the above).
  • You can create rare, potentially-dangerous situations, which may be completely absent from the non-synthetic training set. Note that it is probably a good idea to train your system on both non-synthetic and synthetic inputs.
  • The stimulus-generation part of the VE will supply inputs for training. The checker part of the VE may supply “answers” (for supervised learning) or a reward signal (for Reinforcement Learning, see next chapter). The coverage part will help us track that we went through all needed scenarios and their parameters.
  • As CDV-based VEs for autonomous systems improve (as I assume they must), VE-based-training capabilities will also automatically improve. For instance, when that thorny synthetic-multi-streams problem is eventually solved for verification, it can be used for training as well.
  • In the meantime, one can carefully work around some of the problems. For instance, to solve the synthetic-multi-stream problem, one can use a mix of recording and synthetic inputs (as many are doing now), perhaps enhanced with the ability to choose and stitch together recorded segments in interesting ways.
  • Note that the VE in question does not have to be directed specifically at an ML-component: It could be a full-AV VE, and yet you can still use it for training an ML component within the VE.

But perhaps the biggest plus of using VE-based-training is that it lets you impose some structure and modularity on an ML-based system which otherwise lacks those qualities.

For instance, there may be no way to separate out (in the ML-based system) the “weather aspect”, i.e. the handling of different weather conditions. But in the VE you absolutely can do that: have a “weather aspect” module, and as you do the training, methodically take all scenarios and add weather conditions to them, making sure the system gets trained on them all.

This goes for many other aspects, such as the “location aspect”: Suppose your ML-based AV should behave differently in different countries. With VE-based training, the team responsible for “driving in India” can modularize the India-related part of the training (driving on the left, much more chaotic traffic, …) into a separate, inspectable module of the VE. Then, they can make sure to do “enough training” with this aspect super-imposed on the “normal training scenarios”. This, of course, assumes your VE is good at aspect-merging.

[Added 17-Oct-2016: A new paper from DeepMind talks about new ways to apply simulation-based training to the real world, and mentions another paper which says that training on synthetic data often fails because of “the discrepancy in visual cues between simulation and reality”]

Inverse Reinforcement Learning and verification

In a previous post I said ML-based systems should (mostly) be verified using “traditional” rule-based techniques (like CDV):

While I will be talking at length about using-ML-to-verify-ML, I urge you again, oh ML practitioners and researchers, to not be guided by the beauty of your weapons (as Leonard Cohen used to say). For serious, safety-critical systems, one needs serious, systematic verification.

You would have thought that me asserting that (on the Internet!) would be enough to end all discussion ;-). Well, no. Some people did question that assertion, and that got me thinking again: Can we really expect rule-based verification to work for everything?

To understand the limits of rule-based verification, consider Inverse Reinforcement Learning:

Reinforcement Learning (RL) is a fairly-popular ML technique: Unlike supervised learning, in RL (as you probably know better than me) we don’t train the system by giving it examples with the “right” answers. Rather, we give it a “reward signal” from time to time (e.g. when we teach it to play Go, we give it a +1 reward when it wins a game and a -1 reward when it loses).

We then let the algorithm derive the “value function” for intermediate actions by itself. For instance, if experience shows that doing action X when in state Y has a high chance of getting the system to the final “win the game” state, then the algorithm will assign a high value to doing X in state Y. And so on. Once the value function is established, it is a simple matter to choose, in any state, the action with the highest value function.

But what if the problem is so complex that we can’t even specify the right reward signal? Well, one idea is to use “Inverse Reinforcement Learning” (IRL). In IRL, the algorithm tracks, say, a (human) expert solving the same problems, and tries to derive the value function the expert is using when making decisions.

Deriving a value function from actions is of course tricky (“did the expert exit through the green door because green doors are better or because the green door is closer to the garage and getting to the garage is better?”) but is apparently doable.

As problems get more complex, IRL (or something like it) will probably keep getting more important: To take an extreme example, I have talked before about Artificial General Intelligence (AGI) and its dangers. One possible, partial way to deal with that danger is for the AGI to keep asking people about what to do in novel / unusual cases, which sounds a bit like IRL.

OK, back to verification: IRL-based systems are probably really hard to verify in a rule-based way: There is no spec, no rules, not even a user-defined reward signal: The system simply has to “do its best to find what the human expert would have done, given enough time and information”.

But (putting aside the extreme, full-AGI case) I think regulatory bodies, the public (and common sense) will still demand rule-based verification: Just saying “Oh, I verified the learning algorithm, and it works fine” will not cut it. People will want to know what scenarios / parameters were executed during verification, and what requirements have been tracked. Even if some / most things are decided using demonstration-only, the system should still follow some basic requirements. And yes, checking may be hard (see my post about probabilistic checking), but people will still want to understand it.

So, at least for the foreseeable future, even for IRL-base systems we still need something like CDV for verification.

[30-Nov-2016: Changed RRL to the (much more common) IRL]

Notes

I’d like to thank Ziv Binyamini, Amiram Yehudai and Sandeep Desai for commenting on previous versions of this post.


2 thoughts on “Misc stuff: The verification gap, ML training and more

  1. Hi Yoav,
    I am sure we can continue chat on HVC site ( I plan to be there ). However – are you suggesting ( or have you considered ) using gaming s/w as stimuli generators ? It seems gaming engines have enough randomness and enough scenarios options to cover a wide set of input and extreme conditions. ) It may be that the gaming engine may be help to support/help checking the output – based on your score of the game. How deeply have you looked at this ?

    1. Hi Gil

      Did not look very deeply, but I do consider game engines to be a promising direction (assuming you look mainly at visual sensors). My current favorite is Unity [1]: It is portable and fairly advanced.

      I don’t know enough about available hooks to assess how hard it will be to add monitoring / coverage / checking.

      Because there is so much money and excitement in game design, there are other things we can learn from that field, e.g. from game design languages. I talked about that a bit in [2].

      See you at HVC.

      [1] https://unity3d.com/
      [2] https://blog.foretellix.com/2015/10/16/misc-stuff-hvc-game-design-languages-and-more/

Leave a Reply