Summary: This post describes in general terms the problem of “using ML for coverage maximization”, explains why it is important for CDV and for fuzzing, and gives some references.
My first post in the “ML and verification” series talked about verifying ML-based systems (lots more to say about that). This post talks about the other side – using ML to help verification (of anything).
There are several ways in which ML can improve the verification process. Some examples are:
- Failure clustering: A single bug can cause many failures in many test runs. ML can be used to help cluster together the failures such that each cluster (hopefully) corresponds to a single bug.
- Automatic debugging: This slight misnomer refers (among other things) to techniques which help you understand the bug by suggesting what is unique in the runs leading to it. This is a bit like anomaly detection, and ML algorithms can be pretty helpful there.
- Coverage maximization: This is the automatic process of “skewing” runs such that they reach coverage holes.
There are lots of other ideas (many related to the “big data” aspect of large verification projects), but I think coverage maximization may be the most promising, so the rest of this post is devoted to it.
Notes that this post is written as a stand-alone document and does not require any previous verification knowledge: It was originally written to hook ML people into applying their magic for coverage maximization. We’ll see how that goes.
Also, as is often the case in multi-disciplinary areas, terms like CDV and fuzzing do not have a totally-agreed-upon meaning. So whenever possible, I have added links to the terminology page (and to other posts) to serve as a dictionary.
I’ll try to explain below why Coverage Maximization via Machine Learning (let’s call it CMML) would be an excellent thing, if it worked. So this (very informal) post will go through the following:
- What are CDV and fuzzing, and why they are similar
- What is the coverage maximization problem
- Some tentative ideas about how to do CMML
- Some references
[23-Dec-2016: Added the post Verification, coverage and maximization: The big picture, which indeed gives the bigger picture]
What are CDV and fuzzing
CDV (Coverage Driven Verification) is the leading technique for finding bugs in HW designs (though there are other techniques, like formal verification).
The goal of CDV is verification, i.e. finding as many important bugs as possible in the shortest time, while keeping track of where we have been using coverage – usually functional coverage (see below), which is mapped to a verification plan.
In CDV, we construct a Verification Environment (VE) whose job is to control and check the Device Under Test (DUT), which is running in some (usually simulated) execution platform. The DUT and VE together are called the DVE.
Fuzzing is a similar technique used by security people to find vulnerabilities. The goal is to find as many vulnerabilities as possible in the shortest time, while keeping track of where we have been using various measures of coverage.
So, even though there are a lot of differences, fuzzing and CDV have a lot of similarities, and figure 1 can also represent fuzzing.
What is coverage maximization
Coverage maximization refers to the automated process of coverage filling, i.e. of reaching higher and higher coverage. This is also called coverage auto-filling, or Coverage Driven Generation (CDG, not to be confused with CDV). There is also the term Model Based Testing (MBT), which many people use to refer to the kind of coverage maximization which is based on formal model-based techniques.
Note that the process of just executing more and more runs will continue to (asymptotically) improve coverage, and may (perhaps, in a million years) fill all coverage. But coverage maximization is about a smart, automated, much faster way. Coverage maximization usually implies some feedback loop (see the back arrow in figure 1), where you maximize some, then collect current coverage and see what’s missing, then maximize some more.
It is important to understand that “coverage” is used here in the broadest sense, including both bucket-based “normal” coverage, and ever-increasing “pseudo-coverage”. Let me clarify that:
- Normal, bucket-based coverage is the kind of coverage for which it is meaningful to say “I have reached 100%”. This is what most people call coverage, and it comes in two main variants, as explained here:
- Implementation coverage (e.g. code coverage), i.e. which source constructs (e.g. source lines) have been exercised so far
- Functional coverage, i.e. which user-defined DVE-related attributes (and crosses of attributes) have been reached so far
- Pseudo-coverage is the kind of coverage for which “more is better” but does not sum up to 100%. Some example:
- Novelty coverage, which measures how far (using some metric) each new “sample” is from all previous ones
- Criticality coverage, which measures how close we are to some “danger state”. I mentioned some companies maximizing criticality coverage in the last chapter of this post.
- Known bug proximity coverage, which measures how close we are to triggering an already-found bug
Filling coverage “manually” (by writing new tests) is very time-consuming, so a generic coverage maximization system would be very helpful.
Using ML for coverage maximization
Using ML for coverage maximization has been tried before (see references below), but with limited success (often there was too much work to set it up per DVE). My hope is that the new advances in ML, plus some innovative thinking, will finally make this practical.
Here is one possible way to do it:
- Train the system per-DVE (hopefully, small changes to the DVE will need shorter incremental training)
- Train the system by observing randomization decisions (what value was given to which object field) and the resulting events (including coverage events)
- Then have the system control those randomization decisions to “maximize” subsequent coverage
There are lots of ways to model this. My favorite is:
- After training, the ML component is connected to the CDV system and activated on the fly
- It gets as input each “abstract event” and each randomization request (complete with “path”, i.e. which randomization request it is)
- For each randomization request, it returns a set of (id, value) pairs – one per field in the current object-to-be-randomized – which will hopefully maximize the coverage
- During training, it gets the same inputs, but with the randomization request it also gets the (id, value) pairs as randomized by the system
There are other possibilities, but I think this should be pretty good, and should generalize to many CDV and fuzzing systems. Some further comments:
- We are talking here about learning event sequences, and there are ML tools for learning state machines (as in the Usenix reference below)
- However, state machines may not be the right abstraction – the DUT can be an arbitrary Turing-complete machine
- How can we use supervised learning to learn how to reach a state which (almost by definition) never happened before?
- Note, however, that much of coverage maximization is about reaching hard-to-reach places which are near places already reached, or about crossing two states, each of which was already reached
- Also, the ML algorithm does not have to always succeed. It just has to make coverage improve faster than normal constrained-random
- The values generated at each point have to satisfy constraints which also depend on the current state
- Thus, the values suggested by the ML algorithm have to be considered as “soft” suggestions: If they are not legal at the current point, they will be overridden by the “normal” randomization
Here are some CDV-related coverage maximization efforts:
- The IBM HRL CDG effort (lots of other papers too)
- Kerstin Eder’s review of ML for coverage maximization – comprehensive, from 2012
- (non-ML) Kerstin Eder et al. have several publications about CDV-based robotic verification, with maximization using formal / MBT techniques. For instance, here and here.
Here are some fuzzing-related maximization efforts:
- A Usenix security paper about an ML system creating a state machine describing the observed system behavior, and using it to detect subsequent “abnormal” behavior and aim for it. This could be considered maximizing “Novelty coverage”.
- (non-ML): AFL uses its own fast discovery algorithms to increase (its own definition of) coverage. See in my HVC’15 report.
- (non-ML) Sage uses symbolic execution (also called concolic execution) to increase coverage. See also my HVC’15 report.
I really hope some practical, general-purpose CMML tools will spring up. If you know of any, or interested in implementing one, feel free to drop me a line.