Verifying friendly AI: our final, most important verification project

[Added 7-July-2026: This post was written in July 2015, for verification engineers who had mostly never heard of the alignment problem – hence the introductory tone, and the period vocabulary (“Friendly AI” is what alignment was called back then). The part most relevant to my current work is the last chapter, where I argued that verification people should stick their nose into this, but that I personally would contribute only indirectly (via V&V of robots and AVs). A decade of autonomy verification later, I’m trying to contribute directly: See my series “Alignment as a verification problem” (starting with When is misalignment just a bug?), the earlier Coverage-driven alignment post, and The V&V method for the overall approach (and how it relates to other approaches like formal verification).]

OK folks – this is somewhat outside the kind of big-system verification that Foretellix (and this blog) are about, so I am not making it an “official” post. However, it is somewhat related (as we’ll see), and is consistent with the general spirit of this blog [1].

So here we go: If anybody here was feeling that the verification range in this post was not wide enough [2], here is an even bigger challenge. In this post I’ll discuss:

What is this “Friendly-Artificial-Intelligence (FAI) design and verification challenge” and why it could get pretty grim for humanity if we don’t solve it [3]
What I plan to do about it (answer: not much – I don’t have the required skill-set)
Why doing CDV-style big-system verification will, nevertheless, contribute something towards avoiding that gloom-and-doom scenario.

So what’s this problem about AI and why should I care?

This topic has been in the news lately (because Gates, Musk, Hawking and others were talking about it). If you already know all about it, just skip to the next chapter, which talks about its verification implications.

Still here? OK. The problem is not that AI will take all our jobs, nor any of the other social or moral issues associated with AI. We are talking here about the very specific problem that it is very hard to cross the threshold to AGI (Artificial General Intelligence, roughly human-level AI and beyond) without us all ending up dead or in some other very nasty state. And while nobody knows if / when it will happen, most AI experts think it will happen in 10 to 60 years.

Other people have done an excellent job of explaining this issue. Here are some pointers:

Perhaps start with Stuart Russell’s gentle call to arms. Russell is of course one of the fathers of AI, but this two-hour video is almost jargon-free, and captivating. He is essentially taking a hard, unflinching look at the question “what if we succeed with AI”.

Other good places to start are the Wikipedia entry on FAI, this very readable description from Wait But Why, and Bostrom’s book.

Lots of people would just like this problem to go away. Here are some typical questions they ask, and my short, simplified answers (for longer, better answers see the sources above):

This is never going to happen, right? Look at all these pitiful AI projects.
- Nope, these AI projects are actually getting better and better lately.
But this is really, really far away, right?
- Probably in 10 to 60 years.
It’s too late to do anything, right?
- Nope, we still have 10 to 60 years.
Can’t we just put the AI in a hermetic box, or add an “off” switch?
- Nope, it will be too smart to be hampered by that.[4]
But machines will never be really like us: they’ll never have a soul / conciousness, right?
- Perhaps, but it does not matter. What matters is general engineering ability.
What’s this jump from AGI to terminator-like behavior? Why would it want to harm us? I did not even like these movies.
- It would probably not want to hurt you. The problem is the unintended consequences of it doing what you request of it. Example: “Eliminate cancer. Wait, obviously not by killing everybody. Wait, not by putting everybody in suspended animation either. Wait, …”.
If it misbehave we’ll just fix it, right?
- Nope, beyond a certain point it will be much more capable than us and resistant to change. The issue is that the best way to accelerate intelligence is to engineer an AGI which will engineer a more capable version of itself, and so on. So this is how it will probably be built. So once we cross that threshold, things will move very quickly.
Can’t we just stop this mad research?
- Nope, because AI confers huge advantages to companies / armies possessing it
But some last-minute solution will come up, right?
- Not unless we find a way to engineer and verify it. This ain’t Hollywood,

Verifying friendly AI, and what it has to do with this blog

Ok, now that I have done a good job making you all uneasy, let’s talk about what can be done.

The solution is probably the creation of a Friendly AI (FAI), i.e. an AI which somehow does what we would have wanted it to do. There are some organizations (probably too few) devoted to thinking about that, e.g. MIRI.

Designing an FAI is a huge challenge. Verifying it is also a huge challenge, made even bigger by the fact that we’ll be verifying the design of a machine, FAI(1), which will then create FAI(2) and so on. Verifying that some property P holds in any future FAI(n) is, well, non-trivial.

How does one even approach that verification challenge? I don’t know, but clearly this is something we should think about: Here is the world’s most important engineering project. It has lots of beneficial outcomes, but it may also have terrible outcomes due to unexpected consequences (i.e. bugs). Surely this is something that we verification people should stick our nose into. Perhaps we have a moral imperative to do that.

Eliezer Yudkowsy, one of the guiding lights on the topic (and the guy who coined the term FAI) suggests in Artificial Intelligence as a Positive and Negative Factor in Global Risk that one should use formal verification for that purpose. He also implies that this is how people verify complex chips.

He is obviously wrong on that last point: Most verification / bug-finding of hardware is done via CDV, not formal verification (where in “hardware” here I lump together block, module, subsystem and chip verification). Nevertheless, I assume he is right in his original point – that FAI should mainly be verified in some formal / mathematical way.

I personally don’t have the mathematical background to really help there (in fact, I find it hard to even read some of the papers on MIRI’s research page). So on the whole, I don’t plan to do much specifically about FAI.

However, I think that the kind of CDV-style verification of e.g. AVs and robots which is the main topic of this blog will have some early contribution towards verifying FAI.

Here is why: Look, for instance, at the Russell video I mentioned above. At around the 1:03 mark he mentions that some of his reasons for optimism (that we shall be able to solve this FAI problem) is that long before we encounter it, some of the same problems will need to be solved in areas of like robotics.

For instance, work on the value system of (much dumber) domestic robots, i.e. what they should do upon conflict between two bad things one of which they must do, and how to even come up with the idea of that conflict, will help future FAI research. So working “merely” on the kind of socio-technical systems that this blog is about will also have (some) contribution towards that FAI goal.

Notes

I’d like to thank Ziv Binyamini, Yaron Kashai and Kerstin Eder for reviewing an earlier version of this post.

[1] “Entertain them or scare them, but do something”.

[2] That was a joke.

[3] This is not a joke.

[4] Even The Economist, my favorite newspaper, got this one wrong. In their 9-May-2015 issue they had a review of AI which said “Because system designers cannot foresee every set of circumstances, there must also be an off-switch”, ignoring the fact that Bostrom (which they do mention) has a whole section in his book about why an off-switch is probably impossible.

	When is misalignment… on Verifying friendly AI: our fin…
	Coverage-driven alig… on It’s the spec bugs that kill y…
	https://otomotif71.w… on Stuttgart impressions: Scenari…
	Daan van der Keur on About “The coming AI hackers”…
	Mariah Jackson on M-SDL, the autonomous vehicles…
	sakhokhar on Machine Learning for Coverage…
	hongseoklee on How to write AV scenarios (and…
	Erik Panu on GPT-3 and verification
	Yoav Hollander on Autonomy markets and their pot…

The Foretellix CTO Blog – AI safety

Now focusing on AI safety (autonomy-related posts go to the company blog)