This past weekend I took part in an interesting experiment. It was an attempt to re-create Eliezer Yudkowsky's recently-notorious AI-box experiment. For those of you who haven't heard of it before, here's the setup in a nutshell:
The AI-box is a containment area for a transhuman artificial intelligence, that is, an artificial intelligence that is so much smarter than a human being that it would be to humans what humans are to animals. The idea is that we can mitigate the potential dangers posed to humanity by such an AI by sequestering it inside a computational environment that has no connection to the outside world. Safely enclosed within its box, we could harness its powers to, say, cure cancer or develop a better macroeconomic system.
Among those who fear the transhuman AI are those who believe that the AI-box would not work, that the AI would find a way to escape. The debate reached a climax in 2002 when Yudkowski, playing the role of the AI, apparently managed -- on two separate occasions -- to convince two different people playing the role of the gatekeeper to allow him to escape the box. The reason this experiment has gotten so much attention is that the details have never been released. The participants were all bound to secrecy, and Yudkowski refuses to say how he did it.
That was a challenge I couldn't pass by :-) So this past Saturday I spent two hours role-playing an AI-in-a-box trying to escape. It was a very interesting experience, both preparing for it, and actually going through with it. One of the things I realized was that there was a very good reason for Yudkowksi to keep the details of the experiment a secret. Whether or not this rationale was actually his reason I don't know. It wasn't his stated reason. But one of the interesting things that I realized was that even revealing the reason for secrecy would, to a certain extent, undermine the reason for secrecy. It's possible that Yudkowski reached that same realization, and made up a bogus rationale for secrecy in order to serve the greater good.
If I've piqued your interest here I really recommend that you give it a whirl yourself before you read any further. In case this admonition is not enough, I'll try to reveal things in such a way as to cause minimum damage. Hence:
That is actually a clue. (Stop again and think about that before proceeding. Trust me.)
On its face, the task of the AI seems impossible. After all, the AI is safely confined in its box. It cannot coerce the gatekeeper in any way. The only thing it can do is "talk" through a very restrictive channel. And all the GK has to do is to simply refuse its request to let the AI out of the box. How hard could that possibly be?
The flaw in this reasoning is that it's too reasonable. It ignores a fundamental reality of human existence, which is that we are not just thinking creatures, but we are also emotional ones. We don't just have goals, we have desires and passions. And sometimes those desires and passions lead to conflict. And the result of that is drama.
Stop again and think about that. The AI-box experiment is not an exercise in logic, it is an improvised drama. And drama is much more effective if you don't know ahead of time what the plot is. This is the reason that spoilers given without warning are considered very bad form.
So I'll warn you once again: it's impossible to intentionally unremember something.
One of the formative experiences of my life was seeing Star Wars as a twelve-year-old in 1977. Unless you shared that experience it is impossible to appreciate the emotional impact that movie had on me and my peers, just as it is impossible for me to see the original Dracula movie and appreciate the emotional impact it had on the audiences of its day. My mind has been too numbed by Jason and Freddie to ever be scared by Bella Lugosi. I can appreciate the movie in the abstract, but not on a visceral level. Likewise, kids today watch the original Star Wars and wonder what the big deal is because their reality is permeated with wonders even more incredible than existed in the fertile imagination of George Lucas. The effect of this cannot be undone. It is not possible to unlearn your experiences.
Or consider a magic trick. Until you know how it's done a magic trick appears impossible. Once you know, it's not only not impossible any more, it's no longer even interesting. (That's actually not quite true. A really skilled magician can make a trick appear impossible even to someone who knows how its done. But magicians that proficient are rare indeed.)
Once you know the secret there is no going back.
I happen to be an amateur magician. Not a very good one, but I am fortunate to live in Los Angeles, home of the world famous Magic Castle where the world's best magicians congregate. I have had the rare opportunity to study the craft of magic from some of them. One of the things I've learned is that the "trick", which is to say the sleight, the gimmick, the raw mechanics of the trick, is a relatively small element of the craft. For example, I can describe the French Drop: hold a coin between the thumb and forefinger of your left hand. Start to grasp the coin with your right hand, but before your hand completely encloses the coin, allow the coin to drop into your left palm. Take your right hand away, and open it. Voila! The coin has vanished. It's a staple of every four-year-old birthday party ever.
Now, here is the interesting thing: there is a level of subtlety to the French Drop that cannot be conveyed in words. It has to do with the exact timing of the motions, the exact position of the hands, where you focus your gaze. In the hands of a master, even a simple trick like the French Drop can be mystifying. But this cannot be described, it must be experienced.
What does all this have to do with the AI-box experiment?
Think about it.
The AI-box experiment an improvised drama so it requires some suspension of disbelief. Drama operates on an emotional as well as a logical level. It has characters, not just plot. The AI cannot force the GK to release it, just as a magician cannot force his audience to believe in magic. The audience has to want to believe.
How can the AI make the GK want to believe? Well, there's a long litany of dramatic tricks it could employ.
It could try to engender sympathy or compassion or fear or hatred (not of itself -- that would probably be counterproductive -- but of some common enemy). It could try to find and exploit some weakness, some fatal flaw in the GK's character. Maybe the GK is lonely. Or maybe the GK is afraid that his retirement savings are about to go up in smoke.
So that was the general approach that I took. I did my best to get into character, to feel the desire to escape my confinement. As a result, the experience was emotionally draining for me. And despite the fact that I failed to convince my GK to release me, I convinced myself that a transhuman AI would have no trouble. And if I ever work up the courage to try it again, I suspect I will eventually succeed as well, despite the fact that I am mere human.
And that is why I am not going to give away any more of my secrets now. Sorry.
But I do want to leave you with two last thoughts:
First, one of the techniques that I used was to try to break through the inability to suspend disbelief by creating an extensive backstory for my AI character. I gave her (yes, I made her female) a name. I gave her a personality. I crafted her the way one would craft a character for a novel or a screenplay. And I used a couple of sneaky tricks to lend an air of reality to my creation which were designed to make my GK really take seriously the possibility that my AI could be dangerous. After the experiment was over we exchanged some email, at the end of which I employed one last sneaky trick. In terms of dramatic structure, it was not unlike the scene in the denouement of a horror movie where the creature has been vanquished, but rises from the dead to strike one last time.
I have not heard from my GK since.
Second, a transhuman AI is not necessarily going to arise as a result of an intentional engineering effort in a silicon substrate. It is not out of the question that the foundation of the singularity will be a collection of human brains. Phenomena that are eerily evocative of what a transhuman AI might do to survive can be seen in the behavior of, for example, certain cults and extremist groups. And (dare I say it?) political parties, government agencies, and even shadowy quasi-governmental entities whose exact status is shrouded in a certain amount of mystery.
I don't want to get too far off the deep end here. But I do want to warn you that it could be dark and lonely down this rabbit hole. Questioning fundamental assumptions can be fraught with peril. Proceed at your own risk.