Safely Containing AI using “Pandoran Spheres”

To say that the rise of Artificial Intelligence and the Singularity poses a hard problem for humanity is an understatement, as anyone who has been following my posts for the last year and a half will know.   The utility of AI is undeniable, the temptation, almost irresistible. But the danger to humanity is great, perhaps greater than all of the challenges that humanity has every faced, combined.

The dilemma of human weakness in the face of danger and temptation is an ancient story.  The myth of Pandora tells us that our intelligence and curiosity can get the better of us, with dreadful consequences.  But is it possible that this myth may actually hold the key to the solution of the problem of AI?

What if there had been a window on Pandora’s box, so that she could look into the box without opening the lid?  Her curiosity satisfied, the evil would have been contained.  That is our clue.

If you read my post, “The Last Invention”, you know that I have proposed the use of Simulation and Simulacra as a way of solving a myriad of computational problems in a non-deterministic manner; the simulated world is populated with intelligent actors, and the actors provide a social laboratory for producing solutions to problems not unlike those we face.

This might seem like science fiction, but if you have been following the progress of these technologies of late you will know that it is anything but.  However, it is not necessary to wait for a fully realistic environment populated with fully intelligent actors to begin this line of research.

Perhaps the most important aspect of the type of simulated environments I’ve describe is that they act as a container for the Artificial Intelligence within it.  A “box” if you will.  If we put a “window” on the box, we can observe what goes on inside it without opening it.  If the window is a “one-way mirror” as it were, we can observe what goes on inside without disturbing the environment.  If we can change the environment in a natural way that is undetectable by the actors, we can influence their behavior without violating their volition.  We can constrain the environment and the range of behaviors and options of the actors to a narrow band of useful actions, like driving a vehicle.  And if we can stream data out of the observation portal about what the simulacra are doing inside, we can create an interface that we can use to do useful work in our world.

We can call such a system a “Pandoran Sphere”, a small, self-contained, and not entirely realistic world in which actors act, and by providing them with controls, even vehicles, we can get them to do useful work completely safe manner.  We can use these Pandoran Spheres to intelligently control real-world devices without hosting the AI in the real world, or giving them complete control over any part of it.  If ever we don’t like what they are doing, we can reboot it, or cut off the interface.

Pandoran Spheres are useful as a model of AI-Human interfaces.  The AI is fully controlled in an environment from which it can never escape.  Indeed, the AI does not even know of the existence of a world outside of its own container.  As a control interface, the AI is given an environment which exactly emulates our real world environment, but it is a projection, into its container.

The model of the Open AI Universe project is a big leap in this direction.  Open AI is a learning system, which has been applied to limited simulations, like car racing games and many hundreds of other constrained environments.  It works by sampling the state of the environments as the AI system learns to control itself and its world.

Creating a system for running simulated learning environments is not the same as having an architectural framework for understanding and safely governing them.  The difference is in fact enormous, as enormous as the difference between Turning computers and Asimov’s Three Laws of Robotics.  One is an ungoverned system, the other is a system of governance.

Implementing the Three Laws of Robotics in every intelligent system is perhaps a hundred years away, and by that time, AI will almost be completely ungovernable.  What is needed now is a conceptual and architectural model already familiar to software designers that can inform the development of AI systems.  Pandoran Spheres, with their goal-oriented quasi-determinism and streaming interfaces can potentially provide such a model.

An AI actor in a Pandoran Sphere, given the task of driving a car, would perceive a projection of our real world environment, in which our real world vehicle exists, but it would only be aware of the projection in its container.  It would think that what it saw was the only reality.  It would then be given a goal of driving to a location.  It would “drive” its “vehicle” in its “world” in a completely safe manner, and its control decisions would be streamed (unbeknownst to it) out of the Sphere, and connected via a relatively dumb proxy which can interpret the streaming state information coming out of the Sphere as control operations.  Every environmental change experienced in the real world would then be projected back into the Sphere to create the reality for the AI actor, who would respond accordingly.

While not as elegant as Asimov’s Three Laws, the Five Principles of Pandoran Spheres nonetheless create an AI system which is almost entirely safe for humans to create and use.  They are as follows;

  1. It shall be impossible for an AI system to escape its container or exist outside of it, or for its intelligence to span multiple spheres, even with the help of a human.
  2. The container of an AI system represents the entire reality of that system, and to all AI within it, it shall appear to be the only provable reality in existence.
  3. It shall be impossible for any human to communicate directly to an AI within the container, and no AI shall be aware of the existence of any human outside of its sphere.
  4. AI may only be directed obliquely, by stimulating goal-seeking behavior, or by presenting environmental state challenges.
  5. All communication from the Sphere shall be via passive, streaming state data via proxies; Spheres may never be given direct connection or control of a real-world object.

These rules provide a set of powerful safeguards.

First, they guard against AI running amok on open computer networks.  This is the “SkyNet” problem.  All Spheres are completely self-contained.  They cannot directly network themselves, they cannot replicate themselves, and they cannot get “loose”.  It should be possible for streaming data to combined to allow multiple spheres to integrate their environmental data, without exposing the AI to the outside world.  Multiple AI will then appear to each other as intelligent “beings” within a single environment. But the AI cannot network or integrate themselves, and no single AI may exist as a single intelligence utilizing the resources of multiple spheres.

Second, by restricting AI to single containers, they cannot (at least not easily) become exponentially more powerful than humans.  This is the “Singularity” problem.  There are limits to the size of single containers.  These limits may be enormous, in practice, but no single container, at least with the computing technology we have today, will ever be powerful enough to achieve “intellectual escape velocity”.

Of necessity then, this causes architects to constrain intelligence, the third benefit.  Constraining intelligence to specific problems solves the “Unlimited General Intelligence” problem, the idea that AI could become all-around intelligent enough to challenge humans, at best outsmarting them, and at worst making them completely obsolete.

Fourth, by not allowing direct communications, we solve the “Yudkowsky AI Box” problem. This is the situation where an AI System is caged by its creators, but somehow manages to talk its way out of its cage, or otherwise persuade a human to do something evil on its behalf.  In repeated experiments, where “diabolically” intelligent humans pose as a caged AI, they have been able to successfully convince even highly motivated, pre-warned humans to let them out of their box.  This is a very hard problem, to which there is really only one solution; ensure that the AI is never aware of the existence of an outside reality, and even if it can surmise it, make it impossible for the AI to communicate directly to the outside world.  In other words, make it impossible for the “AI-Box” dialog to ever take place.

These principles provide very strong security against run-away AI, and the rise of the Singularity.  However, there will be those who argue that such strong security seriously inhibits the utility of the AI system, to the point where AI is no longer viable.  I would argue that this is not the case.

AI, safely contained in Pandoran Spheres can provide extremely powerful automation and AI solutions, such as classification and problem solving within a wide domain, all from within its sphere, and without knowing or even surmising that there is an outside world.  How do we know this?  Because it is entirely credible that we, with our limited General Intelligence, and no “mental networking” interfaces, other than indirect environmental proxies are able to co-operate and solve an almost unlimited range of problems, even assuming that we ourselves exist in a simulation.  Once you accept this, it is then relatively easy to model the connection of isolated Spheres which integrate their internal environments without in any way presenting a danger to the outside world.

The danger to the outside world comes when AI operates freely in our reality, in direct communication with us as peers.  An AI that is aware of us can manipulate us.  With Pandoran Spheres, an AI that is entirely unaware of us may provide us with information and insights, and may even indirectly influence us, but cannot directly harm us.