Thursday, July 20, 2017

Detecting Intelligence with an Unknown Objective

This is the second post in a series on theory for adaptive systems. The previous post argued that the lack of good adaptive systems theory is the main bottleneck to scientific progress today. The main goal for the next few posts is to lay out questions and problems, and to suggest possible approaches toward quantitative solutions.

Today’s question: How can we recognize adaptive systems in the wild?

To be more concrete: suppose I run a ridiculously huge Game of Life simulation with random initial conditions. What function can I run on the output in order to detect adaptive system behavior within the simulation? Specifically, we’re looking for subsystems of the Game of Life which:
  • Learn from their environment
  • Use what they learn to optimize for some objective
I see two major difficulties to this problem:
  1. We don’t know the system’s objective.
  2. We don’t know what defines the “system”.
I’ll focus on the first part for now; defining the “system” will be a running theme which I will revisit toward the end of these posts.

Example 1: Street Map
Imagine cars driving around on a street map which looks roughly like this:
Suppose two types of cars drive around this map. The first type wanders about, picking a random direction at each intersection until it reaches its destination. The second type knows what the map looks like, and takes the shortest path from its starting point to its destination. Looking at their paths as they drive, how could we tell the two apart? In particular, how could we tell the two apart without knowing the destination?

It would be tedious but straightforward to build an elaborate statistical test for this particular problem, but it wouldn’t generalize. Instead, I’ll point out an heuristic: the intelligent cars, the cars which take the shortest route, will almost always start by driving toward the center, and almost always finish by driving away from the center.

Why? Pick two points at random on the map. Look at the shortest path between them. A majority of the time, it will go through the center point. Even when it doesn’t, it almost always goes first toward the center, then away - it never gets closer to the center then further then closer again.

(For the mathematically inclined: you can prove this by looking at the map as a tree, picking a root, and viewing “distance to center” as the depth within the tree.)

Example 2: Shortest Paths
In the street map example, we can detect “intelligent” behavior by looking for cars which first go towards the center of the map. This behavior is statistical evidence that the car is following a relatively short path to some destination.

Can we generalize this? “Intelligent” cars only start by going toward the center because that’s the shortest path. Even on a more general map, we could look for statistical patterns among shortest paths. On a real-world road map, “shortest paths” over significant distances usually hop onto a highway for most of the drive. Even locally, there are more central and less central roads. Without diving into any statistics, it seems like we could take a typical road map and develop a statistical test to tell whether a car is following a short path between two points, without needing to know the car’s destination.

But what makes the short path “intelligent” at all? Why do we intuitively associate short paths with intelligent behavior, as opposed to wandering randomly around the map?

Example 3: Resource Acquisition
Let’s look at the problem from a different angle. One characteristic behavior of living creatures, from animals to bacteria, is a tendency to acquire resources.

In biology, the main types of resources acquired are energy and certain standard biochemicals. Each of these resources is stored - e.g. energy is stored as starch, fat, ATP, an electric potential difference, etc.

Why would adaptive systems in general want to acquire and store resources? Because it gives the system more options. A human who accumulates lots of currency has more options available than a human without any currency. A bacteria with a store of energy has more options than a bacteria without. Ultimately, those resources could be used in a variety of different ways in order to achieve the system’s objective.

Whether it’s a human taking a vacation or buying a car, a bacteria reproducing or growing, a pool of resources offer options suitable to many different situations. Intuitively, we expect adaptive systems to accumulate resources, because those resources will give the system many more options in the future.

Example 4: Time as a Resource
One universal resource is time. In this view, saving time is a special case of accumulating resources: time saved can be spent in a wide variety of ways, offering more options in the future.

This ties back to the shortest path example. We expect “intelligent” systems to take short paths in order to save time. They save time because time is a universal resource - time saved can almost always be “spent” on something else useful to the system’s goal.

In the street map example, we run into a more unusual resource: “centrality” in the road map. (Mathematically: height in the tree.) A more central location is closer to most points. By moving toward the center of the map, a car accumulates centrality. It can then cash in that centrality for time savings, converting one resource (centrality) into another (time).

A Little Formalization
We now have a handful of examples of intuitively “intelligent” behavior - short paths, energy and currency accumulation, saving time. These examples all amount to the same thing: accumulating some useful resource. Can we formalize this intuition somewhat? Can we generalize it further?

In AI theory, there’s a duality between constraint relaxation and heuristics. A constraint relaxation would be something like “what could the system do if it had more of resource X?”. The amount of X is constrained, and we “relax” that constraint to see if more X would be useful. That constraint relaxation has a corresponding heuristic: “accumulate X”. That heuristic is useful exactly when relaxing the constraint on X is useful.

All of our resource accumulation examples can be viewed as heuristics of that same form: “accumulate X”. Each of them has a corresponding constraint relaxation: “what could the system do if it had more of resource X?”.

In principle, any formal heuristic can be viewed as a resource. But the examples we addressed above seem more specific than heuristics in general. They share some common features beyond general formal heuristics:
  • Each resource is highly fungible. Energy, currency and time are easy to trade for a very wide range of other things, and other things are easy to trade back into energy, currency, and/or time.
  • Each resource can be stored efficiently. Cream is not a good resource for humans to accumulate; it spoils quickly.
  • Each resource is scarce. Bacteria need water, and they could accumulate water, but they’re usually surrounded by unlimited amounts of water anyway. No point storing it up.
In some ways, these are just criteria for what makes a good formal heuristic. In order for a formal heuristic to accelerate planning significantly, it needs to be scarce and storable and fungible. In order for something to be a good resource to accumulate, it should be a useful heuristic for planning problems.

Problems. Plural.

Remember where we started this post: we want to detect adaptive systems without necessarily knowing the systems’ objectives in advance. All the resources listed above make good heuristics not just for one problem, but for a wide variety of different problems. Why? What do they have in common, beyond generic formal heuristics?

The Ultimate Resource
Let’s go back to where formal heuristics come from: constraint relaxation. Intuitively, by accumulating resources, by following an heuristic, by relaxing a constraint, a system gives itself more options. That’s why it’s useful to have more energy, more currency, more time: the system can choose from among a wider variety of possible actions. The action space is larger.

This is the ultimate resource: accessible action space. The more possible actions available to an adaptive system, the better. A good resource to accumulate is, in general, one which dramatically expands the accessible action space. Fungibility, storability, and scarcity are all key criteria for something to significantly expand the action space.

Redux
Time to go back to the opening question: suppose I run a ridiculously huge Game of Life simulation with random initial conditions. What function can I run on the output in order to detect adaptive system behavior within the simulation?

This post has only addressed one tiny piece of that problem: an unknown objective. Later posts will focus more on information processing, learning, and defining the system. But already, we have a starting point.

We expect optimizing systems to accumulate resources. These resources will be fungible, storable, and scarce in the environment. The system will accumulate these resources in order to expand its action space.

So what might we look for in the Game of Life? Very different kinds of resources could be useful, depending on the scale and nature of the system. But we would certainly look for statistical anomalies - resources are scarce. Those anomalies should be persistent - resources can be stored. Finally, the extent of those anomalies should grow and shrink over time - resources are acquired and spent.

It’s not much, but it’s a starting point. Hopefully it suggests a flavor for what sort of things could be involved in research on the subject. Or better yet - hopefully it gives you ideas for better approaches to tackle the problem.

Next post will talk about how to extract an adaptive system’s internal probabilistic model.

No comments:

Post a Comment