Foreword

by Herbert A. Simon

Contemporary artificial intelligence systems depend heavily upon their programmers for their knowledge about the world in which they must act. Robotics is only beginning to provide the sensory and motor capabilities that they require when they must interact with the physical environment. And even when such capabilities are in place, or when the interaction with the task environment is symbolic instead of physical, AI systems still need intelligent strategies for exploring their environments to acquire information about them, and to build their internal representations of them.

Dr. Shen's admirable book addresses these fundamental problems of how learning from and about the environment can be automated. It provides both a basic framework, within which Dr. Shen examines our present understanding of these matters, and an important example of a program, LIVE, that possesses novel and important capabilities for learning about its environment autonomously.

Tasks calling for intelligence fall into two broad categories. In the one case, the intelligent system knows the task situation exactly, hence need not distinguish between the real world that its actions will affect and the mental world in which it plans its actions. In the other case, the intelligent system knows the actual situation only in part, hence must be concerned with incompleteness and inaccuracies of its picture of reality; for its plans will frequently fail to reach the intended goals or have undesired side effects, and it must have means for recognizing these failures, remedying them as far as possible, and re-establishing its contact with the external reality.

Proving mathematical theorems belongs to the first category of AI tasks: the theorem prover is provided with the axioms and inference rules of its mathematical "world," never has to stray beyond that world and can always be sure that its inferences will be valid in it. The domain of robotics belongs to the second category. There are always important differences between the world as perceived by the robot planner, and the world upon which the robot acts. In this situation, an intelligent system must have means for gathering information about the world and correcting its picture of it. Of course, the category of systems that must handle uncertainty extends far beyond robotics, encompassing all intelligent systems that deal, physically or mentally, with the conditions of the world we live in. This book is concerned with this second, very broad and fundamental category of artificial intelligence tasks, and with systems capable of coping with the uncertainties and incomplete information that are intrinsic to such tasks.

There is another way of dividing up AI tasks that is based upon a distinction between information the intelligent system is given and information it must acquire by some kind of learning process. A performance system is designed to work in a defined task domain, accepting particular goals and seeking to reach them by some kind of highly selective search. The system must be told what goal is to be reached and must be given a description of the structure and characteristics of the task domain in which it is to operate: its problem space. In real life, the problem space can only be a highly simplified and approximate description of the actual outside world.

In contrast, a learning system, is capable of acquiring a problem space, in whole or part, by interacting with the external environment and without being instructed about it directly. Learning, in turn, can be passive or active (or both). Passive learning uses sensory organs to gather information about the environment, but without acting on the environment. Active learning moves about in the environment in order to gather information, or actually operates on the environment to change it (performs experiments).

Since no intelligent system can grasp the whole of reality in its problem representation, it must build highly simplified and special problem spaces to deal with the special classes of situations it faces at given times. It must be able to alter its problem spaces frequently, and sometimes radically. Clearly, systems that must distinguish between the real world and the world of thought, taking account of the differences between the actual and the expected, have to detect these differences and respond to them in adaptive ways. This book is concerned specifically with intelligent systems that have learning capabilities enabling them to correct their pictures of their environments and consequently to behave instrumentally in the world -- the real world rather than the imagined one.

To build theory about complex systems, whether these systems be natural or artificial, we need to pursue both empirical and formal approaches -- neither by itself is sufficient. Up to the present time, most of what we have learned about artificial intelligence (and about human intelligence, for that matter) has been learned by building systems capable of exhibiting intelligent behavior. "The moment of truth," it has been said, "is a running system." At the same time, merely exhibiting such a system is not enough: we must discover how, why, and how well it works. That calls for a conceptual structure within which intelligent systems can be analysed and compared.

This book indeed provides such a conceptual framework for addressing the general problem of learning from the environment. It emphasizes that if a system is to learn from its environment it must have capabilities for induction -- it must be able to assemble its observations into a coherent problem space that can serve as its model of the environment. As part of its modeling activity, it must be able to induce concepts and regularities and laws to describe the world it is living in. It must be able to make predictions and to modify its model when the predictions are falsified. It must be able to detect when the variables it can observe provide only an incomplete description of the external reality, and must be able to respond to this incompleteness by postulating new hidden variables (theoretical terms).

A framework that encompasses all of these requirements also encompasses many central topics in artificial intelligence: concept learning, heuristic search, law discovery, prediction and others. The first half of the book draws upon the research on all of these topics, bringing this work into focus and exploring its implications for the design of systems that learn. It shows how a wide range of techniques -- some drawn from AI, some from operations research, others from statistical decision theory and elsewhere -- can be used for model construction, for making predictions, for exploring the environment actively, for correcting and elaborating the model. One major value of the book for me has been to put together in a principled way and as a single continent a large body of literature that had hitherto formed rather isolated islands.

Dr. Shen's book never loses sight of the fact that theory in AI is centrally concerned with what is computable and how it can be computed. "Computability," from an AI standpoint, has almost nothing to do with Godel theorems that show that some valid propositions must always lie beyond the capabilities of any single formal system, or with the equivalence or inequivalence of certain languages or architectures with Turing Machines. It has only a little more to do with those theorems on computational complexity that deal with worst cases as problem size increases without limit, or with theorems that show that an algorithm will ultimately converge (after a computation of unspecified duration) to a desired result.

Computability in AI has to do with what can be computed over reasonable time intervals using the kinds of computing devices that are actually available, or are likely to be available within a visible time. This is an imprecise criterion, but the only one of real interest and the only one that distinguishes AI from pure mathematics. Dr. Shen, while he does not ignore formal results, keeps this criterion clearly in mind in both his empirical and his theoretical work.

One way to study the computational capabilities of systems that learn from the environment is to construct robots to operate in a physical world. Another way is to construct two "worlds" within the computer -- in one of them actions are taken and consequences follow; the other represents the problem solver's representation of that real world, its problem space. A theory of autonomous learning from the environment can be developed and tested initially in either of these contexts, and the theory developed in this book applies to both. Ultimately, of course, the autonomous system must be tested in the real world, but simulation can provide a very effective and economical initial test platform.

The particular AI system, LIVE, that Dr. Shen has built and observed, and that occupies much of the second half of the book, represents the second strategy. It is a conceptual rather than a physical robot; the world that LIVE learns about is stored in computer memory alongside (but separated from) its own problem space. The two worlds communicate only through the sensory capabilities with which LIVE is endowed and its capabilities for action.

This research strategy for studying learning from the environment avoids the arduous tasks of building physical visual and auditory sensors and mechanical arms and legs, but does not lose sight of the fundamental issue: the actual and possible discrepancies between the mental picture and the scene of action.

Research on human discovery processes had shown that surprise -- the accident that, as Pasteur put it, happens to the prepared mind -- plays a frequent and important role in scientific discovery. Fleming's discovery of penicillin, Hans Kreb's discovery of the role of ornithine in the synthesis of urea, the Curies' discovery of radium are well-known examples. Autonomous Learning from the Environment shows clearly why surprise (the departure of expectations from observations) is useful for guiding the modification of the learner's problem space, and in particular how the formation of expectations enhances the knowledge obtainable from new observations.

Most readers, I think, will experience more than one surprise as they explore the pages of this book, and if they do not already know how to exploit surprise for purposes of discovery, they will find help on that topic too. On this and many other topics, Weimin Shen has provided us with an indispensable vade mecum for our explorations of systems that learn from their environments.