In this experiment, a real hardware robot control system was evolved for
wall-avoidance behaviour in an empty 2.9m
4.2m rectangular arena, using
sonar time-of-flight sensing. The two-wheeled robot (``Mr Chips,''
Figure 4(a)) has a diameter of 46cm, and a height of 63cm. For
this scenario, its only sensors were a pair of fixed sonar heads pointing
left and right.
Figure 4: (a) The ``Mr Chips'' Robot. (b) The evolvable Dynamic State Machine.
``G.L.'' stands for ``Genetic Latch'': which of the bits are latched according
to the clock, and which are passed straight through is under genetic control.
Consider how this control problem might traditionally have been solved using finite-state machines (FSMs). Firstly, a pair of FSMs would be used to measure the time of flight of the sonar ``pings.'' When the sonar echoes returned, each timer would deliver a binary code representing the time of flight to a central ``control'' FSM, which would compute a motor response on the basis of its current internal state and these inputs. The control FSM would then deliver a binary code representing desired motor speed to each of a pair of pulse-width-modulation FSMs, which would drive the d.c. motors with the appropriate waveforms. Notice that there is a very strict sensory/control/motor functional decomposition inherent in this architecture, and the system is a computational one, in which the control FSM deals in binary codes and is totally divorced from the dynamics of the sensorimotor systems and the environment. It would be possible to evolve the control FSM in hardware, but there would be no benefit: exactly the same behaviour would be obtained from an implementation of the FSM in software. A hardware implementation would use a clocked register to hold the current state, and a random-access memory (RAM) chip with evolvable contents would hold the next-state and output variables corresponding to each present-state and input combination (this is the well known ``direct-addressed ROM'' implementation of an FSM [3]).
The hardware control system used in this experiment is superficially similar to an FSM, but fundamentally different. Call it a Dynamic State Machine (DSM). As its input, it directly takes the echo signals from the sonars. There is one wire from each sonar, on which pulses arrive: the lengths of these pulses are equal to the time of flight for that sonar. (The sonars fire, and the pulses begin, simultaneously, with a pulse repetition rate of 5Hz. The sonar firing cycle is completely asynchronous to the DSM.) The output of the DSM goes directly to the motor drivers; if the motors are to go at an intermediate speed, then the DSM must pulse them itself. Like the FSM, the DSM implementation is centred around an RAM chip with evolvable contents, but which of the input, state and output variables are clocked, and which are free running (asynchronous) is genetically determined. For those that are clocked, the clock frequency is also genetically determined. The full arrangement is shown in Figure 4(b).
The sensory/control/motor functional decomposition has now been removed, and the control system is intimately linked to the dynamics of the sensorimotor signals and the environment, with time now able to play an important role throughout the system. The possibility of mixing asynchronous state variables with state variables being clocked at an evolved frequency endows the system with a rich range of possible dynamical behaviour: its actions can immediately be influenced by the input signals, but at the same time it is able to keep a trace of previous stimuli and actions over a time-scale that is under evolutionary control. It is able to exploit special-purpose tight sensorimotor couplings because the temporal signals can quickly flow through the system, being influenced by, and in turn perturbing, the DSM on their way.
The presence of asynchronous state variables means that this is not a finite-state machine (their continuous-valued temporal relationships need to be included in a description of the machine's state). It would not be possible to simulate this machine in software, because the effects of the asynchronous variables and their interaction with the clocked ones depend upon the characteristics of the hardware: meta-stability and glitches will be rife, and the behaviour will depend upon physical properties of the implementation, such as propagation delays and meta-stability constants. Similarly, a designer would only be able to work within a small subset of the possible DSM configurations -- the more predictable ones.
For the simple wall-avoidance behaviour, only the two state variables that
also go to the motors were used -- the others were disabled, and can be
introduced incrementally as the difficulty of the task is increased. The
genetic algorithm was the same as that described in the previous section, with
the contents of the RAM (only 32 bits required for the machine with two state
variables), the period of the clock (16 bits, giving a clock frequency from
around 2Hz to several kHz) and the clocked/unclocked condition of each
variable all being directly encoded onto the linear bit-string genotype. The
population size was 30, probability of crossover 0.7, and the mutation rate
was set to be approximately 1 bit per string. (It can be shown that this small
DSM is statistically likely to visit all of its possible states: DSMs with
more state variables are likely to visit a smaller fraction of their possible
states, and the mutation rate needs to be higher, because many mutations will
have no immediate phenotypic effect.) If the distance of the robot from the
centre of the room in the x and y directions at time t was
and
, then after an evaluation for T seconds, the robot's fitness
was a discrete approximation to the integral:
and
were chosen such that their respective Gaussian terms fell
from their maximum values of 1.0 (when the robot was at the centre of the
room) to a minimum of 0.1 when the robot was actually touching a wall in their
respective directions. The s(t) term is to encourage the robot always to
keep moving. Each individual was evaluated for four trials of 30 seconds each,
starting with different positions and orientations, the worst of the four
scores being taken as its fitness [6]. For the final few
generations, the evaluations were extended to 90 seconds, to find controllers
that were not only good at moving away from walls, but also staying away
from them.
For convenience, evolution took place with the robot in a kind of ``virtual reality.'' The real evolving hardware controlled the real motors, but the wheels were just spinning in the air. The wheels' angular velocities were measured, and used by a real time simulation of the motor characteristics and robot dynamics to calculate how the robot would move. The sonar echo signals were then artificially synthesised and supplied in real time to the hardware DSM. Realistic levels of noise were included in the sensor and motor models, both of which were constructed by fitting curves to experimental measurements, including a probabilistic model for specular sonar reflections.
Figure 5 shows the excellent performance which was attained after 35 generations, with a good transfer from the virtual environment to the real world. The robot is drawn to scale at its starting position, with its initial heading indicated by the arrow; thereafter only the trajectory of the centre of the robot is drawn. The bottom-right picture is a photograph of behaviour in the real world, taken by double-exposing a picture of the robot at its starting position, with a long exposure of a light fixed on top of the robot, moving in the darkened arena. If started repeatedly from the same position in the real world, the robot follows a different trajectory each time (occasionally very different), because of real-world noise. The robot displays the same qualitative range of behaviours in the virtual world, and the bottom pictures of Figure 5 were deliberately chosen to illustrate this.
Figure 5: Wall avoidance in virtual reality and (bottom right) in the real
world, after 35 generations. The top pictures are of 90 seconds of behaviour,
the bottom ones of 60.
When it is remembered that the DSM receives the raw echo signals from the sonars and directly drives the motors (one of which happens to be more powerful than the other), with only two internal state variables, then this performance in surprisingly good. It is not possible for the DSM directly to drive the motors from the sonar inputs (in the manner of Braitenberg's ``Vehicle 2'' [1]), because the sonar pulses are too short to provide enough torque. Additionally, such naive strategies would fail in the symmetrical situations seen at the top of Figure 5. One of the evolved wall-avoiding DSMs was analysed (see below), and was found to be going from sonar echo signals to motor pulses using only 32 bits of RAM and 3 flip-flops (excluding clock generation): highly efficient use of hardware resources, made possible by the absence of design constraints.
Figure 6 illustrates the state-transition behaviour of one of the wall avoiders. This particular individual used a clock frequency of 9Hz (about twice the sonar pulse repetition rate). Both sonar inputs were asynchronous, and both motor outputs were clocked, but the internal state variable that was clocked to become the left motor output was free-running (asynchronous), whereas that which became the right output was clocked. In the diagram, the dotted state transitions occur as soon as their input combination is present, but the solid transitions only happen when their input combinations are present at the same time as a rising clock edge. Since both motor outputs are synchronous, the state can be thought of as being sampled by the clock to become the motor outputs. This state-transition representation is misleadingly simple in appearance, because when this DSM is coupled to the input waveforms from the sonars and its environment, its dynamics are subtle, and the strategy being used is not at all obvious. It is possible to convince oneself that the diagram is consistent with the behaviour, but it would have been very difficult to predict the behaviour from the diagram, because of the rich feedback through the environment and sensorimotor systems on which this machine seems to rely. The behaviour even involves a stochastic component, arising from the probabilities of the asynchronous echo inputs being present in certain combinations at the clocking instant, and the probability of the machine being in a certain state at that same instant (remember that one of the state variables is free-running).
Figure 6: A representation of one of the wall-avoiding
DSMs. There is more to
its behaviour than is seen immediately in this state-transition diagram,
because it is not entirely a discrete-time system, and its dynamics are
tightly coupled to those of the sonars and the rest of the environment.
Even this small system is non-trivial, and performs a difficult task with minimal resources, by means of its rich dynamics and exploitation of the real hardware. Only time will tell whether the DSM architecture will be capable of more sophisticated behaviour, using more state variables. The success of this particular architecture in the long term is less important than the powerful demonstration of the principles which inspired it, given here.