1. The basic findings 2. Imitation 3. Understanding 4. Language evolution1. The basic findings
The neurophysiological findings of the Sakata group on parietal cortex and the Rizzolatti group on premotor cortex indicate that parietal area AIP (the anterior intra-parietal sulcus) and ventral premotor area F5 in monkey form key elements in a cortical circuit, which transforms visual information on intrinsic properties of objects into hand movements that allow the animal to grasp the objects appropriately. Further study revealed a class of F5 neurons that discharge not only when the monkey grasped objects in a certain way, but also when the
monkey observed the experimenter make a similar action (Gallese et al. 1996). Neurons with this property are called 'mirror neurons'. We distinguish mirror neurons, which are active both when the monkey performs certain actions and when the monkey observes them performed by others, from
canonical neurons which are active when the monkey performs certain actions but not when the monkey observes actions performed by others. In summary, area F5 is endowed with an
observation/execution matching system.
Positron emission tomography (PET) experiments were then designed to seek 'mirror systems' in humans (Rizzolatti et al. 1996). There were three conditions: subjects grasped a three-dimensional object; subjects observed the experimenter grasping the object; and (control) subjects simply observed the object. A mirror region was then defined as one that was activity for grasping and observation of grasping, but not for object observation alone. Intriguingly, the only mirror region found in the part of the human brain corresponding to monkey premotor cortex was Broca's area, a major component of the human brain's language mechanisms. More about language later.
2. Imitation
Imitation seems a natural extension of mirror system capability — not just recognizing an observed action, but playing out the mirror system's neural code to yield a replica of that action. Surprisingly, however, monkeys seem not to have this 'playback' feature. The key point seems to be that monkeys and chimps can recognize how actions bring hand, object, and body in relation to each other, rather than being able to tease out the specific motions involved in an instance of that action.
The monkey's F5 mirror neurons will fire when the experimenter grasps a block even if it is hidden from view, so long as the monkey has recently seen that the block is there. The monkey has to know about the goal to recognize the action. By contrast, humans can recognize an action from pantomime, without needing the prompt of seeing the object toward which the hand is directed. Indeed, human imitation can be more abstract still — not only can we freely imitate hand movements, but we can use hand movements to signal actions of a very different kind, such as the flapping of wings of a flying bird. Thus, there is much interest in understanding not only what the mirror systems of monkeys and humans have in common, but also the evolutionary changes that must have occurred. There is little imitation exhibited by monkeys, while chimpanzees exhibit 'simple' imitation, a process which is long and laborious compared to the rapidity with which humans can acquire novel sequences. When the chimpanzee imitates a human behaviour, the focus is on moving objects in relation to other objects or the chimpanzee's body. By contrast, we may say that humans have 'complex' imitation: they can acquire (longer) novel sequences in a single trial if the sequences are not too long and the components are relatively familiar. Moreover, the actions need no longer be directed at specific actors or objects. Indeed, brain imaging directed at linking neural correlates of imitation to the mirror system in humans is now an active area of research.

Fig. 1. Upper row: behavioural situations. Lower rows: the firing pattern of the neuron on each of a series of consecutive trials is shown above the histogram which sums response from each trial. Left: the experimenter grasps a piece of food, then moves it towards the monkey, who then grasps it. The neuron discharges during observation of the experimenter's grasp, ceases to fire when the food is given to the monkey, and discharges again when the monkey grasps it.
3. Understanding
Much attention has been drawn to the idea that the mirror system may provide the basis for one monkey to 'understand' the activity — even the intentions — of another, since the mirror system's neural activity is similar whether the monkey is performing an action or observing a related action. However, if one probes the nature of 'understanding', this seems to be only part of the story. New findings on the monkey begin to fill this in. Rizzolatti's group extended its search beyond F5, and found mirror neurons in some other areas, including region PF of parietal cortex. In one study of PF, 61 cells were responsive when the monkey observed biological actions, and two-thirds of these were also active during the monkey's own actions. However, about a quarter of these 'PF mirror neurons' do
not match observed actions to congruent executed actions. For example, a cell active for observation of downward motion of the hand when grasping an object may also be active during execution of grasping by
mouth. At first this may seem counter to the notion of a mirror neuron but for us it sets the stage for exploring the notion that understanding will in general involve more than the recognition of an action in isolation, and may also involve some notion of 'meaning', e.g. the context in which the action is appropriate and the expectations that such a behaviour evokes. This opens the door to extending the study of mirror neurons to include the recognition of context and expectations. For example, recognition of one action may be seen as a preliminary for either doing something or predicting what the observed primate will do next (e.g. bringing food to the mouth to eat). The context and expectations set the stage for action recognition, action recognition modifies the context and expectations, and so on. This will let us explore the notion that mirror neurons can act as the basis for 'understanding' if a given action can be placed in the context of its observed (in self and/or others) consequences.
4. Language evolution
Rizzolatti and Arbib (1998) argued that the homology between the monkey F5 mirror system and Broca's area provides a neurobiological 'missing link' for the long-argued hypothesis that primitive forms of communication based on manual gesture preceded speech in the evolution of language (Stokoe 2001). Their 'Mirror System Hypothesis' states that the matching of neural code for execution and observation of hand movements in the monkey is present in the common ancestor of monkey and human, and is the precursor of the crucial language property of parity, namely that an utterance usually carries similar meaning for speaker and hearer
Developing this theme, Arbib (2002) hypothesized seven stages in the evolution of human language:
(1) grasping,
(2) a mirror system for grasping,
(3) a 'simple' imitation system,
(4) a 'complex' imitation system,
(5) a manual-based communication system,
(6) speech, and
(7) language.
Turning to stage (5), our hypothetical sequence for the evolution of manual-based communication involves:
(i) observation of pragmatic action directed towards a goal object;
(ii) pantomime in which similar actions are produced away from the goal object; and
(iii) abstract gestures divorced from their pragmatic origins (if such existed) and available as elements for the formation of compounds which can be paired with meanings in more or less arbitrary fashion.
Imitation is the generic attempt to reproduce movements performed by another, whether to master a skill or simply as part of a social interaction. By contrast, pantomime is performed with the intention of getting the observer to think of a specific action or event. It is essentially communicative in its nature. The imitator observes; the pantomimic intends to be observed. Note that there are two roles for imitation in the evolution of manual-based communication. The first extends imitation to pantomime to provide ad hoc gestures that may convey a situation to the observer. The second extends the mirror system from the grasping repertoire to mediate imitation of gestures to support the transition from ad hoc gestures to conventional signs which can reduce ambiguity and extend the semantic range.
On this view, the 'speech' area of early hominids, i.e. the area somewhat homologous to monkey F5 and human Broca's area, is not yet even a proto-speech area. Instead, it mediated orofacial and manuobrachial communication. The 'generativity' which some see as the hallmark of language is present in manual behaviour. Combinatorial properties are inherent in the manuobrachial system. This provided the evolutionary opportunity for stage (6): the manual–orofacial symbolic system 'recruits' vocalization as association of vocalization with manual gestures allowed them to assume a more open referential character. This explains why F5, rather than the primate call area, provides the evolutionary substrate for speech. Locating phonology in a speech–manual–orofacial gesture complex we see that language acquisition takes various forms: a hearing person shifts the major information load of language — but by no means all of it — into the speech domain, whereas for a deaf person the major information load is removed from speech and taken over by hand and orofacial gestures. Even blind humans accompany speech with hand movements.
Finally, we note that some authors (e.g. Chomsky 1980) have postulated that the framework for the rich variety of human languages is encoded in the genome, a sort of universal grammar. By contrast, I would argue that the six biological stages described above bring us to a human brain that is 'language ready' in that it has the following properties supporting pre-language communication:
(i) symbolization: the ability to associate an arbitrary symbol with a class of episodes, objects, or actions. (At first, these symbols may not have been words in the modern sense nor need they have been vocalized.)
(ii) Intentionality: extension of communication to be intended by the utterer to have a particular effect on the recipient.
(iii) Parity (the mirror property): what counts for the speaker must count for the listener. In addition it has other properties not specific to communication, including
(iv) hierarchical structuring: perception and action involving components with sub-parts;
(v) temporal ordering: coding hierarchical structures 'of the mind';
(v) the ability to recall past events or imagine future ones; and
(vi) paedomorphy and sociality: conditions for complex social learning.
This language readiness involved major evolution of the mirror system to include a rich ability for imitation. I would then claim that, once we have groups of people with language-ready brains in this sense, biology need do no more. Rich processes of cultural evolution and diffusion can, I claim, then bring us to the current range of human languages without any recourse to a genetically inscribed universal grammar. The proof or disproof of this claim is one of the most exciting challenges for our study of the mind.
(Published 2004)— Michael A. Arbib
Bibliography- Arbib, M. A. (2002). 'The mirror system, imitation, and the evolution of language'. In Nehaniv, C., and Dautenhahn, K. (eds.), Imitation in Animals and Artifacts.
- — — (2004). 'From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics'. Behavioral and Brain Sciences (in press).
- Chomsky, N. (1980). Rules and Representations.
- Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). 'Action recognition in the premotor cortex'. Brain, 119.
- Rizzolatti, G., and Arbib, M. A. (1998). 'Language within our grasp'. Trends in Neurosciences, 21/5.
- — — Fadiga, L., Matelli, M., Bettinardi, V., Perani, D., and Fazio, F. (1996). 'Localization of grasp representations in humans by positron emission tomography: 1. Observation versus execution'. Experimental Brain Research, 111.
- Stokoe, W. C. (2001). Language in Hand: Why Sign Came before Speech.