Vision is the task of understanding the world through our eyes. It is probably the most difficult thing that we do with our brains, yet we do it every waking moment, and it is virtually effortless. Just open your eyes and the universe is there, in all the richness of its shapes and colours, its brightness, distance and movement. But the analysis that underlies seeing involves about one third of the entire human cerebral cortex — more than a billion nerve cells. That is one indication of the magnitude of the task of vision.
Using their eyes, most people can thread a needle, recognize thousands of faces, read a newspaper, drive a car, see an orange as orange whatever the colour of the illuminating light. Some people can fly a jet plane at three times the speed of sound, return a tennis ball served at 200 km an hour, distinguish a thrush from a female blackbird at 100 m, or an early Cubist still life by Picasso from one by Braque. Each of these is an accomplishment of staggering complexity. Even the most sophisticated of computer vision systems, which interpret signals from cameras mounted on robots, seem like idiots compared with the genius of normal human vision. This is another indication of the scale of the task of vision.
Vision involves the detection of light — electromagnetic, non-ionizing radiation, ranging in wavelength from about 400 to about 750 nanometres. The main natural source of light is stars, especially our own sun. Full sunlight appears white, but light consisting of a limited range of wavelengths appears coloured. Short wavelengths look blue, long wavelengths red. Most of the light that enters our eyes does not come directly from the sun but is reflected from the surfaces of objects. Most surfaces (except mirrors and pure white objects) absorb part of the spectrum of light, changing the wavelength composition of the reflected light, thus making the surfaces appear coloured.
Vision has humble origins. In its very simplest form, it probably appeared near the start of life on Earth, with single-celled organisms that produced photopigments — molecules that change shape when they absorb light, and trigger chemical reactions in the cell. The mere detection of light can be useful to organisms, enabling them to regulate their activity according to the time of day or the seasons of the year, and even allowing them to orientate themselves towards or away from the source of light. Eyes — organs for collecting light — exploit the fact that light travels in straight lines. They use a lens, a mirror, or even just a pinhole, to cast an image on to receptor cells containing photopigment (photoreceptors). The crucial feature of an image is that it contains information about individual objects in the scene and their relative positions, thus affording the animal an opportunity to recognize and respond to those objects, as long as it has the apparatus in its head to analyze the information. The other huge value of vision is that it works at a distance, and hence serves to predict the future:
For I dipt into the future, far as human eye could see,
Saw the Vision of the world, and all the wonder that would be.
Alfred, Lord Tennyson, Locksley HallAll vertebrate eyes are built to a common plan. Rather like cameras, they have a lens system that forms an inverted image on a layer of photoreceptor cells in the back of the retina, which lines the eyeball. In front of the receptors are alternating layers of nerve fibres and cells, forming a complex network through which signals from the receptors are passed. Each photoreceptor absorbs light over a particular band of wavelengths, thus providing, between them, a pattern of activity that can be used to retrieve the brightness and colour of light. Essentially, the photoreceptors
pixellate the information in the image, reducing it to a point-by-point description of intensity and wavelength, rather like that on a computer screen. The grain of photographic emulsion in camera film does much the same. But cameras do not see. Vision depends on the interpretation of the
patterns of activity from the photoreceptors, across space and time.
Part of the process of interpretation occurs within the retina itself. The essential function of all vertebrate retinas is to reduce the overwhelming flood of information that pours into the eye. In the human eye there are about 120 million
rod photoreceptors, which work only in dim light, and 6 million
cones, which respond under brighter conditions and are of three types, sensitive to light in the blue, green or red part of the spectrum. Each photoreceptor produces a signal, dependent on the intensity and wavelength composition of the light that it catches. In computer terminology, this translates into many megabytes of information every second.
Evolution, ever the master of tricks and short-cuts to efficiency, has discovered ways in which unneeded information is removed, during processing in the retina, so that only the essential skeleton of the message is transmitted to the brain.
First, the overall number of ‘pixels’ is dramatically reduced. The signals are passed from the photoreceptors through several connections, to the last retinal cells in the chain, the
ganglion cells, which cover the inner surface of the retina and whose axons stream out through a hole in the eyeball to form the optic nerve. Each ganglion cell in the
fovea (the central part of the retina, which we point towards objects when we look directly at them), receives its main input from just one cone photoreceptor, perfectly conserving the fine-grain detail of that part of the image. But, compared with the roughly 125 million photoreceptors, there are a mere 1.5 million or so ganglion cells. Those in the peripheral parts of the retina pool signals from very large numbers of receptors. In effect the output of the retina is like very coarse-grain film for the peripheral parts, and very fine-grain just in the middle. The constant jerky movements of the eye, which occur about 3 times every second, deliver one part of the image after another to the high-resolution fovea.
The second function of the retina is to ‘filter’ the image in space and in time, through procedures somewhat similar to those used to ‘compress’ the information of an entire movie on to a DVD. Everyone is familiar with a phenomenon called
dark adaptation: if you go from a bright environment into a dark room it is initially very hard to see anything, but vision gradually improves, over the course of fully half an hour. In other words, the eye changes its sensitivity over time to suit the average brightness of the scene — rather like having camera film that can constantly change its speed to match the light conditions. On a shorter time-scale, the eye transmits signals only when the image has just changed, for example, after an eye movement. Indeed, if the image is held absolutely stationary on a person's retina (by means of optical or electronic techniques), perception fades out completely within a few seconds.
Our detailed knowledge of the visual system has come largely from the study of animals, and especially from the use of tiny microelectrodes to record impulses from individual nerve cells or fibres. Retinal ganglion cells have been much studied in this way in totally anaesthetized animals (in which the retina, indeed much of the visual pathway, continues, surprisingly, to respond to visual stimulation). Each ganglion cell responds to changes of light intensity over a limited area of the retina, called the cell's
receptive field, corresponding to the group of photoreceptors that influence the cell, via the network of connections in the retina. Roughly half the retinal ganglion cells respond with a burst of impulses when the centre of the receptive field is illuminated (ON cells). The other half respond to a decrease in illumination (OFF cells). Thus the output of the retina signals the relative brightness and darkness of each point or patch in the visual field.
Horace Barlow discovered (in the frog) and Steven Kuffler (in the cat) that ganglion cells also ‘filter’ the image in space (as well as in time), to achieve further information-compression. Essentially, the signals from each group of photoreceptors that feed the central part of the receptive field are inhibited by signals from surrounding photoreceptors, a process called
lateral inhibition. This means that each ganglion cell signals the
difference of illumination, or
contrast, between the central and the surrounding part of its receptive field. Any cell whose receptive field happens to view a part of the image with uniform brightness (e.g. the sky on a cloudless day) will be fairly inactive, while those whose receptive fields lie at the boundary of a change of intensity in the image will send strong signals to the brain.
It is almost as if the retina reduces the image to a line drawing of the visual scene. Perhaps this accounts for the fact that simple outlines are so powerful in their ability to evoke rich perception: just think of the how much can be seen in a line drawing or etching by Rembrandt or Matisse.

Fig. 1 The power of outline to evoke visual perception. A bison drawn between 10 000 and 15 000 years ago on a cave wall in France
In the retina of old-world monkeys (e.g. rhesus monkeys), assumed to be very similar to the human retina, the ON and OFF classes of ganglion cell can be further sub-divided into two main groups, called P cells and M cells (read on to discover the origin of these terms). P ganglion cells receive the central part of their receptive fields from one, or sometimes two (but not all three) types of colour-selective cone photoreceptors, and thus are
colour-selective in their responses. M cells, which generally have larger receptive fields, receive input from all cone classes: they are not colour selective but are exquisitely sensitive to contrast and hence to movement of images on the retina. To some extent, this division of function between P and M cells is maintained through the visual pathway, and into the domain of visual perception.
The real business of vision is in the brain. Each optic nerve (the second cranial nerve) passes through a hole at the back of the bony orbit (the cavity in the skull that contains the eyeball), and the two nerves meet to form a distinctive cross-shaped structure, the
optic chiasma, directly underneath the
hypothalamus. (Actually, a small number of fibres branch off at this point to provide information about ambient light level to nerve cells of the
suprachiasmatic nucleus, the heart of the
body clock mechanism in the brain.)
In the optic chiasma, roughly half the nerve fibres cross over to the opposite side, and the rest continue on to the same side. It was Isaac Newton who first described this anatomical curiosity, and recognized its functional importance:
Are not the Species of Objects seen with both Eyes united where the optick Nerves meet before they come into the Brain, the Fibres on the right side of both Nerves uniting there, and after union going thence into the Brain in the Nerve which is on the right side of the Head, and the Fibres on the left side going into the Brain in the Nerve which is on the left side of the Head. (Opticks, Book 3, Part 1, 14th edition, 1730)
Thus, the arms of the optic chiasma that point towards the brain, called the optic tracts, contain a mixture of fibres from geometrically corresponding halves of the two retinas, which, because of optical inversion of the image, view the opposite half of the visual world. Essentially this arrangement splits the representation of the visual field neatly into two. The right side of the field is viewed by the left cerebral hemisphere, the left side by the right. This fits with a general rule, that the left hemisphere is concerned with everything to the right of the body — the skin of right side, control of the muscles of the right side, even sounds coming from the right — while the right hemisphere is devoted to the left side of the body.
This means that damage to the visual pathway on one side of the brain causes blindness or partial blindness in both eyes, on the opposite side of the visual field. Interruption of one optic tract causes total blindness in the opposite half of the visual field — hemianopia. Nothing at all is visible to one side of a precise vertical line through the middle of whatever the patient is looking at. Remarkably, patients with this condition are sometimes unaware that they are half-blind: they complain of not being able to read normally, or not being able to drive as well as they used to! This points up a sensible but surprising property of vision — that it is concerned with what we can see, and not with what we cannot see. Think of how indifferent we are to the fact that we cannot see behind our heads. Equally, we are normally unaware that most of the visual field (except that part falling on the fovea) is represented in the brain with very poor detail and colour.

Fig. 2 The base of the human brain (drawn by Christopher Wren) from Cerebri anatome (1664) by Thomas Willis showing the optic nerves (E) meeting in the optic chiasm and the optic tracts continuing into the hemispheres of the brain. Wellcome Institute Library, London
In Dickens' Pickwick Papers, Sam Weller says: Yes I have a pair of eyes … and that's just it. If they was a pair o' patent double million magnifyin' gas microscopes of hextra power, p'raps I might be able to see through a flight o' stairs and a deal door; but bein' only eyes, you see, my wision's limited.
Indeed, our ‘wision’ is limited — by the resolution of the optics of our eyes and the structure of the retina, by the range of wavelengths to which our photoreceptors are sensitive, and by the capacity of our brains to fathom, from the mere shadows that flit across the retina, what is there in the outside world. But mercifully we are normally blissfully unaware of those limitations of sight.
This leads to a more general conclusion. Visual experiences are externalized, i.e. they happen outside the body, not inside the head. The visual properties of objects appear to belong to them, not to be the products of the brain. We are hardly even aware of our eye movements, which cause the image to jerk and slew continuously across the retina. The task of vision is to inform about the outside world, not about the nature of vision.
The nerve fibres in the optic tract (the axons of retinal ganglion cells) terminate in two main areas of the brain. A minority project to a structure called the superior colliculus (the upper little hill), which can be seen as a bump, one on each side, on the roof of the midbrain, as well as to nearby tiny clusters of nerve cells (in the pretectum). This general region, the mammalian vestige of the principal visual centre in amphibia, reptiles, birds and fish, is concerned mainly with visual reflexes. It contains regions that regulate the size of the pupil of the eye in bright and dim conditions, and that make the eye involuntarily follow large moving objects. The main function of the two superior colliculi is to control the automatic tendency of the eyes, the head and the body, to turn towards objects of interest — so-called orienting responses. They are, in fact, centres for sensory integration, since they receive input from the ears and the skin as well as the eyes, all helping to guide such reactions.
The bulk of the fibres of the optic nerve reach the lateral geniculate (meaning knee-shaped) nucleus (the LGN) in the thalamus (an egg-shaped mass of grey matter through which virtually all information passes on its way to the cerebral cortex). In monkeys, the LGN has six layers. The information from the two eyes remains separate, each eye sending its fibres to three of the layers. The lower two layers are called magnocellular, because the nerve cells in them are relatively large. The neurons of the magnocellular layers receive input from the fibres of the M class of ganglion cells (that is why they are called M cells), and hence they are also sensitive to contrast and motion, but not colour. The upper four, parvocellular layers (two for each eye) contain relatively small nerve cells, and receive input (one-to-one connections in some cases) from the axons of P ganglion cells. Hence the parvocellular layers transmit information about colour and fine detail.
The fibres of the roughly 1.5 million cells in the LGN fan backwards and upwards in a bundle of white matter called the optic radiation, which passes to the back of the hemisphere to reach the region of cerebral cortex, called the primary visual cortex (or striate cortex, or area 17, or V1). During the First World War, the British neurologist Gordon Holmes examined the visual deficits of soldiers who had suffered shrapnel injuries to this region. If a tiny fragment had entered the back of the skull on one side, there was a corresponding blind patch, a scotoma, in the opposite side of the visual field. This implies that there is a kind of ‘map’ of the retinal image across the surface of the primary visual cortex. Indeed, individual nerve cells in the grey matter receives input, directly or via the network of connections in the cortex, from a limited group of cells in the LGN. Thus each cortical cell also has its own receptive field — a patch of retina, and hence visual field, through which it responds to appropriate visual stimuli.
Nerve cells in the middle layers of the cortex, where the incoming fibres mainly terminate, respond to brightening or darkening of a particular spot in the visual field, very much like neurons in the LGN. Indeed there are separate sub-layers receiving input from P-type and M-type cells. Input from the two eyes is still kept separate at this point, with axons from the right- and left-eye layers of the LGN terminating in a remarkable alternating pattern. Each eye's input occupies regions that form curving, branching ocular dominance stripes, each about 0.3 mm wide, running across the middle layers of the cortex. Alternate stripes are dominated by right eye, then left, forming a pattern similar to a fingerprint impressed on the visual cortex. Neighbouring stripes have input from roughly the same point in the visual field, seen through the two eyes.

Fig. 3 The visual pathway, depicted by the great Spanish neuroanatomist Ramón y Cajal (1899). Optic nerve fibres from the nasal half of each retina (the half closer to the nose) cross over in the optic chiasma. So, the optic tract contains fibres from corresponding half-retinas of the opposite eye (c) and the eye on the same side (d). The fibres contain nerve cells in the lateral geniculate nucleus (g) whose fibres run up to the primary visual cortex at the back of the hemispheres. The right visual cortex (Rv) views the left side the visual field, the left views the right side
Extraordinary things happen as the information is passed up and down within the grey matter, to the many other neurons in the cortex. David Hubel and Torsten Wiesel won the Nobel Prize in 1981 for their pioneering work on the physiology of the visual cortex. They discovered, first in cats and later in monkeys, that these neurons respond not just to light or dark spots, like the neurons that drive them, but selectively to lines or edges, falling on, or moving over, the receptive field. Each cell prefers a line stimulus, at a particular orientation, and the preferred orientation varies from cell to cell. Somehow, the property of orientation selectivity is created by the combination of all the nerve fibres that converge on each cell. These orientation-selective cells are arranged into a beautiful system of columns, presumably created by the fact that most connections within the cerebral cortex run up and down radially within the grey matter. The selective neurons within each column, perhaps 0.1 mm across, running from the surface down to the white matter, all prefer the same orientation. And the preferred orientation shifts progressively from column to column, across the cortex.

Fig. 4 Vision depends on inference. The bright, white triangle in this illusory figure of Kanizsa is 'invented' by the brain on the basis of the evidence from the other featurer in the image
Orientation-selective neurons remain perhaps the best example of feature detection — the notion that sensory neurons are ‘programmed’ (partly through innate control of the ‘wiring’ of the pathway, partly through the effects of sensory experience early in life) to respond to particular information-rich features of the sensory world. The primary visual cortex starts the process of ‘dissecting’ the retinal image, so as to encode its essential structure. In normal conditions, these cells respond to the boundaries of objects in space, or to elements of the texture of surfaces, presumably describing these features to the rest of the brain. This is the beginning of a process that has been called inverse optics — inferring from the flat retinal image the true shapes and distribution of the objects that generated the image, ‘reversing’ the optical process that made the image.
Hubel and Wiesel also discovered that the vast majority of these orientation-selective neurons are also ‘binocularly driven’: they have receptive fields in roughly corresponding positions on both retinas, and are remarkably similar in their preferences for visual stimuli, whichever eye is open. Thus, in normal viewing conditions, these cells will be stimulated simultaneously through both eyes, by the two images of individual objects in space. This presumably accounts for the fact that we see only one, fused visual world, despite the fact that two eyes are viewing it.
Because our two eyes are horizontally separated in the head, when we view a three-dimensional scene, their retinal images are not absolutely identical. Binocular parallax, as it is called, creates tiny differences in the relative positions on the two retinas of the images of individual objects that lie at different distances from the eyes. Sir Charles Wheatstone first described, in 1838, the fact that we can interpret these minute differences between the two retinal images to perceive the solidity of objects and their relative distances in space. This skill, called stereopsis or stereoscopic vision, is a wonderful example of inverse optics. The brain has evolved mechanisms for analysing not just the individual retinal images, but also the differences between them, so as to understand the world.
Now, it turns out that, although the two receptive fields of individual visual cortical cells, on average, lie on geometrically corresponding points in the two retinas, there is a little variation in their relative positions. This, combined with the fact that the responses of neurons are often strongly enhanced when both receptive fields are stimulated simultaneously, means that individual such cells respond best to the boundaries of objects at particular distances, behind or in front of whatever the eyes are fixating. Thus, the processing that underlies stereopsis appears to start with the binocular neurons of the primary visual cortex.

Fig. 5 The brain infers a three-dimensional world from the flat retinal image. But in the case of the Necker cube, it is unable to decide between two equally likely interpretations. The cube spontaneously alternates in depth, depending on which face appears closer
The existence of a visual area in the back of the cerebral hemispheres was known in the nineteenth century. But at that time, the vast continent of uncharted cortex in between the major sensory and motor regions was thought simply to combine information, in some ill-defined way. It was called association cortex. Work on monkeys, starting in the 1960s, has shown that the entire association cortex of the rear part of the hemispheres is in fact devoted exclusively to the analysis of vision. It is divided into a huge patchwork of individual areas, each containing a representation of all or part of the visual field. These are known as extrastriate visual areas, to distinguish them from the striate cortex — the primary visual cortex. Virtually all the fibres from the LGN, carrying information from the eyes, reach only the striate cortex, and these other visual areas receive their input mainly from cortico-cortical connections, forming a complex network, with fibres running back and forth linking the striate cortex to the other areas.
While damage to the primary visual cortex leads to blindness in the corresponding area of the visual field, injury in extrastriate areas generally leads to more subtle deficits in perception. It must, however, be said that people rendered clinically blind by damage to the striate cortex can nevertheless sometimes respond unconsciously to a visual stimuli, by moving their eyes towards it, particularly if it moves rapidly or is of very high contrast. Indeed, some can ‘guess’ reliably the direction of movement of the stimulus and whether a flashed line is vertical or horizontal, even though they deny actually seeing it. This curious residual visual capacity, called ‘blindsight’, may be mediated by surviving connections from the eyes to other parts of the brain, perhaps via the superior colliculus.
Broadly speaking, the extrastriate areas of the cortex form two broad processing ‘streams’, both originating in the striate cortex. The ‘ventral stream’, which runs downwards into the lower parts of the temporal lobe, is dominated by the P-cell system, and thus contains information about colour and fine detail, while the ‘dorsal stream’, monopolized by M-cell input, runs up into the parietal lobe, and is concerned with the analysis of movement, and the detection of the position of objects in space. The ventral and dorsal streams have been dubbed ‘what’ and ‘where’ systems, although this is an over-simplification.

Fig. 6 In the brain of a person imagining the movement of an elephant certain (light grey) areas are active. When imagining the colour of an elephant, separate (dark grey) areas are active. The areas at the back of the brain are those that become active when one actually sees real movement or colour. The activity in the front of the brain is associated with the acts of imagining
The ventral stream does seem to be mainly concerned with the recognition of objects, and it feeds signals to parts of the brain, especially the hippocampus, thought to be responsible for conscious visual memory. Neurons in some areas within the ventral stream have remarkable properties. In an area called V4, for instance, some cells respond selectively to surfaces of a particular colour, regardless of the spectral composition of the illuminating light. This correlates with the fact that we see the colours of objects as more or less constant, whatever the illumination — a phenomenon called colour constancy. To achieve this property, these neurons must somehow take account of the wavelength composition of light reflected from surrounding surfaces, a ‘computation’ that cannot be done in the primary visual cortex. Further south, in parts of the temporal lobe, are populations of nerve cells that respond selectively to the appearance of monkey or human faces, somehow detecting the combination of features that define a face. Even deeper into the ventral stream cells can ‘learn’ to respond specifically to one stimulus out of a series of objects or abstract shapes that the animal is shown as part of a memory task. This all suggests that the ventral stream is concerned with identifying and remembering objects.
This work on monkeys has underpinned the recent study of visual areas in the human brain, making use of the new imaging techniques of Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI), which essentially detect the small local changes of blood flow associated with activity in neurons. There may be as many as 50 different extrastriate visual areas in humans, and those occupying the lower part of the occipital and temporal lobes also seem to be concerned with the analysis of colour, faces and the identification of objects. Damage in these regions, caused, for instance, by STROKE, causes various, selective deficits in visual understanding, such as central achromatopsia, a form of colour blindness, or prosopagnosia, the inability to recognize individual faces. In extreme cases, damage of the ventral stream leads to the frightening condition of visual agnosia, in which patients simply cannot recognize familiar objects, despite all their basic visual functions being normal.
The dorsal stream in monkeys also has areas with distinctive physiological properties. One, called the middle temporal area (MT) or V5, seems deeply involved in the analysis of motion. Neurons here almost all respond to movement in a particular direction, and they probably also play a part in stereoscopic vision. Neighbouring areas are concerned with analysing the flow of patterns across the retina produced by movements of the head or the whole body through space. Even higher up, in the parietal lobe, cells respond to the positions and movements of objects in ways that imply that they are concerned with guiding hand and eye movements. Again, similar functional areas have been found in the upper parts of the human occipital lobe and the parietal lobe. Damage in these regions can produce such conditions as akinetopsia (deficiency in the perception of motion) and visual neglect (failure to attend to objects on the opposite side of visual space).
It has been argued that the dorsal stream is more concerned with unconscious visually-guided reactions, such as manipulating objects with the hands, while the ventral stream underlies the conscious perception of objects. Evidence for this view comes from the fact that some individuals with ventral stream damage, while unaware of the differences between particular objects, can nevertheless shape their hands correctly when asked to pick them up. Equally, some patients with dorsal stream damage make clumsy hand movements when they try to pick up objects that they can recognize perfectly well.
The huge adaptive value of vision has driven its explosive evolution. Its machinery dominates our brains; its impressions dominate our subjective lives. Indeed, for the sighted, it is hard to imagine life without it. Language is full of visual metaphors that bear testimony to the fact that vision is the main route to the mind. ‘I see what you mean’; ‘A person of vision’; ‘My point of view’, ‘A picture is worth a thousand words’. Moreover, vision not only underpins our understanding of the world around us but also sets the scale of beauty and ugliness. The view from a mountaintop, the skyline of New York, sunset in the south of France, Botticelli's Birth of Venus (see Venus). It is seeing that makes those things breathtaking. Vision rules our aesthetic lives.
Vision has been a favourite topic of some of the most eminent individuals in the history of science, including such physicists as Isaac Newton, James Clerk Maxwell, Thomas Young, Hermann von Helmholtz and Ernst Mach. Arguably, we know more about vision than any other high-level function of the brain. Yet much remains mysterious. How does the brain arrive at reliable interpretations of objects? How is the identity of every object we can distinguish represented in the brain? How is the subjective experience of seeing related to, and generated from, the activity of neurons? Indeed, what, if anything, does conscious experience add to the purely computational process of vision?— Colin Blakemore
Bibliography
- Gregory, R. L. (2001) Eye and brain: the psychology of seeing, 5th edition. Oxford University Press, Oxford.
- Hubel, D. H. (1988) Eye, brain and vision. Scientific American Library/W. H. Freeman, San Francisco.
- Zeki, S. (1999) Inner vision. Oxford University Press, Oxford
See also blindness; blindness, recovery from; colour blindness; consciousness; eye movements; eyes; illusions sensory receptors.