Systems that attempt to reconstruct some or all of the audible dimensions of an acoustic event that occurred elsewhere. A sound-reproducing system includes the functions of capturing sounds with microphones, manipulating those sounds using elaborate electronic mixing consoles and signal processors, and then storing the sounds for reproduction at later times and different places. See also Microphone.
Certain technical variables are relevant to the reliable reconstruction of audio events. These include frequency range, dynamic range, and linear and nonlinear distortions. Traditionally the audible frequency range has been considered to be 20 Hz to 20 kHz. However, lower frequencies can generate interesting tactile impressions, and it is now argued by some that higher frequencies (even 50 kHz and beyond) add other perceptual nuances. See also Hearing (human).
In humans, dynamic range is the range of sound level from the smallest audible sound to the largest sound that can be tolerated. In devices, it is the range from background noise to unacceptable distortion. From the threshold of audibility to the onset of discomfort is about 120 decibels at middle and high frequencies. Microphones are not a limiting factor since the best have dynamic ranges in excess of 130 dB. In the recording studio, dynamics are manipulated by electronic gain compensation, so the ultimate limitation is the storage medium. For example, a 16-bit compact disk (CD) can exceed 100 dB signal-to-noise ratio, while 24-bit digital media can exceed any reasonable need for dynamic range. Background acoustical noise in studios and concert halls sets the lower limit for recordings, just as background noise in homes and cars sets a lower limit for playback. Quiet concert halls and homes are around 25 dB sound pressure level (SPL) at middle and high frequencies. Crescendoes in music and movies approximate 105 dB. See also Acoustic noise; Compact disk; Decibel; Distortion (electronic circuits); Gain; Loudness; Signal-to-noise ratio; Sound; Sound pressure.
Variations in amplitude and phase as functions of frequency are known as linear distortions, since (ideally) they do not vary with signal level. In practice they often do, and in the process generate nonlinear distortions. In terms of their importance in preserving the timbre of voices and musical instruments, the amplitude versus frequency characteristic, known commonly as the frequency response, is the dominant factor. Humans are very sensitive to variations in frequency response, the amount depending on the bandwidth of the variation. It is convenient to discuss this in terms of the quality factor (Q) of resonances, since most such variations are the result of resonances in loudspeaker transducers or enclosures. See also Distortion (electronic circuits); Q (electricity); Resonance (acoustics and mechanics); Response; Reverberation.
Nonlinear distortions occur when a device behaves differently at different signal levels. The waveform coming out of a distorting device will be different from the one entering it. The difference in shape, translated into the frequency domain, is revealed as new spectral components. If the input waveform is a single frequency (pure tone), the additional spectral components will be seen to have a harmonic relationship to the input signal. Hence they are called harmonic distortion. If two or more tones are applied to the device, nonlinearities will create harmonics of all of the tones (harmonic distortion) and, in addition, more new spectral components that are sum-and-difference multiples of combinations of the tones. These additional components are called intermodulation distortion.
Perceptually, listeners are aided by a phenomenon called masking, in which loud sounds prevent some less loud sounds from being heard. This means that the music causing the distortion inhibits the ability to hear it. Rough guidelines suggest that, in music, much of the time we can be unaware of distortions measuring in whole percentages, but that occasionally small fractions of a percent can be heard. See also Masking of sound.
Loudspeakers radiate sound in all directions, so that measurements made at a single point represent only a tiny fraction of the total sound output. In rooms, most of the sound we hear from loudspeakers reaches our ears after reflections from room boundaries and furnishings, meaning that our perceptions may be more influenced by measures of reflected and total sound than by a single measurement, say, on the principal axis. See also Architectural acoustics; Sound field enhancement.
To be useful, technical measurements must allow us to anticipate how these loudspeakers might sound in rooms. Consequently, it is necessary to measure sounds radiated in many different directions at points distributed on the surface of an imaginary sphere surrounding the loudspeaker. From these data, it is possible to calculate the direct sound from the loudspeaker to the listener, estimates of strong early reflected sounds from room boundaries, and estimates of later reflected or reverberant sounds. It is also possible to calculate measures of total sound output, regardless of the direction of radiation (sound power) and of the uniformity of directivity as a function of frequency. All of these measured quantities are needed in order to fully evaluate the potential for good sound in rooms. See also Directivity.
Binaural system configurations are sound-reproduction techniques in which a dummy head, equipped with microphones in the ear locations, captures the original performance. Listeners then audition the reproduced performance through headphones, with the left and right ears hearing, ideally, what the dummy head “heard.” The system is good, but flawed in that sounds that should be out in front (usually the most important sounds) tend to be perceived to be inside, or close to, the head. Addressing this limitation, systems have been developed that use two loudspeakers, combined with signal processing to cancel the acoustical crosstalk from the right loudspeaker to the left ear, and vice versa. The geometry of the loudspeakers and listener is fixed. In this mode of listening, sounds that should be behind are sometimes displaced forward, but the front hemisphere can be very satisfactorily reproduced. See also Binaural sound system; Earphones; Virtual acoustics.
Multichannel audio systems began with stereophonic systems using two channels, because in the 1950s the technology was limited to one modulation for each groove wall of a record. Good stereo imaging is possible only for listeners on the axis of symmetry, equidistant from both loudspeakers. See also Modulation; Stereophonic sound.
Quadraphonic systems appeared in the 1970s, and added two more loudspeakers behind the listeners, mirroring the ones in front. The most common systems exhibited generous interchannel crosstalk or leakage. To hear the proper spatial perspective, the preferred listening location was restricted to front/back as well as left/right symmetry. The failure to agree on a single standard resulted in the system's demise.
For film sound applications, Dolby Surround modified the matrix technology underlying one of the quadraphonic systems, rearranging the four matrixed channels into a left, center, right, and surround configuration. In cinemas, the single, limited-bandwidth, surround channel information is sent to several loudspeakers arranged along the side walls and across the back of the audience area. For home reproduction, the surround channel is split between two loudspeakers placed above and to the sides of the listeners.
Even with the best active (electronically steered) matrix systems, the channels are not truly discrete. Sounds leak into unintended channels where they dilute and distort directional and spatial impressions. Digital recording systems now provide five discrete full-bandwidth channels, plus a low-frequency special effects channel for movies with truly big bass sounds. For homes, 5.1 channels refers to five satellite loudspeakers, operating with a common low-frequency subwoofer. The subwoofer channel is driven through a bass-management system that combines the low frequencies of all channels, usually crossing over at 80–100 Hz so that, with reasonable care in placement, it cannot be localized.