People and Society: Speech Perception

Speech Perception

Speech perception, the process by which we employ cognitive, motor, and sensory processes to hear and understand speech, is a product of innate preparation (“nature”) and sensitivity to experience (“nurture”) as demonstrated in infants’ abilities to perceive speech. Studies of infants from birth have shown that they respond to speech signals in a special way, suggesting a strong innate component to language.

Other research has shown the strong effect of environment on language acquisition by proving that the language an infant listens to during the first year of life enables the child to begin producing a distinct set of sounds (babbling) specific to the language spoken by its parents.

Since the 1950s, great strides have been made in research on the acoustics of speech (i.e., how sound is produced by the human vocal tract). It has been demonstrated how certain physiologic gestures used during speech produce specific sounds and which speech features are sufficient for the listener to determine the phonetic identity of these sound units. Speech prosody (the pitch, rhythm, tempo, stress, and intonation of speech) also plays a critical role in infants’ ability to perceive language.

early age by infants

Two other distinct aspects of perception—segmentation (the ability to break the spoken language signal into the parts that make up words) and normalization (the ability to perceive words spoken by different speakers, at different rates, and in different phonetic contexts as the same)—are also essential components of speech perception demonstrated at an early age by infants.

In addition to the acoustic analysis of the incoming messages of spoken language, two other sources of information are used to understand speech: “bottom-up” and “top-down”. In the former, we receive auditory information, convert it into a neural signal and process the phonetic feature information.

In the latter, we use stored information about language and the world to make sense of the speech. Perception occurs when both sources of information interact to make only one alternative plausible to the listener who then perceives a specific message.

high-sucking

To understand how bottom-up processing works in the absence of a knowledge base providing top-down information, researchers have studied infant speech perception using two techniques: high-amplitude sucking (HAS) and head-turn (HT).

In HAS, infants from 1 to 4 months of age suck on a pacifier connected with a pressure transducer which measures the pressure changes caused by sucking responses when a speech sound is presented. Head turn conditioning is used to test infants between 6 months and one year of age.

With this technique, a child is trained to turn his or her head when a speech sound, repeated once every second as a background stimulus, is changed to a comparison speech sound. When the head is turned during the presentation of the comparison stimulus, the child is rewarded with a visual stimulus of a toy which makes a sound.

phonetic categories

As a result of studies using these techniques, it has been shown that infants at the earliest ages have the ability to discriminate phonetic contrasts (/bat/ and /pat/) and prosodic changes such as intonation contours in speech.

However, to understand speech, more than the ability to discriminate between sounds is needed; speech must be perceptually organized into phonetic categories, ignoring some differences and listening to others.

To measure categorical perception, adults were asked to discriminate between a series of sounds varying in equal steps in acoustic dimension from /ra/ to /la/. As predicted by the categorical perception phenomenon, their discrimination improved at the boundary between the two phonetic categories.

native language

However, adult listeners could do this only for sounds in their native language. The discovery that categorical perception was languagespecific suggested that it might be a learned behavior. This prompted researchers to question if categorical perception was the result of experience with language. If so, young infants could not be expected to show it, while older infants, who had experienced language, might be expected to do so.

Using the sucking technique, this study revealed that at birth, infants’ discrimination of /pa/ and /ba/ was categorical not only with the perception of sounds in their native language but also with sounds from foreign languages as if the infants heard all the phonetic distinctions used in all languages.

But if this “language-general” speech perception ability of infants later became “language-specific” speech perception in adults, when and by what process did this change occur? To answer this question, researchers began to study the perception of phonetic prototypes (i.e., the “best” members of a phonetic category).

phonetic prototypes

Under the assumption that sound prototypes exist in speech categories, adults were asked to judge the category “goodness” of a sampling of one hundred instances of the vowel /i/ using a scale from 1 to 7. Results indicated evidence of a vowel prototype for /i/, but also showed that phonetic prototypes or “best” vowels differed for speakers of different languages.

Further perceptual testing revealed an even more unique occurrence: sounds that were close to a prototype could not be distinguished from the prototype, even though they were physically different. It appeared as if the prototype perceptually assimilated nearby sounds like a magnet, attracting the other sounds in that category.

Dubbed the perceptual magnet effect, this theory offered a possible explanation of why adult speakers of a given language can no longer hear certain phonetic distinctions as is the case with Japanese speakers who have difficulty discriminating between /r/ and /l/; the Japanese prototype is something that is acoustically similar to both sounds and results in their assimilation by the Japanese prototype.

magnet effect

To discover whether infants are born with all the prototypes of all languages and whether language experience then eliminates those prototypes which are not reinforced, an experiment in which 6-month-old American infants listened to English was performed (Kuhl, 1991).

It confirmed the perceptual magnet effect but left the question of the role of language experience unresolved. When a study was conducted (Kuhl,Williams, Lacerda, Stevens & Lindblom, 1992) with listeners from two different languages (English and Swedish) on the same vowel prototypes it was demonstrated that the perceptual magnet effect is strongly affected by exposure to a specific language.

The Native Language Magnet (NLM) theory grew out of the research on the development of speech perception. Simply stated, it explains how infants at birth can hear all of the phonetic distinctions used in the world’s languages.

specific language

However, during the first year of life, prior to the acquisition of word meaning and contrastive phonology, infants begin to perceive speech by forming mental representations or perceptual maps of the speech they hear in their environment.

These representations, stored in the brain, constitute the beginnings of language-specific speech perception and serve as a blueprint which guides infants’ attempts to produce speech. The native language magnet effect works to partition the infant’s perceptual space in a way that conforms to phonetic categories in the language that is heard.

spoken language

Sounds in the spoken language that are close to a given magnet or prototype are perceptually pulled into the magnet and thus assimilated, and not discriminated, by the listener. As the perceptual space surrounding a category prototype or magnet shrinks, it takes a very large acoustic difference for the listener to hear that sound.

However, a very small acoustic difference in the region of a nonprototype can be heard easily. Thus the developing magnet pulls sounds that were once discriminable toward a single magnet, making them no longer discriminable and changing the infant’s perception of speech.

acoustic difference

People and Society

Speech Perception

Blogroll

Blog Archive