Skip to main content icon/video/no-internet

Word Recognition, Auditory

Language provides humans with the remarkable capacity to express their thoughts through a physical medium to share with others. To do so, we combine elements, words, whose form has been conventionalized within a particular language community. Thus a critical step in the process of retrieving a talker's message consists of identifying these elements in his or her speech. This entry discusses how our knowledge of the auditory forms that words take may be represented in memory, and how listeners decide, based on the auditory stimulus, which words they heard, out of all possible word combinations the talker may have spoken.

What Does Our Knowledge of Words Look Like?

When we listen to someone talk, words seem to pop out of his or her speech effortlessly. This impression is misleading, however. Words are not neatly segregated from one another in speech as they are in print. How many words the utterance contains, and where they begin and end in the speech stream, are properties that the listener must establish. Moreover, the way spoken words sound varies considerably across contexts—for example, when produced by a man or a woman, in the clear speech used in lecture halls, or in the casual speech characteristic of informal conversation. Our knowledge of the form of words must accommodate this variability. Two approaches to this issue can be contrasted.

First, listeners may represent the form of a word as a compilation of the memory traces that correspond to all past exposure with the word. Each instance retains the acoustic properties resulting from the context in which the word was uttered. Such a representation is sometimes described as a cluster of observations in a multidimensional space. A more compact representation may also be postulated, such as one that represents the central tendency derived from past instances of a word, its prototype. These views assume ever-changing word representations because new instances of words are constantly added to the cluster or the set of instances that contribute to the central tendency.

These exemplar and prototype views are rooted in cognitive theories of categorization and contrast with a second, linguistically grounded, approach where words are represented by the features that distinguish them from other words. The acoustic properties of a spoken word, such as the voice quality of the talker that utters it, are considered irrelevant to this distinction and consequently not part of the representation of the word's form. This approach assumes abstract, context-independent, and immutable representations. Normalization algorithms transform information extracted from the speech to neutralize the influence of contextual variability, in effect treating it as noise, or to model the variation and factor out its influence.

Distinguishing between the two approaches has proven difficult. For instance, some have taken the fact that people recognize words uttered by familiar talkers more readily than the same words from unfamiliar talkers as evidence supporting the instance-based approach because it demonstrates that nondistinctive properties of spoken stimuli are maintained in memory and contribute to recognition. However, the finding is also compatible with the abstractionist approach if one assumes that the normalization algorithms can be optimized to reflect past experience with a given talker.

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading