Skip to main content icon/video/no-internet

Computational Approaches

Perception is the analysis of sensory input in the context of our prior perceptual experience of the world. The goal of such analysis in visual perception is to infer the identities, forms, and spatial arrangement of objects in the three-dimensional (3-D) scene based on our two-dimensional (2-D) retinal images. Computational approaches to perception seek to elucidate the theoretical principles and to model the mechanisms underlying these analyses and inferential processes.

The theoretician David Marr suggested that computational accounts of perception should provide explanations at three different levels: (1) computational theory, (2) representation and algorithms, and (3) implementation. Accounts at the computational theory level clarify the purposes or goals of the computations underlying a perceptual phenomenon and explain the logic of the proposed strategies for achieving those goals.

Next, representation and algorithm level accounts describe how information is represented and how it is transformed into the desired output given the input. The implementation level details the physical implementation of the transformation in neural circuits or in computers: notably, each implementation medium is associated with a set of unique constraints—for example, computers can manipulate numbers with high precision but in a step-by-step serial fashion, whereas neurons individually represent information in low resolution and slowly but process information in a massive parallel fashion. These three levels of computational account are coupled only loosely. There is wide choice available at each level, and the explication of each level involves issues that are rather independent of the other two. This entry describes stages and modules of perceptual computation, varieties of computational approaches, and contributions of computational approaches.

Stages and Modules of Perceptual Computation

Perceptual computation in our visual system can be roughly divided into three stages: early vision, mid-level vision, and high-level vision. Early vision involves the extraction of elementary visual features such as edges, color, and optical flow, among others, as well as the grouping of these features into useful aggregates. Mid-level vision deals with the inference of visible surfaces or Marr's so-called 2.5-D sketch. The objective here is to infer the geometrical shapes of the visible surfaces and the occlusion relationships between them in a visual scene. High-level vision concerns reasoning about objects, their identities, structures, and locations. It also concerns global scene information such as 3-D spatial layout and illumination direction. Although the three-stage division is motivated mainly by functional considerations, these computational stages correspond roughly to the different visual areas along the ventral visual pathway in the hierarchical visual cortex. Marr suggested that visual computation proceeds in a bottom-up, feedforward fashion, employing a series of loosely coupled, decomposable computational modules, each of which can be studied mathematically and computationally in isolation. Recent research, however, has begun to emphasize the functional roles of recurrent interaction among the different computational processes during perceptual inference.

Varieties of Computational Approaches

There are three major computational approaches in the study of perception: the inverse optics approach, the dynamical system/neural network approach, and the statistical inference approach. All three approaches consider perception fundamentally as a problem of inference, as proposed by Hermann von Helmholtz, for filling in the missing logical gap between the retinal images and the perceptual knowledge to be derived from them. Current research formulates perceptual inference in the Bayesian framework, which emphasizes the integration of prior knowledge in the inference process. To illustrate Bayesian inference, let us consider the following example. Suppose I saw a woman wandering around in my yard wearing a hat one evening when I got home. Normally, I would have concluded that it was my wife because she was the only woman in the house. On the other hand, grandma told me she would be visiting either that day or the next day. Thus, it could also be granny. In Bayesian terms, the prior probability of the woman being my wife was p(wife) = 2/3, and that of granny was p(granny) = 1/3. Now, from experience, I also knew granny loved wearing hats but my wife was not fond of it. Thus, the likelihood of observing granny in a hat was p(hat|granny) = 0.2, but that of observing my wife in a hat was much lower at p(hat|wife) = 0.05. The probability of a certain interpretation after combining the likelihood of a certain observation and the prior probability of that interpretation, called the posterior probability (i.e., p(granny|hat), p(wife|hat)), can be obtained by the Bayes'

...

  • Loading...
locked icon

Sign in to access this content

Get a 30 day FREE TRIAL

  • Watch videos from a variety of sources bringing classroom topics to life
  • Read modern, diverse business cases
  • Explore hundreds of books and reference titles

Sage Recommends

We found other relevant content for you on other Sage platforms.

Loading