Scene Perception
Oliva, A. Chapter in the New Visual Neurosciences. Eds John S. Werner and Leo. M. Chalupa


Visual scene perception is the gateway to many of our most valued behaviors, like navigation, recognition, and reasoning with the world around us.  What is a "visual scene"? What are its properties? Is scene perception different than object perception? Operationally, a visual scene can be defined as a view in which objects and surfaces are arranged in a meaningful way, for example a kitchen, a street, or a forest. Scenes contain elements arranged in a spatial layout and can be viewed at a variety of spatial scales (e.g. the up-close view of an office desk or the view of the entire office).  As a rough distinction, one generally takes action upon an object, while one usually acts within a scene.

One paradoxical feature of visual scene analysis is that the complex arrangement of objects and surfaces in the world creates the impression that there is too much to see at once. How can so much visual information be processed and understood in a timely manner? Remarkably, we are able to interpret the meaning of multifaceted and complex scene images- a wedding, a birthday party, or a stadium crowd - in a fraction of a second! (Potter, 1975). This is about the same time it takes a person to identify that a single object is a face, a dog or a car (Grill-Spector & Kanwisher, 2005; Intraub, 1981; Thorpe, Fize, & Marlot, 1996). An unmistakable demonstration of the brain's prowess in visual scene understanding can be experienced (and enjoyed!) at the movies: with a few rapid scene cuts from a movie to form a trailer, it seems as if we have perceived and understood much more of the story in a few seconds than could be described later in the same amount of time. Perceiving scenes in a glance is like looking at an abstract painting of a landscape and recognizing that a “forest” is depicted before seeing the “trees” that create it (Navon, 1977).

This chapter reviews research in the behavioral, computational and cognitive neuroscience domains that describe how the human visual system analyses real-world scenes. While we typically experience scenes in a three-dimensional physical world, most studies are conducted using two-dimensional pictures.  While there are likely important differences between perceiving the world and perceiving visual scenes via pictures, this chapter describes principles that are likely to apply to both mediums (for a review, see Cutting, 2003).