Natural Scene Perception
   HOME

TheInfoList



OR:

Natural scene perception refers to the process by which an agent (such as a human being) visually takes in and interprets scenes that it typically encounters in natural modes of operation (e.g. busy streets, meadows, living rooms). This process has been modeled in several different ways that are guided by different concepts.


Debate over role of attention

One major dividing line between theories that explain natural scene perception is the role of
attention Attention is the behavioral and cognitive process of selectively concentrating on a discrete aspect of information, whether considered subjective or objective, while ignoring other perceivable information. William James (1890) wrote that "Atte ...
. Some theories maintain the need for focused attention, while others claim that focused attention is not involved. Focused attention played a partial role in early models of natural scene perception. Such models involved two stages of visual processing. According to these models, the first stage is attention free and registers low level features such as
brightness Brightness is an attribute of visual perception in which a source appears to be radiating or reflecting light. In other words, brightness is the perception elicited by the luminance of a visual target. The perception is not linear to luminance, ...
gradients,
motion In physics, motion is the phenomenon in which an object changes its position with respect to time. Motion is mathematically described in terms of displacement, distance, velocity, acceleration, speed and frame of reference to an observer and mea ...
and orientation in a parallel manner. Meanwhile, the second stage requires focused attention. It registers high-level object descriptions, has limited capacity and operates serially. These models have been empirically informed by studies demonstrating
change blindness Change blindness is a perceptual phenomenon that occurs when a change in a visual stimulus is introduced and the observer does not notice it. For example, observers often fail to notice major differences introduced into an image while it flickers ...
,
inattentional blindness Inattentional blindness or perceptual blindness (rarely called ''inattentive blindness'') occurs when an individual fails to perceive an unexpected stimulus in plain sight, purely as a result of a lack of attention rather than any vision defects o ...
and
attentional blink Attentional blink (AB) is a phenomenon that reflects temporal limitations in the ability to deploy visual attention. When people must identify two visual stimuli in quick succession, accuracy for the second stimulus is poor if it occurs within 20 ...
. Such studies show that when one's visual focused attention is engaged by a task, significant changes in one's environment that are not directly pertinent to the task can escape awareness. It was generally thought that natural scene perception was similarly susceptible to change blindness, inattentional blindness and attentional blink, and that these psychological phenomena occurred because engaging in a task diverts attentional resources that would otherwise be used for natural scene perception.


Evidence against the need for focused attention

The attention-free hypothesis soon emerged to challenge early models. The initial basis for the attention-free hypothesis was the finding that in
visual search Visual search is a type of perception, perceptual task requiring attention that typically involves an active scan of the visual environment for a particular object or feature (the target) among other objects or features (the distractors). Visual s ...
, basic visual features of objects immediately and automatically pop out to the person doing the visual search.See 2. Further experiments seemed to support this: Potter (as cited by Evans & Treisman, 2005) showed that high-order
representations ''Representations'' is an interdisciplinary journal in the humanities published quarterly by the University of California Press. The journal was established in 1983 and is the founding publication of the New Historicism movement of the 1980s. It ...
can be accessed rapidly from natural scenes presented at rates of up to 10 per second. Additionally, Thorpe, Fize & Marlot (as cited by Evans & Treisman) discovered that humans and primates can categorize natural images (i.e. of animals in everyday indoor and outdoor scenes) rapidly and accurately even after brief exposures. The basic idea in these studies is that exposure to each individual scene is too brief for attentional processes to occur, yet human beings are able to interpret and categorize these scenes. Weaker versions of the attention-free hypothesis have also been targeted at specific components of the natural scene perception process instead of the process as a whole. Kihara & Takeda (2012) limit their claim to saying that it is the integration of spatial frequency-based information in natural scenes (a sub-process of natural scene perception) that is attention free. This claim is based on a study of theirs which used attention-demanding tasks to examine participants' abilities to accurately categorize images that were
filtered Filtration is a physical separation process that separates solid matter and fluid from a mixture using a ''filter medium'' that has a complex structure through which only the fluid can pass. Solid particles that cannot pass through the filter m ...
to have a wide range of spatial frequencies. The logic behind this experiment was that if integration of visual information across spatial frequencies (measured by the categorization task) is preattentive, then attention-demanding tasks should not affect performance in the categorization task. This was indeed found to be the case.


More recent evidence reasserting the need for focused attention

A recent study by Cohen, Alvarez & Nakayama (2011) calls into question the validity of evidence supporting the attention-free hypothesis. They found that participants did display inattentional blindness while doing certain kinds of multiple-object tracking (MOT) and
rapid serial visual presentation Rapid serial visual presentation (RSVP) is a scientific method for studying the timing of vision. In RSVP, a sequence of stimuli are shown to an observer at one location in their visual field. The observer is instructed to report one of these stimu ...
(RSVP) tasks. Furthermore, Cohen et al. found that participants' natural scene perception was impaired under dual-task conditions, but that this dual-task impairment happened only when participants' primary task was sufficiently demanding. The authors concluded that previous studies showing the absence of a need for focused attention did not use tasks that were demanding enough to fully engage attention. In the Cohen et al. study, the MOT task involved viewing eight black moving discs presented against a changing background that consisted of randomly colored checkerboard masks. Four of these discs were picked out and participants were instructed to track these four discs. The RSVP task involved viewing a stream of letters and digits presented against a series of changing checkerboards, and counting the number of times a digit was presented. In both experiments, the critical trial involved a natural scene suddenly replacing the second last checkerboard, and participants were immediately afterwards asked whether they had noticed anything different, as well as presented with six questions to determine whether they had categorized the scene. The dual-task condition simply involved participants performing the MOT task mentioned above and a scene-classification task simultaneously. The authors varied the difficulty of the task (i.e. how demanding the task was) by increasing or decreasing the speed of the moving discs.


Models

These are some of the models that have been proposed for the purpose of explaining natural scene perception.


Evans' & Treisman's hypothesis

Evans & Treisman (2005) proposed a hypothesis that humans rapidly detect disjunctive sets of unbound features of target categories in a
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of IBM ...
manner, and then use these features to discriminate between scenes that do or do not contain the target without necessarily fully identifying it. An example of such a feature would be outstretched wings that can be used to tell whether or not a bird is in a picture, even before the system has identified an object as a bird. Evans & Treisman propose that natural scene perception involves a first pass through the visual processing hierarchy up to the nodes in a visual identification network, and then optional revisiting of earlier levels for more detailed analysis. During the 'first pass' stage, the system forms a global representation of the natural scene that includes the layout of global boundaries and potential objects. During the 'revisiting' stage, focused attention is employed to select local objects of interest in a serial manner, and then bind their features to their representations. This hypothesis is consistent with the results of their study in which participants were instructed to detect animal targets in RSVP sequences, and then report their identities and locations. While participants were able to detect the targets in most trials, they were often subsequently unable to identify or localize them. Furthermore, when two targets were presented in quick succession, participants displayed a significant attentional blink when required to identify the targets, but the attentional blink was mostly eliminated among participants only required to only detect them. Evans & Treisman explain these results by with the hypothesis that the attentional blink occurs because the identification stage requires attentional resources, while the detection stage does not.


Ultra-rapid visual categorization

Ultra-rapid visual categorization is a model proposing an automatic
feedforward Feedforward is the provision of context of what one wants to communicate prior to that communication. In purposeful activity, feedforward creates an expectation which the actor anticipates. When expected experience occurs, this provides confirmato ...
mechanism that forms high-level object representations in parallel without focused attention. In this model, the mechanism cannot be sped up by training. Evidence for a feedforward mechanism can be found in studies that have shown that many
neurons A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
are already highly selective at the beginning of a visual response, thus suggesting that feedback mechanisms are not required for response selectivity to increase. Furthermore, recent
fMRI Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area o ...
and ERP studies have shown that masked visual stimuli that participants do not consciously perceive can significantly modulate activity in the motor system, thus suggesting somewhat sophisticated visual processing. VanRullen (2006) ran simulations showing that the feedforward propagation of one wave of spikes through high-level neurons, generated in response to a stimulus, could be enough for crude recognition and categorization that occurs in 150 ms or less.


Neural-object file theory

Xu & Chun (2009) propose the neural-object file theory, which posits that the human visual system initially selects a fixed number of roughly four objects from a crowded scene based on their spatial information (object individuation) before encoding their details (object identification). Under this framework, object individuation is generally controlled by the i nferior intra-parietal sulcus (IPS), while object identification involves the superior IPS and higher-level visual areas. At the object individuation stage, object representations are coarse and contain minimal feature information. However, once these object representations (or object-files, to use the theory's language) have been 'set up' during the object individuation stage they can be elaborated on over time during the object identification stage, during which additional featural and identity information is received. The neural-object file theory deals with the issue of attention by proposing two different processing systems. One of them tracks the overall hierarchical structure of the visual display and is attention-free, while the other processes current objects of attentional selection. The current hypothesis is that the
parahippocampal place area The parahippocampal gyrus (or hippocampal gyrus') is a grey matter cortical region of the brain that surrounds the hippocampus and is part of the limbic system. The region plays an important role in memory encoding and retrieval. It has been inv ...
(PPA) plays a role in shifting visual attention to different parts of a scene and incorporating information from multiple frames in order to form an integrated representation of the scene. The separation between object individuation and identification in the neural object-file theory is supported by evidence such as that from Xu's & Chun's fMRI study (as cited in Xu & Chun, 2009). In this study, they examined posterior brain mechanisms that supported
visual short-term memory In the study of visual perception, vision, visual short-term memory (VSTM) is one of three broad memory systems including iconic memory and long-term memory. VSTM is a type of short-term memory, but one limited to information within the visual domai ...
(VSTM). The fMRI showed that representations in the inferior IPS were fixed to roughly four objects regardless of object complexity, but representations in the superior IPS and lateral occipital complex (LOC) varied according to complexity.See 12.


Natural scene statistics


References

{{reflist Perception Psychological concepts