Subscribe to Computing Intelligence

Saturday, November 8, 2008

Top-down Processing in Visual Perception Part I: Introduction and Some Examples

One of the subjects I have written about before is machine vision and the incredible difficulty of developing a robust visual processing system that can equal the robustness of our own visual system. It shouldn't be entirely surprising, though, that our visual system is as incredibly powerful as it is, since a huge proportion of our brain is utilized primarily for visual processing. One of the interesting debates in perception psychology and neuroscience is whether the brain performs bottom-up or top-down processing. As with most things (especially in psychology), neither one is entirely correct and your brain utilizes a combination of the two. Optical illusions and trick images are one relatively simple way to explore the way our brain processes visual information, and they are also fairly fun to look at.

Bottom-up processing basically means your brain reads in the raw visual information captured by the retina and gradually figures out what it means as one moves farther along the processing chain that is your cerebral cortex. Top-down processing means you start with an idea of what you ought to be seeing (most likely determined by recent sensory information, other sensory clues, and your past experience). Your brain clearly does some bottom-up processing, since you react to raw changes in the visual stimuli even if there was no reason to expect that change. What is fairly surprising, though, is top-down processing is also clearly involved in visual processing. Effectively introducing top-down processing into artificial visual systems, however, is quite difficult, and it would seem that the top-down algorithms instituted by our brains (and their handy parallel architecture) are what keep us currently so far ahead of computers.

One example of top-down processing that is fairly easy to demonstrate is the blind spot. In your retina you have a small area devoid of receptors where nerves and blood vessels enter and leave your eye. This is normally not a problem since the blindspot of each eye falls on a different area of your visual field, so the sensory perceptions of one eye can compensate for the other. Also, your eyes are almost constantly performing saccades (small jumps around to focus on different regions of the visual field). However, if you close one eye and keep your other eye locked on a specific target, your blind spot becomes anchored in place. You do not realise this, though, because your brian manages to fill in that area of your visual field with its best guess as to what is there. A quick way to demonstrate this is to take a piece of scrap paper and put two X's on it about eight centimeters apart. Then close one of your eyes and stare at the opposite mark with your open eye (for example, if you closed your left eye, look at the left X with your right eye). Hold the paper about half an arm's length in front of you and gradually move it closer. At a certain point, the X on the periphery of your vision should disappear. When it does, it is sitting in your blind spot, and your brain fills in that area with it's best guess (in this case, blank white paper).

Another example that occurs slightly higher up in your visual processing is the Necker Cube, shown here.

This simple drawing forms a three dimensional clear cube. It is ambiguous, though, whether it is intended to be in one of two possible orientations: are you looking slightly down onto the cube, or slightly up at it (in other words, are the bottom two corners corners on the front or back face of the cube)? For most people, there is a default orientation when they first see it. However, after staring at the cube for a few moments, they can cause it to 'flip' into the other orientation. At no point, though, can both orientations be held in one's head at once (at least, I cannot manage to do that). It would seem that your brain takes the visual information provided about the cube's edges and then tries to fit an interpretation on it. Since more than one interpretation is possible, your brain alternates between them. However, whenever one particular interpretation is selected, the others are suppressed to avoid conflicting interpretations of a visual scene.

Continue reading in Part II: Faces.


R said...

Neat post. Working in machine vision must be fascinating.

I'd argue with your characterization of the blind spot as an example of top-down processing, though. I'm not sure the brain "fills in" anything. Does it need to actually fill this area in, or is the blind spot simply outside of our perception? Similarly, we're not aware of the periods of blindness that we technically experience with ever saccadic movement, but I'm not sure that it's because the brain is filling in those periods of blankness, we're simply unaware of them.

Mozglubov said...

Thanks for the comment, R. The reason I interpret the blindspot as being filled in is because of the experiment I described in the post in which you move a piece of paper until the 'X' disappears. At that point, you perceive a cohesive blank white sheet of paper, not one with a missing hole of unknown content. Likewise, if the blind spot were simply not perceived but not actively filled in, when you fixate your eyes on a part of the visual field and then shut one of them, the part of your periphery corresponding to the blindspot of your open eye should 'wink out' as it loses the visual information provided by the eye you just shut.

Of course, my interpretation could be wrong. My understanding of people suffering from hemianopia (loss of a hemisphere of vision, usually due to stroke or other neurological damage) are simply unaware of the lost field of vision without actively perceiving it as lost (I've never worked with or studied hemianopia in depth, however, so I could be wrong about this). Thus, one could make the argument that the blindspot functions similarly, just on a smaller scale.

As for the non-awareness of saccadic blindness, that is true but also starts to get a lot more complicated since it adds a temporal dimension to the problem.