1. Introduction to Computer Vision · 2014-12-29 · 1. 인간은 눈으로 “보는” 것이...

E-mail: [email protected]://web.yonsei.ac.kr/hgjung

1. Introduction to1. Introduction toComputer VisionComputer Vision


1.1. Human Vision1.1. Human Vision


목적목적

1.

인간은

눈으로

“보는” 것이

아니고

뇌로

“보는” 것이다. 이를

통해, 본다는

것이

단순

한

감각이

아니고

복잡한

정보처리임을

배운다. Model-based approach의

필요성

을

배운다.

2.

인간의

시각과정을

모사하는

computer vision의

가능성을

배운다.

3.

개념이

인간의

시각과정의

특성에

의하여

규정됨을

배운다.

인간의

시각과정을

모사하는

computer vision이

우수할

것이라는

가정의

근거는?

우리가

computer vision을

통해

검출/인식하려는

대상은

어떻게

규정되었는가?

인간이

효과적으로

검출/인식할

수

있는

경우를

대상화한

것은

아닌가?

예) 색, edge 위주

물체

인식

4.

인간의

사유

체계로

인하여

통폐합된

개념이

시각과정의

통일성을

훼손할

수

있음을

배운다. 하나의

대상을

여러

시각적

대상으로

분해하려는

노력이

필요하다는

것을

배운다.

예) 다양한

외형의

비행기, 자동차, 다양한

자세의

보행자


David H. Hubel [1] [3]David H. Hubel [1] [3]

David Hunter Hubel

(born February 27, 1926) was co-

recipient with Torsten

Wiesel

of the 1981 Nobel Prize in Physiology or Medicine, for their discoveries concerning information processing in the visual system.

The understanding of sensory processing in animals served as inspiration for the SIFT

descriptor (Lowe, 1999), which is a local feature used in computer vision

for tasks such as object recognition

and wide-baseline matching, etc. The SIFT

descriptor is arguably the most widely used feature type for these tasks [3].


The Brain [1]The Brain [1]

• The brain contains 1012

(one million million) cells.• A typical nerve cell in the brain receives information from hundreds or

thousands of other nerve cells and in turn transmits information

to hundreds or

thousands of other cell.• The total number of interconnections in the brain should therefore be

somewhere around 1014

to 1015.


The Brain [1]The Brain [1]

This view of a human brain seen from the left and slightly behind shows the cerebral cortex and cerebellum. A small part of the brainstem can be seen just in front of the cerebellum.

두정엽

후두엽

측두엽

전두엽

Cerebrum: 대뇌

Cerebral: 뇌의

Cortex: (특히

대뇌) 피질

뇌간

선조피질

척수

Spine: 척추Spinal: 척추의

소뇌


Visual Pathway [1]Visual Pathway [1]

망막

외측슬상체

(外側膝狀體)

선조피질

(線條皮質)


Visual Pathway [1]Visual Pathway [1]1.

Retina• A plate having three layers of cells, one of which contains the light-sensitive

receptor cells, or rods and cones.• Each eye contains over 125 million receptors.

2.

LGB

(Lateral Geniculate

Bodies) or LGN

(Lateral Geniculate

Nuclear)• The two retinas send their output to two peanut-size nests of cells deep within the

brain, the lateral geniculate

bodies.3.

Primary Visual Cortex• LGB send their fibers to the visual part of the cerebral cortex.

More specifically, they go to the striate cortex, or primary visual cortex.

4.

From there, after being passed from layer to layer through several sets of synaptically

connected cells, the information is sent to several neighboring higher visual areas; each of these sends its output to several others.


Questions [1]Questions [1]

1.

Why all these chains of neuronal structures exist?

2.

How they work, and what they do?

3.

What kind of visual information travels along a trunk of fibers?

4

.

How the information is modified in each region – retina, lateral geniculate

body,

and the various levels of cortex?


Investigation Method [1]Investigation Method [1]

An experimental plan for recording from the visual pathway. The animal, usually a

macaque monkey, faces a screen

onto which we project a stimulus. We record by

inserting a microelectrode

into some part of the pathway, in this case, the primary visual

cortex. (The brain in this diagram is from a human, but a monkey

brain is very similar.)


The Eye [1]The Eye [1]

• The eyeball and the muscles that control its position. The cornea and the lens focus the

light rays onto the back of the eye. The lens regulates the focusing for near and far

objects by becoming more or less globular.

• About two-thirds of the bending of light

necessary for focusing takes place at the air-

cornea interface, where the light enters the eye.

• The lens

of the eye supplies the remaining third of the focusing power, but its main job is

to make the necessary adjustments to focus on objects at various distances.

홍채

각막

모양체근

(毛樣體) 공막

외안근

수정체

시신경


The Retina [1]The Retina [1]

• The retina translates light into nerve signals, allows us to see under conditions that range from starlight to sunlight, discriminates wavelength so that we can see colors, and provides a precision sufficient for us to detect a human hair or

speck of dust a few yards away.

• The retina is part of the brain, having been sequestered from it early in development but having kept its connections with the brain proper through a bundle of fibers – the optic nerve.



• The enlarged retina at the right shows the relative positions of

the three retinal layers. Surprisingly, the light has to pass through the ganglion-cell and bipolar-cell layers before it gets to the rods and cones.

• The tier of cells at the back of the retina contains the light receptors, the rods and cones.• Rods: vision in dim light, Cones: fine and color vision, Rods >> Cones• Cones are most densely packed in the fovea

(about half a millimeter in diameter).• In the middle layer, there are three types of nerve cells: bipolar cells, horizontal cells,

and amacrine

cells.

Ganglion: 신경절



• Each eye contains about 125 million rods and cones

but only 1 million ganglion cells.

How detailed visual information can be preserved?• The total area occupied by the receptors in the black layer that

feed one ganglion cell in the front layer, directly and indirectly, is only about one millimeter. That area is the receptive field of the ganglion cell, the region of retina over which we can influence the ganglion cell’s firing by light simulation.



Erez N. Ribak and Amichai M. Labin, “Light propagation explains our inverted retina,” 18 Oct. 2010, SPIE Newsroom, http://spie.org/x42206.xml?pf=true&ArticleID=x42206

We found very limited coupling into neighbor glial cells for small incidence angles. In other words, only light that came through the center of the pupil was captured by the glial cells, concentrated, and guided directly to the cones (see Figure 2). Evidently, the retina developed into a natural optical waveguide array, tailored to almost perfectly preserve images obtained through a narrower pupil.

Figure 2. (left) Light concentration in the glial-cell array, rotated by two degrees. Green light follows the central cell C, and very little leaks to the neighbor N. (right) The glial cells are rotated by eight degrees and cannot hold on to blue light. The decoupled light mostly arrives at rods between the two cells, and only a small fraction is captured in the neighbor cell. This high angle corresponds to light arriving from the outskirts of the pupil, dilated under darkness conditions.

http://spie.org/x42206.xml?pf=true&ArticleID=x42206



Erez N. Ribak and Amichai M. Labin, “Light propagation explains our inverted retina,” 18 Oct. 2010, SPIE Newsroom, http://spie.org/x42206.xml?pf=true&ArticleID=x42206

We also found that it made little difference if this light was blue, green, or red. This explains why we are not sensitive to the difference in focus between colors. The eye is affected by significant chromatic aberration, where blue light is focused approximately 0.25mm in front of red light. This aberration forces ophthalmologists to use a single color or correct for the color aberration to see a sharp focus. It was thought that processing by the neural network takes care of this focus error, but it has now become clear that guiding by the glial cells removes this color ambiguity.

http://spie.org/x42206.xml?pf=true&ArticleID=x42206


The Function of Retina [1]The Function of Retina [1]

OnOn--center cell and offcenter cell and off--center cellcenter cell

• Narrowly defined, the term receptive field refers simply to the specific receptors that feed into a given cell in the nervous system, with one or more synapses intervening.

• Retinal ganglion cell’s receptive fields

had a substructure.• Ganglion cells were of two types, on-center cell and off-center cell.

• The center surround structures are mathematically equivalent to the edge detection

algorithms

used by computer programmers to extract or enhance the

edges in a digital photograph. Thus the retina performs operations on the image to enhance the edges of objects within its visual field.


The Overlapping of Ganglions [1]The Overlapping of Ganglions [1]

Two neighboring retinal ganglion cells receive input over the direct path from two overlapping groups of receptors. The areas of retina occupied by

these receptors make up their receptive-field centers, shown face on by the large overlapping circles.

• How some particular retinal stimulus affects the entire population of ganglion cells?• Neighboring retinal ganglion cells in fact receive their inputs from richly overlapping and

usually only slightly different arrays of receptors.• The fineness of our world image is best measured not by the overall size of receptive

fields, but by the size of the field centers.


Optic Cabling [4] [5]Optic Cabling [4] [5]

Simplified Signal Flow: Photoreceptors →

Bipolar → Ganglion →

Chiasm → LGN →

V1 cortex 교차

http://en.wikipedia.org/wiki/File:ERP_-_optic_cabling.jpg


Optic Cabling [9]Optic Cabling [9]

Simplified Signal Flow: Photoreceptors →

Bipolar → Ganglion →

Chiasm → LGN →

V1 cortex

시신경구

(球)

시신경교차(視神經交叉)

외측슬상체(外側膝狀體)

시방사(視放射)

선조피질(線條皮質)

Thalamus: 시상(視床)

Hypothalamus: 시상하부(視床下部)

Circadian rhythms: 생리

기능

주기

상구

동공반사


Optic Chiasm [6]Optic Chiasm [6]

Visual pathway with optic chiasm (X shape outlined, red) (1543 image from Andreas Vesalius' Fabrica) [5]

The optic chiasm

or optic chiasma

is the part of the brain

where the optic nerves

(CN II) partially cross. This allows for parts of both eyes

that attend to the right visual field to be processed in the left visual system

in the brain, and vice versa.

뇌량(腦梁)

http://en.wikipedia.org/wiki/File:1543,Visalius%27OpticChiasma.jpg

http://upload.wikimedia.org/wikipedia/commons/2/22/Gray720.png


Thalamus (Thalamus (시상시상))The thalamus

is a midline paired symmetrical structure within the brains

of vertebrates, including humans.It is situated between the cerebral cortex

and midbrain, both in terms of location and neurological connections.Its function includes relaying sensation, spatial sense, and motor signals to the cerebral cortex, along with the regulation of consciousness, sleep, and alertness. http://en.wikipedia.org/wiki/Thalamus

MRI

cross-section of human brain, with thalamus marked.

http://en.wikipedia.org/wiki/Thalamus

http://en.wikipedia.org/wiki/File:Brain_chrischan_thalamus.jpg


Hypothalamus (Hypothalamus (시상하부시상하부))

The Hypothalamus

is a portion of the brain

that contains a number of small nuclei

with a variety of functions. One of the most important functions of the hypothalamus is to link the nervous system

to the endocrine system

via the pituitary gland

(hypophysis). http://en.wikipedia.org/wiki/Hypothalamus

뇌하수체내분비계

http://en.wikipedia.org/wiki/Hypothalamus

http://en.wikipedia.org/wiki/File:Illu_diencephalon_.jpg


Lateral Lateral GeniculateGeniculate

Nucleus [7]Nucleus [7]

The lateral geniculate

nucleus

(LGN) is the primary processing center for visual

information received from the retina

of the eye. The LGN is found inside the thalamus

of the brain, and is thus part of the central nervous system.

The LGN receives information directly from the ascending retinal ganglion

cells

via the optic tract and from the reticular activating system. Neurons of the LGN send their axons through the optic radiation, a pathway directly to the primary visual cortex, also known as the striate cortex.

Addition, the LGN receives many strong feedback connections from the

primary visual cortex.

http://upload.wikimedia.org/wikipedia/commons/0/0b/Gray719.png




In humans and macaques the LGN is normally described as having six distinctive layers. The inner two layers, 1 and 2, are called the magnocellular

layers, while the outer four layers, 3, 4, 5, and 6, are called parvocellular

layers

In terms of visual information, then, the lateral geniculate

bodies do not seem to be exerting any profound transformation, and we simply don’t yet know what to make of the nonvisual

inputs and the local synaptic interconnection. The receptive field of lateral geniculate

cells have the same center-surround organization as the retinal ganglion cells that feed into them.

http://upload.wikimedia.org/wikipedia/en/f/f8/Lateral_geniculate_nucleus.png




The layers are stacked in such a way that the eyes alternate.

In the left lateral geniculate

body, the sequence in going from layer to layer, from above downwards, is right, left, right, left, left, right. It is not at all clear why the sequence reverses between the fourth and fifth layers.




The stacked-plate organization is preserved in going from retina to geniculate, except that the fibers from the retinas are bundled into a cable and splayed out again, in an orderly way, at their geniculate

destination.

Each hemisphere of the brain is dealing with the opposite half of the environment, not with the opposite side of the body.In horses and mice the eyes tend to point outward rather than straight ahead, so that most of the retina of the right eye gets its information from the right visual field.

Fixation is the cause.

Much of the rest of the brain is arranged in an analogous way: for example, information about touch and pain coming from the right half of the body goes to the left hemisphere; motor control to the right side of the body comes from the left hemisphere.


Optic Radiation: LGN Optic Radiation: LGN

V1V1

[8][8]

The optic radiation is a collection of axons from relay neurons in the LGN of the thalamus carrying visual information to the visual cortex along calcarine

fissure.


Primary visual cortex (V1) [10]Primary visual cortex (V1) [10]

The primary visual cortex is divided into six functionally distinct layers, labelled

1

through 6. Layer 4, which receives most visual input from the lateral geniculate

nucleus

(LGN), is further divided into 4 layers, labelled

4A, 4B, 4Cα, and 4Cβ.

Sublamina

4Cα

receives most magnocellular

input from the LGN, while layer 4Cβ

receives input from parvocellular

pathways.

http://en.wikipedia.org/wiki/File:Visualcortex.gif


Visual Cortex [10]Visual Cortex [10]

The term visual cortex refers to the primary visual cortex (also

known as striate cortex or V1) and extrastriate

visual cortical area such as V2, V3, V4, V5.

(Two Stream Hypothesis)

V1 transmits information to two primary pathways:

The dorsal stream

(green) and ventral stream

(purple) are shown. They originate from primary visual cortex.

1)

The dorsal stream: V1

V2

dorsomedial

area and V5(Visual area MT), posterior parietal cortex. The dorsal stream, sometimes called the "Where Pathway" or "How Pathway", is associated with motion, representation of object locations, and control of the eyes and arms, especially when visual information is used to

guide saccades

or reaching.

2)

The ventral stream: V1

V2

V4 and inferior temporal cortex. The ventral stream, sometimes called the “What Pathway”, is associated with form recognition and object representation. It is also associated with storage of long-term memory.

http://en.wikipedia.org/wiki/File:Brodmann_areas_17_18_19.png

http://en.wikipedia.org/wiki/File:Ventral-dorsal_streams.svg


The Function of Visual Cortex [1]The Function of Visual Cortex [1]

The flow of information in the cortex takes place over several loosely defined stages.At the first stage, most cells respond like geniculate

cells.

From the next stage, cells get their input from the center-surround cortical cells in the first stage [1]. Neurons in visual association cortex may respond selectively to human faces, or to a particular object

[4].

1) A simple cell

in the primary visual cortex

is a cell that responds primarily to oriented edges and gratings (bars of particular orientations). [11]

2) Complex cells

can be found in the primary visual cortex (V1), the secondary visual cortex (V2), and Brodmann

area 19

(V3). Like a simple cell, a complex cell will respond primarily to oriented edges and gratings, however it has a degree of spatial invariance. This means that its receptive field cannot be mapped into fixed excitatory and inhibitory zones. Rather, it will respond to patterns of light in a certain

orientation within a large receptive field, regardless of the exact location. Some complex cells respond optimally only to movement in a certain direction. [12]


Simple Cell [11]Simple Cell [11]

Gabor filter-type receptive field typical for a simple cell. Blue regions indicate inhibition, red faciliation

Current consensus seems to be that early responses of V1 neurons

consists of tiled sets of

selective spatiotemporal filters. In the spatial domain, the functioning of V1 can be thought

of as similar to many spatially local, complex Fourier transforms, or more accurately, Gabor

transforms. Theoretically, these filters together can carry out neuronal processing of spatial

frequency, orientation, motion, direction, speed (thus temporal frequency), and many other

spatiotemporal features. [10]


Simple Cell [11]Simple Cell [11]

This type of wiring could produce a simple-cell receptive field. On the right, four cells are shown making excitatory synaptic connections with a cell of higher order. Each of the lower-order cells has a radially

symmetric receptive field with on-

center and off-surround,

illustrated by the left side of the diagram. The centers of these fields lie along a line. If we suppose that many more than four center-surround cells are connected with the simple cell, all with their field centers overlapped along this line, the receptive field of the simple cell will consist of a long, narrow excitatory region with inhibitory flanks.

Avoiding receptive-

field terminology, we can say that stimulating with a small spot

anywhere in this long, narrow rectangle will strongly activate one or a few of the center-surround cells and in turn excite the simple cell, although only weakly. Stimulating with a

long, narrow slit will activate all the center-surround cells, producing a strong response in the simple cell.


Complex Cell [12]Complex Cell [12]

Complex cells represent the next step or steps in the analysis. They are the commonest

cells in the striate cortex -

a guess would be that they make up three-quarters of the

population [1].

Like simple cells, they respond over a limited region of the visual field; unlike simple cells,

their behavior cannot be explained by a neat subdivision of the receptive field into

excitatory and inhibitory regions.

The activation of the complex cell by a moving stimulus requires

successive activation of

many simple cells.



Directional Selectivity

Many complex cells respond better to one direction of movement than to the diametrically opposite direction.

Horace Barlow and William Levick

proposed this circuit to explain directional selectivity. Synapses from purple to green are excitatory, and from green to white, inhibitory. We suppose the three white cells at the bottom converge on a single master cell.



Movement-Sensitive Cells: How we see [1]

What our two eyes in fact do is fixate on an object: we first adjust the positions of our eyes

so that the images of the object fall on the two foveas; then we hold that position for a brief

period, say, half a second; then our eyes suddenly jump to a new

position by fixating on a

new target whose presence somewhere out in the visual field has asserted itself, either by moving slightly, by contrasting with the back-ground, or by presenting an interesting shape.

During the jump, or saccade, which is French for "jolt", or "jerk" (the verb), the eyes move

so rapidly that our visual system does not even respond to the resulting movement of the

scene across the retina; we are altogether unaware of the violent change. (Vision may also

in some sense be turned off during saccades by a complex circuit

linking eye-movement

centers with the visual path.)

When we look at a stationary scene by fixating on some point of interest, our eyes lock

onto that point, as just described, but the locking is not absolute. Despite any efforts we

may make, the eyes do not hold perfectly still but make constant

tiny movements called

micro-saccades.



Movement-Sensitive Cells: How we see [1]

In 1952 Lorrin

Riggs and Floyd Ratliff, at Brown University, and R. W. Ditchburn

and B. L.

Ginsborg, at Reading University, simultaneously and independently found that if an image is

optically artificially stabilized on the retina, eliminating any

movement relative to the retina,

vision fades away after about a second and the scene becomes quite blank! (The simplest

way of stabilizing is to attach a tiny spotlight to a contact lens; as the eye moves, the spot

moves too, and quickly fades.) Artificially moving the image on the retina, even by a tiny

amount, causes the spot to reappear at once. Evidently, microsaccades

are necessary for

us to continue to see stationary objects.

It is as if the visual system, after going to the trouble to make movement a powerful

stimulus-wiring up cells so as to be insensitive to stationary objects-had then to invent

micro-saccades to make stationary objects visible.



End-stopped cell [1]

For an end-stopped cell, lengthening the line improves the response up to some limit, but exceeding that limit in one or both directions results in a weaker response.

Top: An ordinary complex cell responds to various lengths of a slit of light. The duration of each record is 2 seconds. As indicated by the graph of response versus slit length, for this cell the response increases with length up to about 2 degrees, after which there is no change.

Bottom: For this end-stopped cell, responses improve up to 2 degrees but then decline, so that a line 6 degrees or longer gives no response.



End-stopped cell [1]

One scheme for explaining the behavior of a complex end-

stopped cell. Three ordinary complex cells converge on the end-stopped cell: one, whose receptive field is congruent with the end-stopped cell's activating region (a), makes excitatory contacts; the other two, having fields in the outlying regions (b and c), make inhibitory contacts.

In an alternative scheme, one cell does the inhibiting, a cell whose receptive field covers the entire area (b). For this to work, we have to assume that the inhibiting cell responds only weakly to a short slit when (a) is stimulated, but responds strongly to a long slit.

This end-stopped simple cell is assumed to result from convergent input from three ordinary simple cells. (One cell, with the middle on-center field, could excite the cell in question; the two others could be off center and also excite or be on center and inhibit.)


Architecture of the Cortex [1]Architecture of the Cortex [1]

A cross section of the striate cortex taken at higher magnification shows cells arranged in layers. Layers 2 and 3 are indistinguishable; layer 4A is very thin. The thick, light layer at the bottom is white matter.



Ocular dominance remains constant in vertical microelectrode penetrations through the striate cortex. Penetrations parallel to the surface show alternation from left eye to right eye and back, roughly one cycle every millimeter.

Ocular-Dominance Columns



In still another experiment where we graph orientation against track distance, three reversals separated long, straight progressions.

Orientation Columns



A tilted line segment shining in the visual field of the left eye

(shown to the right) may cause this hypothetical pattern of activation of a small area of striate cortex (shown to the left). The activation is confined to a small cortical area, which is long and narrow to reflect the shape of the line; within this area, it is confined to left ocular-dominance columns and to orientation columns representing a two o'clock-eight o'clock tilt.

Cortical representation is not simple! When we consider that the orientation domains are not neat parallel lines, suggested here for simplicity, but far more complex.

Maps of the Cortex



We call this our "ice cube model" of the cortex. It illustrates how the cortex is divided, at one and the same time, into two kinds of slabs, one set for ocular dominance (left and right) and one set for orientation. The model should not be taken literally: Neither set is as regular as this, and the orientation slabs especially are far from parallel or straight. Moreover, they do not seem to intersect in any particular angle -

certainly they are not orthogonal, as shown here.

Maps of the Cortex


The Corpus The Corpus CallosumCallosum

and and StereopsisStereopsis

[1][1]

The corpus callosum, a huge band of myelinated

fibers, connects the two cerebral hemispheres. Stereopsis

is one mechanism for seeing depth and judging distance. Although these two features of the brain and vision are not closely related, a small minority of corpus-callosum

fibers do play a small role in Stereopsis.One function of the corpus callosum

is to connect cells so that their fields can span the midline

Here the brain is seen from above. On the right side an inch or so of the top has been lopped off. We can see the band of the corpus callosum

fanning out after crossing, and joining every part of the two hemi-spheres. (The front of the brain is at the top of the picture.)




[1][1]

Left: When an observer looks at a point P, directions of the eyes, adjusting the toeing in or toeing out, will bring the two images of P fall on the foveas P. Q images of an object together over a narrow range of convergence or diver is a point that is judged by the observer to be the same distance away as P. The two images of Q (QL and QR ) are then said to fall on corresponding points. (The surface made up of all points Q, the same apparent distance away as P, is the horopter through P.)Right: If Q' appears closer to the observer than Q, then the images of Q' (QL ' and QR ') will be farther apart on the retina in a horizontal direction than they would be if they were corresponding points. If Q' appears farther away, QL ' and QR ' will be horizontally displaced toward each other.




[1][1]

To sum up, our ability to see depth depends on five principles:

1.

We have many cues to depth, such as occlusion, parallax, rotation of objects, relative

size, shadow casting, and perspective. Probably the most important cue is stereopsis.

2.

If we fixate on, or look at, a point in space, the images of the point on our two retinas

fall on the two foveas. Any point judged to be the same distance away as the point

fixated casts its two images on corresponding retinal points.

3.

Stereopsis

depends on the simple geometric fact that as an object gets closer

to us,

the two images it casts on the two retinas become outwardly displaced, compared with

corresponding points.

4.

The central fact of stereopsis

-

a biological fact learned from testing people -

is this:

an object whose images fall on corresponding points in the two retinas is perceived as

being the same distance away as the point fixated. When the images are outwardly

displaced relative to corresponding points, the object is seen as nearer than the fixated

point, and when the displacement is inward, the object is seen as farther away.

5

.

Horizontal displacements greater than about 2 degrees or vertical displacements of over a

few minutes of arc lead to double vision.


눈으로눈으로

본다본다??

http://www.ehow.com/info_8089919_international-greeting-types.html

사람들은사람들은

밝게밝게

웃는웃는

친구를친구를

어떻게어떻게

알아알아

볼까볼까??

http://www.ehow.com/info_8089919_international-greeting-types.html


눈으로눈으로

본다본다??

Visual PathwayVisual Pathway

Scheme of the optic tract with image being decomposed on the way, up to simple cortical cells (simplified).

http://en.wikipedia.org/wiki/Visual_system

http://en.wikipedia.org/wiki/Visual_system


눈으로눈으로

본다본다??

Zonglei Zhen, et al., “The Hierarchical Brain Network for Face Recognition,” PLoS ONE, vol. 8, no. 3, Mar. 2013, e59886, pp. 1-9.

Brain Network for Face RecognitionBrain Network for Face Recognition


Can we believe our eyes?Can we believe our eyes?

Copyright A.Kitaoka 2003


http://www.coolopticalillusions.com/backgrounds/movings-spots-desktop-background.htm


http://www.coolopticalillusions.com/backgrounds/movings-spots-desktop-background.htm

http://www.coolopticalillusions.com/backgrounds/spots_appear_to_move_1280_1024.jpg


http://www.coolopticalillusions.com


http://www.coolopticalillusions.com/backgrounds/moving-backgrounds.htm


http://www.coolopticalillusions.com/blog/wp-content/uploads/2011/01/statue_optical_illusion.jpg


http://www.coolopticalillusions.com/blog/wp-content/uploads/2011/01/statue_optical_illusion.jpg


The brain constructs a 3D interpretation consistent with the 2D projection of the scene on your retina [1]

Can we believe our eyes?`Can we believe our eyes?`


Where is a dog?Where is a dog?


Constructing shape and depth [2]Constructing shape and depth [2]

How can our perceptions of such simple shapes be so dramatically

incorrect?

The answer is that human vision has rules by which it constructs three-

dimensional shapes and depths.



If you stare at the cube for a while, you might notice that it flips, so that a corner that was in front goes behind, and vice versa.Think of the black discs in the illustration as holes in a white

sheet of paper, and through the holes you see a cube behind the sheet of paper.Line

Cube: One fabrication built on the foundation of another fabrication.



This movie is composed of 12 frames, in which dots change positions slightly from one frame to the next. Your visual system creates the 3D cylinder and its motion.What is remarkable about this visual capacity is that human vision solves an ill-

posed problem whenever it computes a three-dimensional shape and motion for an object from just the two-dimensional motions of its points.



How does human vision pick one z-coordinate out of the infinite options that are possible?

It employs certain built-in constraints.

In the case of depth from motion, one constraint that roughly models the performance of human vision is rigidity. That is, human vision tries to find a rigid interpretation for the two-dimensional motions of the dots. If it can find such an interpretation, it adopts that interpretation.



There is certainly something about the Kanizsa

illustrations that we do not fabricate, namely the black lines and discs. We do not fabricate everything we see?

Even the black lines and discs that appear in the Kanizsa

illustration are fabricated by your visual system, and are not there on the page until your visual system creates them (and the page), and puts them on the page.


Constructing shading and color [2]Constructing shading and color [2]

Colours

and shades of grey are the end products of a process of construction, a process so sophisticated that it is still not fully understood by vision researchers.The grey bar on the right as if it lies behind horizontal white bars in front of a black background. When the grey bar is seen in front of the black bars, the grey bar has a ghostly transparent appearance. When the grey bar is seen behind the white bars, it appears opaque

and no longer ghostly.



This can be explained by sophisticated probabilistic inferences employed by the human visual system.



In this display you might see a glowing pink ring that moves back and forth. In fact, nothing in this display is moving and there is no pink ring. Only the dots are colored and the colors of the dots change. However the dots themselves never move. This shows that human vision creates colors, motions, shapes, and contours in coordinated fashion.


Constructing Objects and Their Parts [2]Constructing Objects and Their Parts [2]

No matter how long you look, you are unlikely to be able to see the lines of the illustration simply as two dimensional curves on a page. This shows that although the visual system is sophisticated in its constructions, it can also persist in an error even when presented with clear evidence of it. The visual system operates according to built-in rules, and these rules grant it its constructive powers.


Constructing Objects and Their Parts [2]Constructing Objects and Their Parts [2]


눈으로눈으로

본다본다??

Burkard Wordenweber · Jorg Wallaschek · Peter Boyce · Donald Hoffman, “2. How Vision Constructs Reality,” Automotive Lighting and Human Vision, Springer-Verlag Berlin Heidelberg 2007, pp. 9-94.

ModelModel--based Approachbased Approach


눈으로눈으로

본다본다??

인간은

눈눈으로

보는

것이

아니고

뇌뇌로

보는

것이다.

본다는

것은

단순한

감각감각이

아니고, 복잡한

정보처리정보처리임을

알

수

있다.

- 특화된 신경조직

- 논리적 사고

- 다양한 경험

- 대용량 기억

사전지식사전지식

기반기반

(a priori knowledge(a priori knowledge--based)based)

상황상황

인식인식

(situation awareness)(situation awareness)

문맥문맥

기반기반

(context(context--based)based)


눈으로눈으로

본다본다??

Camera, Image Processing, and Computer VisionCamera, Image Processing, and Computer Vision

카메라영상처리

(with ASIC, SOC)임베디드 PC

(with 대용량 하드디스크)

통신


Links for Further StudyLinks for Further Study

David Hubel’s Eye, Brain, and Vision, available at

http://hubel.med.harvard.edu/index.html

Webvision: The organization of the Retina and Visual System, available at

http://webvision.med.utah.edu/

Neuroscience for Kids, available at http://faculty.washington.edu/chudler/introb.html

Donald D. Hoffman’s Visual Illusion, available athttp://www.cogsci.uci.edu/~ddhoff/illusions.html

http://hubel.med.harvard.edu/index.html

http://webvision.med.utah.edu/

http://faculty.washington.edu/chudler/introb.html

http://www.cogsci.uci.edu/~ddhoff/illusions.html


ReferencesReferences

1. David Hubel, “Eye, Brain, and Vision,” available at http://hubel.med.harvard.edu/bcontex.htm2. BurkardWordenweber · JorgWallaschek · Peter Boyce · Donald Hoffman, “2. How Vision Constructs

Reality,” Automotive Lighting and Human Vision, Springer-Verlag Berlin Heidelberg 2007, pp. 9-94.3. Wikipedia, “David H. Hubel,” available at http://www.wikipedia.org4. Wikipedia, “Visual System,” available at http://www.wikipedia.org5. Wikipedia, “Retina,” available at http://www.wikipedia.org6. Wikipedia, “Optic chiasm,” available at http://www.wikipedia.org7. Wikipedia, “Lateral Geniculate Nucleus,” available at http://www.wikipedia.org8. Wikipedia, “Optic Radiation,” available at http://www.wikipedia.org9. Lucient T. Thompson, “Sensory systems IV: Vision II II,” available at

http://www.utdallas.edu/~tres/integ/sen4/display8_01.html.10. Wikipedia, “Visual Cortex,” available at http://www.wikipedia.org11. Wikipedia, “Simple Cell,” available at http://www.wikipedia.org12. Wikipedia, “Complex Cell,” available at http://www.wikipedia.org

http://hubel.med.harvard.edu/bcontex.htm

http://www.wikipedia.org/










1.2. Computer Vision1.2. Computer Vision


What is vision? [0, 3]What is vision? [0, 3]

• What does it mean, to see? • How to discover from images what is present in the world, where things are, what actions are taking place.

In computer vision, we are trying to do the inverse, i.e., to describe the world that we see in one or more images and to reconstruct its properties, such as shape, illumination, and color distribution.


[3]


[3]What’s the name of the Palace?


[3]


What are computer vision used for ? [3]


Why is vision so difficult? [0]Why is vision so difficult? [0]

In part, it is because vision is an inverse problem, in which we seek to recover some unknowns given insufficient information to fully specify the solution.

We must therefore resort to physics-based and probabilistic models to disambiguate between potential solutions.


Three HighThree High--Level Approaches [0]Level Approaches [0]

In formulating and solving computer vision problems, I have often found it useful to draw inspiration from three high-level approaches:

• Scientific: build detailed models of the image formation process and develop mathematical techniques to invert these in order to recover the quantities of interest (where necessary, making simplifying assumption to make the mathematics more tractable).

• Statistical: use probabilistic models to quantify the priori likelihood of your unknowns and the noisy measurement processes that produce the input images, then infer the best possible estimates of your desired quantities and analyze their resulting uncertainties. The inference algorithms used are often closely related to the optimization techniques used to invert the (scientific) image formation processes.

• Engineering: develop techniques that are simple to describe and implement but that are also known to work well in practice. Test these techniques to understand their limitation and failure modes, as well as their expected computational costs (run-time performance).


The best way to validate your algorithms [0]The best way to validate your algorithms [0]

Three part strategy:

1. Test your algorithm on clean synthetic data, for which the exact results are known.

2. Add noise to the data and evaluate how the performance degrades as a function of noise level.

3. Test the algorithm on real-world data, preferably drawn from a wide variety of sources, such as photos found on the Web.


ReferencesReferences

0. Richard Szeliski, “Computer Vision: Algorithms and Applications,” Springer Verlag, 2010.

1. Steve Seitz, “Introduction to Computer Vision,” Washington University lecture material of computer vision (CSE455), 2008.

2. Linda Shapiro, “Introduction to Computer Vision,” Washington University lecture material of computer vision (CSE455), 2007.

3. Fei-Fei Li, “Introduction to Computer Vision,” Princeton University lecture material of computer vision (EE598), 2005.

1. Introduction to Computer Vision · 2014-12-29 · 1. 인간은 눈으로 “보는” 것이...

Documents

Transcript of 1. Introduction to Computer Vision · 2014-12-29 · 1. 인간은 눈으로 “보는” 것이...