Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I....

79
Analisi di Immagini e Video (Computer Vision) Giuseppe Manco

Transcript of Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I....

Page 1: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Analisi di Immagini e Video(Computer Vision)

Giuseppe Manco

Page 2: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Outline

• Reti Neurali• CNN

Page 3: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Crediti

• Slides adattate da vari corsi e libri• Deep Learning (Ettore Ritacco)• Deep Learning (Bengio, Courville, Goodfellow, 2017)• Andrey Karpathy• Computer Vision (I. Gkioulekas) - CS CMU Edu• Cmputational Visual Recognition (V. Ordonez), CS Virgina Edu

Page 4: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Oltre i modelli lineari

Page 5: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu
Page 6: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

• A biological neuron is a cell connected to other neurons and acts as a hub for electrical impulses. A neuron has a roughly spherical cell body called soma, whichprocesses the incoming signals and converts them into output signals. Input signals are collected from extensions on the cell body called dendrites. Theseoutput signals are transmitted to other neurons through another extension calledaxon, which prolongs from the cell body and terminates into several branches. The branches end up into junctions transmitting signals from one neuron to another, called synapses. • The behavior of a neuron is essentially electro-chemical. An electrical potential

difference is maintained between the inside and the outside of the soma, due to different concentrations of sodium (Na) and potassium (K) ions. When a neuronreceives inputs from a large number of neurons via its synaptic connections, there is a change in the soma potential. If this change is above a given threshold, it results in an electric current flowing through the axon to other cells. Then the potential drops down below the resting potential and neuron cannot fire againuntil the resting potential is restored.

Page 7: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Deep Learning

• Parte del machine learning• Apprende rappresentazioni dei dati• Utilizza una gerarchia di layers che imitano il

comportamento dei neuroni nel cervello

Page 8: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Senza feature engineering

Input Data Featureengineering

TraditionalLearning algorithm

Input DataDeep

Learning algorithm

Page 9: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Representation learning

Input Pixels

Features concatenazione

SVM

Linear Classifier

Ans

Esonero

Page 10: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Representation learning

Input Pixels

Ans

Gli strati della rete apprendono le features automaticamente

(GoogLeNet)

Page 11: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Y LeCunMA Ranzato

The Mammalian Visual Cortex is Hierarchical

[picture from Simon Thorpe]

[Gallant & Van Essen]

The ventral (recognition) pathway in the visual cortex has multiple stagesRetina - LGN - V1 - V2 - V4 - PIT - AIT ....Lots of intermediate representations

• The ventral (recognition) pathway in the visual cortex has multiple stages • Retina - LGN - V1 - V2 - V4 - PIT - AIT ....

• Lots of intermediate representations

Page 12: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Perceptron Learning• Funzione di base:

• Classificazione binaria• Separabilità lineare

• Due estensioini: • K classi• Relazioni nonlineari

𝑦 = 𝜎 𝑎

a =&!"#

$

𝑤!𝑥! + 𝑏

𝜎 𝑎 =1

1 + 𝑒%&

Page 13: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Estensioni• K classi

• Relazioni nonlineari

𝑦' = 𝑓 𝑎'

𝑎' =&!"#

$

𝑤'!𝑥! + 𝑏'

𝒂 = 𝑾𝒙 + 𝒃

𝒚 = 𝑓 𝒂

𝑎' =&!"#

$

𝑤'!𝜙(𝑥!) + 𝑏'

𝒂 = 𝑾𝜙(𝒙) + 𝒃

Page 14: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Format generale

• Una ennupla:

𝑛𝑒𝑡 = 𝑔, 𝑙, 𝑜, 𝑖, fpp

Page 15: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑔 Il grafo… network topology and operator

Page 16: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑔 il grafo

• 𝑔 = 𝑁, 𝐸 è un grafo diretto pesato

• Ogni nodo 𝑖 ∈ 𝑁 è un perceptron• È caratterizzato da due elementi

• Un valore 𝑎!, (l’attivazione)• Una funzione di attivazione 𝑓!

• Applicata all’attivazione, produce l’output 𝑧!

• Un arco 𝑒 = 𝑗 ∈ 𝑁 → 𝑖 ∈ 𝑁 ∈ 𝐸 è associato a un peso 𝑤#$

• Ogni nodo 𝑖 è associato ad un arco speciale ad un nodo fantasma, il cui peso 𝑏$ è chiamato bias

Page 17: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑔 Il grafo

• Ogni neurone è una unità di calcolo

𝑧! = 𝑓! 𝑎!

𝑎! = 𝑏! + &(:(→!∈,

𝑤(!𝑧(𝑖

𝑤(!

Page 18: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑔 Il grafo

• Tre categorie di nodi

• Input• I valori sono ”sovrascritti” dall’esterno

• Hidden• Unità di calcolo

• Output• Forniscono valori all’esterno

Page 19: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑥#

𝑥-

𝑔 Il grafo

• Combinazione di neuroni connessi → Calcolocomplesso → Operazione• Nodi che condividono lo stesso input sono strutturati

in layers

𝑦 = 𝑧. = 𝑓. 𝑏. + &(:(→.∈,

𝑤(.𝑧(

𝑧/ = 𝑓/ 𝑏/ + &(:(→/∈,

𝑤(/𝑧(

𝑧0 = 𝑓0 𝑏0 + &(:(→0∈,

𝑤(0𝑧(

𝑧1 = 𝑓1 𝑏1 + &(:(→1∈,

𝑤(1𝑧(

𝑧# = 𝑥#

𝑧- = 𝑥-

𝑦 = 𝑓. 𝑏. + 𝑤1,.𝑓1 𝑏1 + 𝑤#,1𝑥# + 𝑤-,1𝑥- + 𝑤0,.𝑓0 𝑏0 + 𝑤#,0𝑥# + 𝑤-,0𝑥- + 𝑤/,.𝑓/ 𝑏/ + 𝑤#,/𝑥# + 𝑤-,/𝑥-

Page 20: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑔 Il grafo

• Notazione compatta:• Dati due layer consecutivi 𝑘 e ℎ:• 𝒛𝒌 = 𝑓9 𝒃𝒉 +𝑾𝒛𝒉

• Nota: tutti i nodi condividono la stessa funzione di attivazione 𝑓!• 𝑊 è la matrice dei pesi associate agli archi

ℎ 𝑘

Page 21: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Feed-Forward NetworksComponents

… …

…… …

Hidden variables ℎ1 ℎ ℎ

𝑦

Input 𝑥

First layer Output layer

𝑥 𝑧# 𝑧- 𝑧'

Page 22: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Input• Rappresentato come

vettoreInput

• Represented as a vector

• Sometimes require somepreprocessing, e.g.,

• Subtract mean• Normalize to [-1,1]

Expand

Page 23: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Output Layers

• Regressione:

Output layers

• Regression: 𝑦 = 𝑤 ℎ + 𝑏• Linear units: no nonlinearity

𝑦

Output layer

𝑦 = 𝑾3𝒛 + 𝒃

𝑧

Page 24: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Output layers

• Regressione multidimensionale:

Output layers

• Multi-dimensional regression: 𝑦 = 𝑊 ℎ + 𝑏• Linear units: no nonlinearity

𝑦

Output layer

𝒚 = 𝑾3𝒛 + 𝒃

𝑧

Page 25: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Output layers

• Classificazione binaria:

• Regressione logistica su z

Output layers

• Binary classification: 𝑦 = 𝜎(𝑤 ℎ + 𝑏)• Corresponds to using logistic regression on ℎ

𝑦

Output layer

𝑦 = 𝜎 𝑾3𝒛 + 𝒃

𝑧

Page 26: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Output layers

• Classificazione multiclasse:

Output layers

• Multi-class classification: • 푦 = softmax 푧 where 푧 = 𝑊 ℎ + 푏• Corresponds to using multi-class

logistic regression on ℎ

Output layer

𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊4𝑧 + 𝑏

𝑠𝑜𝑓𝑡𝑚𝑎𝑥((𝑎) =𝑒&!

∑!"#' 𝑒&"=

𝑒&!%&#$%

∑!"#' 𝑒&"%&#$%

𝑧

Page 27: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Hidden layers• Ogni neurone è una

combinazione dei layerprecedenti

Hidden layers

• Neuron take weighted linear combination of the previous layer

• So can think of outputting one value for the next layer

……

ℎ ℎ +1

𝒛𝒊6𝟏 = 𝑓!(𝑾𝒊8𝒛𝒊 + 𝒃𝒊)

𝑧! 𝑧!6#

Page 28: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Funzioni di attivazione

• La forma di 𝑓" ha una grossa influenza sui risultati• Storicamente, 𝑓(𝑎) ha preso due forme

𝜎 𝑎 =1

1 + 𝑒%& 𝑡𝑎𝑛ℎ 𝑎 =1 − 𝑒%-&

1 + 𝑒%-&

Page 29: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

Page 30: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

Forward pass

Page 31: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

Forward pass

Page 32: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Forward pass

Page 33: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

𝑒 = 1

Backward pass

Page 34: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑑𝜕𝑏 = −1/𝑏

!

Page 35: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑑𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑑𝜕𝑏 = −0.25

Page 36: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑎

=?

Page 37: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑎

=𝜕𝑒𝜕𝑐⋅𝜕𝑐𝜕𝑎

Chain rule!

Page 38: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑎

=𝜕𝑒𝜕𝑐⋅𝜕𝑐𝜕𝑎

= 0.5

Page 39: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑏

=?

Page 40: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑏

=?

Distribution rule!

Page 41: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑏

=𝜕𝑒𝜕𝑐⋅𝜕𝑐𝜕𝑏

+𝜕𝑒𝜕𝑑

⋅𝜕𝑑𝜕𝑏

Page 42: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑏

= 0.5 ⋅ 0.5 − 2 ⋅ 0.25

Page 43: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradient Computing

𝑎 𝑏

𝑐 = 𝑎 + 𝑏/2 𝑑 = 1/𝑏

e= 𝑐 ∗ 𝑑

𝑎 = 1 𝑏 = 2

c= 2 d= 0.5

e= 1

Backward pass

𝜕𝑒𝜕𝑐 = 𝑑

𝜕𝑒𝜕𝑑 = 𝑐

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −1/𝑏

!

𝜕𝑒𝜕𝑐 = 0.5

𝜕𝑒𝜕𝑑 = 2

𝜕𝑐𝜕𝑎 = 1

𝜕𝑐𝜕𝑏 = 0.5

𝜕𝑐𝜕𝑏 = −0.25

𝜕𝑒𝜕𝑏

= −0.75

Page 44: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Patterns nel flow dei gradienti

)HL�)HL�/L��-XVWLQ�-RKQVRQ��6HUHQD�<HXQJ /HFWXUH����� $SULO�����������

DGG�JDWH��JUDGLHQW�GLVWULEXWRU

3DWWHUQV�LQ�JUDGLHQW�IORZ

��

���

PXO�JDWH��³VZDS�PXOWLSOLHU´

PD[�JDWH��JUDGLHQW�URXWHU

PD[

FRS\�JDWH��JUDGLHQW�DGGHU

î�

���

� � ��

� � ��

���

��

��� ��

Page 45: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

• Input scalare, output scalare

• Derivata

𝑥 ∈ ℝ, 𝑦 ∈ ℝ

𝜕𝑦𝜕𝑥

∈ ℝ

Page 46: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

• Input vettore, output reale

• Gradiente

𝑥 ∈ ℝ# , 𝑦 ∈ ℝ

𝜕𝑦𝜕𝑥

∈ ℝ$

𝜕𝑦𝜕𝑥 $

=𝜕𝑦𝜕𝑥$

Page 47: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

• Input vettore, output vettore

• Jacobiano

𝑥 ∈ ℝ# , 𝑦 ∈ ℝ%

𝜕𝑦𝜕𝑥

∈ ℝ#×%

𝜕𝑦𝜕𝑥 $,(

=𝜕𝑦(𝜕𝑥$

Page 48: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

𝑥 ∈ ℝ9

𝑦 ∈ ℝ:

𝑧 ∈ ℝ;

Page 49: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

𝑥 ∈ ℝ9

𝑦 ∈ ℝ:

𝑧 ∈ ℝ;𝜕𝑜𝜕𝑧

∈ ℝ;×3

𝜕𝑧𝜕𝑥 ∈ ℝ

9×;

𝜕𝑧𝜕𝑥 ∈ ℝ

9×:

Page 50: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Derivate vettoriali sui nodi

𝑥 ∈ ℝ9

𝑦 ∈ ℝ:

𝑧 ∈ ℝ;𝜕𝑜𝜕𝑧

∈ ℝ;×3

𝜕𝑧𝜕𝑥 ∈ ℝ

9×;

𝜕𝑧𝜕𝑥 ∈ ℝ

9×:

=>=?= =@

=?⋅ =>=@∈ ℝ9×3

=>=A= =@

=A⋅ =>=@∈ ℝ:×3

Page 51: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Esercizio

• Disegnare il grafo• Calcolare i gradienti

Page 52: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Funzioni di attivazione e gradienti

• Problema: saturazione

Gradiente nullo!

Page 53: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente nullo

Page 54: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente evanescente

𝑥 𝑧" = 𝜎(𝑥 ⋅ 𝑤")

𝑤"

𝑧! = 𝜎(𝑧" ⋅ 𝑤!)

𝑤!

𝑧# = 𝜎(𝑧! ⋅ 𝑤#)

𝑤#

𝑧$ = 𝜎(𝑧# ⋅ 𝑤$)

𝑤$

𝑦 = 𝜎(𝑧$ ⋅ 𝑤%)

𝑤%

Page 55: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente evanescente

𝑥 𝑧" = 𝜎(𝑥 ⋅ 𝑤")

𝑤"

𝑧! = 𝜎(𝑧" ⋅ 𝑤!)

𝑤!

𝑧# = 𝜎(𝑧! ⋅ 𝑤#)

𝑤#

𝑧$ = 𝜎(𝑧# ⋅ 𝑤$)

𝑤$

𝑦 = 𝜎(𝑧$ ⋅ 𝑤%)

𝑤%

2

1 0.1 1.2 −.5 1

Page 56: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente evanescente

𝑥 𝑧" = 𝜎(𝑥 ⋅ 𝑤")

𝑤"

𝑧! = 𝜎(𝑧" ⋅ 𝑤!)

𝑤!

𝑧# = 𝜎(𝑧! ⋅ 𝑤#)

𝑤#

𝑧$ = 𝜎(𝑧# ⋅ 𝑤$)

𝑤$

𝑦 = 𝜎(𝑧$ ⋅ 𝑤%)

𝑤%

2

1 0.1 1.2 −.5 1

.52 0.65 0.410.88 0.60

Page 57: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente evanescente

𝑥 𝑧" = 𝜎(𝑥 ⋅ 𝑤")

𝑤"

𝑧! = 𝜎(𝑧" ⋅ 𝑤!)

𝑤!

𝑧# = 𝜎(𝑧! ⋅ 𝑤#)

𝑤#

𝑧$ = 𝜎(𝑧# ⋅ 𝑤$)

𝑤$

𝑦 = 𝜎(𝑧$ ⋅ 𝑤%)

𝑤%

2

1 0.1 1.2 −.5 1

.52 0.65 0.410.88 0.60

𝜕𝑦𝜕𝑤R

=?

Page 58: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Gradiente evanescente

𝑥 𝑧" = 𝜎(𝑥 ⋅ 𝑤")

𝑤"

𝑧! = 𝜎(𝑧" ⋅ 𝑤!)

𝑤!

𝑧# = 𝜎(𝑧! ⋅ 𝑤#)

𝑤#

𝑧$ = 𝜎(𝑧# ⋅ 𝑤$)

𝑤$

𝑦 = 𝜎(𝑧$ ⋅ 𝑤%)

𝑤%

2

1 0.1 1.2 −.5 1

.52 0.65 0.410.88 0.60

𝜕𝑦𝜕𝑤R

= 𝜎S 𝑧T 𝑤U𝜎S 𝑧V 𝑤T𝜎S 𝑧W 𝑤V𝜎′(𝑧R)𝑤W𝜎′(𝑥 ⋅ 𝑤R)𝑥

Page 59: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Vanishing Gradient

• Forward pass

• Backward pass

a(h+1) = W(h)z(h)

z(h+1) = �⇣a(h+1)

z(0) = x<latexit sha1_base64="lcjygqVSVlavy8XVJ/KLK0/c0OQ=">AAACjXicbVHbTttAEF2bcmm4BfrYl1WjoERIwUYgeKAVog/tI5XIRYpDNF4myYr1RbtjBFj+kH4aD3wLtd1EbRLm6cyZOXtmZ/xYSUOO82LZKx9W19Y3PlY2t7Z3dqt7+x0TJVpgW0Qq0j0fDCoZYpskKezFGiHwFXb9++9FvfuA2sgovKGnGAcBjEM5kgIop4bV314ANPFHKWS3aWNy6DYzfsC/8hndLelmNsufZ7lXmaf+KY0cB+ApHFFj+XFPy/GEmotyZ8H2MasMqzWn5ZTBl4E7BTU2jeth9dW7i0QSYEhCgTF914lpkIImKRRmFS8xGIO4hzH2cxhCgGaQlivMeD0xQBGPUXOpeEni/4oUAmOeAj/vLEY0i7WCfK/WT2h0PkhlGCeEoSiMSCosjYzQMr8N8jupkQiKyZHLkAvQQIRachAiJ5P8WMU+3MXfL4POcct1Wu6vk9rl1XQzG+wz+8IazGVn7JL9ZNeszQR7s+pWyzqyd+1T+8L+9rfVtqaaT2wu7B9/AFnuxOc=</latexit><latexit sha1_base64="lcjygqVSVlavy8XVJ/KLK0/c0OQ=">AAACjXicbVHbTttAEF2bcmm4BfrYl1WjoERIwUYgeKAVog/tI5XIRYpDNF4myYr1RbtjBFj+kH4aD3wLtd1EbRLm6cyZOXtmZ/xYSUOO82LZKx9W19Y3PlY2t7Z3dqt7+x0TJVpgW0Qq0j0fDCoZYpskKezFGiHwFXb9++9FvfuA2sgovKGnGAcBjEM5kgIop4bV314ANPFHKWS3aWNy6DYzfsC/8hndLelmNsufZ7lXmaf+KY0cB+ApHFFj+XFPy/GEmotyZ8H2MasMqzWn5ZTBl4E7BTU2jeth9dW7i0QSYEhCgTF914lpkIImKRRmFS8xGIO4hzH2cxhCgGaQlivMeD0xQBGPUXOpeEni/4oUAmOeAj/vLEY0i7WCfK/WT2h0PkhlGCeEoSiMSCosjYzQMr8N8jupkQiKyZHLkAvQQIRachAiJ5P8WMU+3MXfL4POcct1Wu6vk9rl1XQzG+wz+8IazGVn7JL9ZNeszQR7s+pWyzqyd+1T+8L+9rfVtqaaT2wu7B9/AFnuxOc=</latexit><latexit sha1_base64="lcjygqVSVlavy8XVJ/KLK0/c0OQ=">AAACjXicbVHbTttAEF2bcmm4BfrYl1WjoERIwUYgeKAVog/tI5XIRYpDNF4myYr1RbtjBFj+kH4aD3wLtd1EbRLm6cyZOXtmZ/xYSUOO82LZKx9W19Y3PlY2t7Z3dqt7+x0TJVpgW0Qq0j0fDCoZYpskKezFGiHwFXb9++9FvfuA2sgovKGnGAcBjEM5kgIop4bV314ANPFHKWS3aWNy6DYzfsC/8hndLelmNsufZ7lXmaf+KY0cB+ApHFFj+XFPy/GEmotyZ8H2MasMqzWn5ZTBl4E7BTU2jeth9dW7i0QSYEhCgTF914lpkIImKRRmFS8xGIO4hzH2cxhCgGaQlivMeD0xQBGPUXOpeEni/4oUAmOeAj/vLEY0i7WCfK/WT2h0PkhlGCeEoSiMSCosjYzQMr8N8jupkQiKyZHLkAvQQIRachAiJ5P8WMU+3MXfL4POcct1Wu6vk9rl1XQzG+wz+8IazGVn7JL9ZNeszQR7s+pWyzqyd+1T+8L+9rfVtqaaT2wu7B9/AFnuxOc=</latexit><latexit sha1_base64="lcjygqVSVlavy8XVJ/KLK0/c0OQ=">AAACjXicbVHbTttAEF2bcmm4BfrYl1WjoERIwUYgeKAVog/tI5XIRYpDNF4myYr1RbtjBFj+kH4aD3wLtd1EbRLm6cyZOXtmZ/xYSUOO82LZKx9W19Y3PlY2t7Z3dqt7+x0TJVpgW0Qq0j0fDCoZYpskKezFGiHwFXb9++9FvfuA2sgovKGnGAcBjEM5kgIop4bV314ANPFHKWS3aWNy6DYzfsC/8hndLelmNsufZ7lXmaf+KY0cB+ApHFFj+XFPy/GEmotyZ8H2MasMqzWn5ZTBl4E7BTU2jeth9dW7i0QSYEhCgTF914lpkIImKRRmFS8xGIO4hzH2cxhCgGaQlivMeD0xQBGPUXOpeEni/4oUAmOeAj/vLEY0i7WCfK/WT2h0PkhlGCeEoSiMSCosjYzQMr8N8jupkQiKyZHLkAvQQIRachAiJ5P8WMU+3MXfL4POcct1Wu6vk9rl1XQzG+wz+8IazGVn7JL9ZNeszQR7s+pWyzqyd+1T+8L+9rfVtqaaT2wu7B9/AFnuxOc=</latexit>

@`

@z(h)=

⇣W(h)

⌘T @`

@a(h+1)

@`

@a(h)=

@`

@z(h)� �0(a(h))

<latexit sha1_base64="WS1qNFbSx4gA55h+ILZW2WAcGxQ=">AAADAniclVJNb9NAEF2bQouhNIVjL6tGQCKkyEZI5VKpKheOrdQ0lbJpNN6Mk1XXH9odI7WWb/wKru2pN8SVP9ID/6W2MVJJOdA5vX1v3szs7IaZVpZ8/8ZxH608frK69tR79nz9xUZn8+WxTXMjcShTnZqTECxqleCQFGk8yQxCHGochWefan30BY1VaXJE5xlOYpgnKlISqKKmm866iAzIQmRgSIHmArUu7xxjoEUYFRfladFb9MuSe2/4LhcaI+r9EUetKIyaL6h/euT9X1FofO+CuqwQDzLVlmaQB04v0llKXFg1j+Ftb7lmf9rp+gO/CX4fBC3osjYOpp1fYpbKPMaEpAZrx4Gf0aSo+0uNpSdyixnIM5jjuIIJxGgnRfNuJX+dW6CUZ2i40rwh8a6jgNja8zisMutB7bJWk//SxjlFHyeFSrKcMJF1I1Iam0ZWGlV9COQzZZAI6smRq4RLMECERnGQsiLz6od41T6C5dvfB8fvB4E/CA4/dPf2282ssS22zXosYDtsj31mB2zIpEPON+fSuXK/utfud/fH71TXaT2v2F/h/rwFdFv2EQ==</latexit><latexit sha1_base64="WS1qNFbSx4gA55h+ILZW2WAcGxQ=">AAADAniclVJNb9NAEF2bQouhNIVjL6tGQCKkyEZI5VKpKheOrdQ0lbJpNN6Mk1XXH9odI7WWb/wKru2pN8SVP9ID/6W2MVJJOdA5vX1v3szs7IaZVpZ8/8ZxH608frK69tR79nz9xUZn8+WxTXMjcShTnZqTECxqleCQFGk8yQxCHGochWefan30BY1VaXJE5xlOYpgnKlISqKKmm866iAzIQmRgSIHmArUu7xxjoEUYFRfladFb9MuSe2/4LhcaI+r9EUetKIyaL6h/euT9X1FofO+CuqwQDzLVlmaQB04v0llKXFg1j+Ftb7lmf9rp+gO/CX4fBC3osjYOpp1fYpbKPMaEpAZrx4Gf0aSo+0uNpSdyixnIM5jjuIIJxGgnRfNuJX+dW6CUZ2i40rwh8a6jgNja8zisMutB7bJWk//SxjlFHyeFSrKcMJF1I1Iam0ZWGlV9COQzZZAI6smRq4RLMECERnGQsiLz6od41T6C5dvfB8fvB4E/CA4/dPf2282ssS22zXosYDtsj31mB2zIpEPON+fSuXK/utfud/fH71TXaT2v2F/h/rwFdFv2EQ==</latexit><latexit sha1_base64="WS1qNFbSx4gA55h+ILZW2WAcGxQ=">AAADAniclVJNb9NAEF2bQouhNIVjL6tGQCKkyEZI5VKpKheOrdQ0lbJpNN6Mk1XXH9odI7WWb/wKru2pN8SVP9ID/6W2MVJJOdA5vX1v3szs7IaZVpZ8/8ZxH608frK69tR79nz9xUZn8+WxTXMjcShTnZqTECxqleCQFGk8yQxCHGochWefan30BY1VaXJE5xlOYpgnKlISqKKmm866iAzIQmRgSIHmArUu7xxjoEUYFRfladFb9MuSe2/4LhcaI+r9EUetKIyaL6h/euT9X1FofO+CuqwQDzLVlmaQB04v0llKXFg1j+Ftb7lmf9rp+gO/CX4fBC3osjYOpp1fYpbKPMaEpAZrx4Gf0aSo+0uNpSdyixnIM5jjuIIJxGgnRfNuJX+dW6CUZ2i40rwh8a6jgNja8zisMutB7bJWk//SxjlFHyeFSrKcMJF1I1Iam0ZWGlV9COQzZZAI6smRq4RLMECERnGQsiLz6od41T6C5dvfB8fvB4E/CA4/dPf2282ssS22zXosYDtsj31mB2zIpEPON+fSuXK/utfud/fH71TXaT2v2F/h/rwFdFv2EQ==</latexit><latexit sha1_base64="WS1qNFbSx4gA55h+ILZW2WAcGxQ=">AAADAniclVJNb9NAEF2bQouhNIVjL6tGQCKkyEZI5VKpKheOrdQ0lbJpNN6Mk1XXH9odI7WWb/wKru2pN8SVP9ID/6W2MVJJOdA5vX1v3szs7IaZVpZ8/8ZxH608frK69tR79nz9xUZn8+WxTXMjcShTnZqTECxqleCQFGk8yQxCHGochWefan30BY1VaXJE5xlOYpgnKlISqKKmm866iAzIQmRgSIHmArUu7xxjoEUYFRfladFb9MuSe2/4LhcaI+r9EUetKIyaL6h/euT9X1FofO+CuqwQDzLVlmaQB04v0llKXFg1j+Ftb7lmf9rp+gO/CX4fBC3osjYOpp1fYpbKPMaEpAZrx4Gf0aSo+0uNpSdyixnIM5jjuIIJxGgnRfNuJX+dW6CUZ2i40rwh8a6jgNja8zisMutB7bJWk//SxjlFHyeFSrKcMJF1I1Iam0ZWGlV9COQzZZAI6smRq4RLMECERnGQsiLz6od41T6C5dvfB8fvB4E/CA4/dPf2282ssS22zXosYDtsj31mB2zIpEPON+fSuXK/utfud/fH71TXaT2v2F/h/rwFdFv2EQ==</latexit>

Page 60: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Vanishing gradient

• Conseguenza:

• Il gradiente «svanisce» esponenzialmente con la profondità della rete se i pesi sono ill-conditioned o le attivazioni sono nel dominio di saturazione di σ.

@`

@z(h)=

⇣W(h)

⌘T✓�0(a(h))� @`

@z(h+1)

<latexit sha1_base64="byoO+oTUusmnbnTKlB1Kgqk0Wss=">AAACp3iclVFNT9tAEF27tFBTSoBT1cuKqCVRpchGSHBBILjACSqRDykO0XgzTlZZf2h3jEQt/yp+DQf+C3ZwER+9dE5v33ujNzsTpEoact17y/6w9PHT8spnZ/XL2tf1xsZmzySZFtgViUr0IACDSsbYJUkKB6lGiAKF/WB+Wun9G9RGJvEV3aY4imAay1AKoJIaN+78UIPI/RQ0SVDcR6WKF88IaBaE+Z/iOm/N2kXBnZ/8kPsKQ2r9Ffu16Gs5nVH7+sqpdSOnEew8+6D2tR0/mSTE/yf6l1eG1wHjRtPtuIvi74FXgyar63LcePAnicgijEkoMGbouSmN8ipIKCwcPzOYgpjDFIcljCFCM8oXyy34j8wAJTxFzaXiCxJfduQQGXMbBaWzmti81SryX9owo/BglMs4zQhjUQWRVLgIMkLL8mrIJ1IjEVSTI5cxF6CBCLXkIERJZuUZnXIf3tvfvwe93Y7ndrzfe83jk3ozK+w722Yt5rF9dszO2CXrMmF9s46sM+vcbtsXds8ePFltq+7ZYq/Khke5v9GH</latexit><latexit sha1_base64="byoO+oTUusmnbnTKlB1Kgqk0Wss=">AAACp3iclVFNT9tAEF27tFBTSoBT1cuKqCVRpchGSHBBILjACSqRDykO0XgzTlZZf2h3jEQt/yp+DQf+C3ZwER+9dE5v33ujNzsTpEoact17y/6w9PHT8spnZ/XL2tf1xsZmzySZFtgViUr0IACDSsbYJUkKB6lGiAKF/WB+Wun9G9RGJvEV3aY4imAay1AKoJIaN+78UIPI/RQ0SVDcR6WKF88IaBaE+Z/iOm/N2kXBnZ/8kPsKQ2r9Ffu16Gs5nVH7+sqpdSOnEew8+6D2tR0/mSTE/yf6l1eG1wHjRtPtuIvi74FXgyar63LcePAnicgijEkoMGbouSmN8ipIKCwcPzOYgpjDFIcljCFCM8oXyy34j8wAJTxFzaXiCxJfduQQGXMbBaWzmti81SryX9owo/BglMs4zQhjUQWRVLgIMkLL8mrIJ1IjEVSTI5cxF6CBCLXkIERJZuUZnXIf3tvfvwe93Y7ndrzfe83jk3ozK+w722Yt5rF9dszO2CXrMmF9s46sM+vcbtsXds8ePFltq+7ZYq/Khke5v9GH</latexit><latexit sha1_base64="byoO+oTUusmnbnTKlB1Kgqk0Wss=">AAACp3iclVFNT9tAEF27tFBTSoBT1cuKqCVRpchGSHBBILjACSqRDykO0XgzTlZZf2h3jEQt/yp+DQf+C3ZwER+9dE5v33ujNzsTpEoact17y/6w9PHT8spnZ/XL2tf1xsZmzySZFtgViUr0IACDSsbYJUkKB6lGiAKF/WB+Wun9G9RGJvEV3aY4imAay1AKoJIaN+78UIPI/RQ0SVDcR6WKF88IaBaE+Z/iOm/N2kXBnZ/8kPsKQ2r9Ffu16Gs5nVH7+sqpdSOnEew8+6D2tR0/mSTE/yf6l1eG1wHjRtPtuIvi74FXgyar63LcePAnicgijEkoMGbouSmN8ipIKCwcPzOYgpjDFIcljCFCM8oXyy34j8wAJTxFzaXiCxJfduQQGXMbBaWzmti81SryX9owo/BglMs4zQhjUQWRVLgIMkLL8mrIJ1IjEVSTI5cxF6CBCLXkIERJZuUZnXIf3tvfvwe93Y7ndrzfe83jk3ozK+w722Yt5rF9dszO2CXrMmF9s46sM+vcbtsXds8ePFltq+7ZYq/Khke5v9GH</latexit><latexit sha1_base64="byoO+oTUusmnbnTKlB1Kgqk0Wss=">AAACp3iclVFNT9tAEF27tFBTSoBT1cuKqCVRpchGSHBBILjACSqRDykO0XgzTlZZf2h3jEQt/yp+DQf+C3ZwER+9dE5v33ujNzsTpEoact17y/6w9PHT8spnZ/XL2tf1xsZmzySZFtgViUr0IACDSsbYJUkKB6lGiAKF/WB+Wun9G9RGJvEV3aY4imAay1AKoJIaN+78UIPI/RQ0SVDcR6WKF88IaBaE+Z/iOm/N2kXBnZ/8kPsKQ2r9Ffu16Gs5nVH7+sqpdSOnEew8+6D2tR0/mSTE/yf6l1eG1wHjRtPtuIvi74FXgyar63LcePAnicgijEkoMGbouSmN8ipIKCwcPzOYgpjDFIcljCFCM8oXyy34j8wAJTxFzaXiCxJfduQQGXMbBaWzmti81SryX9owo/BglMs4zQhjUQWRVLgIMkLL8mrIJ1IjEVSTI5cxF6CBCLXkIERJZuUZnXIf3tvfvwe93Y7ndrzfe83jk3ozK+w722Yt5rF9dszO2CXrMmF9s46sM+vcbtsXds8ePFltq+7ZYq/Khke5v9GH</latexit>

Page 61: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

Hidden layers

• ReLU

Page 62: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss… defining the network goal

Page 63: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss

• 𝑔 è un operatore algebrico non lineare

• L’operatore è parametrico rispetto ai pesi:• La matrice 𝑊 e il bias b

• La fase di learning aspira a trovare i migliori valori di 𝑊 e 𝑏

Page 64: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss

• Problema di ottimizzazione

• Qual’è l’output desiderato?

• Quando è differente dall’output prodotto?

Page 65: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss

• La loss misura la discrepanza tra l’output predetto e quello desiderato

• La funzione obiettivo:

argmin),*

1𝑛>"+,

$

𝑙𝑜𝑠𝑠 𝑦" , 𝑔 𝑥"|𝑊, 𝐵

Page 66: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss

• Se l’output è una classe:• Binary Cross Entropy (BCE) – 𝑦! ∈ 0; 1 , 𝑔 �⃗�!|𝑊, 𝐵 ∈ 0; 1

BCE = −1𝑛7!"#

$

𝑦! ln 𝑔 𝑥!|𝑊, 𝐵 − 1 − 𝑦! ln 1 − 𝑔 𝑥!|𝑊, 𝐵

• Categorical Cross Entropy (CCE) – 𝐾 classes, 𝑦!,& ∈ 0; 1 , 𝑔 𝑥!|𝑊, 𝐵 ∈ 0; 1

CCE = −1𝑛7!"#

$

7&"#

'

𝑦!,& ln 𝑔 𝑥!|𝑊, 𝐵 &

• Hinge – 𝑦! ∈ −1; 1

Hinge =1𝑛7!

$

max 0; 1 − 𝑦! ⋅ 𝑔 𝑥!|𝑊, 𝐵

Page 67: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑙 funzione di loss

• BCE:• Classi binarie• Pesa gli errori allo stesso modo

• CCE:• Classi multiple• Pesa gli errori allo stesso modo

• Hinge• Classi binarie• Pesa gli errori allo stesso modo • Non differenziabile• Penalizza le predizioni con confidenza bassa

• Vicina a 0 quando i segni coincidono e la predizione è vicina a 1

Page 68: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 L’ottimizzatore… finding optimal solutions

Page 69: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Problema di ottimizzazione

argmin),*

1𝑛>"+,

$

𝑙𝑜𝑠𝑠 𝑦" , 𝑔 𝑥"|𝑊, 𝐵

• Stochastic Gradient Descent• E sue varianti

Page 70: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Più controllo sugli update

• Momentum update

𝑊XYR∗ = 𝑚 ⋅𝑊X

∗ − 𝜂∇𝑙Z

𝑙ZYR = 𝑙ZYR +𝑊XYR∗

• Annealing• Learning rate adattivo• E.g. Exponential decay 𝜆! = 𝜆" ⋅ 𝑒#$!

Page 71: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Varianti di SGD

• Considerano

• Raggiungibilità

• Convergence speed

• Overfitting

Page 72: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Varianti di SGD• AdaGrad

𝑊XYR∗ = 𝑊X

∗ −𝜂

𝜖 ⋅ 𝐼 + diag ∇𝑙Z ⋅ ∇𝑙Z[∇𝑙Z

• I pesi con gradiente alto hanno un learning rate ridotto• Pesi con gradiente piccolo hanno un learning rate ampliato

Page 73: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Varianti di SGD• RMSprop

𝜁\YR = 𝛼 ⋅ 𝜁X + 1 − 𝛼 ⋅ ∇𝑙Z W

𝑊XYR∗ = 𝑊X

∗ −𝜂

𝜖 ⋅ 𝐼 + 𝜁XYR∇𝑙Z

• Riduce la policy aggressive di AdaGrad sulla riduzione del learning rate

Page 74: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Varianti di SGD• Adam

𝜁!"# = 𝛼 ⋅ 𝜁$ + 1 − 𝛼 ⋅ ∇𝑙% & → 𝜁!"#∗ =𝜁$"#

1 − 𝛼 $"#

𝑚$"# = 𝛽 ⋅ 𝑚$ + 1 − 𝛽 ⋅ ∇𝑙% → 𝑚$"#∗ =

𝑚$"#

1 − 𝛽 $"#

𝑊$"#∗ = 𝑊$

∗ −𝜂 ⋅ 𝑚$"#

𝜖 ⋅ 𝐼 + 𝜁$"#∇𝑙%

• RMSprop con smoothing

Page 75: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑜 l’ottimizzatore

• Confronto

(Source: Stanford class CS231n, MIT License, Image credit: Alec Radford)

Page 76: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑖 l’inizializzazione… well begun is half done

Page 77: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑖 l’inizializzazione

• I pesi necessitano un valore iniziale

• L’inizializzazione ha un effetto significativo sul risultato finale

Page 78: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑖 l’inizializzazione

• Zero initialization

• Bad

• Tutti I nodi hanno lo stesso gradiente

• Non c’è diversificazione

• Simmetria

Page 79: Analisi di Immagini e Video (Computer Vision) · 2021. 3. 11. · •Computer Vision (I. Gkioulekas) -CS CMU Edu •CmputationalVisual Recognition(V. Ordonez), CS VirginaEdu

𝑖 l’inizializzazione

• Random initialization