Advanced Machine Learning
24: Convolutional Neural Networks
Outline for the lecture
- History of CNNs
- Bulding Blocks
- Skip Connections
- Fully Convolutional Neural Nets
- Semantic Segmentation with Twists
- (even more) Advanced Uses of CNN
Cat's brain 1962 (Hubel and Wiesel)
Fukushima's Neurocognitron 1979
Time Delay Neural Network 1989
CNN 1989
CNN 1998
CNN+GPU+MaxPooling 2011
AlexNet 2012
Convolving a kernel with an image
\[
\left(
\begin{array}{ccc}
0 & 1 & 2 \\
2 & 2 & 0 \\
0 & 1 & 2 \\
\end{array}
\right)
\]
Convolving a kernel with an image
Padding and symmetries
Padding and symmetries
How do the channels look?
Pooling: maxpooling
Pooling: maxpooling
Pooling: average
How do we produce a class prediction?
One-convolution
Upconvolution
Dilated convolution
Basic building blocks
- Convolution with a filter
- Zero Padding
- Channels and channel-kernel relationship
- Pooling (max and average)
- Moving from convolution layers to predictions
- One convolution
- Upconvolution
- Dilated convolution
Skip connections
Dark knowledge
Highway networks (May 2015 on arxiv)
-
$$
\vec{y} = H(\vec{x}, \bm{W}_H)
$$
-
$$
\vec{y} = H(\vec{x}, \bm{W}_H) \odot T(\vec{x}, \bm{W}_T) + \vec{x} \odot C(\vec{x}, \bm{W}_C)
$$
-
$$
\vec{y} = H(\vec{x}, \bm{W}_H) \odot T(\vec{x}, \bm{W}_T) + \vec{x} \odot (1 - T(\vec{x}, \bm{W}_T))
$$
-
$$
\vec{y} =
\left\{
\begin{array}{ll}
\vec{x} & \mbox{if }\;\;T(\vec{x}, \bm{W}_T)=0,\\
H(\vec{x}, \bm{W}_H) & \mbox{if }\;\;T(\vec{x}, \bm{W}_T)=1
\end{array}
\right.
$$
- What if untrained gate is always open and does not let gradients flow?
- Initialize gate biases to large negative values!
Train models with 100 of layers instead of just 10 before
Residual Networks (block)
Residual Networks (full)
Residual Networks (performance)
Error surface effect of skip connection
Dense Networks (architecture)
Dense Networks (effect)
Take Away Concepts
- Skip connections
- Gates
Fully convolutional networks
The task of Semantic segmentation
Semantic segmentation task
Replacing feed forward with convolutional
Fully Convolutional Model (2014)
Examples
Take Away Point
- When target and input have the same dimension it may be better to use convolution everywhere.
(even more) "Advanced" uses of CNN
Wavenet: $\ge$16kHz audio
Wavenet: sample by sample
Wavenet: conditioned on text
Model |
"The blue lagoon..." |
Parametric |
|
Concatenative |
|
Wavenet |
|
Model |
"English poetry and ..." |
Parametric |
|
Concatenative |
|
Wavenet |
|
Deformable Convolutions
Deformable Convolutions
Take Away Points
- Masked convolution
- Pixel based generation
- Deformable convolution (can be rotation invariant)