Advanced Machine Learning

24: Convolutional Neural Networks

Outline for the lecture

  • History of CNNs
  • Bulding Blocks
  • Skip Connections
  • Fully Convolutional Neural Nets
  • Semantic Segmentation with Twists
  • (even more) Advanced Uses of CNN

Convolutions what?

History of CNNs

Cat's brain 1962 (Hubel and Wiesel)

cat's brain
fukushima

Fukushima's Neurocognitron 1979

neurocognitron
waibel

Time Delay Neural Network 1989

tdnn
lecun

CNN 1989

cnn
lecun

CNN 1998

lenet
cirican

CNN+GPU+MaxPooling 2011

gpunet
Alex

AlexNet 2012

alexnet

CNN: bulding blocks

Convolving a kernel with an image

convolution \[ \left( \begin{array}{ccc} 0 & 1 & 2 \\ 2 & 2 & 0 \\ 0 & 1 & 2 \\ \end{array} \right) \] convolution 2

Convolving a kernel with an image

convolution

Padding and symmetries

same pad full pad

Padding and symmetries

convolution

How do the channels look?

Pooling: maxpooling

max pooling

Pooling: maxpooling

max pooling

Pooling: average

average pooling

How do we produce a class prediction?

One-convolution

one_conv
gif upconv

Upconvolution

upconv

Dilated convolution

dilated

Play with a simulator

Video

Demo

GitHub

Basic building blocks

  1. Convolution with a filter
  2. Zero Padding
  3. Channels and channel-kernel relationship
  4. Pooling (max and average)
  5. Moving from convolution layers to predictions
  6. One convolution
  7. Upconvolution
  8. Dilated convolution

Skip connections

Dark knowledge

vampire

Highway networks (May 2015 on arxiv)

  • $$ \vec{y} = H(\vec{x}, \bm{W}_H) $$
  • $$ \vec{y} = H(\vec{x}, \bm{W}_H) \odot T(\vec{x}, \bm{W}_T) + \vec{x} \odot C(\vec{x}, \bm{W}_C) $$
  • $$ \vec{y} = H(\vec{x}, \bm{W}_H) \odot T(\vec{x}, \bm{W}_T) + \vec{x} \odot (1 - T(\vec{x}, \bm{W}_T)) $$
  • $$ \vec{y} = \left\{ \begin{array}{ll} \vec{x} & \mbox{if }\;\;T(\vec{x}, \bm{W}_T)=0,\\ H(\vec{x}, \bm{W}_H) & \mbox{if }\;\;T(\vec{x}, \bm{W}_T)=1 \end{array} \right. $$
  • What if untrained gate is always open and does not let gradients flow?
  • Initialize gate biases to large negative values!
Train models with 100 of layers instead of just 10 before HWN

Residual Networks (block)

resnet block

Residual Networks (full)

resnet whole

Residual Networks (performance)

resnet perf

Error surface effect of skip connection

landscape

Dense Networks (architecture)

dense diagram

Dense Networks (effect)

dense

Take Away Concepts

  1. Skip connections
  2. Gates

Fully convolutional networks

The task of Semantic segmentation

Semantic segmentation task

fikes

Replacing feed forward with convolutional

  • cc 1
  • cc 1
  • cc 1

Fully Convolutional Model (2014)

final model

Examples

final model

Take Away Point

  1. When target and input have the same dimension it may be better to use convolution everywhere.

Semantic segmentation with twists

deep learning standard: U-net

david

deep learning standard: U-net

david

comparison on the brain segmentation task

david

state of the art: freesurfer

david

Meshnet

Websocket Websocket

Meshnet

Websocket
  • 72516 vs. 23523355
  • 600kb vs. 2Gb


Meshnet

Websocket

Meshnet

Websocket

(often) better than the teacher

loop

multimodal is straightforward

loop

better than the human (sometimes)

loop

better than U-net

loop

(even more) "Advanced" uses of CNN

Masked Convolutions

PixelCNNs

Wavenet: $\ge$16kHz audio

loop

Wavenet: sample by sample

loop

Wavenet: conditioned on text

Model "The blue lagoon..."
Parametric
Concatenative
Wavenet
Model "English poetry and ..."
Parametric
Concatenative
Wavenet

Deformable Convolutions

defrmable

Deformable Convolutions

defrmable

Take Away Points

  1. Masked convolution
  2. Pixel based generation
  3. Deformable convolution (can be rotation invariant)