This is a summary of coursera course convolutional neural networks.
How convolution works
For horizontal edge detection, we can use:
How the edge detector works can be clear from the following figure:
Equations of Size
- Image
- Filter
Then:
- Output
With Padding
If the image has a padding of on all sides, then:
- Output
With Stride
Instead of rolling over each pixel, hop steps.
- Output:
Padding Types
- Valid: No padding
- Same: Idea is to make input size and output size the same. So, use the above equation to determine the padding size that makes input and output size the same.
Convolution over Volume
- Note Number of channels in the image must match the number of channels in the filter.
- We sum over all dimensions at the output.
Multiple Filters
Example of CNN
- Note each convolutional layers also contains bias term and non-linear activation (e.g. Relu).
Below is an example of a simple CNN:
Layers in a CNN
- Convolutional Layer: Convolution with filter (number of channels for the filter and the input must be same), Input might be padded, stride might be more than 1. Similar to a typical NN, output of the layer is the convolution layer also has bias + non-linear activation.
- Pooling Layer
- Fully Connected Layer
- 1X1 convolution
Pooling Layer
- Has two hyper parameters: stride , filter size
- Pooling layer has no parameter to learn
- Works well in CNN - however why not well understood
- Because nothing to learn, very cheap.
Size
Input
Hyperparameter ,
Output
Note Number of input channel and number of output channels are the same. The same filter is applied to each of the channel independently.
Example: Max Pool
Example: Average Pool
Different CNN
- Classic
- Lenet 5 (‘Lecun 1998)
- AlexNet (2012)
- VGG-16 (2015), VGG-19
- Recent
- ResNet (very deep 152 layers) (2015)
- Inception (uses 1x1 convolution) (2014)
1x1 Convolution
(Network in Network, Lin et. el. 2013)
- Adds non-linearity in the network
- Help reduce number of channels if channels became too large
Example of 1x1 CONV for 1 channel image is as follows:
It seems like - it just multiplies by a constant. However, in case of multiple channel, we can think of it as a fully connected layer over the channels - i.e. , as shown in the figure below:
Example use-case to reduce number of channels:
Here, we have 32 1x1 conv filter, each of dimension 1x1x192 (192 because number of channels for input and the filter has to be the same).
LeNet
AlexNet
- Similar to LeNet, but much bigger (60M params compare to 60K)
VGG-16
- Simplified architecture using the same kind of operations over and over.
- Main downside is pretty big network with ~138M params.
- The two operations are:
- Convolution 3x3 filters, s=1, Same
- Pooling 2x2, s=2
ResNet
- The issue with very deep network is the exploding/ vanishing gradient problem. ResNet makes use of ‘skip connection’ or ‘short cut’ to help with vanishing/exploding gradient problem.
Inception Net
Two key ideas:
- Inception Block: (try out everything you want)
- Bottleneck Layer: Reduce computational costs. E.g.
needs 120M multiplications. In contrast, using a bottleneck layer as shown below:
only needs 12.5M multiplications.
No comments:
Post a Comment