convolutional neural network

What is a CNN (Convolutional Neural Network)?

Convolution Formula

It is a convoluted Neural Network.

A convolution is a combined integration of two functions: it shows you how one function modifies the other or the shape of the other.

Image to Classify

First the Machine translates the images to 0s and 1s because of black and white.


As humans, the Machine looks for specific features to recognize the image.

We apply a Feature Detector (Filter or Kernel) to our Input Image and we get a Feature Map (Convoluted Feature or Activation Map) which contains the information of how frequent that feature is in the Input Image.

Multiple Feature Maps create the Convolutional Layer.

Image gets translated into bits, we detect the features we need, so we create the feature maps

We apply a Rectifier Function ('relu') because we want to increase non-linearity in our CNN: images themselves are highly non-linear (different objects in the image, background stuff, transitions from pixels).

When we apply our Feature Detector we risk to create something linear (that's why we need to break this linearity by applying the 'relu' function).

For example if there's a linear color progression from white to black in an image, you can break this by entirely removing the black.

Max Pooling, Pooled Feature Maps, Flattening & Dense/Full Connection

Max Pooling(Down Sampling) helps us have:

  • Spacial Invariance: the Neural Network doesn't care in which part of the image it finds and learns the Features or if they are distorted. In this way, on new and similar images it will recognize the feature. We have some level of flexibility 
  • We're able to preserve the features: if the image is rotated the Pooled Feature Map will keep the feature's values
  • We're reducing the size (75%): we reduce the n° of parameters that are going into our Neural Network & we prevent overfitting
We apply Pooling on the Feature Maps, to only keep the significant informations and be faster on computing data

We apply Max Pooling (2x2 with a stride of 2) on our Feature Map:

  • Mask the Feature Map with 2x2 Max Pooling
  • Take the biggest number from the 2x2 Pooling and record it in the Pooled Feature Map
  • Go to the next stride, repeat

What is a stride?

A stride is how many columns and rows you jump by.

Example of Strides

Flattening the Pooled Feature Map

We take the Pooled Feature Map and we Flatten() it to an array, so we can use it as an input for our Neural Network.

Vector as input to Neural Network

In brief

machine learning convolution cnn