What is CNN (ConvNet)?
What is CNN (Convolutional Neural Network)?
If you ever once had an interest in Computer Vision, especially in Artificial Intelligence, you may hear about Convolutional Neural Network (CNN) or AlexNet.
A CNN is one of the famous artificial neural networks in the computer vision and image recognition fields. It is very powerful when we are processing pixel data like images, and it can be used in both generative and descriptive tasks.
Then what makes CNN so special?
What is CNN?
A Convolutional Neural Network (CNN) is a neural network that has learnable parameters (weights and bias), and it takes out important features from an input image that can be used to differentiate one from the other.
One important idea of a CNN is that the pre-processing required is much lower than other techniques used in the past.
This is because a CNN learns filters and parameters by itself from images while primitive methods filters are hand-engineered.
Why CNN?
Before CNN was found, the input image was flattened into a vector before it went into a model. This might work for a simple binary image but not for a complex image.
However, CNN takes in images and takes out the Spatial and Temporal dependencies from the image with its filters in the CNN network.
Since it uses filters and captures important features in images, the number of parameters gets reduced since it reuses those filters and the size of the vector before the fully connected layer gets smaller.
How CNN Works?
Let's think of an RGB image.
Computers understand this image as a matrix of pixel values having three planes.
Like the image above, CNN takes a kernel and applies it to the input image, and takes out convolved features which will be passed on to the next layer.
For RGB Image, there will be different kernels or filters for each of the channels.
Those features contain various information about an input image, and that information will be mapped into higher dimensions or channels as it passes through the network.
This means that CNN will extract higher-level features as it moves through more layers.
These features can be edges, gradient orientation, or even the contour of an input image.
Therefore CNN reduces the images into a form that is easier to process, without losing features that are critical for processing images.
So sometimes we need to add paddings around the input in order to keep the size the same.
Or, if you want to reduce the size of the image, you may you a pooling layer.
What is a Pooling Layer?
A pooling layer is used when you want to reduce the spatial size of the Convolved Feature.
This can make your model lighter since it decreases the computational power required by reducing the dimensions of the input.
First is Max Pooling which only takes the highest value from the portion of the image covered by the Kernel, while Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.
Like this, CNN is a powerful network used in computer vision and image processing; however, it has a limitation such as understanding the content of images.
Still, CNN is widely used and makes a great performance.
Later, I will talk about different networks using CNN.
References:
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
https://www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/
https://cs231n.github.io/convolutional-networks/
Comments
Post a Comment