Recently there has been lot of buzz around Deep Learning. Almost all of the tech giants, Facebook, Google, Microsoft, Apple are investing heavily in this technology. In social media there has been lot of hype related to it. Some are claiming it to be the ‘ultimate solution’ to the long standing AI problem. Some of them are claiming it to be final solution to How Brain works. Due to all this hype this technology seems like some Vodoo Magic, some futuristic technology to people. I thought I’ll demystify all this hype around Deep learning by explaining it in layman terms.

If you are familiar with Neural Networks than Deep Learning is nothing but big fat Neural Networks with lots of Non-Linear layers in it. If you are not, then I’ll explain what these things are.

Lets go back few years back, say 50 years. The ultimate goal of AI is to create a Machine which can replace a human in the sense that it can talk like a human, ‘listen’ like a human, ‘perceive’ like a human and reason like human. Scientists have tried using Rule based engines to do that. You ask machine a question, it consults the ‘Rule book’ and gives answer according to it. But this is not a good solution. What if the question asked is out of the ‘book’? Scientists are puzzled by this problem. So they ask a question? Why are we looking for new novel ways to design an Artificial Intelligence system, when we already have such a system present? Our brain. It is perfect machine to study and try to emulate to achieve an AI system.

They started studying the architecture of Brain and they discovered it is a big “Neural Network” with Billions and Billions of Neurons connected to each other and interacting in complex fashion to make us ‘intelligent’. They start simulating the same architecture in hardware. They name it ‘Perceptron Model’. It was giving ‘Good’ empirical results on some tasks and they declare that they have found a perfect solution to AI problem.

Mathematically a perceptron is ‘Linear’ system which separates points belonging to two ‘class’ with a straight line. For example in a face recognition problem the two classes are ‘Face’, ‘Non-Face’. And points are images.

Back to the past, Scientists are happy for creating a perfect AI solution. But soon one scientist finds out that this is a flawed system and problems with two classes cannot be separated by a line cannot be solved by this system. So scientists decide to solve this problem by stacking multiple such units in layers to solve this problem. This is how Neural Network was created.

This is a typical Neural network with three layers and each layer transforms the input using some ‘non-linear’ function. To train the network we have to tune the weights each edge between two nodes carry to modify the mapping as the mapping depends on the incoming weights. I won’t go into mathematical details, they seem to be boring. I’ll rather give an intuition on how it works.

In my previous post I mentioned that to fit a model on some points we assume some parametric form like a line, circle etc and then using data we find the parameters of that model. A big flaw in this model is that we are fixing underlying structure which we will fit on the data and we are just tuning the parameters. This is not a good approximation of any function.

If you remember Fourier Series expansion of a function, a function can be approximated by infinite sum of ‘sine’ functions of different frequencies. So a good approximation of function will be when you consider lot more terms in the basis. But taking infinite basis functions is not possible, so to work around that solution we can make the basis function itself adaptable. That means basis function itself depends on the data.

The ‘non-linear’ functions I mentioned are infact the basis functions used for ‘function approximation’ in Neural Networks. They are dependent on data. Through weights of edges incoming to them.

It was this time when Geoffrey Hinton developed Backpropagation algorithm for training Neural Networks. We have reached era of 80’s. After lot of hype suddenly scientists backed up from Neural Network due to their heavy computational and data need.

Now lets go back to mid of 20K-2010 decade. Thanks to internet we have loads of data available and due to advances in hardware we have very powerful machines too. Geoffrey Hinton showed at this time that Big Neural networks can be trained efficiently using lots of data and computation. Scientists started experimenting on this and it is then they came up with “Deep Learning”. As I explained earlier that Neural Networks are multiple perceptrons stacked together.

When you stack multiple layers like this instead of only one ‘hidden layer’ this is called Deep network. And the learning performed by such machine is called Deep learning. So you can see this is no fancy stuff. This is just plain Neural Network with big architecture.

Of course there are many complications associated with it. It is difficult to train such big networks even with high computation power. This is where all the engineering comes in and all popular architectures are some work around to this problem.

One more reason for popularity of Neural Network is that they can even learn ‘Feature Detection’ from the data. Best example of this is a Convolution Neural Network. It is a big breakthrough in the field of Computer Vision. Entire Computer Vision depends on ‘good’ features detectors from images and Computer scientists spend decades in finding good features detector which are essentially hand crafted. Deep Networks on the other hand can detect good features for the task automatically from the data. This is a big plus for Machine Learning researchers, they don’t have to design features by hand. They can just use a Neural Network for the same.

This is Deep learning in nutshell. Please comment your observations and/or any disparity you find in this article.

Cheers!