The Role of Probability in Machine Learning!

Greeting to all!

This post is an extension to the post I wrote last time. In the last post I talked a bit about linear algebra. In this post I’ll talk about Probability and it’s importance in Machine Learning. With the recent advancement of Statistical Machine Learning theory probability is the most powerful tool for analyzing Machine Learning models.

Now what is Probability? Technically Probability is a mathematical framework to deal with uncertainty. It is a way to quantify uncertainty in an event. In Machine Learning settings uncertainty is inherent. The reasons are obvious, real world data set has lots of noise in it, and the data generation process can also be biased. So we need some framework to analyse this uncertainty and then take decisions accordingly. I’ll explain how this is used in Machine Learning to quantify uncertainty. But before digging into Probability for machine learning I’ll press on the importance of one particular class of function used very frequently in Machine Learning, the Gaussian Distribution. For anyone studying Machine Learning this is the most common term they’ll encounter in their study. Gaussian Distribution is probably (with very high probability :D) the most important distribution in probability theory. It is very commonly used to model noise in the data. The reason is the Central Limit Theorem in Statistics. This theorem states that sum of any number of Random Variables will be a Gaussian Distribution (or at least will be close to it). Now noise is generally due to sum of many random events like human error, error in recording device etc. If we consider each of these events as a Random Variable then sum of all these random variable will be a Gaussian Distribution. This is the reason why Gaussian is so commonly used to model noise.

Coming back to probability, I’ll illustrate its role with an example. Consider a problem of predicting the employability of candidates based on their credential. Input to the model is the credential of the candidate ( grades, experience, relevant projects,referral, etc) and the target is a ordinal variable (each number comes in order, lowest number having least significance highest number with most significance).

This is a toy example. On x-axis is the input variable(consider it to be grades of employees, assume which for any weird reason varies sinusoidal) and on y-axis is the output(employability). Consider the thin blue line in the middle to be the ideal trend which is followed but due to some errors the Y values oscillate about that thin blue line and the readings are corrupted. Now to model this uncertainty probability is used, for any point X0 on X-axis we define a Gaussian distribution on Y variable conditioned on X to accommodate the variation in Y values.

Consider another problem, where we are given X-Ray of a patient and our goal is to predict whether he has a fracture or not. We can have training examples which covers lot of variations of possible X-rays for which some of them have fractures and some X-Rays don’t have fractures. Due to obvious reasons we cannot capture all the variations possible in an X-Ray so we train a Machine Learning model to take X-ray as input and give output the probability of having fracture given the input. This is the aim of Machine Learning, this is important to realize. Given “enough” training examples learn the pattern which they follow and then make predictions accordingly (the pattern here is what does a fractured X-ray looks like). Of course there are more technical details to learn in Probability which I’ll talk in next post. This was just to get a feel of the role of probability in Machine Learning.

Cheers!!


Leave a comment