# Machine Learning Interview Questions – Q6 – What is Baye’s Theorem?

Machine learning interview questions is a series I will periodically post on.  The idea was inspired by the post 41 Essential Machine Learning Interview Questions at Springboard.  I will take each question posted there and provide an answer in my own words.  Whether that expands upon their solution or is simply another perspective on how to phrase the solution, I hope you will come away with a better understanding of the topic at hand.

To see other posts in this series visit the Machine Learning Interview Questions category.

### Q6 – What is Bayes’ Theorem?

Bayes’ Theorem helps us determine the actual probability of an event or class by combing the knowledge of the prior probability with the information from a test result.  It essentially converts the results from a test into a real probability of the actual event actually occurring.

In a way, Bayes’ Theorem can be thought of as the ‘evidence’ theorem.  It describes how much you can trust the info you are getting from your test.  For example, a car alarm has so many false positives that you begin to lose trust that a car alarm is signaling car theft.  Bayes’ Theorem is the mathematical concept that combines a false positive rate with an actual probability of the event happening into the probability that a test (car alarm) is really signaling the event (car theft).

Prior and Posterior Probability

Tests can be flawed – we can often get false positives and false negatives.  To get a more accurate picture of the actual probability of an event occurring we need to take into account the real world probability as well.  This prior probability is the probability that the event happens at all, given no information from your test.  By combining this with the probability our test gives is how Bayes’ Theorem produces an accurate probability of the event happening.

Prior probability can be thought of as the probability estimate before you even look at your test results.  What is the probability of this even in the general population without having any evidence or test results?  What is your initial belief that this event will occur?

Posterior probability is the probability of the given event after you see the test result and combine it with your prior probability.  Bayes’ theorem tells you how to combine the prior probability with your test result to get an accurate posterior probability.

Bayes’ Theorem Formula

The formula is basically:

Real Probability = (True Positive count * Prior Probability) / All Positive counts

To get an accurate measure from the test you need to take the true positive count you got and divide that by the sum of true positives and false positives (anytime the test was positive).  This is how you account for false positive rates.  Think of getting a reliable test probability as:

Chance of evidence being real = Chance of a true positive / Any chance of a positive test

Multiplying the numerator by the prior probability is how you combine the test probability and prior probability to get your real world probability of the event happening.

Formal Definition

In this equation the terms mean:

• P(A|B) = Chance of event A given the positive test B
• P(B|A) = Chance of true positive – chance of positive test given event A being true
• P(A) = Chance of A happening in general (prior probability)
• P(B) = Chance of getting a positive test (chance of true positive or false positive)

Example

A common classification problem is whether an e-mail is spam or not spam.  Let’s say you have a test that can accurately classify an email as spam 80% of the time (true positive rate), and classify a non spam e-mail as non-spam 98% of the time (true negative rate).  We will also assume that the probability any e-mail being spam is 5%.

The table below summarizes these metrics.  If an e-mail is spam in reality, which it has a 5% chance of being, then it will test positive 80% of the time and test negative incorrectly 20% of the time.  If in reality, an e-mail is not spam, which occurs 95% of the time, it will test positive only on 2% of the tests and test as not spam 98% of the time.

 Spam (5%) Not Spam (95%) Test Result – Is Spam 80% 2% Test Result – Not Spam 20% 98%

Given our prior probability and the evidence from our test, we can predict whether or not an e-mail actually is spam.  In the case of a positive test result here is how it works:

P(Spam | Positive Test) = P(Positive Test | Spam)*P(Spam) / P(Positive Test)

The chance of getting a positive test result is simply:

P(Positive Test) = True Positive Rate + False Positive Rate

P(Positive Test) = (80% * 5%) + (2% * 95%)

P(Positive Test) = (0.04) + (.019) = .059

Therfore using Bayes’ Theorem we get:

P(Spam | Positive Test) = 80% * 5% / .059

P(Spam | Positive Test) = .04 / .059 = .677

This means that with the current accuracy of this test, an e-mail that tests positive for spam is 67.7% likely to be spam.  Since our test isn’t that accurate and offers a decent amount of false negatives (spam e-mails that test negative as not spam), we can’t simply trust the test result to be correct 80% of the time.

Conclusion

Bayes’ Theorem helps us combine the test result with the prior probability of the event occurring.  This gives us a real probability of the event actually happening now given a test result.

Naive Bayes classifiers are an implementation of Bayes’ theorem for machine learning.  It assumes independence among the features and is best with categorical variables.  It is well known for multi class prediction and text classification.