Machine Learning Interview Questions – Q7 – Why is Naive Bayes naive?

Machine learning interview questions is a series I will periodically post on.  The idea was inspired by the post 41 Essential Machine Learning Interview Questions at Springboard.  I will take each question posted there and provide an answer in my own words.  Whether that expands upon their solution or is simply another perspective on how to phrase the solution, I hope you will come away with a better understanding of the topic at hand.

To see other posts in this series visit the Machine Learning Interview Questions category.

Q7 – Why is Naive Bayes naive?

Naive Bayes is a machine learning implementation of Bayes Theorem.  It is a classification algorithm that predicts the probability of each data point belonging to a class and then classifies the point as the class with the highest probability.

It is naive because while it uses conditional probability to make classifications, the algorithm simply assumes that all features of a class are independent.  This is considered naive because, in reality, it is not often the case.  The upside is that the math is simpler, the classifier runs quicker, and the results are often quite good for certain problems.

 

Bayes’ Theorem Review

Bayes’ Theorem gives us the probability of an event actually happening by combining the conditional probability given some result and the prior knowledge of an event happening.

Conditional probability is the probability that something will happen, given that something has a occurred.  In other words, the conditional probability is the probability of X given a test result or P(X|Test).  For example, what is the probability an e-mail is spam given that my spam filter classified it as spam.

The prior probability is based on previous experience or the percentage of previous samples.  For example, what is the probability that any email is spam.

Formally

    • P(A|B) = Posterior probabilty = Probability of A given B happened
    • P(B|A) = Conditional probaility = Probability of B happening if A is true
    • P(A) = Prior probability = Probability of A happening in general
    • P(B) = Evidence probability = Probability of getting a positive test

 

Naive Bayes Classifier

There are several types of Naive Bayes classifiers.  Which one you use will depend on the features you are working with.  The different types are:

  • Gaussian NB – use when you have continuous feature values.  This classifier assumes each class is normally distributed.
  • MultiNomial NB – good for text classification.  This classifier treats each occurrence of a word as an event.
  • Bernoulli NB – use when you have multiple features that are assumed to be binary.  This classifier can be used for text classification but the features must be binary.  For text classification, the features can be set as a word is in the document or not in the document.

Advantages

  • Can successfully train on small data set
  • Good for text classification, good for multiclass classification
  • Quick and simple calculation since it is naive

Disadvantages

  • Can’t learn the relationship among the features because assumes feature independence
  • Continous feature data is assumed to be normally distributed

 

Summary

Naive Bayes is a classification algorithm based on the Bayes’ Theorem.  It uses conditional probability to make class predictions.  Naive Bayes is naive because it assumes every feature is independent in predicting the class.  

While this is most likely not true in reality, it provides the benefit of quick and simple calculations that allow a Naive Bayes classifier to work well on problems such as text classification.

To see other posts in this series visit the Machine Learning Interview Questions category.