Machine Learning Interview Questions – Q4 – Explain how a ROC curve works

Machine learning interview questions is a series I will periodically post on.  The idea was inspired by the post 41 Essential Machine Learning Interview Questions at Springboard.  I will take each question posted there and provide an answer in my own words.  Whether that expands upon their solution or is simply another perspective on how to phrase the solution, I hope you will come away with a better understanding of the topic at hand.

To see other posts in this series visit the Machine Learning Interview Questions category.

Q4 – Explain how a ROC curve works.

The ROC curve also known as the receiver operating characteristic curve or relative operating characteristic curve is used in binary classification problems to help visualize model performance.

Binary classification means that our model is predicting a data point as belonging to one of a potential two classes.  Sometimes when doing binary classification we can set a probability threshold of our model to help determine which class we want the model to pick.  This probability threshold for our model is useful for certain problems that have a different cost/benefit for each class.

For example, if we are predicting whether a tumor is cancerous or benign, we would probably want to err on the side of having the model predict cancer even if a patient is healthy, rather than predict benign when the tumor is actually cancerous.   The ROC curve plots the true positive rate (TPR) and false positive rate (FPR) at various class probability thresholds.

Confusion Matrix

The true positive rate, also referred to as sensitivity, recall, or probability of detection, defines how many correct positive predictions occur among all positive samples.  A true positive is an instance that your model predicts to be positive and the actual result of that data point is positive.

The false positive rate, also referred to as fall-out, 1-specificity, or probability of false alarm, defines how many incorrect positive predictions occur among negative samples.  A false positive is an instance that your model predicts to be positive but the actual result of that data point is negative.  The following table, known as the confusion matrix, summarizes the various prediction errors we can have in a binary classification problem.

 Prediction Positive Prediction Negative Reality Positive True Positive False Negative (Type 2 error) Reality Negative False Positive (Type 1 error) True Negative

In a confusion matrix, also known as a contingency table, the true/false part of the label is whether or not the prediction and the reality match.  The positive/negative label is always referring to the prediction class.

For example, a true negative means we predicted negative and the reality was true to our prediction, the reality was negative also.  A false positive means we predicted positive but the reality was false to our prediction, the reality was really a negative

ROC Space

The above image from Wikipedia illustrates the space of a ROC plot.  The ROC Curve is not plotted yet but this is the grid we will plot on.  The TPR or true positive rate on the y-axis and the FPR or false positive rate along the x-axis.

That dashed red line at the 45-degree angle is called the line of no discrimination, that line essentially represents random guessing when classifying data points.  Points above the line mean classification is better than random and points below the line mean you are predicting at worse than random.

The top left point at 0,1 is known as perfect classification where the FPR is 0 and the TPR is at 100 percent.

Curves in ROC Space

To plot the ROC Curve you take the TPR and the FPR at various classification probability thresholds.  The ROC Curve lets you visualize all possible classification probabilities.  This allows you to construct your model with business logic in case of wanting to maximize a TPR when the cost of false positives is rather low.  An example being fraud detection, simply asking the user to verify a potential purchase is a rather low cost so finding a class threshold that allows for more false positives is worthwhile.

The ROC curve helps you visually understand the impact of your choice of a classification probability threshold.

The above image from Wikipedia illustrates a point on the ROC Curve.  The distribution plot in the top left shows how both classes of data are distributed, while the black line is the probability threshold for classifying each class and is shown on the ROC curve as the black dot.  As you slide the probability threshold you will slide along the ROC curve.

To further understand the relationship between classification probability threshold and the ROC Curve, I suggest you play with this ROC Curve applet.  In this applet, the gray slider at the top will change the distribution of your two classes, the black bar on the bottom is the classification threshold and will change where you are on the ROC Curve.  The threshold that has the least overlap between the two distributions will give you the point closest to perfect classification (top left at 0,1).

Summary

The ROC Curve is a plot of the true positive rate vs the false positive rate for different classification thresholds.  It allows you to find the optimal classification probability threshold for your model depending on the cost/ benefits of the various errors given by the confusion matrix.

The top left is more accurate with the point 0,1 being perfect classification.  The 45-degree line of no discrimination represents random guessing, being above that line is more accurate than random.

To see other posts in this series visit the Machine Learning Interview Questions category.