This article was published as a part of theData Science Blogathon
In today’s post, we will discuss the merits of ROC curves vs. accuracy estimation.

Photo By @austindistelon Unsplash
Multiple mysteries may bother you while appraising your machine learning models like,
- Is prediction score a more trustworthy evaluation metric than ROC?
- What is Area Under Curve ROC, and how to apply it?
- If my data is deeply imbalanced, should I practice AROC rather than Accuracy or vice versa?
Here is a quick summary of our discussion.
- The “Receiver Operating Characteristic” (ROC) curve is an alternative to Accuracy for evaluating learning algorithms on raw datasets.
- The ROC curve is a mathematical curve and not an individual number statistic.
- In particular, this means that the comparison of two algorithms on a dataset does not always produce an apparent order.
- Accuracy (= 1 – error rate) is a standard method employed to estimate training algorithms. It is a single-number review of completion.
- AROC is the area beneath the ROC curve. It is a single estimate report of completion.
As perpetually, it depends, but learning the trade-offs within various metrics is crucial for making the accurate decision.
- Accuracy: It estimates how many observations, both positive and negative, were accurately classified. You shouldn’t use Precision on imbalanced difficulties. Then, it is obvious to get an extraordinary accuracy score by solely transcribing all comments as the majority class. Considering the accuracy rate is determined on the predicted levels (not prediction rates), we must implement a particular threshold before measuring it. The clear option is the threshold of 0.5, but it can be suboptimal.
- ROC/ AROC: When it proceeds to a classification problem, we can calculate an AROC. A ROC curve (receiver operating characteristic curve) is a plot of achievement of a classification model at each classification threshold. It is one of the numerous fundamental evaluation metrics for monitoring any classification model’s achievement.
Examining these metrics is a complex matter because, in machine learning, each works differently on different natural datasets.
It will make some sense if we accept the hypothesis “Performance on past learning problems (roughly) predicts performance on future learning problems.”
The ROC vs. accuracy discussion confuses with “is the goal classification or ranking?” because ROC curve creation demands generating a ranking.
Here, we believe the purpose is classification willingly than ranking. (There are several natural problems where we prefer the ranking of instances to classification. In extension, there are numerous natural obstacles where classification is the intention.)
How To Measure ROC Curve:
The ROC curve is generated by measuring and outlining the true positive rate versus the false-positive rate for a particular classifier at a family of thresholds.
True Positive Rate = True Positives / (True Positives + False Negatives)False Positive Rate = False Positives / (False Positives + True Negatives)
- The true positive rate is additionally introduced as sensitivity.
- The false-positive rate is additionally introduced to as Specificity.
How To Measure Accuracy Score:
Accuracy is calculated as the division of accurate predictions for the test data. It can be determined easily by dividing the aggregate of true predictions by the product of complete predictions.
Accuracy = True Positive + True Negative / True Positive + True Negative + False Positive + False Negative.
Arguments for ROC
Specification: The costs of choices are not well specified. The training standards are often not expressed from the corresponding marginal distribution as the test models. ROC curves allow for an adequate comparison over a range of different choice costs and marginal distributions.
Dominance: Standard classification algorithms do not have a dominant structure as the costs vary. We should not say “algorithm A is better than algorithm B” when you do not know the choice costs well enough to be sure.
Just-in-Time use: Any system with a good ROC curve can efficiently be designed with a ‘knob’ that controls the rate of false positives vs. false negatives.
AROC inherits the arguments of ROC except for Dominance.
Summarization: Humans do not have the time to understand the complexities of a conditional comparison, so having a single number instead of a curve is valuable.
Robustness: Algorithms with a large AROC are robust against a variation in costs.
Accuracy is the traditional approach-Arguments for Accuracy.
Summarization: As for AROC.
Intuitiveness: Within no time, people understand what Accuracy means. Unlike (A)ROC, it is obvious what happens when one additional example is classified wrong.
Statistical Stability: The basic test set bound shows that Accuracy is stable subject to only the IID assumption. It is only valid for AROC (and ROC) when the number in each class is not near zero.
Minimality: In the end, a classifier makes classification decisions. Accuracy directly measures this while (A)ROC compromises this measure with hypothetical alternate choice costs. For the corresponding purpose, evaluating (A)ROC may demand significantly more effort than resolving the problem.
Generality: Accuracy generalizes immediately to multiclass Precision, rank-weighted Precision, and comprehensive (per-example) cost-sensitive classification. ROC curves become problematic when there are just three classes.
Although the area beneath the ROC curve (AROC) is no habitual quantity in itself.
I observe that its interpretation as a Wilcoxon-Mann-Whitney statistic, which effectively measures the fraction of positive-negative instance pairs ranked correctly, makes the quantity easier to understand. This interpretation also has other benefits; while generalizing ROC curves to more than two classes is not straightforward, the above interpretation facilitates graceful generalizations of the AROC statistic to multi-category ranking.
Some additional data, more or less relevant to the thread:
a) A subtle and exciting difference between AROC evaluations and computations based on the most significant “standard” loss functions (including 0/1 loss, squared-error, “cost-sensitive classification,” etc.) is that we can evaluate all the standard loss functions for each (example) independently of the others. AROC is defined only for a set of examples.
b) One neat use of AROC is as a base-rate-independent version of the Bayes rate. Specifically, data sets cannot be compared directly to Bayes rates when their base rates differ (by base rate, it means the typical notion of the marginal/unconditional probability of the most probable class). However, their “optimal” AROCs could be connected instantly as assumptions of how divisible the classes are.
Summing-up
- When your dataset is balanced & all classes are equivalently crucial to you, Accuracy is ordinarily a great start. A further advantage is that it is outspoken to describe it to non-technical stakeholders in your scheme.
- AROC is scale-invariant because it estimates how well predictions are ranked, preferably than their positive values. AROC is classification-threshold-invariant measures the quality of the model’s predictions irrespective of whatever classification threshold is taken.
What To Do Next
One crucial method not yet mentioned in the present discussion is the elegant work by Provost and Fawcett on the ROC Convex Hull as an alternative to both “vanilla” ROC curves and the Area Under Curve summary. Within the ROCCH framework, classifiers with the highest expected utility have curves sitting on the convex hull of all the candidate classifiers’ curves. Expected-cost-optimal regions of the hull’s upper boundary (parametrized by a gradient) are related to the practitioner’s belief about utility and class priors.
Here are a few study materials I suggest to readers for further understanding of the topic:
- Convex hull-based multi-objective evolutionary computation for maximizing receiver operating characteristics performance
- Maximizing receiver operating characteristics convex hull via dynamic reference point-based multi-objective evolutionary algorithm
- Robust classification systems for imprecise environments
- Convex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance
Thanks for Browsing my Article. Kindly comment and do not forget to share this blog as it will motivate me to deliver more quality blogs on ML & DL-related topics. Thank you so much for your help, cooperation, and support!
About Author
Mrinal Walia is a professional Python Developer with a computer science background specializing in Machine Learning, Artificial Intelligence, and Computer Vision. In addition to this, Mrinal is an interactive blogger, author, and geek with over four years of experience in his work. With a background working through most areas of computer science, Mrinal currently works as a Testing and Automation Engineer at Versa Networks, India. My aim to reach my creative goals one step at a time, and I believe in doing everything with a smile.
Medium | LinkedIn | ModularML | DevCommunity | Github
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
FAQs
What is accuracy vs ROC vs AUC? ›
When measuring a predictive model's performance, there are two essential metrics: ROC AUC and Accuracy. ROC AUC compares the relation between True Positive Rate and False Positive Rate, while Accuracy is simply the percentage of correct predictions.
What is the difference between ROC and AUC? ›AUC stands for "Area under the ROC Curve." That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).
What is the relationship between ROC curve and accuracy? ›AUC equals 0.5 when the ROC curve corresponds to random chance and 1.0 for perfect accuracy. On rare occasions, the estimated AUC is <0.5, indicating that the test does worse than chance.
What is the difference between ROC AUC and AUC? ›A. The ROC AUC score tells us how efficient the model is. The higher the AUC, the better the model's performance at distinguishing between the positive and negative classes. An AUC score of 1 means the classifier can perfectly distinguish between all the Positive and the Negative class points.
What is difference between AUC and accuracy? ›AUC is the go-to metric in such scenarios as it calibrates the trade-off between sensitivity and specificity at the best-chosen threshold. Further, accuracy measures how well a single model is doing, whereas AUC compares two models as well as evaluates the same model's performance across different thresholds.
What does ROC and AUC show? ›ROC is a probability curve and AUC represents the degree or measure of separability. It tells how much the model is capable of distinguishing between classes. Higher the AUC, the better the model is at predicting 0 classes as 0 and 1 classes as 1.
Can we use ROC and AUC in regression? ›Neither of these measures exists in the context of regression, so there is no such thing as ROC curves for regression.
Why is AUC the best? ›The AUC is an estimate of the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. For this reason, the AUC is widely thought to be a better measure than a classification error rate based upon a single prior probability or KS statistic threshold.
What is a good AUC for ROC? ›AREA UNDER THE ROC CURVE
In general, an AUC of 0.5 suggests no discrimination (i.e., ability to diagnose patients with and without the disease or condition based on the test), 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding.
AROC is the area beneath the ROC curve. It is a single estimate report of completion.
How is AUC calculated? ›
AUC :Area under curve (AUC) is also known as c-statistics. Some statisticians also call it AUROC which stands for area under the receiver operating characteristics. It is calculated by adding Concordance Percent and 0.5 times of Tied Percent.
What is ROC curve used for? ›The ROC curve is used to assess the overall diagnostic performance of a test and to compare the performance of two or more diagnostic tests. It is also used to select an optimal cut-off value for determining the presence or absence of a disease.
How is AUC calculated from ROC curve? ›ROC AUC is the area under the ROC curve and is often used to evaluate the ordering quality of two classes of objects by an algorithm. It is clear that this value lies in the [0,1] segment. In our example, ROC AUC value = 9.5/12 ~ 0.79.
Why is ROC AUC bad for imbalanced data? ›For imbalanced classification with a severe skew and few examples of the minority class, the ROC AUC can be misleading. This is because a small number of correct or incorrect predictions can result in a large change in the ROC Curve or ROC AUC score.
What are the disadvantages of AUC ROC? ›The area under the receiver operating characteristic curve (ROC AUC) measures diagnostic accuracy. Confidence scores used to build ROC curves may be difficult to assign. False-positive and false-negative diagnoses have different misclassification costs. Excessive ROC curve extrapolation is undesirable.
How do you calculate AUC and accuracy? ›The AUC is the P(predicted TRUE|actual TRUE) vs P(FALSE|FALSE), while the overall accuracy is the P=P(TRUE|TRUE)*P(actual TRUE) + P(FALSE|FALSE)*P(actual FALSE). So this depends on the proportion of the true value on your data set very much.
Is AUC diagnostic accuracy? ›AUC is a global measure of diagnostic accuracy. It tells us nothing about individual parameters, such as sensitivity and specificity. Out of two tests with identical or similar AUC, one can have significantly higher sensitivity, whereas the other significantly higher specificity.
What is the difference between AUC and AIC? ›Roughly speaking: AIC is telling you how good your model fits for a specific mis-classification cost. AUC is telling you how good your model would work, on average, across all mis-classification costs.
How do you interpret ROC curve results? ›An ROC curve lying on the diagonal line reflects the performance of a diagnostic test that is no better than chance level, i.e. a test which yields the positive or negative results unrelated to the true disease status.
How do you interpret ROC area? ›Interpreting the ROC curve
Classifiers that give curves closer to the top-left corner indicate a better performance. As a baseline, a random classifier is expected to give points lying along the diagonal (FPR = TPR). The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
Is AUC a good metric for imbalanced data? ›
There is a similar metric AUC-ROC curve, which works similar to AUC-PR but is based on TPR-FPR rates. However, the AUC-ROC curve is not preferred under severe imbalance, as it produced over optimistic results especially if the number of rare class is very small.
When AUC is more useful than accuracy? ›Accuracy is a very commonly used metric, even in the everyday life. In opposite to that, the AUC is used only when it's about classification problems with probabilities in order to analyze the prediction more deeply. Because of that, accuracy is understandable and intuitive even to a non-technical person.
Does high AUC mean Overfitting? ›Here is my conclusion: If Training AUC value and Test/validation AUC value differs significantly then it is probably Overfitting and in case they do not differ significantly but have pretty low values (say around 0.5-0.6) then it is probably Underfitting.
When should AUC be used? ›You should use it when you care equally about positive and negative classes. It naturally extends the imbalanced data discussion from the last section. If we care about true negatives as much as we care about true positives then it totally makes sense to use ROC AUC.
What does a good AUC look like? ›The value for AUC ranges from 0 to 1. A model that has an AUC of 1 is able to perfectly classify observations into classes while a model that has an AUC of 0.5 does no better than a model that performs random guessing.
What is AUC known for? ›AUC is the premier English-language institution of higher learning. The University is committed to teaching and research of the highest caliber, and offers liberal arts and professional education in a cross-cultural environment.
What is the optimal value of AUC? ›This method provides an “optimal” cut-point which has maximum sensitivity and specificity values at the same time. In order to find the highest sensitivity and specificity values at the same time, the AUC value is taken as the starting value of them. For example, let AUC value be 0.8.
Is an ROC AUC of 0.75 good? ›0.5 < ROC < 0.7 We consider this poor discrimination, (...). 0.7 ≤ ROC < 0.8 We consider this acceptable discrimination. 0.8 ≤ ROC < 0.9 We consider this excellent discrimination. ROC ≥ 0.9 We consider this outstanding discrimination.
What does an AUC of 0.5 mean? ›A perfect predictor gives an AUC-ROC score of 1, a predictor which makes random guesses has an AUC-ROC score of 0.5. If you get a score of 0 that means the classifier is perfectly incorrect, it is predicting the incorrect choice 100% of the time.
How do you calculate AROC? ›The average rate of change represents a measurement that can provide insight into a variety of applications. From finance and accounting to engineering applications, you can calculate the average rate of change using the simple algebraic formula: (y1 - y2) / (x1 - x2).
What does Aroc stand for? ›
Australasian Rehabilitation Outcomes Centre Home. AROC is the national rehabilitation medicine integrated outcomes centre of Australia and New Zealand.
What are the two methods to calculate AUC? ›There are two major different approaches to calculation of AUC: one is compartmental modeling analysis; the other is model independent (non-compartmental) analysis.
What are any two methods to calculate AUC? ›The AUC was calculated by incremental area, positive incremental area, and total area using the trapezoidal rule. The 3 methods resulted in different statistical inferences.
How do you calculate AUC in Excel? ›For Example 1, the AUC is simply the sum of the areas of each of the rectangles in the step function. The formula for calculating the area for the rectangle corresponding to row 9 (i.e. the formula in cell H9) is shown in Figure 2. The formula for calculating the AUC (cell H18) is =SUM(H7:H17).
What does AUC of 0.6 mean? ›In general, the rule of thumb for interpreting AUC value is: AUC=0.5. No discrimination, e.g., randomly flip a coin. 0.6≥AUC>0.5. Poor discrimination.
Why do we calculate AUC? ›AUC is useful when models output scores, providing a high-level single-number heuristic of how well a model's prediction scores can differentiate data points with true positive labels and true negative labels.
Why is accuracy bad for imbalanced data? ›… in the framework of imbalanced data-sets, accuracy is no longer a proper measure, since it does not distinguish between the numbers of correctly classified examples of different classes. Hence, it may lead to erroneous conclusions …
Is AUC the same as balanced accuracy? ›ROC_AUC is similar to Balanced Accuracy, but there are some key differences: Balanced Accuracy is calculated on predicted classes, and ROC_AUC is calculated on predicted scores for each data which can't be obtained on the confusion matrix.
Why is accuracy poor in imbalanced data? ›When working with imbalanced data, The minority class is our interest most of the time. Like when detecting “spam” emails, they number quite a few compared to “not spam” emails. So, the machine learning algorithms favor the larger class and sometimes even ignore the smaller class if the data is highly imbalanced.
What is the weakness of AUC? ›The limitation of the AUC-ROC technique is that we can't compare 2 different models with this. Since you've already understood, how the ROC curve forms, you may notice that the ROC Curve formed is basically dependent on the order of the probability.
How do you calculate AUC from accuracy? ›
The AUC is the P(predicted TRUE|actual TRUE) vs P(FALSE|FALSE), while the overall accuracy is the P=P(TRUE|TRUE)*P(actual TRUE) + P(FALSE|FALSE)*P(actual FALSE).
Which ROC curve is most accurate? ›The AUC is widely used to measure the accuracy of diagnostic tests. The closer the ROC curve is to the upper left corner of the graph, the higher the accuracy of the test because in the upper left corner, the sensitivity = 1 and the false positive rate = 0 (specificity = 1). The ideal ROC curve thus has an AUC = 1.0.
Is AUC a good performance measure? ›The AUC is an estimate of the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. For this reason, the AUC is widely thought to be a better measure than a classification error rate based upon a single prior probability or KS statistic threshold.
Can you use AUC for imbalanced data? ›ROC AUC and Precision-Recall AUC provide scores that summarize the curves and can be used to compare classifiers. ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class.
What is the disadvantage of AUC? ›We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give ...
What is a good AUC ROC score? ›AREA UNDER THE ROC CURVE
In general, an AUC of 0.5 suggests no discrimination (i.e., ability to diagnose patients with and without the disease or condition based on the test), 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding.
ROC analysis is used in clinical epidemiology to quantify how accurately medical diagnostic tests (or systems) can discriminate between two patient states, typically referred to as "diseased" and "nondiseased" (16, 17, 21, 22).
How can I improve my ROC score in AUC? ›- Add more features to your dataset which provide some signal for the target.
- Tweak your model by adjusting parameters or the type of model used.
- Change the probability threshold at which the classes are chosen.
roc_auc_score is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.
When should we use ROC curve? ›ROC curves are frequently used to show in a graphical way the connection/trade-off between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests. In addition the area under the ROC curve gives an idea about the benefit of using the test(s) in question.
When should you not use ROC curve? ›
ROC curve is not a good visual illustration for highly imbalanced data, because the False Positive Rate ( False Positives / Total Real Negatives ) does not drop drastically when the Total Real Negatives is huge.
How do you know if a ROC curve is good? ›Interpreting the ROC curve
Classifiers that give curves closer to the top-left corner indicate a better performance. As a baseline, a random classifier is expected to give points lying along the diagonal (FPR = TPR). The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.