A good machine learning model metric should reflect the model’s true and constant predictive ability. This means that if you change the test dataset, it should not give a different result. The best performance metric for artificial intelligence systems is the AUC ROC curve.
ROC Curve Theory
Receiver Operating Characteristic was first created for the use of radar signal detection during World War II. The US used this system to improve the accuracy of radar detection of Japanese aircraft. Therefore, it is called the operating characteristic of the receiver. AUC is an area under a receiver operating characteristic.
Error Matrix
It is a combination of your forecast (1 or 0) and the actual value (1 or 0). Depending on the prediction result and whether the classification was correct, the matrix is divided into several parts:
- true positive is the number of times you correctly classify the sample as positive;
- false positive is the number of times you misclassify a sample as positive.
The error matrix contains only absolute numbers. However, using them, we can get many other metrics based on percentages. True Positive Rate (TPR) and False Positive Rate (FPR) are two of them.
It is difficult to determine the optimal point because one has to choose the most appropriate threshold value given the scope of the system. However, the general rule is to maximize the difference (TPR-FPR).
Different thresholds create different TPRs and FPRs. They represent the very points that form the ROC curve. You can select “Increase” as the valuation prediction if the probability of change based on historical data is greater than 50%.
Why Is the Area Under the ROC Curve a Good Metric to Evaluate a Classification Model?
The ROC curve takes into account not only the received results but also the probability of predicting all classes. For example, if a result is correctly classified based on a 51% chance, then it is likely to be classified incorrectly if you use a different test dataset. In addition, the ROC curve also takes into account the performance of the estimation at various thresholds. It is a comprehensive metric for assessing how well cases separate across groups.
What AUC Value Is Acceptable for a Classification Model?
For a binary classification task, when determining classes randomly, you can get 0.5 AUC. Therefore, if you are solving a binary classification problem, a reasonable AUC should be >0.5. A good classification model has an AUC >0.9, but this value is highly dependent on its application.
When you are building a classification model for AI systems, it is crucial to provide input data. The model will first calculate the probability of an increase or decrease using the historical data you provide. After that, based on the threshold value, it will decide whether the result will increase or decrease.