How to evaluate a machine learning model
[Source: [1]]
[Source: [1]]
Different types of prediction
How can we quantify how good a prediction is?
\[ AE = |y_i - y_p| \]
\[ MAE = \frac{1}{n} * \sum_{i=1}^{n}(y_i - y_p) \]
\[ MSE = \frac{1}{n} * \sum_{i=1}^{n}(y_i - y_p)^2 \]
Moving to classification
Going beyond log-loss
Model Predicted as Yes | Model Predicted as No | |
---|---|---|
True value is Yes | A | B |
True value is No | C | D |
True Positives = A
True Negatives = D
False Positives = C
False Negatives = B
Model Predicted as Yes | Model Predicted as No | |
---|---|---|
True value is Yes | A | B |
True value is No | C | D |
Total sample size = A + B + C + D
Total cases = A + B
Total not-cases = C + D
Prevalence = total cases / total sample size
Model = 1 | Model = 0 | |
---|---|---|
True = 1 | 80 | 30 |
True = 0 | 20 | 40 |
An ROC curve which shows the classification performance for blood different infection markers for predicting serious bacterial infection (SBI) in febrile infants. Milcent K, Faesch S, Gras-Le Guen C, et al. Use of Procalcitonin Assays to Predict Serious Bacterial Infection in Young Febrile Infants. JAMA Pediatr. 2016;170(1):62–69. doi:10.1001/jamapediatrics.2015.3210
Putting it all together
Practical and Ethical Considerations
Figure 1
NHS app and deprivation: See this study
Different operating systems parse data differently Reference: Bhandari Neupane et al., “Characterization of Leptazolines a–d” (2019).
Revisiting the scenarios
Conclusion
Thank you to Teddy Hla for writing these slides.
[1] - Hermansson, Anton. (2020). Standardization and optimization of index for 28 day strength for cement made from standard clinker. 10.13140/RG.2.2.18178.91849.