How to evaluate a machine learning model
[Source: [1]]
[Source: [1]]
Different types of prediction
Source: [2]
[Source [2]]
[Source: [3]]
How can we quantify how good a prediction is?
\[ AE = |y_i - y_p| \]
\[ MAE = \frac{1}{n} * \sum_{i=1}^{n}(y_i - y_p) \]
\[ MSE = \frac{1}{n} * \sum_{i=1}^{n}(y_i - y_p)^2 \]
Moving to classification
Going beyond log-loss
Model Predicted as Yes | Model Predicted as No | |
---|---|---|
True value is Yes | A | B |
True value is No | C | D |
True Positives = A
True Negatives = D
False Positives = C
False Negatives = B
Model Predicted as Yes | Model Predicted as No | |
---|---|---|
True value is Yes | A | B |
True value is No | C | D |
Total sample size = A + B + C + D
Total cases = A + B
Total not-cases = C + D
Prevalence = total cases / total sample size
Model = 1 | Model = 0 | |
---|---|---|
True = 1 | 80 | 30 |
True = 0 | 20 | 40 |
An ROC curve which shows the classification performance for blood different infection markers for predicting serious bacterial infection (SBI) in febrile infants. Milcent K, Faesch S, Gras-Le Guen C, et al. Use of Procalcitonin Assays to Predict Serious Bacterial Infection in Young Febrile Infants. JAMA Pediatr. 2016;170(1):62–69. doi:10.1001/jamapediatrics.2015.3210
Putting it all together
Practical and Ethical Considerations
Figure 1
NHS app and deprivation: See this study
Kitchen and Recipe
Different operating systems parse data differently Reference: Bhandari Neupane et al., “Characterization of Leptazolines a–d” (2019).
[Source: Kitchen image]
Revisiting the scenarios
Conclusion
Thank you to Teddy Hla for writing these slides.
[1] - UNDERFIT and OVERFIT Explained. Aarthi Kasirajan, Medium. URL: https://medium.com/@minions.k/underfit-and-overfit-explained-8161559b37db
[2] - How to interpret residuals in a linear regression model. Crystal X, Medium. URL: https://tracyrenee61.medium.com/how-to-interpret-residuals-in-a-linear-regression-model-7cfd8141d456
[3] - Saturated Models, Deviance and the Derivation of Sum of Squares. Egor Howell, Medium. URL: https://towardsdatascience.com/saturated-models-deviance-and-the-derivation-of-sum-of-squares-ee6fa040f52/