Background: Clinical prediction models for a health condition are commonly evaluated regarding performance for a population, although decisions are made for individuals.The classic view relates uncertainty in risk estimates for individuals to sample size (estimation uncertainty) but uncertainty can also be caused by model uncertainty (variability in modeling choices) and applicability uncertainty (variability in measurement procedures and between populations).Methods: We used real and synthetic data for ovarian cancer diagnosis to train 59400 models with variations in estimation, model, and applicability uncertainty.We then used these models to estimate the probability of ovarian cancer in a fixed test set of 100 patients and evaluate the variability in individual estimates.Findings: We show empirically that estimation uncertainty can be strongly dominated by model uncertainty and applicability uncertainty, even for models that perform well at the population level.Estimation uncertainty decreased considerably with increasing training sample size, whereas model and applicability uncertainty remained large.Interpretation: Individual risk estimates are far more uncertain than often assumed.Model uncertainty and applicability uncertainty usually remain invisible when prediction models or algorithms are based on a single study.Predictive algorithms should inform, not dictate, care and support personalization through clinician-patient interaction rather than through inherently uncertain model outputs.
The paper addresses the challenges of risk prediction in health AI, especially concerning individual patient outcomes, using ovarian cancer diagnosis as a case study. It highlights the importance of distinguishing between different types of uncertainties - estimation, model, and applicability uncertainty - and reveals that individual risk estimates often have much higher uncertainty than typically acknowledged. The authors create a structured framework to better understand this uncertainty and conduct an empirical analysis by training 59,400 models on real and synthetic data to demonstrate the variability in individual risk estimates. They conclude that although prediction models can perform well on a population level, individual predictions can be highly unreliable, emphasizing the need for careful interpretation in clinical settings.
This paper employs the following methods:
- Logistic Regression
- Random Forest
- XGBoost
The following datasets were used in this research:
- University Hospitals Leuven
- Synthetic Data
- AUROC
- Estimated Calibration Index (ECI)
- Relative Utility (RU)
- Decision Uncertainty (DU)
- 95% Range of individual estimates
- Individual risk estimates often exhibit high uncertainty, particularly with small training sample sizes.
- Decision uncertainties persist even with larger sample sizes, indicating model and applicability uncertainties are significant despite estimation uncertainty decreasing with sample size.
The authors identified the following limitations:
- The study is limited by the available data affecting applicability uncertainty underestimation.
- The event of interest was relatively common, potentially skewing the risk range observed.
- Number of GPUs: None specified
- GPU Type: None specified
- Compute Requirements: None specified