USE-Evaluator: Performance metrics for medical image segmentation models supervised by uncertain, small or empty reference annotations in neuroimaging
Abstract
Medical image segmentation models are increasingly used in clinical applications, but evaluating their performance remains challenging when reference annotations are uncertain, small, or missing entirely. Traditional evaluation metrics assume perfect ground truth annotations, which is often unrealistic in medical imaging where expert annotations can vary significantly.
We introduce USE-Evaluator (Uncertain, Small, Empty Evaluator), a comprehensive framework for evaluating medical image segmentation models when reference annotations are imperfect. Our approach addresses three key challenges: (1) uncertain annotations where expert disagreement exists, (2) small lesions or structures that are difficult to annotate consistently, and (3) cases where no clear pathology is visible (empty annotations).
The USE-Evaluator provides robust performance metrics that account for annotation uncertainty and enables fair comparison of segmentation models across different clinical scenarios. We demonstrate the effectiveness of our framework on neuroimaging datasets, showing that traditional metrics can be misleading when applied to uncertain annotations, while our proposed metrics provide more reliable performance estimates.