Assessing the Performance of Artificial Intelligence Models: Insights from the American Society of Functional Neuroradiology Artificial Intelligence Competition
BACKGROUND AND PURPOSE: Artificial intelligence (AI) models in radiology are frequently developed and validated using datasets from a single institution and are rarely tested on independent, external datasets, raising questions about their generalizability and applicability in clinical practice. The American Society of Functional Neuroradiology (ASFNR) organized a multi-center AI competition to evaluate the proficiency of developed models in identifying various pathologies on NCCT, assessing age-based normality and estimating medical urgency. MATERIALS AND METHODS: In total, 1201 anonymized, full-head NCCT clinical scans from five institutions were pooled to form the dataset. The dataset encompassed normal studies as well as pathologies including acute ischemic stroke, intracranial hemorrhage, traumatic brain injury, and mass effect (detection of these-task 1). NCCTs were also assessed to determine if findings were consistent with expected brain changes for the patient’s age (task 2: a