Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy
Background Errors in radiology reports may occur because of resident-to-attending discrepancies, speech recognition inaccuracies, and large workload. Large language models, such as GPT-4 (ChatGPT; OpenAI), may assist in generating reports. Purpose To assess effectiveness of GPT-4 in identifying common errors in radiology reports, focusing on performance, time, and cost-efficiency. Materials and Methods In this retrospective study, 200 radiology reports (radiography and cross-sectional imaging [CT and MRI]) were compiled between June 2023 and December 2023 at one institution. There were 150 errors from five common error categories (omission, insertion, spelling, side confusion, and other) intentionally inserted into 100 of the reports and used as the reference standard. Six radiologists (two senior radiologists, two attending physicians, and two residents) and GPT-4 were tasked with detecting these errors. Overall error detection performance, error detection in the five error categories