AI accurately flags follow-up imaging recommendations

A large language model (LLM)-based AI agent can significantly improve the detection of radiologist-recommended follow-up imaging reports, according to a study published February 18 in NEJM Catalyst.

On a test set of 10,000 radiology reports, the AI agent detected 6.18 times more imaging studies requiring follow-up than an existing manual system, achieving an accuracy of 98.7%, noted lead author Alex Treacher, PhD, of the Parkland Health and Hospital System in Dallas, and colleagues.

“Implementation of an AI agent as an additional safety net significantly improved the identification of missed diagnostic opportunities in radiologist notes and accurately extracted key details that aid in patient outreach and scheduling," the group wrote.

In 2018, Parkland Health developed a system to capture and manage follow-up imaging cases wherein radiologists insert a template phrase -- a "macro" -- into their reports indicating the need for follow-up. The macros tag the reports in the electronic health record so that they are automatically routed to a queue for patient outreach.

Despite early success, however, reliance on manual macros has proven insufficient over time, with many recommendations going undetected and not all follow-up findings being captured by the system, the authors noted. They hypothesized that an AI agent could fill the gap.

The researchers used Meta's Llama-3 70B as a foundation and optimized it to detect imaging follow-up recommendations on a set of 1,000 randomly sampled x-ray, CT, ultrasound, and MRI orders. The training involved prompt engineering rather than fine-tuning, the authors noted.

Next, they evaluated the model’s performance versus the macro-based system on a sample of 10,000 randomly selected radiologist notes. They further put the model into silent production mode for three months, during which it processed over 120,000 studies in real time without affecting clinical workflows.

According to the results, the AI agent achieved a balanced accuracy exceeding 97% for identifying radiologist notes requiring follow-up, correctly flagging 6.18 times more cases than the macro-based system (513 vs. 83 based on a sample of 10,000 studies).

In silent mode, the agent flagged 9,600 studies versus 1,145 by the macro-based system across the 120,000 studies. It also extracted key details such as follow-up timing, the recommended procedure, and the clinical rationale with an accuracy of 94%, the researchers reported.

"A custom AI agent built using a pretrained LLM can achieve high predictive performance of follow-up prediction from radiologist notes," the group wrote.

The researchers wrote that they believe the approach can be adopted and achieve scale, but that further research is needed to determine whether AI-flagged follow-ups actually led to completed imaging or improved diagnoses.

“This work illustrates how generative AI can be thoughtfully applied to address critical gaps in diagnostic safety in high-volume health care environments,” the group concluded.

The full study can be found here.

 

Back to the Featured Stories

Connect with us

Whether you are a professional looking for a new job or a representative of an organization who needs workforce solutions - we are here to help.