The process of collecting and analyzing data for a systematic review is one of the most laborious in medicine. Evaluating studies, extracting data, rating bias, and summarizing the statistics frequently generates work representing a valuable contribution to the literature. However, many aspects are time-consuming manual tasks.
Let’s have an AI agent do it!
This is otto-SR, an agentic architecture for systematic review workflow. It looks like this:
Using this process, the authors report they performed 12 systematic reviews in the Cochrane style in two days, rather than the weeks to months this would have taken in Real Life (the authors state it represented “12 work years”). The agentic workflow tended to outperform the humans in retrieval, identifying additional studies that changed the statistical outcomes of some of the works.
This all sounds wonderful and promising – so far. Only one amusing limitation caught my eye:
“Sixth, the GPT-4.1 knowledge cut-off was updated to June 2024. While our screening and data extraction benchmarks primarily involved closed-source data, it’s possible that the April 2024 Cochrane reviews were included in the model’s pretraining corpus.”
Probably would have been reasonable to check on that, first, before running the study ….