Under The Watchful Eye Of "AI Consult"

Always lurking in the background to double-check your treatment plan.

Jul 26, 2025

I would be remiss in failing to highlight the Penda Health preprint describing the effects of AI Consult, despite the messy and convoluted nature of its write-up. Even given the many limitations and difficulties in generalization, it’s too close to a “future vision” for AI in medicine to leave unmentioned.

Penda Health is a network of clinics in Nairobi, Kenya, and they’ve been using decision-support within their lightweight documentation EHR for a couple years to improve guideline-concordant care. About a year ago, they decided to transition their rule-based system to an LLM-based system: specialized prompts and RAG to good ol’ off-the-shelf GPT-4o.

How does it work? This figure shows how the shadow process pulls in the written documentation, identifies a potential issue with a “red flag” and explanation, and the envisioned final result and change following review:

The study design basically rolled out the active AI consult decision-support to half the “clinical officers” working in their clinics, while it ran as a silent process for the other half of officers. The AI consult only functioned for a subset of visit types, and the authors had “structured outcome data” for only 40% of the eligible visits – about 7,000 in each group. The study also involved holding out 5,000 visits for manual review by a collection of 100 clinicians recruited locally and from around the world.

There’s 125 pages of methods, results, data, and underlying prompts to examine – far too much to cover in a brief post – but the “headline” result was improvement in “clinical error rates”:

Everything is worse without AI – and, if anything leaps out of these data, it’s that baseline errors could be as high as 70% in certain categories! I would call this a fertile environment for AI coaching, to put it mildly.

The flip side to these small, but consistent, improvements was increased time burden – an average patient encounter required 16.5 minutes with AI, as compared to 13 without. This largely involved additional time clicking through the red/yellow/green flags from the AI Consult.

Again, a lot of interesting data presented – and, at the minimum, an ambitious step forward.

Evidence Triage

Discussion about this post

Ready for more?