Discussion about this post

User's avatar
Zac Robinson's avatar

Maybe I'm missing something, but if these diagnostic models are just calling general LLMs, isn't NEJM exactly the kind of input the LLMs would be trained on, potentially including the diagnostic cases they are testing against? I know I've seen some of these papers where they intentionally created test scenarios that couldn't have been included in the training material but that doesn't seem to be the case here.

Even if they are protecting from that particular source of confounding, it seems like the more esoteric/rare the disease the more likely both your training and testing materials are to consist of a small number of case reports and derivatives of those case reports. As a result, I'd expect the model to perform well against those testing materials but see no reason to think it would perform nearly as well on an actual patient presentation.

Expand full comment
1 more comment...

No posts