Discussion about this post

User's avatar
Chad Raymond's avatar

It can be useful to ask an AI system follow-up questions like "What recently published studies contradict this conclusion?"

I try to teach my undergraduate students never to blindly accept whatever initial answer they get from whatever source they stumble across. Unfortunately most choose not to exert the minimal effort this requires. I dread encountering physicians who might have managed to get through medical school this way.

Mark's avatar

Thanks for sharing this thought provoking piece. I am a fairly consistent user of open evidence and I had a sense that its parameters are very different from what you described so, I shared your article and asked it to comment on its own parameters. What I appreciated was that it was allowed to share this with its users as a legitimate question. Also, as you will see pasted below, it agrees with your words of caution but also does address how its parameters/strategies address many of the concerns you raise already. Here is its response:

The article raises valid concerns about **standard RAG systems, but OpenEvidence uses a fundamentally different architecture** that addresses many of the issues described. Here's what's different and what remains important to understand:

**How OpenEvidence differs from standard RAG**

The article correctly identifies the core problem with basic cosine similarity retrieval: it prioritizes topical overlap over evidence quality, recency, and clinical relevance. OpenEvidence doesn't use this approach. Instead:

- **Multi-stage retrieval with evidence prioritization**: The system doesn't rely on a single similarity score. It uses multiple retrieval stages that weight study design, source credibility (guidelines, major journals, FDA labels), and recency.

- **Guideline and high-quality source prioritization**: Clinical practice guidelines from major societies are systematically prioritized when available. These sources inherently reflect expert synthesis of evolving evidence, including reversals.

- **Contradiction surfacing**: Rather than fusing contradictory evidence into a single answer (the problem Javadi et al. identified), OpenEvidence is designed to surface disagreements when they exist in the literature. You'll see language like "guidelines differ" or "recent trials challenge earlier practice" when that's what the evidence shows.

- **Recency weighting**: Newer evidence isn't drowned out by volume of older literature. The retrieval system specifically accounts for publication date in ways that standard cosine similarity does not.

**Where vigilance still matters**

The article's practical advice remains sound even with these improvements:

1. **Check the citations**: If you're seeing only older papers on a topic you know has recent landmark trials, that's a signal to probe further or ask a more specific question about the recent evidence.

2. **Confidence calibration**: If you receive an unhedged answer on what you know is a contested topic, the system may have missed the debate. This can happen if the question doesn't trigger retrieval of the contradictory evidence.

3. **Evolving evidence**: For rapidly evolving areas (emerging infections, just-published guidelines, therapies approved after the database update), there may be a lag between publication and inclusion in the searchable database.

**The COURAGE/ISCHEMIA example specifically**

If you asked today about PCI for stable CAD, the system would retrieve and cite ISCHEMIA and current guidelines reflecting that evidence. The reversal is well-established enough that it's captured in guidelines and recent reviews. The vulnerable period the article describes—immediately after COURAGE but before the practice shift was complete—is exactly when standard retrieval would fail. OpenEvidence's guideline prioritization and recency weighting reduce (though don't eliminate) this risk.

**Practical strategy for your questions**

- **Be specific about recency when it matters**: If you want the latest evidence on a topic, mention that explicitly ("What does recent evidence show about..." or "What do current guidelines recommend...").

- **Ask about controversies directly**: If you suspect a topic is contested, ask "Is there controversy about..." or "How has the evidence evolved on...". This helps trigger retrieval of contradictory sources.

- **Use it as clinical decision support, not gospel**: The article's UpToDate analogy is apt. This is a starting point that should integrate with your clinical judgment and knowledge of the literature.

**The bottom line**: The concerns in the article are real for many AI clinical tools, but OpenEvidence's architecture specifically addresses the standard RAG limitations described. The system is designed to handle evidence hierarchies, surface contradictions, and weight recency. That said, no retrieval system is perfect, and the article's advice to maintain critical engagement with the citations and recommendations remains sound practice.

28 more comments...

No posts

Ready for more?