What Surgeons Should Ask Before Trusting an AI Imaging Tool
A practical checklist for orthopaedic surgeons evaluating an AI imaging or planning tool, validation, generalisability, explainability, regulatory status, and data governance, with the questions that separate clinical tools from demos.
TL;DR
Most AI imaging tools fail the same way: impressive on the dataset they were built on, unproven on yours. Before trusting one, ask seven questions, what data was it validated on, does that match my population, is the output editable, can I see why it decided, what is its regulatory status, where does my patient data go, and what happens when it is wrong. A tool that answers these clearly is a clinical instrument; one that deflects is a demo. Roughly 91% of orthopaedic surgeons already treat AI as complementary rather than a replacement, the questions below keep it that way.
Why a Checklist Matters
AI imaging tools are easy to demo and hard to trust. A model can post a striking accuracy number and still fail in your clinic, because it was validated on a different scanner, a different population, or a curated dataset that does not look like your Tuesday list. The gap between "works in the paper" and "works on my patient" is where clinical risk lives.
These are the questions that close that gap.
The Seven Questions
| # | Question | What a good answer looks like |
|---|---|---|
| 1 | What data was it validated on? | A named, external test set, not just the training data |
| 2 | Does that population match mine? | Scanner, demographics, and pathology mix comparable to your clinic |
| 3 | Is the output editable? | You can correct segmentation/measurements before use |
| 4 | Can I see why? | Heatmaps, landmarks, confidence, not a black box |
| 5 | What is its regulatory status? | CE-marked for your procedure, or clearly Research Use Only |
| 6 | Where does patient data go? | On-device or a documented, compliant data pathway |
| 7 | What happens when it is wrong? | A failure mode you can catch, not a silent error |
1. What data was it validated on?
The single most important question. "95% accuracy" means little without knowing the test set. Internal validation (testing on held-out data from the same source) overstates real-world performance. Ask for external validation, performance on data from institutions and scanners the model never saw in training.
2. Does the validation population match mine?
A model validated on one country's adult knee CT may degrade on paediatric anatomy, a different scanner vendor, or a different disease severity mix. Generalisability is not a given; it is a property you check.
3. Is the output editable?
Automated segmentation and measurement will occasionally be wrong. A clinical-grade tool lets you correct it; a demo hands you a fixed result you must accept or discard. Editability is what combines AI speed with human verification.
4. Can you see why it decided?
Explainability, heatmaps, landmark overlays, confidence scores, lets you sanity-check the AI against your own read. A black-box score you cannot interrogate is hard to trust and harder to defend.
5. What is the regulatory status, for your procedure?
CE-marking is procedure-specific: cleared for arthroplasty planning is not cleared for osteotomy. Research Use Only (RUO) tools are legitimate for evaluation and research but should not drive clinical decisions without appropriate local validation. Ask exactly what is cleared, and for what.
6. Where does your patient data go?
Cloud tools upload imaging to external servers, creating obligations under KVKK, GDPR, and HIPAA. Client-side tools keep data on your device. For an independent surgeon without a governance team, this is often the deciding factor.
7. What happens when it is wrong?
Every model fails sometimes. The question is whether the failure is catchable, a visibly wrong segmentation you spot, versus a plausible-but-wrong number you trust. Prefer tools whose errors are obvious over tools whose errors are silent.
The Underlying Principle
AI in orthopaedics works best as the surgeon's second pair of eyes, not the decision-maker. The tools worth trusting are transparent about what they were validated on, honest about their regulatory status, and designed so a surgeon stays in the loop. The ones that oversell, autonomous claims, no external validation, no editability, are the ones to walk past.
FAQ
What is the most important question to ask about an AI tool? What external data it was validated on. Internal-only validation overstates real-world accuracy.
Is a Research Use Only (RUO) tool safe to use? For evaluation and research, yes. For clinical decisions, only with appropriate local validation; RUO means it is not a cleared medical device.
Why does it matter where patient data is processed? Cloud upload creates KVKK/GDPR/HIPAA data-processing obligations. On-device processing avoids them.
Can AI replace the radiologist or surgeon? No. Current evidence and surgeon consensus treat AI as complementary; it accelerates and standardises, the clinician decides.
The Takeaway
Trust is earned with answers, not accuracy numbers. Run any AI imaging tool through these seven questions before it touches a clinical decision. The good ones welcome the scrutiny.
Explore the Salnus Surgeon Portal →
Disclaimer: This article is for educational and research purposes only. Salnus tools are designated for Research Use Only (RUO) and are not cleared medical devices. Clinical decisions should be made by qualified physicians.
References:
- AI fails to outperform orthopaedic surgeons: a systematic review. J Exp Orthop, 2025. https://esskajournals.onlinelibrary.wiley.com/doi/10.1002/jeo2.70548
- AI and multimodal imaging in orthopaedics: from technological advances to clinical translation. Frontiers in Medicine, 2025. https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1728248/full
Reviewed by the Salnus biomedical engineering team.