Clicky

AI detection in GP training – A hidden burden on supervisors

Pedro Elston, Isla Jones, Sam Miles, Jenny Blythe and Safiya Virji are primary care educators specialising in digital education at QMUL and UCL medical schools. Safiya Virji and Jenny Blythe are practising GPs and GP trainers.

GPs work by synthesis. Test results, risk scores, guidelines, patient narratives – each may seem convincing in isolation, but none are enough on their own. Our expertise lies in integrating these signals, weighing them against context, and choosing the most appropriate course of action. In doing so, we model to learners that general practice is about judgement, not certainty.

General practice placements are core learning environments for undergraduate and postgraduate healthcare professionals. GP registrars, in particular, consult, reflect, debrief, and document experiences in their ePortfolios as part of workplace-based assessments, demonstrating professional growth and safe reasoning to their supervisors.1

Can we recognise AI‑generated text? How do we know whether a reflection is truly a learner’s own thinking?

But as generative AI tools become more powerful and accessible, concerns about the authenticity of reflective writing are growing. Supervisors are left wondering: Can we recognise AI‑generated text? How do we know whether a reflection is truly a learner’s own thinking?

National training bodies advise supervisors to discuss concerns directly with learners in the first instance. The practical responsibility for doing so does not sit with the RCGP or ARCP panels, but with the local GP supervisor. When a reflection is signed off, it is not a central body that absorbs the uncertainty about its integrity. It is the GP trainer in a busy surgery – running a full clinic, juggling teaching responsibilities, tackling a hundred Docman tasks, and trying to fit in yesterday’s missed home visit. The responsibility to interpret, question, and conclude falls locally.

And that responsibility can feel heavy.

Some institutions are experimenting with AI detection software to flag potentially AI-generated work, while acknowledging that no detection tool is wholly reliable [2]. In reviewing reflective submissions through such systems, educators have found that the very software learners use to draft and format their work, such as Microsoft Word, can generate markers that appear incriminating.

For example, a reflective entry might include a line beneath a figure stating: “AI generated content may be incorrect”. To a supervisor reviewing flagged work, this might suggest clear evidence of undisclosed AI use.

This phrasing may not originate from the learner at all.

Modern word-processing platforms and accessibility tools frequently embed AI-enabled features – automatic summarisation, predictive editing, image recognition, and language refinement. As documents move between platforms, formatting changes or embedded metadata (e.g. content origin) can become visible. In some cases, automatically generated warnings can appear in submitted work, therefore suggesting deliberate AI use where none was in fact intended.

Automation bias – the tendency to over-trust algorithmic outputs – is not limited to prescribing prompts or risk calculators.

When detection software flags work, it may trigger an investigation – an experience that can be professionally and emotionally distressing for learners. At the same time, supervisors are left navigating difficult terrain: How do you probe without implying accusation? How much faith should be placed in the detection tool?

Even where technical review later clarifies that automated software functions (rather than deliberate AI use) produced the flagged content, the supervisory dilemma remains real.

Automation bias – the tendency to over-trust algorithmic outputs – is not limited to prescribing prompts or risk calculators.3 It can also shape how we respond to warnings about learners. Act too quickly and we risk damaging trust. Act too cautiously and we may compromise training standards or patient safety.

There are wider lessons here.

AI is woven into everyday software in ways that are often invisible. Supervisors need a basic understanding of how these systems work – not to become experts, but to avoid misinterpretation and misplaced concern.

Educational institutions must also recognise that investigating AI‑related issues requires time, emotional labour, and professional skill, alongside an acceptance that certainty is not always possible.4 Expecting local supervisors to navigate this evolving space without clear frameworks risks placing disproportionate stress on the very clinicians responsible for training the next generation.

General practice has always been about synthesising multiple forms of information under uncertainty. AI does not remove that responsibility; it adds another layer – one that is often invisible. Used openly, AI tools can support learning. Used without clarity, they can mask a learner’s true understanding and reflective ability.

So, the question is no longer simply whether learners are using AI. It is whether we are equipping supervisors to interpret AI‑related signals fairly and proportionately. Because if we allow AI to remain invisible, the burden of interpretation doesn’t disappear.

It lands in the consulting room – and on the GP supervisor’s desk.

Deputy editor’s note – see also: https://bjgplife.com/if-ai-can-reflect-what-are-we-assessing/

References

  1. RCGP – Workplace Based Assessment (WPBA). Available at: https://www.rcgp.org.uk/mrcgp-exams/wpba [accessed 14/5/26]
  2. V. Bellini, F. Semeraro, J. Montomoli, M. Cascella, and E. Bignami, Between human and AI: assessing the reliability of AI text detection tools, Curr. Med. Res. Opin., vol. 40, no. 3, pp. 353–358, Mar. 2024, DOI: 10.1080/03007995.2024.231008
  3. C. Lewis, P. Funnell, N. Fisher, and P. Elston, Student Perspectives on the Impact of GenAI, pp. 23–41. in, Institutional guide to using AI for research, X. Zhou and H. Al-Samarraie, Eds, Cham: Springer Nature Switzerland, 2025, DOI: 10.1007/978-3-031-94809-1_2.
  4. J. Q. J. Liu et al., The great detectives: humans versus AI detectors in catching large language model-generated medical writing, Int. J. Educ. Integr., vol. 20, no. 1, p. 8, May 2024, doi: 10.1007/s40979-024-00155-6.

Featured Photo by Nikita Kozlov on Unsplash

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted

Latest from AiT

Are we judging GP registrars on their outfits?

Professionalism may not be a discrete domain on the mark scheme, yet it shapes how registrars are assessed, creating a standard that remains open to personal interpretation. And when that interpretation is shaped by assumptions about religion, culture, gender, or identity, attire

Mutual mentorship: A hidden strength in general practice

We’ve come to believe that at any stage of training or career, every GP has the potential to be both mentor and mentee; it is a reciprocal relationship that fosters self-reflection, career development, and personal growth for all involved.
0
Would love your thoughts, please comment.x
()
x