AI detection in GP training – A hidden burden on supervisors

Pedro Elston, Isla Jones, Sam Miles, Jenny Blythe and Safiya Virji are primary care educators specialising in digital education at QMUL and UCL medical schools. Safiya Virji and Jenny Blythe are practising GPs and GP trainers.

GPs work by synthesis. Test results, risk scores, guidelines, patient narratives – each may seem convincing in isolation, but none are enough on their own. Our expertise lies in integrating these signals, weighing them against context, and choosing the most appropriate course of action. In doing so, we model to learners that general practice is about judgement, not certainty.

General practice placements are core learning environments for undergraduate and postgraduate healthcare professionals. GP registrars, in particular, consult, reflect, debrief, and document experiences in their ePortfolios as part of workplace-based assessments, demonstrating professional growth and safe reasoning to their supervisors.¹

Can we recognise AI‑generated text? How do we know whether a reflection is truly a learner’s own thinking?

But as generative AI tools become more powerful and accessible, concerns about the authenticity of reflective writing are growing. Supervisors are left wondering: Can we recognise AI‑generated text? How do we know whether a reflection is truly a learner’s own thinking?

National training bodies advise supervisors to discuss concerns directly with learners in the first instance. The practical responsibility for doing so does not sit with the RCGP or ARCP panels, but with the local GP supervisor. When a reflection is signed off, it is not a central body that absorbs the uncertainty about its integrity. It is the GP trainer in a busy surgery – running a full clinic, juggling teaching responsibilities, tackling a hundred Docman tasks, and trying to fit in yesterday’s missed home visit. The responsibility to interpret, question, and conclude falls locally.

And that responsibility can feel heavy.

Some institutions are experimenting with AI detection software to flag potentially AI-generated work, while acknowledging that no detection tool is wholly reliable.² In reviewing reflective submissions through such systems, educators have found that the very software learners use to draft and format their work, such as Microsoft Word, can generate markers that appear incriminating.

For example, a reflective entry might include a line beneath a figure stating: “AI generated content may be incorrect”. To a supervisor reviewing flagged work, this might suggest clear evidence of undisclosed AI use.

This phrasing may not originate from the learner at all.

Modern word-processing platforms and accessibility tools frequently embed AI-enabled features – automatic summarisation, predictive editing, image recognition, and language refinement. As documents move between platforms, formatting changes or embedded metadata (e.g. content origin) can become visible. In some cases, automatically generated warnings can appear in submitted work, therefore suggesting deliberate AI use where none was in fact intended.

Automation bias – the tendency to over-trust algorithmic outputs – is not limited to prescribing prompts or risk calculators.

When detection software flags work, it may trigger an investigation – an experience that can be professionally and emotionally distressing for learners. At the same time, supervisors are left navigating difficult terrain: How do you probe without implying accusation? How much faith should be placed in the detection tool?

Even where technical review later clarifies that automated software functions (rather than deliberate AI use) produced the flagged content, the supervisory dilemma remains real.

Automation bias – the tendency to over-trust algorithmic outputs – is not limited to prescribing prompts or risk calculators.³ It can also shape how we respond to warnings about learners. Act too quickly and we risk damaging trust. Act too cautiously and we may compromise training standards or patient safety.

There are wider lessons here.

AI is woven into everyday software in ways that are often invisible. Supervisors need a basic understanding of how these systems work – not to become experts, but to avoid misinterpretation and misplaced concern.

Educational institutions must also recognise that investigating AI‑related issues requires time, emotional labour, and professional skill, alongside an acceptance that certainty is not always possible.⁴ Expecting local supervisors to navigate this evolving space without clear frameworks risks placing disproportionate stress on the very clinicians responsible for training the next generation.

General practice has always been about synthesising multiple forms of information under uncertainty. AI does not remove that responsibility; it adds another layer – one that is often invisible. Used openly, AI tools can support learning. Used without clarity, they can mask a learner’s true understanding and reflective ability.

So, the question is no longer simply whether learners are using AI. It is whether we are equipping supervisors to interpret AI‑related signals fairly and proportionately. Because if we allow AI to remain invisible, the burden of interpretation doesn’t disappear.

It lands in the consulting room – and on the GP supervisor’s desk.

Deputy editor’s note – see also: https://bjgplife.com/if-ai-can-reflect-what-are-we-assessing/

References

RCGP – Workplace Based Assessment (WPBA). Available at: https://www.rcgp.org.uk/mrcgp-exams/wpba [accessed 14/5/26]
V. Bellini, F. Semeraro, J. Montomoli, M. Cascella, and E. Bignami, Between human and AI: assessing the reliability of AI text detection tools, Curr. Med. Res. Opin., vol. 40, no. 3, pp. 353–358, Mar. 2024, DOI: 10.1080/03007995.2024.231008
C. Lewis, P. Funnell, N. Fisher, and P. Elston, Student Perspectives on the Impact of GenAI, pp. 23–41. in, Institutional guide to using AI for research, X. Zhou and H. Al-Samarraie, Eds, Cham: Springer Nature Switzerland, 2025, DOI: 10.1007/978-3-031-94809-1_2.
J. Q. J. Liu et al., The great detectives: humans versus AI detectors in catching large language model-generated medical writing, Int. J. Educ. Integr., vol. 20, no. 1, p. 8, May 2024, doi: 10.1007/s40979-024-00155-6.

Featured Photo by Nikita Kozlov on Unsplash

Latest from AiT

Between FOMO and FOBO: A GP’s Search for “Enough”

No matter how much we carry out, there is always somebody doing something bigger. And if we are not careful, we begin to measure our worth against impossible standards.

Teaching the art of noticing – a poem

I noticed that you started the conversation ... before she sat down...

Personal Protective Equipment (A Poem)

The clinell wipes held in emerald green... They’re for my body... But what the hell Is there for my soul? A poem.

A note on educators’ notes: why GP training programmes in the UK needs clearer, fairer assessment governance

GP training programmes must balance two imperatives: protecting patients and supporting the development of hardworking, reflective clinicians. To achieve this, the establishment relies heavily on narrative assessment tools...

Are we judging GP registrars on their outfits?

Professionalism may not be a discrete domain on the mark scheme, yet it shapes how registrars are assessed, creating a standard that remains open to personal interpretation. And when that interpretation is shaped by assumptions about religion, culture, gender, or identity, attire

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.