Richard Armitage is a GP and Public Health Specialty Registrar, and Honorary Assistant Professor at the University of Nottingham’s Academic Unit of Population and Lifespan Sciences. He is on twitter: @drricharmitage
GPT-3 is a large language model developed by OpenAI that uses deep learning techniques to generate human-like text. It was created by training a neural network on a vast corpous of textual data – some 175 billion parameters – within which it identified patterns and structures of human language.1 GPT-3.5 is an extension of GPT-3, and ChatGPT is a model based on the GPT-3.5 architecture with a specific focus on conversational language. It was trained on a large dataset of conversational text, including digital chat logs and social media dialogue, which allowed it to learn the nuances of natural language conversation including slang, colloquialisms, and other informal language.2 Its ability to ‘understand’ and generate human-like language has made it a valuable tool for businesses and organisations intending to enhance customer service through communication via chatbots and virtual assistants. ChatGPT – which harnesses the power of GPT-3.5 – is available to the public free of charge. The next iteration of the model – GPT-4 – is available on ChatGPT Plus (at a cost of $20/month), and is substantially more powerful than its predecessor.3
Neither ChatGPT (or ChatGPT Plus) itself, nor any application using the tool, is currently licensed for use within the medical or wider healthcare realm in the UK or US. Nevertheless, the unrestricted access to the model allows students – including students of medicine and its adjacent subjects (such as medical ethics and healthcare law) – to use the tool to generate text that is relevant to their subject of study. By interacting with the model via textual prompts akin to natural conversational language between two humans, the model allows students to perform literature searches (for example, “Summarise the latest evidence pertaining to the efficacy of hypnotherapy in smoking cessation”), outline the structure of their assignments (for example, “Provide a framework for an assignment that discusses bioethical issues that are commonly encountered in contemporary general practice”), and suggest the shape of potential research articles (for example, “Outline a research paper that investigates the effects of video consultations in a rural general practice setting”). As such, ChatGPT could empower students to present their ideas in a clear and organised fashion, and thereby allow the focus of the educational process to centre on developing critical thinking skills and addressing pertinent ‘big question.’4 However, the tool simultaneously poses substantial threats to the educational attainment of contemporary students and, in the case of medical students, consequently poses a significant threat to quality of care, patient safety, and optimal health outcomes.
Students are therefore able to submit work for assessment that partly or wholly consists of ChatGPT output without its origin being exposed…
It is possible for the textual outputs generated by ChatGPT to be lifted (via copy and paste commands) out of the user interface that presents them, and subsequently embedded into a live word processing document currently in use by a student for the purpose of, for example, writing an assignment. Since the output is both devoid of a digital ‘watermark’ pertaining to its origin, and determined by the specific content of the prompt used to generate it, all such output is unique and, therefore, its source is inherently undetectable to third parties (it is also able to evade recognition by the anti-plagiarism software widely used by educations institutions). Students are therefore able to submit work for assessment that partly or wholly consists of ChatGPT output without its origin being exposed, meaning work submitted by students can no longer be reliably considered to reflect the extent to which they understand its content.
While the model empowers students to outline the structure of assignments, ChatGPT can also be prompted to bear a larger, or even the entire, student workload, including writing an entire first draft of an assignment (for example, “Write a 1500 essay arguing that fibromyalgia ought to be managed entirely in general practice”), altering its style (for example, “Use shorter sentences,” “Make the text more formal and ‘academic,’” or even “Write it in the style of [a specified author]”), or incorporating a variety of relevant citations (for example, “Only include high quality, peer reviewed, systematic reviews and meta-analyses”). Using the tool in such a manner denies students the vital learning opportunities for performing literature searches, formulating arguments, and expressing ideas through the written word. Alarmingly, the tendency for ChatGPT to ‘hallucinate’ – the phenomenon in which the tool presents factually incorrect statements (including fabricated academic citations) with unfaltering confidence5 – puts the uncritical student at risk of forming untrue beliefs (and of submitting them for academic assessment). Finally, the efficiency with which the model can be used to complete work on students’ behalf risks students being denied the opportunity for academic exploration, the cultivation of their intellectual curiosity, and the recognition and development of scholarly interests that can only be curated by time dedicated to study.
Accordingly, ChatGPT threatens to significantly harm the educational attainment, as well as the intellectual life, of students of medicine and the subjects that compliment it. This poses a serious threat to the ability of such students to deliver safe and effective care once they graduate to clinical practice. As such, institutions that provide medical education must react to the existence, availability, and rapidly increasing competency of GPT models. A suggested intervention is for such institutions to reintroduce in-person written examinations – ideally that are handwritten, or at least that have no access to ChatGPT – in order to more accurately assess the extent to which students understand the work they submit.
- TB Brown, B Mann, N Ryder, et al. Language models are few-shot learners. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems. December 2020; 159: 1877–1901. Available at https://dl.acm.org/doi/abs/10.5555/3495724.3495883
- OpenAI. GPT-3.5. 2021. https://openai.com/blog/gpt-3-5/ [accessed 01 May 2023]
- OpenAI. GPT-4. 2023. https://openai.com/product/gpt-4 [accessed 01 May 2023]
- University of Cambridge. ChatGPT (We need to talk). 05 April 2023. Available at https://www.cam.ac.uk/stories/ChatGPT-and-education# [accessed 01 May 2023]
- H Alkaissi and SI McFarlane. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 19 February 2023; 15(2): e35179. DOI: 10.7759/cureus.35179.