Richard Armitage is a GP and Public Health Specialty Registrar, and Honorary Assistant Professor at the University of Nottingham’s Academic Unit of Population and Lifespan Sciences. He is on twitter: @drricharmitage
Recently I have written a series of articles showcasing the rapid progress made in large language models (LLM), specifically the latest iterations of the GPT tools developed by OpenAI. The original article – prior to an interview with Hippocrates,1 and an outline of the threats posed to medical education,2 – written in January 2023, detailed the capabilities of the then-current model in the context of general practice. In what now seems like a bygone technological era – a mere four months ago – I concluded that ChatGPT was deeply impressive yet “certainly not ready for any form of use in general practice, the wider health system, or medical ethical enquiry.”3 Today, in May 2023, it appears that the shelf life of that summary has already expired.
Currently, neither ChatGPT (or ChatGPT Plus) itself, nor any application using these tools, is licensed for use within the medical or wider healthcare realm.
At a cost of $20/month, the latest iteration of OpenAI’s LLM – GPT-4 – is available as ChatGPT Plus. This model is substantially more powerful than its predecessor, which was based on the GPT-3.5 architecture with a specific focus on conversational language, and was the model upon which my outdated conclusion was founded.4 Currently, neither ChatGPT (or ChatGPT Plus) itself, nor any application using these tools, is licensed for use within the medical or wider healthcare realm. Accordingly, while the model can generate text on medical information,3 neither clinicians nor their patients should rely on its outputs when making consequential decisions.
Nevertheless, over the last few weeks the ChatGPT Plus interface has been open on my web browser while consulting with patients, and I have experimented with incorporating it into my clinical practice. While at no point did I rely upon it to make patient care decisions, the tool added value to my consultations in various ways. In this article I will use these experiences to describe The Present – how ChatGPT Plus can be used in the GP consultation – and speculate on how incorporating this technology into clinical practice is likely to evolve in The Future.
The Present
Currently, ChatGPT Plus is only available as a web browser (developers also can pay to access the API for incorporation into their products), rather than being integrated into the digital clinical system being used by the GP. My experimenting with the tool during consultations has revealed it to generate value in three distinct ways:
1) Searching for evidence
Performing a literature search on the fly inside a 10-minute appointment is prohibitively challenging. When I do so, I access the various databases that store medical evidence (e.g. PubMed), my academic institution’s online library or, when very pushed for time, Google Scholar. These resources usually produce a long list of results based on the key terms that I selected, with varying degrees of relevance and quality. I then select promising publications and interpret them for the patient ‘live.’ Recently, however, when a patient asked me whether any evidence exists pertaining to the efficacy of hypnotherapy in smoking cessation, I instead turned to ChatGPT Plus. I directed the question at the model interface as if it were a human, and specifically requested the kind of evidence that I would consider most trustworthy, along with their citations:
The previous iteration of ChatGPT – based on GPT-3.5 – was prone to making artificial ‘hallucinations’ in which it provided fictitious material with unwavering confidence.5 These hallucinations were interspersed among entirely factual information, and were thus often difficult to detect and potential misleading. Before upgrading to ChatGPT Plus, searching for evidence with ChatGPT frequently presented a study with a number of substantial errors, such as the wrong DOI number, an inaccurate author list or, most concerningly, a misrepresentative summary of the study’s key findings.
These problems appear to have been essentially corrected by GPT-4. As demonstrated above, the model humbly warns the user that it was only trained on data up to September 2021, so it is unable to consider any studies published beyond that date. Crucially, each of the three studies is real, their citations precise, and the summary provided accurately represents the key findings. In consultation with my patient, I was sure to look up the cited studies and interpret them from their published manuscripts, but I foresee a future in which user confidence in ChatGPT becomes strong enough to trust the model’s summarisation without the need to cross-reference with the original source.
Crucially – and what is most important to GPs in their short consultations – is the time-saving advantages offered by ChatGPT. Rather than searching various medical databases using key terms, filtering out irrelevant and low-quality results, and interpreting strong studies in real-time, asking ChatGPT to summarise the latest evidence is substantially more efficient. Once the model has access to the latest data (such as having full access to the open internet),6 its ability to ‘understand’ questions posed in natural language, and to summarise evidence in the style, literature searching will no longer be prohibitively time-consuming to inform consultations.
2) Searching for clinical guidance
By asking the model a specific question regarding management, and asking it to base its answer upon the most recent iteration of reliable guidance, the GP stands to save a substantial time in-consultation.
Clinical guidance is frequently updated, and the documents that contain it are often enormous. Locating the most up-to-date guidance, and the relevant information within it, therefore poses significant challenges for time-pressured GPs. ChatGPT promises to relieve the stress of this frequently-encountered problem in practice. By asking the model a specific question regarding management, and asking it to base its answer upon the most recent iteration of reliable guidance, the GP stands to save a substantial time in-consultation. As previously, however, key limitations are the recency of the data upon which ChatGPT has been trained, and the degree to which its answered can be trusted. In my practice, I cross-referenced its answers with the published document from which its answers are based. With GPT-4, answers are generally of extreme accuracy, although it is inherently unable to include information that was digitally published after September 2021.
The same approach could be taken to finding local patient support groups for particular conditions. I have asked ChatGPT Plus to identify and summarise, for example, support groups for patients living with Parkinson’s Disease in their local area, and to summarise what takes place in such groups. While ChatGPT was prone to hallucinations (providing plausible-sounding details of phantom support groups in amongst real ones), GPT-4 suffers from that affliction much less frequently, whilst also being humbler in the confidence of its pronouncements. The model is, however, limited to data up to September 2021, meaning support groups that have ceased to function may still be included in its answers, while new groups will not feature in them. Again, cross-checking with the source data is therefore mandatory practice for any GP incorporating this tool into their practice.
3) Creating patient information leaflets
Patient information leaflets can empower patients to take ownership of and self-manage their condition at home. However, these leaflets are often hard to locate, and may not be available for the specific condition that each particular patient is afflicted by. ChatGPT is able to collect and summarise information from specified sources that pertain to particular ailments, and summarise them in lay language accessible to patients.
I have produced patient information leaflets using similar GPT prompts and, after checking the sources and reading every word, printed it out or emailed it to the patient it was personally designed for. By including relevant details about the patient – such as their age, comorbidities, and any recent surgical interventions – ChatGPT can produce a individualised information leaflet that is specific that the patient in front of me.
In summary, GPT-4, which forms the architecture of ChatGPT Plus, is substantially more powerful, and therefore potentially more useful in clinical practice, than its GPT-3.5 predecessor. However, the model is not licensed for use in clinical practice, and therefore must not be relied upon to deliver clinical care. As such, my use of the tool in clinical practice is, without exception, cross-referenced with the original sources, such as the published study or clinical guidelines. Yet, the substantial improvement in competence and reliability between GPT-3.5 and GPT-4 is staggering and, I predict, it is only a matter of time before this tool becomes a fully-integrated (and licensed) feature in contemporary general practice. The following is how I imagine this landscape will unfold.
The Future
I predict that, by June 2025, the latest iteration of GPT will be fully embedded into digital clinical systems used by front line GPs (such as SystemOne and EMIS). This model will have live access to the open internet, and will constitute an enormously powerful tool that the GP can use as a ‘co-pilot’ in their clinical practice. This will offer a vast list of potential benefits to the organisation, planning, and deliver of primary care. Furthermore, and in addition to complete user confidence in the benefits described in The Present, co-pilots will help GPs to care for patients in-consultation in the following three ways:
1) Summarization of medical records and recent events
For some patients – particularly those with whom the GP is unfamiliar – the amount of work done by the GP before the beginning of the consultation is equal to that which takes place in the consultation itself. A patient with multiple comorbidities and various repeat medications, who has recently consulted with a string of different GPs for the same or different ailments, and had primary care investigations, acute prescriptions, repeat medication changes, outpatient appointments, A&E visits, inpatient stays, and contacts with the ambulance service, 111 service, and Urgent Treatment Centres, requires significant time investment just for the GP to come up to speed with recent events, all before the patient enters the consulting room. However, by being fully embedded into the digital clinical system, and with its ability to ‘understand,’ summarise and present written text in natural language format, GP co-pilots promise to save significant amounts of GP time by presenting a summarised account of the patient’s medical records, including recent events, instantaneously. Rather than navigating around an endless number of screens, tabs and letter, the GP will be brought up to speed simply by reading a brief summary paragraph, which could be prompted to include any outstanding tasks, the patient’s previously-expressed ideas, concerns and expectations, or even overlooked features of their clinical management when compared to relevant clinical guidelines.
2) Live writing of summarised notes
The necessary process of writing notes is both time-consuming and time spent away from the direct provision of compassionate care. In combination with a text-to-speech API, the GP co-pilot will be able to listen to the consultation (producing a transcript), summarise it, and automatically record the summary in the clinical notes as soon as the consultation has concluded. Below is a basic example of ChatGPT Plus’ ability to summarise unformatted transcribe language between two individuals. This is the foundation capability upon which the live writing of summarised notes will be based in GP co-pilots.
3) Offer differentials and treatment plans
While listening to and ‘understanding’ the on-going consultation, the GP co-pilot will suggest differential diagnoses (which could be rearranged in order of likelihood in real-time as the consultation progresses) and management plans (as informed by the co-pilot’s accessing and understanding of relevant clinical guidelines). The GP is then freely able to follow or to ignore the co-pilot’s suggestions.
In summary, GP co-pilots will serve as enhancing augmentations, rather than complete replacements, of frontline practicing GPs. Their crucial benefit will the as time-saving tools, which allow GPs to spend a greater proportion of the consultation doing the thing that even fully actualised artificial general intelligences will be unable to achieve – the development of true, compassionate rapport between two human beings. While these speculations, and the integration of these emergent technologies into medicine and wider healthcare, raise a long list of ethical and legal concerns not limited to privacy, confidentiality, data ownership, and clinical accountability, this will not stop the development and incorporation of these tools into contemporary general practice. This is coming. I predict that, by June 2025, all GPs will be using some form of LLM-powered co-pilot in their day-to-day practice. For me, this is to be embraced – with the necessary wisdom and precautions to ensure a smooth and safe transition. Used (and regulated) correctly, these tools will enhance patient care, improve GP working conditions, and relieve pressures on general practice in a transformational manner. Get ready.
References
- R Armitage. Interviewing Hippocrates: a conversation with the father of Western medicine. BJGP Life 04 May 2023. https://bjgplife.com/interviewing-hippocrates-a-conversation-with-the-father-of-western-medicine/
- R Armitage. ChatGPT: a threat to medical education? BJGP Life 11 May 2023. https://bjgplife.com/chatgpt-a-threat-to-medical-education/
- R Armitage. ChatGPT: what it means for general practice. BJGP Life 02 January 2023. https://bjgplife.com/chatgpt-what-it-means-for-general-practice/
- OpenAI. GPT-4. https://openai.com/product/gpt-4 [accessed 11 May 2023]
- H Alkaissi and SI McFarlane. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 19 February 2023; 15(2) :e35179. DOI: 10.7759/cureus.35179.
- OpenAI. ChatGPT Plugins. https://openai.com/blog/chatgpt-plugins [accessed 11 May 2023]
Featured Photo by Glenn Carstens-Peters on Unsplash
Richard Armitage argues that, in three well-defined contexts, clinical decision-making should be delegated to AI systems either today or in the very near future.
Richard Armitage is a GP and Honorary Assistant Professor at the University of Nottingham’s Academic Unit of Population and Lifespan Sciences. He is on twitter:…