Keeping up with business speak can be challenging.
Dogfooding, for instance, refers to “the use of a newly developed product or service by a company’s staff to test it before it is made available to customers.” Trendjacking, on the other hand, is what businesses do when they jump on trending topics, hashtags, events, and memes.
The AI world has its own cryptic terminology: large language models (LLMs) and generative pre-trained transformer (GPT) are at the top of the list at the moment. A recent commentary in Nature explains that an LLM is “a machine-learning system that autonomously learns from data and can produce sophisticated and seemingly intelligent writing after training on a massive dataset of text.” Similarly, a GPT is a language model that taps into a massive text dataset to generate what reads like human-written text.
Articles about the promise and peril associated with these new digital tools are everywhere. On the one hand, they hold promise for medical researchers and clinicians because they may be capable of analyzing a massive collection of published papers on a specific subject and providing a quick summary of the most relevant information. On the other hand, because these tools indiscriminately gather information from the internet and social media, they are subject to the same prejudices, misinformation, and nonsense that fill many of these resources. This two-edged sword begs the question: How do we create a framework for the ethical use of LLMs?
“Not a major leap”
Stefan Harrer, with the Digital Health Cooperative Research Centre in Melbourne, Australia, addressed this challenge in a recent Lancet eBioMedicine paper. He points out that these models are not a major leap in the ability of computers to analyze data logically. Many GPT enthusiasts are under the impression that these tools bring AI a step closer to duplicating human intelligence and all the complex cognitive processes it entails. In fact, LLMs represent “an illusion of intelligence.” In other words: “As sophisticated as LLM-powered chatbot responses might look, they represent nothing more than the model’s extensive statistical knowledge of which words have preceded others in text that it has previously seen. They comprehend none of the language they deal with, neither the prompts they are being fed nor their responses.”
In addition, while LLMs can occasionally produce a truthful, relevant — even creative — report on a specific topic, they are equally capable of generating a document that directly opposes the sound reasoning and analysis put forth in the previous report.
How can LLM’s “hallucinations” be addressed? Harrer proposes a regulatory framework that ensures AI systems are designed to augment the human decision-making process, not replace it.
They should also “regularly produce easily accessible and quantifiable performance, usage, and impact metrics explaining when and how AI is used to assist decision-making and allowing to detect potential bias.” Equally important, any AI system or document that uses LLM should clearly state that content was created using said digital tools.
Similar concerns have prompted several medical journals to take a firm position on the use of LLMs in research and publishing. JAMA, for instance, recently revised its guidelines for authors, stating: “Nonhuman artificial intelligence, language models, machine learning, or similar technologies do not qualify for authorship. If these models or tools are used to create content or assist with writing or manuscript preparation, authors must take responsibility for the integrity of the content generated by these tools.”
LLMs may not be the apocalyptic nightmare some critics fear, but they certainly require due diligence by thought leaders, health care professionals, and the general public.
This piece, written by John Halamka, MD, president, and Paul Cerrato, senior research analyst and communications specialist at Mayo Clinical Platform, was originally posted to their blog page, Digital Health Frontier.