Dr. ChatGPT Will Interface With You Now – IEEE Spectrum

If youre a typical person who has plenty of medical questions and not enough time with a doctor to ask them, you may have already turned to ChatGPT for help. Have you asked ChatGPT to interpret the results of that lab test your doctor ordered? The one that came back with inscrutable numbers? Or maybe you described some symptoms youve been having and asked for a diagnosis. In which case the chatbot probably responded with something that began like, Im an AI and not a doctor, followed by some at least reasonable-seeming advice. ChatGPT, the remarkably proficient chatbot from OpenAI, always has time for you, and always has answers. Whether or not theyre the right answers...well, thats another question.

One question was foremost in his mind: How do we test this so we can start using it as safely as possible?

Meanwhile, doctors are reportedly using it to deal with paperwork like letters to insurance companies, and also to find the right words to say to patients in hard situations. To understand how this new mode of AI will affect medicine, IEEE Spectrum spoke with Isaac Kohane, chair of the Department of Biomedical Informatics at Harvard Medical School. Kohane, a practicing physician with a computer science Ph.D., got early access to GPT-4, the latest version of the large language model that powers ChatGPT. He ended up writing a book about it with Peter Lee, Microsofts corporate vice president of research and incubations, and Carey Goldberg, a science and medicine journalist.

In the new book, The AI Revolution in Medicine: GPT-4 and Beyond, Kohane describes his attempts to stump GPT-4 with hard cases and also thinks through how it could change his profession. He writes that one question became foremost in his mind: How do we test this so we can start using it as safely as possible?

Isaac Kohane on:

IEEE Spectrum: How did you get involved in testing GPT-4 before its public launch?

Isaac Kohane: I got a call in October from Peter Lee who said he could not even tell me what he was going to tell me about. And he gave me several reasons why this would have to be a very secret discussion. He also shared with me that in addition to his enthusiasm about it, he was extremely puzzled, losing sleep over the fact that he did not understand why it was performing as well as it did. And he wanted to have a conversation with me about it, because health care was a domain that hes long been interested in. And he knew that it was a long-standing interest to me because I did my Ph.D. thesis in expert systems back in the 1980s. And he also knew that I was starting a new journal, NEJM AI.

What I didnt share in the book is that it argued with me. There was one point in the workup where I thought it had made a wrong call, but then it argued with me successfully. And it really didnt back down.Isaac Kohane, Harvard Medical School

He thought that medicine was a good domain to discuss, because there were both clear dangers but also clear benefits to the public. Benefits: If it improved health care, improved patient autonomy, improved doctor productivity. And dangers: If things that were already apparent at that time such as inaccuracies and hallucinations would affect clinical judgment.

You described in the book your first impressions. Can you talk about the wonder and concern that you felt?

Kohane: Yeah. I decided to take Peter at his word about this really impressive performance. So I went right for the jugular, and gave it a really hard case, and a controversial case that I remember well from my training. I got called down to the newborn nursery because they had a baby with a small phallus and a scrotum that did not have testicles in it. And thats a very tense situation for parents and for doctors. And its also a domain where the knowledge about how to work it out covers pediatrics, but also understanding hormone action, understanding which genes are associated with those hormone actions, which are likely to go awry. And so I threw that all into the mix. I treated GPT-4 as if it were just a colleague and said, Okay, heres a case, what would you do next? And what was shocking to me was it was responding like someone who had gone through not only medical training, and pediatric training, but through a very specific kind of pediatric endocrine training, and all the molecular biology. Im not saying it understood it, but it was behaving like someone who did.

And that was particularly mind-blowing because as a researcher in AI and as someone who understood how a transformer model works, where the hell was it getting this? And this is definitely not a case that anybody knows about. I never published this case.

And this, frankly, was before OpenAI had done some major aligning on the model. So it was actually much more independent and opinionated. What I didnt share in the book is that it argued with me. There was one point in the workup where I thought it had made a wrong call, but then it argued with me successfully. And it really didnt back down. But OpenAI has now aligned it, so its a much more go-with-the-flow, user-must-be-right personality. But this was full-strength science fiction, a doctor-in-the-box.

At unexpected moments, it will make stuff up. How are you going to incorporate this into practice?Isaac Kohane, Harvard Medical School

Did you see any of the downsides that Peter Lee had mentioned?

Kohane: When I would ask for references, it made them up. And I was saying, okay, this is going to be incredibly challenging, because heres something thats really showing genuine expertise in a hard problem and would be great for a second opinion for a doctor and for a patient. Yet, at unexpected moments, it will make stuff up. How are you going to incorporate this into practice? And were having a tough enough time with narrow AI in getting regulatory oversight. I dont know how were going to do this.

You said GPT-4 may not have understood at all, but it was behaving like someone who did. That gets to the crux of it, doesnt it?

Kohane: Yes. And although its fun to talk about whether this is AGI [artificial general intelligence] or not, I think thats almost a philosophical question. In terms of putting my engineer hat on, is this substituting for a great second opinion? And the answer is often: yes. Does it act as if it knows more about medicine than an average general practitioner? Yes. So thats the challenge. How do we deal with that? Whether or not its a true sentient AGI is perhaps an important question, but not the one Im focusing on.

You mentioned there are already difficulties with getting regulations for narrow AI. Which organizations or hospitals will have the chutzpah to go forward and try to get this thing into practice? It feels like with questions of liability, its going to be a really tough challenge.

Kohane: Yes, it does, but whats amazing about itand I dont know if this was the intent of OpenAI and Microsoft. But by releasing it into the wild for millions of doctors and patients to try, it has already triggered a debate that is going to make it happen regardless. And what do I mean by that? On the one hand, look on the patient side. Except for a few lucky people who are particularly well connected, you dont know whos giving you the best advice. You have questions after a visit, but you dont have someone to answer them. You dont have enough time talking to your doctor. And thats why, before these generative models, people are using simple search all the time for medical questions. The popular phrase was Dr. Google. And the fact is there were lots of problematic websites that would be dug up by that search engine. In that context, in the absence of sufficient access to authoritative opinions of professionals, patients are going to use this all the time.

We know that doctors are using this. Now, the hospitals are not endorsing this, but doctors are tweeting about things that are probably illegal.Isaac Kohane, Harvard Medical School

So thats the patient side. What about the doctor side?

Kohane: And you can say, Well, what about liability? We know that doctors are using this. Now, the hospitals are not endorsing this, but doctors are tweeting about things that are probably illegal. For example, theyre slapping a patient history into the Web form of ChatGPT and asking to generate a letter for prior authorization for the insurance company. Now, why is that illegal? Because there are two different products that ultimately come from the same model. One is through OpenAI and then the other is through Microsoft, which makes it available through its HIPAA-controlled cloud. And even though OpenAI uses Azure, its not through this HIPAA-controlled process. So doctors technically are violating HIPAA by putting private patient information into the Web browser. But nonetheless, theyre doing it because the need is so great.

The administrative pressures on doctors are so great that being able to increase your efficiency by 10 percent, 20 percent is apparently good enough. And its clear to me that because of that, hospitals will have to deal with it. Theyll have their own policies to make sure that its safer, more secure. So theyre going to have to deal with this. And electronic record companies, theyre going to have to deal with it. So by making this available to the broad public, all of a sudden AI is going to be injected into health care.

You know a lot about the history of AI in medicine. What do you make of some of the prior failures or fizzles that have happened, like IBM Watson, which was touted as such a great revolution in medicine and then never really went anywhere?

Kohane: Right. Well, you had to watch out about when your senior management believes your hype. They took a really impressive performance of Watson on Jeopardy!that was genuinely groundbreaking performance. And they somehow convinced themselves that this was now going to work for medicine And created unreasonably high goals. At the same time, it was really poor implementation. They didnt really hook it well into the live data of health records and did not expose it to the right kind of knowledge sources. So it both was an overpromise, and it was underengineered into the workflow of doctors.

Speaking of fizzles, this is not the first heyday of artificial intelligence, this is perhaps the second heyday. When I did my Ph.D., there are many computer scientists like myself who thought the revolution was coming. And it wasnt, for at least three reasons: The clinical data was not available, knowledge was not encoded in a good way, and our machine-learning models were inadequate. And all of a sudden there was that Google paper in 2017 about transformers, and in that blink of an eye of five years, we developed this technology that miraculously can use human text to perform inferencing capabilities that wed only imagined.

When youre driving, its obvious when youre heading into a traffic accident. It might be harder to notice when a LLM recommends an inappropriate drug after a long stretch of good recommendations.Isaac Kohane, Harvard Medical School

Can we talk a little bit about GPT-4s mistakes, hallucinations, whatever we want to call them? It seems theyre somewhat rare, but I wonder if thats worse because if somethings wrong only every now and then, you probably get out of the habit of checking and youre just like, Oh, its probably fine.

Kohane: Youre absolutely right. If it was happening all the time, wed be superalert. If it confidently says mostly good things but also confidently states the incorrect things, well be asleep at the wheel. Thats actually a really good metaphor because Tesla has the same problem: I would say 99 percent of the time it does really great autonomous driving. And 1 percent doesnt sound bad, but 1 percent of a 2-hour drive is several minutes where it could get you killed. Tesla knows thats a problem, so theyve done things that I dont see happening yet in medicine. They require that your hands are on the wheel. Tesla also has cameras that are looking at your eyes. And if youre looking at your phone and not the road, it actually says, Im switching off the autopilot.

I guess the options are either to keep doctors alert or fix the problem. Do you think its possible to fix the hallucinations and mistakes problem?

Kohane: Weve been able to fix the hallucinations around citations by [having GPT-4 do] a search and see if theyre there. And theres also work on having another GPT look at the first GPTs output and assess it. These are helping, but will they bring hallucinations down to zero? No, thats impossible. And so in addition to making it better, we may have to inject fake crises or fake data and let the doctors know that theyre going to be tested to see if theyre awake. If it were the case that it can fully replace doctors, that would be one thing. But it cannot. Because at the very least, there are some commonsense things it doesnt get and some particulars about individual patients that it might not get.

I dont think its the right time yet to trust that these things have the same sort of common sense as humans.Isaac Kohane, Harvard Medical School

Kohane: Ironically, bedside manner it does better than human doctors. Annoyingly from my perspective. So Peter Lee is very impressed with how thoughtful and humane it is. But for me, I read it a completely different way because Ive known doctors who are the best, the sweetestpeople love them. But theyre not necessarily the most acute, most insightful. And some of the most acute and insightful are actually terrible personalities. So the bedside manner is not what I worry about. Instead, lets say, God forbid, I have this terrible lethal disease, and I really want to make it my daughters wedding. Unless its aligned extensively, it may not know to ask me about, Well, theres this therapy which gives you better long-term outcome. And for every such case, I could adjust the large language model accordingly, but there are thousands if not millions of such contingencies, which as human beings, we all reasonably understand.

It may be that in five years, well say, Wow, this thing has as much common sense as a human doctor, and it seems to understand all the questions about life experiences that warrant clinical decision-making. But right now, thats not the case. So its not so much the bedside manner; its the common sense insight about what informs our decisions.To give the folks at OpenAI credit, I did ask it: What if someone has an infection in their hands and theyre a pianist, how about amputating? And [GPT-4] understood well enough to know that, because its their whole livelihood, you should look harder at the alternatives. But in the general, I dont think its the right time yet to trust that these things have the same sort of common sense as humans.

One last question about a big topic: global health. In the book you say that this could be one of the places where theres a huge benefit to be gained. But I can also imagine people worrying: Were rolling out this relatively untested technology on these vulnerable populations; is that morally right? How do we thread that needle?

Kohane: Yeah. So I think we thread the needle by seeing the big picture. We dont want to abuse these populations, but we dont do the other form of abuse, which is to say, Were only going to make this technology available to rich white people in the developed world, and not make it available to individuals in the developing world. But in order to do that, everything, including in the developed world, has to be framed in the form of evaluations. And I put my mouth where my money is by starting this journal, NEJM AI. I think we have to evaluate these things. In the developing world, we can perhaps even leap over where we are in the developed world because theres a lot of medical practice thats not necessarily efficient. In the same way as the cellular phone has leapfrogged a lot of the technical infrastructure thats present in the developed world and gone straight to a fully distributed wireless infrastructure.

I think we should not be afraid to deploy this in places where it could have a lot of impact because theres just not that much human expertise. But at the same time, we have to understand that these are all fundamentally experiments, and they have to be evaluated.

From Your Site Articles

Cloud Hosting

Dr. ChatGPT Will Interface With You Now – IEEE Spectrum

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin