PARIS: Google’s artificial intelligence-powered medical chatbot has achieved a passing grade on a tough US medical licensing exam, but its answers still fall short of those from human doctors, a peer-reviewed study said on Wednesday.

Last year, the release of ChatGPT — whose developer OpenAI is backed by Google’s rival Microsoft — kicked off a race between tech giants in the burgeoning field of AI.

While much has been made about the future possibilities — and dangers – of AI, health is one area where the technology had already shown tangible progress, with algorithms able to read certain medical scans as well as humans.

Google first unveiled its AI tool for answering medical questions, called Med-PaLM, in a preprint study in December. Unlike ChatGPT, it has not been released to the public.

The US tech giant says Med-PaLM is the first large language model, an AI technique trained on vast amounts of human-produced text, to pass the US Medical Licensing Examination (USMLE).

A passing grade for the exam, which is taken by medical students and physicians-in-training in the United States, is around 60 per cent.

In February, a study said that ChatGPT had achieved passing or near-passing results.

In a peer-reviewed study published in the journal Nature on Wednesday, Google researchers said that Med-PaLM had achieved 67.6pc on USMLE-style multiple choice questions.

“Med-PaLM performs encouragingly, but remains inferior to clinicians,” the study said.

To identify and cut down on “hallucinations” — the name for when AI models offer up false information — Google said it had developed a new evaluation benchmark.

Karan Singhal, a Google researcher and lead author of the new study, said that the team has used the benchmark to test a newer version of their model with “super exciting” results.

Med-PaLM 2 has reached 86.5pc on the USMLE exam, topping the previous version by nearly 20pc, according to a preprint study released in May that has not been peer-reviewed.

‘Elephant in the room’

James Davenport, a computer scientist at the UK’s University of Bath not involved in the research, said “there is an elephant in the room” for these AI-powered medical chatbots.

There is a big difference between answering “medical questions and actual medicine,” which includes diagnosing and treating genuine health problems,“ he said.

Anthony Cohn, an AI expert at the UK’s Leeds University, said that hallucinations would likely always be a problem for such large language models, because of their statistical nature.

Therefore, these models “should always be regarded as assistants rather than the final decision makers,” Cohn said.

Published in Dawn, July 13th, 2023

Opinion

A long war?

A long war?

Both sides should have a common interest in averting a protracted conflict but the impasse persists.

Editorial

Interlinked crises
Updated 04 May, 2026

Interlinked crises

The situation vis-à-vis the US-Israeli war on Iran remains tense, with hostilities likely to resume if the diplomatic process fails.
Climate readiness
04 May, 2026

Climate readiness

AS policymakers gather for the Breathe Pakistan conference this week, the urgency is hard to miss. Each year, such...
Kalash preservation
04 May, 2026

Kalash preservation

FOR centuries, the Kalash people have maintained a culture, way of life, language and belief system that is uniquely...
On press freedoms
Updated 03 May, 2026

On press freedoms

THE citizenry forgets, to its own peril, how important a free and independent media is in the preservation of their...
Inflation strain
03 May, 2026

Inflation strain

PAKISTAN’S return to double-digit inflation after 21 months signals renewed economic strain where external shocks...
Troubled waters
03 May, 2026

Troubled waters

PAKISTAN’S water crisis is often framed in terms of scarcity. Increasingly, it is also a crisis of contamination....