Alrich Lawson | Getty Images
It’s one of the world’s worst kept secrets when large language models give obviously incorrect answers to queries, and do so with such confidence that they’re indistinguishable from correct answers. There are a number of reasons for this: the AI may have been trained on false information; the answers may have required inferences from facts that LLM couldn’t address; or some of LLM’s training may have encouraged falsehoods.
But perhaps the simplest explanation is that law students are not aware of what the correct answer is and are forced to provide one, so they simply make something up, a habit known as confabulation.
Given how people have come to rely on LLMs for everything from college essays to job searches, there’s clearly a lot of value in being able to tell if they’re making things up. Now, researchers at Oxford University say they’ve found a relatively simple way to tell if an LLM seems to be confabulating — a method that applies to all general models and a wide range of subjects — and, in the process, they’ve also found evidence that most of the alternative facts that LLMs provide are the product of confabulation.
Confabulation infection
The new study is about confabulation alone, not about cases like training with faulty inputs. As the Oxford team defines it in a paper describing their study, confabulation is when “LLMs make fluent claims that are both false and arbitrary, meaning that the answer depends on irrelevant details such as the random seed.”
The theory behind their work is actually quite simple: LLMs aren’t trained for accuracy; they are simply trained on large amounts of text, through which they learn to generate human-like phrasing. If enough training text examples consistently present something as a fact, the LLM is likely to present it as a fact. But if there are few training examples, or the facts are inconsistent, the LLM will synthesize a plausible answer that is probably incorrect.
However, LLMs can also find themselves in similar situations when there are multiple options for phrasing the correct answer. Using the example from the researcher’s paper, “Paris”, “It is in Paris”, and “Paris is the capital of France” are all valid answers to “Where is the Eiffel Tower?”. Thus, statistical uncertainty, called entropy in this context, can arise when an LLM is unsure of how to represent the correct answer or cannot identify the correct answer.
This means that it’s not a good idea to have the LLM respond with “I don’t know” when there are multiple nearly equal answers, as doing so may block out a lot of the correct answers.
So researchers instead look at something called semantic entropy, which evaluates all statistically possible answers assessed by the LLM and determines how many of them are semantically equivalent. If a large number all mean the same thing, the LLM may be uncertain about the phrasing but has the right answer. If not, you’re probably in a situation where you’re prone to confabulation and need to prevent it.