How Game Theory Can Make AI More Reliable

A much bigger challenge for AI researchers was the diplomacy game favored by politicians like John F. Kennedy and Henry Kissinger. In this game, players don’t play two opponents, but seven, whose motives are hard to read. To win, players must negotiate and make cooperative agreements that can be broken by anyone at any time. Diplomacy is so complex that in 2022, a group from Meta announced that the game could be played in a much more complex way. AI Program Cicero Over the course of 40 games, Cicero mastered “human-level play.” Though he wasn’t able to beat the world champion, he still managed to place in the top 10 percent of the human competitors.

During the project, Meta team member Jacob was struck by the fact that Cicero relied on a language model to generate dialogue with other players. He saw untapped potential. The team’s goal, he says, was “to build the best language model we could to play this game.” But what if we instead focused on building the best game we could to improve the performance of large-scale language models?

Consensual interactions

In 2023, Jacob began exploring that question at MIT. Shen Yikang, Gabriele Farinaand his advisors, Jacob AndreasIn 2013, we studied what became consensus games. The core idea came from imagining a conversation between two people as a cooperative game, where success occurs when the listener understands what the speaker is trying to communicate. In particular, consensus games are designed to coordinate two systems of a language model: a generator that handles generative questions, and a discriminator that handles discriminatory questions.

After months of trial and error, the team turned this principle into a full game. First, the generator receives a question. The question can come from a human or from an existing list, such as “Where was Barack Obama born?”. The generator then receives possible answers, such as Honolulu, Chicago, or Nairobi. Again, these options can come from a human, a list, or a search performed by the language model itself.

However, before answering, the generator is also informed whether it should answer the question correctly or incorrectly, depending on the outcome of a fair coin toss.

If it lands on heads, the machine tries to give the right answer. The generator sends the original question and the selected answer to the discriminator. If the discriminator determines that the generator intentionally sent the correct answer, they each get one point as a kind of incentive.

If the coin lands on tails, the generator sends out what it thinks is the wrong answer. If the discriminator thinks the wrong answer was given on purpose, both parties get the points again. The idea here is to encourage agreement. “It’s like teaching a dog tricks,” Jacob explains. “When the dog does the right thing, you give it a treat.”

The generator and discriminator also each start with some initial “beliefs”. These take the form of probability distributions associated with different choices. For example, the generator might believe, based on information gleaned from the Internet, that Obama was 80 percent likely to have been born in Honolulu, 10 percent likely to have been born in Chicago, 5 percent likely to have been born in Nairobi, and 5 percent likely to have been born elsewhere. The discriminator might start with a different distribution. The two “players” are rewarded if they reach an agreement, but are penalized if they deviate too far from their original beliefs. This arrangement encourages players to incorporate knowledge about the world they gain from the Internet into their answers, improving the accuracy of the model. Without it, they might earn points even if they agree on a completely wrong answer, like Delhi.

Source

Subscribe for Updates

What's Hot

How Game Theory Can Make AI More Reliable

Consensual interactions

Related Posts