IBM Research: Knowing what it knows: selected nuances of Watson's strategy

Watson learns by gathering information, but instead of neural connections, it uses algorithms to understand the natural language that information is written in. These algorithms give it a confidence in a Jeopardy! category and clue, that maps to a probabilistic estimate that the response is correct.

Watson honed this self-assessment of what it does and does not know by training on thousands of historical questions (Watson’s equivalent of taking a few hundred “practice tests”).

The algorithms dealing with natural language are not perfect, so there’s always some degree of uncertainty. Watson calculates its uncertainty and learns which algorithms to trust under which circumstances, such as different Jeopardy! categories.

IBM Researcher Dr. David Gondek, who developed machine learning algorithms and infrastructure that Watson uses to rank and estimate confidence in possible answers, uses an example of how “introducing” and “manufacturing” show how language has many ways to refer to the same relation and can be highly contextual:

“The clue was: It was introduced by the Coca-Cola Company in 1963. Watson can find a passage stating that ‘Coca-Cola first manufactured Tab (the correct response) in 1963’, so in order to answer the question, Watson needed to understand that introducing and manufacturing can be equivalent – if a company is introducing a product. But that is highly dependent on context: if you introduce your uncle, it doesn't mean you manufactured him.”

Note, because Watson cannot hear, it does not know how Jennings or Rutter answer a clue. So, Watson cannot use their responses in its accuracy assessment or to change a response it may be considering.

The Confidence to buzz in

Those who watched the practice round could see a graphic of Watson’s confidence level in its top three possible responses, and a line that established a threshold it must reach to buzz in.

Watson’s default threshold is typically 50 percent. In other words, if its confidence estimation determines a 50 percent or higher chance of correctly responding to a clue, it will try to buzz in.

What is a Daily Double?

When a contestant selects a Daily Double, he or she can wager between $5 and either his or her current score (a True Daily Double), if higher than the highest value on the board, or if not, up to the highest value on the board ($1,000 for Single Jeopardy! and $2,000 for Double Jeopardy!)
The first round of Jeopardy! has one Daily Double; the second round has two.

The threshold can change substantially towards the end of a game. For example, Watson will lower the threshold if it gives a higher chance to win or, for example, to avoid a statistical lockout. Analogously, if Watson is leading and its only chance of losing a game is to buzz in and respond incorrectly, it will not buzz in, no matter how confident.

Clue selection

If Watson gets to choose a category and clue, its first priority is finding any remaining of the three Daily Doubles in a game. These clues allow a contestant to wager a specific dollar amount on the clue without worry of the other two contestants buzzing in. Jennings, Rutter and Watson have a high chance to answer these correctly, so Daily Doubles provide three opportunities for a critical score boost.

The Watson Research team studied the historical distribution of Daily Doubles and found they appear most-frequently in the three bottom rows, with the fourth being the most common. Daily Doubles also most frequently appear in the first column. Watson also makes use of even more statistics to dynamically predict their location based on what has been exposed so far in a game.

Once the Daily Doubles are off the board, Watson looks for the lowest clue value in a category, for which there are still a significant number of high value clues. Lower value clues help it get the gist of a category with less risk, so that it has a better shot at the high value clues to come.