2.03.2011

Knowing what it knows: selected nuances of Watson's strategy

Editor’s note: This guest post from IBM Researcher Dr. Jon Lenchner is the second article in a three-part series about how Watson plays America’s favorite quiz show®

Watson learns by gathering information, but instead of neural connections, it uses algorithms to understand the natural language that information is written in. These algorithms give it a confidence in a Jeopardy! category and clue, that maps to a probabilistic estimate that the response is correct.

Watson honed this self-assessment of what it does and does not know by training on thousands of historical questions (Watson’s equivalent of taking a few hundred “practice tests”).

The algorithms dealing with natural language are not perfect, so there’s always some degree of uncertainty. Watson calculates its uncertainty and learns which algorithms to trust under which circumstances, such as different Jeopardy! categories.

IBM Researcher Dr. David Gondek, who developed machine learning algorithms and infrastructure that Watson uses to rank and estimate confidence in possible answers, uses an example of how “introducing” and “manufacturing” show how language has many ways to refer to the same relation and can be highly contextual:

The clue was: It was introduced by the Coca-Cola Company in 1963. Watson can find a passage stating that ‘Coca-Cola first manufactured Tab (the correct response) in 1963’, so in order to answer the question, Watson needed to understand that introducing and manufacturing can be equivalent – if a company is introducing a product. But that is highly dependent on context: if you introduce your uncle, it doesn't mean you manufactured him.

Watson also exhibits dynamic learning within categories. Watson observes the correct answers to clues to verify it is interpreting the category correctly. The sparring matches offer good examples of Watson making these in-game adjustments. Not only does Watson get better at answering as clues in a category are revealed, but its understanding of its own in-category ability is also refined.

Note, because Watson cannot hear, it does not know how Jennings or Rutter answer a clue. So, Watson cannot use their responses in its accuracy assessment or to change a response it may be considering.



The Confidence to buzz in

Those who watched the practice round could see a graphic of Watson’s confidence level in its top three possible responses, and a line that established a threshold it must reach to buzz in.

Watson’s default threshold is typically 50 percent. In other words, if its confidence estimation determines a 50 percent or higher chance of correctly responding to a clue, it will try to buzz in.

What is a Daily Double?

When a contestant selects a Daily Double, he or she can wager between $5 and either his or her current score (a True Daily Double), if higher than the highest value on the board, or if not, up to the highest value on the board ($1,000 for Single Jeopardy! and $2,000 for Double Jeopardy!)

The first round of Jeopardy! has one Daily Double; the second round has two.

But the buzz threshold is game-state dependent.

The threshold can change substantially towards the end of a game. For example, Watson will lower the threshold if it gives a higher chance to win or, for example, to avoid a statistical lockout. Analogously, if Watson is leading and its only chance of losing a game is to buzz in and respond incorrectly, it will not buzz in, no matter how confident.

Clue selection

If Watson gets to choose a category and clue, its first priority is finding any remaining of the three Daily Doubles in a game. These clues allow a contestant to wager a specific dollar amount on the clue without worry of the other two contestants buzzing in. Jennings, Rutter and Watson have a high chance to answer these correctly, so Daily Doubles provide three opportunities for a critical score boost.

The Watson Research team studied the historical distribution of Daily Doubles and found they appear most-frequently in the three bottom rows, with the fourth being the most common. Daily Doubles also most frequently appear in the first column. Watson also makes use of even more statistics to dynamically predict their location based on what has been exposed so far in a game.

Once the Daily Doubles are off the board, Watson looks for the lowest clue value in a category, for which there are still a significant number of high value clues. Lower value clues help it get the gist of a category with less risk, so that it has a better shot at the high value clues to come.

29 comments:

  1. I assumed from prior postings that Watson could hear the questions being read. How does Watson get the Jeopardy answers in real time, then?

    ReplyDelete
  2. @Rocket:

    An extract from a previous post contains your answer.

    "Watson depends on text messaging, sent over TCP/IP, in order to receive the clue. At exactly the moment that the clue is revealed on the game board, a text is sent electronically to Watson’s POWER7 chips. So, Watson receives the clue text at the same time it hits Brad Rutter’s and Ken Jennings’ retinas."

    ReplyDelete
  3. Wow ! We don't see jeopardy in Australia, so i look forward to seeing some video links.
    Good Luck Watson !!

    ReplyDelete
  4. If you are outside the USA, you can probably catch the video of the show later on youtube or maybe some other web location.
    The show last night(day 2) was pretty amazing

    ReplyDelete
  5. I wonder if there's some disadvantage to Ken and Brad because of human IO bottlenecks. From the show, most human contestants respond after Alex stated the clue.

    ReplyDelete
  6. Watson is able to process the text question instantaneously, while the contestants have to read the question and are actually hindered in processing the question by the announcer speaking the question aloud. It would be a fairer contest to allow the contestants at least a second to read the question (without the announcer).

    ReplyDelete
  7. Alan nailed it. The fact that the human contestants have to read or listen to the clue puts them at a significant disadvantage. The technology behind Watson is incredible but the Jeopardy Challenge is essentially a PR stunt. There is simply no way for a human competitor to beat Watson to the buzzer with the consistency needed to actually have a chance to win.

    ReplyDelete
  8. About the computer being able to buzz in faster than a human because of reading time: (From Wikipedia) Contestants must wait until the host finishes reading the clue before ringing in. Ringing in before this point locks the contestant out for one fourth of a second. Lights mounted around the game board illuminate to indicate when contestants may ring in, and the contestant has five seconds to offer a response. Additionally, a tone sounds in conjunction with the illuminated lights on episodes that feature visually-impaired contestants.

    Before Trebek's second season, contestants were able to ring in at any time after the clue had been revealed, and a buzzer would sound whenever someone rang in. According to Trebek, the buzzer sound was "distracting to the viewers" and sometimes presented problems, as contestants would inadvertently ring in too soon, or ring in so quickly that by the time he finished reading the clue, the contestant's five-second limit had expired. He also said that, by not allowing anyone to ring in until the clue was finished, home viewers could play along more easily, and faster contestants would be less likely to dominate the game

    ReplyDelete
  9. My first blush was along the lines of what Anonymous said above... however, consider that Watson has to then take that clue it *read* faster ... and perform natural language parsing on it. Its reading speed is not the important factor here.

    That being said, I could see giving the humans a head start and see how much of a difference it makes.

    ReplyDelete
  10. How does Watson know when to buzz in? I thought that everyone has to wait for a 'guy in the back' to 'release' the buzzers. If Watson isn't hearing the question, but is told when it is possible to buzz in, but the humans have to rely on reflexes as to when Alex finishes the questions, the contest is skewed quite favorably to the computer. (I would assume that both Brad and Ken would attempt to hit the buzzers about 80-90% of the time.

    ReplyDelete
  11. To answer your question (again from Wikipedia):
    Originally Watson buzzed in electronically, but Jeopardy! requested that it physically press a button, as the human contestants would. Even with a robotic "finger" pressing the buzzer, Watson remained faster than its human competitors. Jennings noted, "If you're trying to win on the show, the buzzer is all," and that Watson "can knock out a microsecond-precise buzz every single time with little or no variation. Human reflexes can't compete with computer circuits in this regard." Also, Watson could avoid the time-penalty for accidentally buzzing in too early, because it was electronically notified when to buzz, whereas the human contestants had to anticipate the right moment.

    ReplyDelete
  12. With regard to:
    "The threshold can change substantially towards the end of a game. For example, Watson will lower the threshold if it gives a higher chance to win or, for example, to avoid a statistical lockout.
    Analogously, if Watson is leading and its only chance of losing a game is to buzz in and respond incorrectly, it will not buzz in, no matter how confident."



    If Watson could lose by buzzing in and answering incorrectly - couldn't he lose by an opponent answering correctly ?
    It would seem that buzzing in would be the best strategy with Jennings/Rutter, caliber opponents..

    ReplyDelete
  13. "Watson is able to process the text question instantaneously..."

    "The fact that the human contestants have to read or listen to the clue puts them at a significant disadvantage."

    ======

    HA! xD

    It's amazing to me that so many people can believe such a silly thing!

    Computers have the advantage when it comes to READING?! Since when??

    One of the major accomplishments of Watson in the first place is that it can "read" and "understand" such a wide variety of Jeopardy questions (or answers)... AT. ALL.

    Just "understanding" the incredible subtleties in a *HUMAN* language (and Jeopardy questions/answers, in particular) requires an enormously sophisticated and complex AI (ahem, *NON-HUMAN*) computer system. I, for one, am astonished that IBM was able to do it.

    Kudos to the entire Watson team for the amazing job that you have done!

    [For the record, no one can buzz in to answer until the question (pardon, "answer") has been read by Alex. So the fact that Watson receives the question/answer as an electronic text file is completely irrelevant. As IBM has stated elsewhere, humans actually have an advantage over Watson because they can anticipate the exact moment when Alex will finish speaking and their buzzers will be enabled.]

    ReplyDelete
  14. Here is a question: if the humans gave up on trying to think of an answer and just concentrated on ringing the buzzer as quickly as possible, would they have a chance to answer any more questions? If they would not have more chances, then it is pretty clear that the contest is over reaction time, not question answering. I am amazed at Watson, its answers are incredible, but I'm not sure the contest is exactly what it appears to be

    ReplyDelete
  15. By one intuition, Watson should perform almost as well, if not as well, on high-value questions than on low-value ones. The questions are harder, not because they are harder to understand, but because the responses are more obscure. Watson should be good on obscure data, though it would find fewer graphs to confirm a response there perhaps. Have you found this to be the case?

    ReplyDelete
  16. What were some of the top data sources stored in 15 Petabytes of disk storage for Watson? How are they stored for Watson to search them so fast?

    ReplyDelete
  17. Fantastic comments.
    Please keep posting your comments and questions. We will be using some of them at the TED.com LIVE event “Final Jeopardy and the Future of IBM Watson” event tomorrow 2/17 at 11:30 am ET.
    Please tune in at http://www.ted.com/pages/view/id/593.
    Thank you,
    Kevin Winterfield
    Editor

    ReplyDelete
  18. I think IBM and Watson has done a very fine job.

    But in order to get a more fair competition, Watson should get get the question only after it has been read aloud. The humans are in a way disturbed by receiving both text and sound at the same time. Especially on long questions Watson has an answer before reading aloud has stoped and at same time the humans has hardly got through their initial fuss and stress. In this situation Watson is to good and win almost every time.

    ReplyDelete
  19. 2 comments:

    1. Agree with niss and others. It certainly gives an advantage to Watson, since humans naturally will tend to wait until they hear the question - or at least be somewhat distracted by the reading. Granted, we can point to plenty of disadvantages for the computer too, but the 'start of processing time' advantage clearly goes to Watson - and is probably significant for many of the clues. Since the real intent was to show if Watson can 'think' better, it would be nice to null out (at least some of) this obvious I/O advantage.

    2. It was interesting to see that many times Watson's 2nd or 3rd choices were wacky. Good fodder to tweak algorithms, to winnow answers down more reliably into the right realm. It would be interesting to hear the Watson team's comments on why the 1st-3rd choices were so disparate at times.

    Thanks - great accomplishment and good entertainment.

    ReplyDelete
  20. As broadband expands and gets faster, the text based information that Watson relies on will diminish. IBM should work with blind community to give youtube the "narrative network" treatment and with deaf to expand closed captioning to other video services. Then harvest the closed captioning and descriptions.

    ReplyDelete
  21. Please view the replay from today's TEDtalk with members of the IBM Watson team for more information on Watson - http://www.ted.com/webcast/archive/event/ibmwatson

    Kevin
    Editor

    ReplyDelete
  22. If Watson could lose by buzzing in and answering incorrectly - couldn't he lose by an opponent answering correctly ?

    Watson's incorrect response could be followed by an opponent's correct response (unless both opponents already have provided incorrect responses, in which case your scenario doesn't apply). Conversely, if Watson does nothing, the potential damage is halved. (An opponent's score can increase, but Watson's score won't decrease by the same amount.)

    The scenario described in the article is one in which it's mathematically impossible for Watson to lose unless it responds incorrectly and an opponent responds correctly. If an opponent's correct response alone would cause Watson to lose, Watson will rely on its confidence (as usual).

    ReplyDelete
  23. I think it picks the easiest question in a category to help it establish an initial weighting to give the "clue" in the category title(how relevant to the actual answer) I would assume the weighting updates as more questions/answers are revealed and the weighting factor may even influence the weighting factor of other categories on the board...

    ReplyDelete
  24. want to know about watson??????????????

    ReplyDelete
  25. What fantasy football draft strategy do you use?

    ReplyDelete
  26. Excelente trabajo, gracias por todo!

    ReplyDelete
  27. I would be interested in knowing these researchers response to Searle's Chinese Room problem.

    What we need now is a good open source voice recognition program with speech corpora - ala CMU Sphinx or something similar - and a good TTS engine. (Remember the good old days of ViaVoice?)

    Also, has IBM thought of using a humanoid robotics platform for Watson's UI?

    ReplyDelete
  28. The meaning of "intelligence" and even the presence of "intelligence", in my opinion, is just a supposition. I would like to think, there is no such thing as "intelligence" just as there is no such thing as "reality"; Both Searle's Chinese Room problem and the philosophical position of a "mind" are by and large mere choices and outcome of choices with out any "intentionality" to start with.

    ReplyDelete