The buzzer sounds.
Jeopardy! host Alex Trebek: “Watson?”
IBM Watson: “What is …”
This scenario will play out on February’s airing of the Jeopardy! quiz show when IBM’s Question Answering system, Watson, will challenge two of the game’s greatest champions, Ken Jennings and Brad Rutter.
Watson, however, cannot “see” or “hear” anything – so how can he play a Jeopardy! game?
When host Alex Trebek finishes stating a clue, a human operator (who works for Jeopardy!) turns on a “Buzzer Enable” light on stage to indicate that contestants can “buzz in” and answer. At exactly the moment the “Buzzer Enable” light is activated, Watson’s system receives a signal that the buzzer is open.
Watson’s avatar, which viewers will see behind a standard Jeopardy! podium, is designer Joshua Davis’ artistic representation of the machine. It does not provide eyes or ears for Watson. Instead, Watson depends on text messaging, sent over TCP/IP, in order to receive the clue. At exactly the moment that the clue is revealed on the game board, a text is sent electronically to Watson’s POWER7 chips. So, Watson receives the clue text at the same time it hits Brad Rutter’s and Ken Jennings’ retinas.
Watson uses IBM’s DeepQA technology (over optimized IBM POWER7 servers) to analyze and produce a Jeopardy! clue response. Those responses come with an associated confidence, or estimated probability that the answer is correct. If his confidence is high enough, Watson may decide to buzz in. To do this, Watson sends a signal to a mechanical thumb, which is mounted on exactly the same type of Jeopardy! buzzer used by human contestants. Just like Ken and Brad, Watson must physically depress a button to buzz in.
Watson’s buzzing is not instantaneous. For some clues he may not complete the question answering computation in time to make the decision to buzz in. For all clues, even if he does have an answer and confidence ready in time, he still has to respond to the signal and physically depress the button.
The best human contestants don’t wait for, but instead anticipate when Trebek will finish reading a clue. They time their “buzz” for the instant when the last word leaves Trebek’s mouth and the “Buzzer Enable” light turns on. Watson cannot anticipate. He can only react to the enable signal. While Watson reacts at an impressive speed, humans can and do buzz in faster than his best possible reaction time.
Speaking when signaled
When answering a clue, Watson must convert his answer from text into speech to verbally respond like any other contestant. An operator prompts Watson to speak his answer. The operator has no control over what Watson might say. The operator just ensures that Watson will speak at the right moment and not interrupt the host or others.
The sound of Watson’s voice is synthesized, based on a human’s voice. Since it’s not possible to record someone speaking every possible word and phrase imaginable – all the more so given the vast range of topics and knowledge that even a single game of Jeopardy! demands – an IBM text-to-speech engine (TTS) “speaks” Watson’s answer. And Watson’s speech must be highly accurate, as mispronunciations of an ambiguous response may be judged incorrect.
Categories and clues
Watson autonomously selects categories and clues, based on algorithms that – just as his human opponents will do – take into consideration available clues; score and game position; knowledge of clues previously revealed, as well as other factors. In the next article of the series, we will take a closer look at how Watson chooses a Jeopardy! category and clue.
Note: As Watson cannot see or hear, he cannot respond to video or audio clues. Jeopardy! has agreed to omit them, just as they have with contestants who are visually or hearing impaired. Watson did take and pass the same Jeopardy! contestant test that humans take to qualify for the show. Find out more about Watson at ibmwatson.com.
Labels: jeopardy power7 deepqa ibmwatson