How Watson “sees,” “hears,” and “speaks” to play Jeopardy!

Editor’s note: This guest post from IBM Researcher Dr. David Gondek is the first article in a three-part series about how Watson plays America’s favorite quiz show®

The buzzer sounds.

Jeopardy! host Alex Trebek: “Watson?”

IBM Watson: “What is …”

This scenario will play out on February’s airing of the Jeopardy! quiz show when IBM’s Question Answering system, Watson, will challenge two of the game’s greatest champions, Ken Jennings and Brad Rutter.

Watson, however, cannot “see” or “hear” anything – so how can he play a Jeopardy! game?

Chips, not retinas

When host Alex Trebek finishes stating a clue, a human operator (who works for Jeopardy!) turns on a “Buzzer Enable” light on stage to indicate that contestants can “buzz in” and answer. At exactly the moment the “Buzzer Enable” light is activated, Watson’s system receives a signal that the buzzer is open.

Watson’s avatar, which viewers will see behind a standard Jeopardy! podium, is designer Joshua Davis’ artistic representation of the machine. It does not provide eyes or ears for Watson. Instead, Watson depends on text messaging, sent over TCP/IP, in order to receive the clue. At exactly the moment that the clue is revealed on the game board, a text is sent electronically to Watson’s POWER7 chips. So, Watson receives the clue text at the same time it hits Brad Rutter’s and Ken Jennings’ retinas.

Watson uses IBM’s DeepQA technology (over optimized IBM POWER7 servers) to analyze and produce a Jeopardy! clue response. Those responses come with an associated confidence, or estimated probability that the answer is correct. If his confidence is high enough, Watson may decide to buzz in. To do this, Watson sends a signal to a mechanical thumb, which is mounted on exactly the same type of Jeopardy! buzzer used by human contestants. Just like Ken and Brad, Watson must physically depress a button to buzz in.

Watson’s buzzing is not instantaneous. For some clues he may not complete the question answering computation in time to make the decision to buzz in. For all clues, even if he does have an answer and confidence ready in time, he still has to respond to the signal and physically depress the button.

The best human contestants don’t wait for, but instead anticipate when Trebek will finish reading a clue. They time their “buzz” for the instant when the last word leaves Trebek’s mouth and the “Buzzer Enable” light turns on. Watson cannot anticipate. He can only react to the enable signal. While Watson reacts at an impressive speed, humans can and do buzz in faster than his best possible reaction time.

Speaking when signaled

When answering a clue, Watson must convert his answer from text into speech to verbally respond like any other contestant. An operator prompts Watson to speak his answer. The operator has no control over what Watson might say. The operator just ensures that Watson will speak at the right moment and not interrupt the host or others.

The sound of Watson’s voice is synthesized, based on a human’s voice. Since it’s not possible to record someone speaking every possible word and phrase imaginable – all the more so given the vast range of topics and knowledge that even a single game of Jeopardy! demands – an IBM text-to-speech engine (TTS) “speaks” Watson’s answer. And Watson’s speech must be highly accurate, as mispronunciations of an ambiguous response may be judged incorrect.

Categories and clues

Watson autonomously selects categories and clues, based on algorithms that – just as his human opponents will do – take into consideration available clues; score and game position; knowledge of clues previously revealed, as well as other factors. In the next article of the series, we will take a closer look at how Watson chooses a Jeopardy! category and clue.

Note: As Watson cannot see or hear, he cannot respond to video or audio clues. Jeopardy! has agreed to omit them, just as they have with contestants who are visually or hearing impaired. Watson did take and pass the same Jeopardy! contestant test that humans take to qualify for the show. Find out more about Watson at ibmwatson.com.


  1. I can't believe that IBM went through all the trouble of creating and meeting this challenge and will NOT compete on a level playing field. How could they say its competition when Watson gets the input as TCP/IP text. Just like the human players it should only get the audio and visual streams that the other two players are getting. Buzzing in and timing are a big part of the game. Watson should have to rely on his own eyes and ears like the others, not be given some back-door to the questions. THIS IS NOT A LEVEL PLAYING FIELD!!! It is still a great accomplishment, but for such a public event IBM should go the full 10 yards.

  2. How does Watson get feedback about:
    - whether its answer was correct
    - whether another contestant's answer was correct, and what they guessed
    - the correct answer, if no one guesses

    If Watson computes an answer with low confidence, and a human opponent answers and gets it wrong, can Watson re-evaluate its answer+confidence?

  3. You realize David that your the one that their going to have to travel back in time to eliminate after the cyborgs take over the world. :)

    Joel at alien-ufo-research.com

  4. @Anonymous: it's more level than you think. Watson can't ring in as soon as the buzzer is ready and then take time to answer it. The human players can buzz in and use their 2-3 seconds to come up with the answer. Watson can't buzz in until it has an answer AND confidence that it's right. It also can't hear the wrong answers given by contestants and adjust his answer based on that - so in theory he can say the exact same wrong answer as just given. Humans have the edge in "tactics" so it's much more level than you think....

  5. This is impressive,but I would be more impressed if watson was having a camera set on screen that shows question and use it to get the question out of it and give the answer instead of tcpip .This would have been like a cyborg against human :)

  6. Why does it have to "see" the screen when it could "listen" to the question audibly.

  7. Great post!
    Despite some of comments here, I think Ken and Brad have a slight advantage over Watson. I go into this on blog today "Humans vs. Watson (Programmed by Humans): Who Has The Advantage?"


  8. Having toyed around with NLP for decades, I appreciate some of the hurdles ... so hafta say, Congratulations guys, the speed and accuracy are pretty amazing.

  9. hmm ... @IBMWatson seems to have a slight advantage as it instantly receives questions via tcp/ip without delays/distractions. This is a HUGE advantage in terms of computing time. Would it be more fair to its human competitors if it either receives the questions only when Alex finishes reading them or if it uses OCR to evaluate them?

  10. To those saying Watson has an unfair advantage with buzzing in, if that were true, why wouldn't it win every buzz-in?

    Also, I read elsewhere that contestants on Jeopardy! cannot buzz in until Alex finishes reading the question AND an in-studio light behind him turns on. I believe it is safe to assume that Watson has to receive a signal of some sort that coincides with the turning on of said light that allows him to buzz in, should he choose to do so.

    Lastly, the buzz-in mechanism is a servo - basically a finger pressing the button, just like a human.

    In my opinion, the above points provide for a reasonably level playing field.

    FULL DISCLAIMER: I work for IBM but have nothing to do with the Watson project. So I am, if anything, a bit biased :)

  11. While it's impressive from a NLP perspective, Watson is exploiting a flaw in the game. At this level of competition, Jeopardy devolves into a buzzer battle, and no human is going to consistently beat a robot with nanosecond reflexes and a millisecond servos throw in for token fairness. However, it impressive that Watson can compete at this level, but it would be nice if they ramped up the difficulty of the questions, so that the computer was actually out "smarting" the humans instead of out clicking them.

    And yes Watson doesn't win every buzzer battle because it depends on how long it takes him to compute and how long Alex spends talking. However, he doesn't have to win them all to rack up a lot of dollars. Ken and Brad might know 90% of the answers, but if Watson can quickly get half the answers then he'll win based on the buzzer.

  12. " Pragnesh said...
    This is impressive,but I would be more impressed if watson was having a camera set on screen that shows question and use it to get the question out of it and give the answer instead of tcpip .This would have been like a cyborg against human :)"

    Absolutely Pragnesh, let us know when you've accomplished that.

  13. @Ken
    What's stopping Watson from using the same tactic at some point? If the computation is taking too long then buzz in before and use that 2 or 3 seconds to find the answer. Watson was very impressive and knew the majority of the answers so this isn't all that risky. I believe Ken Jennings would perform a similar strategy in his winning streak to keep control of the game.

  14. As I review many of the comments on the web with regard to this competition, I am discouraged to see how Watson's teams accomplishments are so unappreciated.
    CUDO's to you folks for your efforts and the historic advances you've brought to computing. I look forward to future developments.

  15. Re TCP/IP, the problem was not to build an artificial eye. But, it's sometimes actually slower to wait for the whole message to arrive rather than to be able to jump to a key word at the end as Brad or Ken can do visually, then buzz immediately without waiting, in anticipation of being able to answer the question. If you saw "...six wives." you might well jump to the answer: Henry VIII...you don't need "Tutor", "seventeenth century", and "king". Poor Watson, he just has to wait and wait for that TCP/IP message to finish.

  16. Can you explain how did Watson miss the final Jeopardy question? Did it not understand the "US" in US Cities?
    Thanks and it was very interesting to watch and can't wait for Watson V2.

  17. Please use a vision technique for reading question and catching the buzzer light. Right now, Watson has some advantages granted by IBM that even a robot cannot have. The buzzer battle made the guys frustrated on day 2. I am sure they knew a lot of the answers before buzzer enabled light.

  18. I do believe that way that Watson receives the clues does give him an advantage in the game, but I don't believe it gives him a $25K advantage.

    Many Jeopardy clues are easily solved within the first few words of the clue, when you take the category into context. If Watson is controlled by the same buzzer mechanism that Ken and Brad are, then Ken and Brad should reasonably know the answer to about 1/2 the clues in the same amount of time as Watson. Now, maybe the physical reflex of the computer is faster than a human's reflex can be. This could cause a distinct advantage to Watson.

    I do want to offer my congratulations to the entire IBM team. You have created something that is both awe-inspiring and frightening as heck. Even more, you have unveiled the technology in a way that is easily accessible to a vast number of people, sparking discussion and debate amongst people who may have never read a sci-fi novel, or who are not techie nerds. For that alone, you should be proud of yourselves.

  19. @Sung: http://asmarterplanet.com/blog/2011/02/watson-on-jeopardy-day-two-the-confusion-over-an-airport-clue.html

  20. Watson getting the input as text instead of visually is really no different than if humans compete against a hypothetical alien species that can speed-read thousands of times faster than humans. More generally, Watson and humans handle Jeopardy! so differently at so many levels, just about every difference can be called out as an advantage or disadvantage. I don't really know if there is such a thing as a "level playing field" for something like this. Better to view it as an entertaining experiment and a test of the strength and weaknesses of Watson's question-answering abilities.

    It's almost like comparing a human to a horse. The horse will always outrun the human in a simple raceway, but left in a maze without a rider, the human will likely find the way out much faster. And Jeopardy! in this case is like a weird mix of a simple raceway and a maze.

    Well, at least all these perceived unfairness means the humans can have a face-saving "out" when the machine claims victory to the challenge (based on what we see so far). ;p

  21. Unlike the victorious "John Henry", in this case, having lost to the new technology, the defeated Jennings and Rutter can live on to fight another day. The parallels of the thrusting action of 'driving steel' and 'pushing the button' in this man vs. machine challenge are tantalising!

    Kudos to the development team, and thanks for the entertainment.

  22. Fantastic comments.
    Please keep posting your comments and questions. We will be using some of them at the TED.com LIVE event “Final Jeopardy and the Future of IBM Watson” event tomorrow 2/17 at 11:30 am ET.
    Please tune in at http://www.ted.com/pages/view/id/593.
    Thank you,
    Kevin Winterfield

  23. So is the entire clue fed to the computer all at once? Maybe it should be typed in... as the "who can buzz in the fastest" contest was completely unfair. If this was a "fair" match, Ken and Brad would have been able to buzz in first more often (THIS IS FACT....random chance should have given them more chances to buzz in first). When Ken went on his streak of 74 his CLEAR advantage was his ability to buzz in 1st most of the time. I must admit the ability to answer the questions is very high tech and the next step in having a "star trek" type computer, but it was NOT fair... Why? Probably human response time VS computer response time plus the time for lights to "light up". I think the "light" players see should light up a fraction of a second sooner than "watson" gets the signal.. for a computer it probably got the signal to buzz in before the light had actually turned on. Footage of what the contestants see as far as a light would be very helpful in this regard. I would love to see a log of the buzzer times for each contestant to see how long it takes watson to buzz in after he receives the signal, and how long it takes the average jeopardy player to buzz in after seeing a light, or guessing when Alex finishes. Plus, I'd love to see how many times Brad and Ken attempted to buzz in on questions watson was able to win the buzzer contest on. Brad and Ken looked like they were helpless.. not because they didn't know the answers (because THEY DID) but because they didn't even have a chance to buzz in. Sorry, not impressed... computer had far to much of an advantage here...

  24. What a stunning accomplishment for IBM. I've dreamed of the day when we have vastly intelligent computers and it seems if IBM keeps working on this, it could be here soon.

    I'd like to see this applied to the medical field.

  25. This "grand challenge" was not about machine vision nor was it about human-vs-machine reaction time on the buzzer. The real accomplishment was interacting on another level than computers usually do, namely 0 and 1. Understanding ambiguous questions, reading through masses of human-written information, extracting the clues, weighting the risk of false answers... in real-time... that, to me, is the Watson Jeopardy challenge.

    As a knowledge worker I find it increasingly difficult to keep up with the omnipresent information flood, and I welcome every kind of machine help in that respect.

    I guess we will see that technology come up not only for medicine, but also in every call center, in financial analysis, in business analysis, in "blogosphere surveys", in patent research, etc.
    And why not also in economics and politics? Many false decisions based on too little evidence might have been avoided in the past.

  26. Watson is just an enormous Anti-Spam Engine!
    For those who understand how most anti-spam works, the systems similarly identify selected keywords, rank them according to an algorithm, and assigns a confidence level. Depending on the confidence level, various actions can then be initiated.

    And yes, anti-spam also can deploy a "learning" algorithm, in that case Bayesian Analysis.

    Kudos to the Design Team, it's a watershed on the path to the Singularity.

  27. Please view the replay from today's TEDtalk with members of the IBM Watson team for more information on Watson - http://www.ted.com/webcast/archive/event/ibmwatson


  28. Let's get real here. Effectively, when Watson so deemed, the Jeopardy! staff (electronically) pressed its button when they activated the lights. For the idea that Watson read all those text documents and responded from that, forget it. By the time the game was played, that information was not text anymore. Watson was really accessing a database. A database, I might add, that was much more than the text Watson was exposed to. As for the Jeopardy! clues being natural language, only in a remote stretch of the term. Sure they are English text, but they are highly structured word puzzles that almost nobody would use in natural conversation or composition. Try the Turing test if you want to demonstrate natural language. I wonder if the database would be much help in a conversation. I would think the ranking of possible responses (and also the actual responses) used for Jeopardy! play would be pretty useless. I imagine creating natural language is much harder than information retrieval.

  29. How did Watson's approach to behaviourism support egalitarian philosophies?