2.13.2011

Watson’s wagering strategies

Editor’s note: This guest post from IBM Researcher Dr. Gerald Tesauro is the third article in a three-part series about how Watson plays America’s favorite quiz show®.

Daily Doubles and Final Jeopardy! are often the most critical junctures of a Jeopardy! game; the amount wagered can make a big difference in a player’s overall chances to win. How does Watson decide on the amount?



Daily Double wagering

In principle, to compute the best Daily Double (DD) bet, a player must answer two basic questions:

(1) How likely am I to answer the DD clue correctly?

(2) How much will a given bet increase or decrease my winning chances when I get the DD right or wrong?


Match Play

The Watson-Jeopardy Challenge is spread over two games, with combined totals determining the winner. This style of play requires different strategies than a typical game. Final Jeopardy! of game one is analogous to “half time,” so requires different strategies by all competitors, compared to when game two is the last chance to win.

Humans are at best only able to make crude estimates of these quantities. By contrast, Watson uses advanced mathematical models that can answer both questions with far greater precision than humans can achieve.

To address the first question, Watson uses an “in-category DD confidence” model. Based on thousands of tests on historical Jeopardy! categories containing DDs, the model estimates Watson’s DD accuracy, given the number of previously seen clues in the category that Watson got right and wrong.

Watson tackles the second question by using a Game State Evaluator (GSE), a complex regression model that estimates Watson’s winning chances at any stage of the game, given the information set that describes the current game state (for example, the scores of the three players, the number of remaining clues, the value of remaining clues, and the number of remaining DDs).

The GSE was trained over the course of millions of simulated Jeopardy! contests pitting Watson vs. two simulated human opponents. The human opponent models in these simulations capture important statistical profiles of human contestants, such as how often contestants attempt to buzz in; how often they are right when they win the buzz; their accuracy on DDs and Final Jeopardy!.

Optimal wagering

By combining the GSE with the in-category DD confidence, Watson can compute an overall expected chance to win the game for any given DD bet. This analysis runs for every legal betting amount – from the $5 DD minimum, to its entire bankroll for a True Daily Double – to come up with an optimal amount. The calculation also uses risk analytics to trade off expected winning chances against the risk of a particular bet.

Watson’s resulting bet might seem unusual, in that it frequently may be far more aggressive, or far more conservative, than typical human bets. The amount may also take on non-round values (i.e., not an exact multiple of $100). Such values may make the arithmetic a little more challenging for the humans when computing their bets.

Final Jeopardy! wagering

In calculating a Final Jeopardy! (FJ) wager, Watson first needs to know if it is playing a single game or a two-game match [see Call out box: Match Play]. In the latter case, Watson will use very different strategies for game one and game two. The analysis for game one is similar to Daily Double analysis: Watson uses a statistical model of likely human bets, human FJ accuracy, and Watson’s FJ accuracy to calculate its expected winning chances for every legal bet. It then selects the bet giving the best risk-adjusted chance to win the match.

While there are no previously revealed clues in the FJ round, Watson does obtain evidence of its likely FJ accuracy from the category title. Given the title, Watson first computes several salient features via Natural Language Processing analysis. It then consults a “FJ prior accuracy” regression model, based on Watson’s performance on thousands of historical FJ categories, to predict Watson’s accuracy given the category features.


Wagering in game two of a match is similar to FJ in ordinary games. The predominant consideration is score positioning (first, second or third place). In some cases, the contestants may need to use strategic reasoning as in games like Rock-Paper-Scissors – predict the opponents’ bets, while taking into account the fact that the opponents are also trying to predict their bets.

Watson has been programmed with a library of known FJ strategy rules, such as Two-Thirds Betting and Shore’s Conjecture. The research team also added novel rules for some special situations which we discovered.[1]

Depending on the situation, Watson will either bet according to a suitable strategy rule, or it will run a real-time simulation to calculate the best bet, among all legal bets. For the match with Ken and Brad, Watson will also take into account the prize values for second place ($300,000) and third place ($200,000), leading to a different objective than simply trying to win the match.


[1] One such rule in ordinary FJ applies when the leader’s score exactly equals the sum of the other two players’ scores, for example, if Watson has $20,000 and the two humans have $13,000 and $7,000. Watson would normally bet $6,001, to win by $1 when the second place player doubles her score. However, in this case Watson will bet $6,000 to tie for first place. The reason is that if Watson bets $6,001 and is wrong, it gives the third place player a chance to win by $1 ($14,000 to $13,999) if the second place player is wrong.

25 comments:

  1. Claiming discovery on first = second + third? That's the only major flaw in the article -- that concept has been well-known in Jeopardy! circles for years. Otherwise, it's great.

    ReplyDelete
  2. It would be interesting to know which novel wagering scenario situations the Watson team discovered. The "first equals second plus third" scenario described in the footnote is not one of them, having been known to Jeopardy! enthusiasts and implemented in the J! Archive wagering calculator for many years, predating the Watson project. The wagering calculator, which does not use any machine learning or otherwise statistics-based analysis but instead consists of a very large if-statement tree, has identified over 129 basic wagering scenarios, and there are many that properly call for a bet to tie. Among them:

    A = 2(B - C) (the "Faith Love" scenario)
    Exact fractions (2/3, 3/4, 4/5...)
    A = 2B - C ("Evenly spaced scores")
    A = B + C ("First equals second plus third")
    A = B + C/2, B != C ("First equals second plus half of third")
    A = 2C, C < (2/3)B ("First is twice third, with third less than two-thirds second", discovered by Jeopardy! Message Board user slam)
    A = 2C, B = 2C ("first and second both twice third")
    B + C = 3/2 A ("Second plus third equals three-halfs first", discovered by Jeopardy! Message Board user Gneq with additional conditions suggested by Jeopardy! Message Board user K703)
    A = B = C ("Three-way tie")
    A = B, C > A/2 (The "tortiose and the hares" scenario)
    A = B, C < B/2 (The probably mis-named "Prisoner's dilemma" scenario)
    All the various "lock-tie" scenarios where A = B/2

    Hopefully, the Watson team will find proper occasion and forum to publish their wagering findings in full, as they would be beneficial to all Jeopardy! enthusiasts and game theory buffs.

    All the best,
    Robert K S

    ReplyDelete
  3. Great article, Gerald! As you can see from the comments, this is an area of intense interest among hardcore Jeopardy fans. I found it interesting that many of your strategies confirmed my own thoughts about how to attack aspects of the game. It demonstrates that the same good ideas can be developed independently by different people. I'm sure that's also the case with the "first equals second plus third" scenario. I've spent a lot of time on the J! Archive and I think it's great, but I was not aware of that scenario since it is not specifically documented on the site. The site's wagering calculator may have 129 scenarios, but to the user, it tends to be a black box. It is not always clear what happens between the input and the output.

    I agree with Robert that it would be interesting to read a lot more detail about the strategies that Watson uses. I think some of Watson's techniques may influence the way humans play the game in the future. Robert is justifiably proud of all his work to document and preserve Jeopardy! history. I hope the Watson team will be equally generous in sharing what they have learned.

    ReplyDelete
  4. I thought I discovered a really nice restaurant a few days ago. Now that I've read the previous comments I see that I was mistaken.

    Is there any chance IBM can develop a method by which people with internet access can learn to understand human language?

    ReplyDelete
  5. This is amazing! Last night we were laughing at the $1246 wager, and so I'm happy to know how it came about.

    ReplyDelete
  6. Does Watson start gathering related data as soon as FJ topic is shown or waits through the commercial break and for the answer (clue) to be shown before starting?

    ReplyDelete
  7. When do you think we can realistically expect to see blog posts from Watson himself (at least genuine capability)? Not self aware yet, but comment and summarize his own experiences and what he (it) learned from interacting with humans? I think if you add a "heartbeat" to Watson constantly interacting with people and learning from them with a purpose of better understanding people it may even get one step closer to being truly self aware (of course depends on algo limitations inside).

    ReplyDelete
  8. @SteveK:

    I can't speak for the Watson team, but I highly doubt Watson is designed in such a way that he can produce interesting blog posts. I'm sure the closest thing he has to a memory are log files listing the various heuristics that offered up possible answers and their scores, as well as various other debugging information - fascinating stuff to us programmers, completely incomprehensible to the untrained, and chock-full of trade secrets :)

    ReplyDelete
  9. I'm curious on a similar but broader question than @kamlesh asked... if Watson has more time in general, does he use it? e.g. someone beats Watson to the buzzer and then gets it wrong -- is Watson still refining results? Similar question for the Final Jeopardy question and the final wager.

    I'm thinking along the lines of some chess-playing computers where the more time they are given the further through the possibilities they search.

    ReplyDelete
  10. I found Rick's comment about not having enough time to complete the processing interesting. In show #3 you could clearly see the human contestants buzzed in before they had completely determined their answer. Perhaps a strategy around probability of getting this question correct to ring in early. [Maybe similar to the betting heuristics.]

    All in all - AWESOME performance by Watson and the human players.

    ReplyDelete
  11. Fantastic comments.
    Please keep posting your comments and questions. We will be using some of them at the TED.com LIVE event “Final Jeopardy and the Future of IBM Watson” event tomorrow 2/17 at 11:30 am ET.
    Please tune in at http://www.ted.com/pages/view/id/593.
    Thank you,
    Kevin Winterfield
    Editor

    ReplyDelete
  12. I'm willing to wager its more likely a random number generator

    ReplyDelete
  13. I'm really interested in Watson's "buzz threshold" -- are the algorithms similar to the wagering strategies? I took a stab at it on my blog (article is linked); am I close?

    Cheers,
    David

    ReplyDelete
  14. Very Interesting blog post, thank you.

    Wagering was the one place where I felt Watson didn't pass the Turing test during the game. (In other words, could Watson pass for a human player to another human judging only on its responses made during the game?)

    I think even Watson's flubs could appear as human mistakes if one only read a transcript of the games without any text mentioning Watson as a computer. Even the final jeopardy US Cities answer of 'Toronto' appeared to me like a joke response from someone who didn't know the real answer.

    But Watson's wagers were the one thing that seemed distinctly non-human to me. To me, the wagers gave the best clue that Watson's intelligence was not human.

    I think that's why the audience would chuckle a bit whenever Watson made wagers. The wager values seemed like something a machine would calculate rather than values human intelligence would calculate. It was like a little slip up, where Watson revealed itself as non-human.

    But then again, I still have my doubts about Ken Jennings being a real human, too. :-)

    ReplyDelete
  15. I'm a little confused about Watson's wager last night...It seems that he wagered enough so that if he got the question wrong and Ken had doubled his score then Watson would've lost. I understand that Watson takes into account his confidence in the category, other players' bets, and his goal of winning the game. Given his bet, he was obviously confident in the category. However, even if he was 99.999% sure of getting it right, couldn't he have wagered 0 dollars and guaranteed himself a win?

    I'd understand his wager if the goal was dollar maximization but essentially all dollars were relative last night as each player was awarded a dollar amount based on where they finished in relation to other players. Given this, why did he choose such a wager?

    ReplyDelete
  16. @Rick Carter - Not that I know for certain, but I'd wager (but only in even $100 increments - I am human after all) that Watson does, in fact, continue refining it's answer because I did notice a few instances where Ken or Brad would ring in, and while they were answering, Watson's "guesses" would alter once or twice in that time.

    The thing I was wondering throughout, since Watson was SO quick to be able to ring in, was at what point was it fed the question: as soon as the text on screen was revealed and Alex started reading; some time in the duration of the reading; or not until Alex finished reading. I couldn't help but think that it was being fed the text for the question as soon as it was revealed and Alex started reading, which if true gave Watson an "unfair" (or at least unrealistic) advantage. A computer can consume and begin processing a text file virtually instantaneously. The humans were limited to reading and/or hearing the answer read, and only at some point a second or two later would they have consumed enough information to begin processing.

    ReplyDelete
  17. I didn't quite understand that final bet either. Watson had such a large lead, there was no reason to bet anything. Also, he could have bet all of his points because the closest that the 2nd place person could come was like 32K which was still smaller than the first day total for Watson. I think he was going for brownie points in winning the round (that day's game). It may appear to be poor sportsmanship to pile it on like that, but it did provide a chance for the other players to win the round. None of this probably had anything to do with the calculations - just my anthromorphisizing my new found hero - Watson.

    ReplyDelete
  18. I am very skeptic about Watson. A few questions:
    1) We know that the question is transmitted via an interface to Watson. So my question when is the transmission made? In a nano second after the question is posted? So by the time humans read and understand the question, a computer already has an answer. Is that fair?
    2) How does it so accurately pick out the Daily Doubles? This leads me to think that the whole concept of Daily Doubles not being random needs to be re-evaluated.
    3) Game 1 Final Jeopardy - Watson loses and the bet was pennies while Game 2 Final Jeopardy - Watson has a huge lead, yet bets a substantial amount and wins. So my question: Is Watson placing a bet before the question is revealed like all humans or after?
    I might sound like a conspiracy theorist but I am not. Just trying to clear some confusion.

    ReplyDelete
  19. Please view the replay from today's TEDtalk with members of the IBM Watson team for more information on Watson - http://www.ted.com/webcast/archive/event/ibmwatson

    Kevin
    Editor

    ReplyDelete
  20. @Abeat:
    If Ken had elected to double his money, he would have had a 2-day total of $41200.

    If Watson had answered incorrectly, his 2-day total would have been $41201.

    ReplyDelete
  21. all I think about when I read about this is : T2.

    ReplyDelete
  22. @ Jay-Milwaukee
    1. There's a bunch of info out there about the buzzer timing, though I haven't looked at it yet. Whatever the answers, this was intended as an exhibition, so I don't see a need to be "fair", but more importantly, "fair" cuts both ways. Human players can read the clue themselves and begin their thought process as soon as it's revealed. There's a limitation on how soon players can ring in, but they can have an answer sooner. Sending the text file when Alex finishes the question might be considered unfair to Watson.

    2. Why do you think Watson was "so accurate" in picking daily doubles? Very little about Jeopardy is random, but let's imagine it is. In that case each of 3 players could be expected to get 2 of the 6 daily doubles in the two games and choose 40 of the 120 total questions.

    Now consider the progress of the game. When the 2nd round starts 1 of 3 players has already found the first DD. Let's assume the 2nd DD is found by a different player. There's now a 2 in 3 chance that the final DD will be found by a player who found a previous DD and a 1 in 3 chance it will be found by the remaining player. That means that each player finding their "fair share" of DD's is only half as likely as one player finding at least twice their "fair share".

    Now consider that the game isn't random. Unless each player is equal one of them should do better than the others. If one player is doing well and choosing more than 1/3 of the questions they're more likely to find each DD than either of their opponents. I forget what the sequence was for the two games, but Watson certainly chose more than 1/3 of the questions. We already know that the most likely outcome is that one player will find at least 2 of the DD's so it shouldn't be too surprising if it was Watson.

    So what was the probability of Watson's success at finding DD's? Ken Jennings got at least one DD, and I'm almost certain Brad Rutter got at least one, so at most Watson got 4 of the 6. Without an exact count, I'd guess that Watson chose at least 50% of the questions, so we could expect Watson to have gotten at least 3 of the 6. If Watson chose 55% of the questions we could expect him to have found 55% of daily doubles, but it's impossible to actually find 3.3 of them. In that case the result HAD to be statistically improbable, but only a little bit improbable. OTOH, if Watson chose at least 80 of the 120 questions 4 of the 6 DD's is exactly what we should have expected. In the end there was absolutely nothing unusual about the result even if it was slightly improbable.

    3. In the first game, even with a high confidence in a correct answer, a small wager for FJ guaranteed a substantial lead after the first day. That seems like a sensible strategy for any player with a big lead and so-so confidence in a correct answer.

    ReplyDelete
  23. Excellent word you've done boys... I think something particular, but in general is OK...

    When do you think we can realistically expect to see blog posts from Watson himself (at least genuine capability)? Not self aware yet, but comment and summarize his own experiences and what he (it) learned from interacting with humans? I think if you add a "heartbeat" to Watson constantly interacting with people and learning from them with a purpose of better understanding people it may even get one step closer to being truly self aware (of course depends on algo limitations inside).

    ReplyDelete
  24. is Pascal Wager a good strategy to trick God into letting you enter heaven?

    ReplyDelete