Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings).pdf

Using Probabilistic Knowledge and Simulation to Play Poker

Darse Billings, Lourdes Peña, Jonathan Schaeffer, Duane Szafron

Department of Computing Science, University of Alberta

Edmonton, Alberta Canada T6G 2H1

{darse, pena, jonathan,

duane}@cs.ualberta.ca

Abstract

Until recently, artificial intelligence researchers who use games

as their experimental testbed have concentrated on games of

perfect information. Many of these games have been amenable to

brute-force search techniques. In contrast, games of imperfect

information, such as bridge and poker, contain hidden information

making similar search techniques impractical. This paper

describes recent progress in developing a high-performance poker-

playing program. The advances come in two forms. First, we

introduce a new betting strategy that returns a probabilistic betting

decision, a probability triple, that gives the likelihood of a fold,

call or raise occurring in a given situation. This component unifies

all the expert knowledge used in the program, does a better job of

representing the type of decision making needed to play strong

poker, and improves the way information is propagated

throughout the program. Second, real-time simulations are used to

compute the expected values of betting decisions. The program

generates an instance of the missing data, subject to any

constraints that have been learned, and then simulates the rest of

the game to determine a numerical result. By repeating this a

sufficient number of times, a statistically meaningful sample is

used in the program’s decision–making process. Experimental

results show that these enhancements each represent major

advances in the strength of computer poker programs.

search is not successful since it is often impractical to

search the game trees that result from all possible

instances of the missing information.

Two examples of imperfect information games are

bridge and poker. Recently, at least two research groups

have made an effort to achieve high-performance bridge-

playing programs [Ginsberg, 1999; Smith et al ., 1998].

The progress has been impressive, and we may not have

to wait long for a world-championship caliber program.

Until now, the computing community has largely

ignored poker (a recent exception being [Koller and

Pfeffer, 1997]). However, poker has several attributes that

make it an interesting and challenging domain for

mainstream AI research [Billings et. al ., 1998a].

We are attempting to build a program that is capable of

beating the best human poker players. We have chosen to

study the game of Texas Hold’em, the poker variation

used to determine the world champion in the annual

World Series of Poker. Hold’em is considered the most

strategically complex poker variant that is widely played.

Our program, Loki, is a reasonably strong player (as

judged by its success playing on the Internet) [Billings et.

al ., 1998a; 1998b]. The current limitation in the

program’s play is its betting strategy - deciding when to

fold, call/check, or raise/bet. A betting strategy attempts

to determine which betting action will maximize the

expected winnings (or minimize the losses) for a hand.

The previous version of Loki used several expert-

knowledge evaluation functions to make betting

decisions. These routines were rigid in the sense that they

always returned a single value: the “best” betting

decision. Although these evaluation functions allowed

Loki to play better than average poker, it was inadequate

to play at a world-class level, since continually upgrading

this knowledge is difficult and error-prone.

This paper introduces two major advances in the

capabilities of computer-poker-playing programs. Each is

shown experimentally to result in substantial

improvements in Loki’s play.

First, this paper introduces a new betting strategy that

returns a probability triple as the knowledge

representation of the evaluation function. The routine

returns three probabilities (one each for fold, call/check,

and raise/bet). The program can then randomly select the

betting decision in accordance with the probability triples.

Representing decisions as a probability distribution better

captures the type of information needed to perform well

in a noisy environment, where randomized strategies and

misinformation are important aspects of strong play. This

1. Introduction

Past research efforts in computer game-playing have

concentrated on building high-performance chess

programs. With the Deep Blue victory over World Chess

Champion Garry Kasparov, a milestone has been

achieved but, more importantly, the artificial intelligence

community has been liberated from the chess “problem”.

The consequence is that in recent years a number of

interesting games have attracted the attention of AI

researchers, where the research results promise a wider

range of applicability than has been seen for chess.

Computer success has been achieved in deterministic

perfect information games, like chess, checkers and

Othello, largely due to so-called brute-force search. The

correlation of search speed to program performance gave

an easy recipe to program success: build a faster search

engine. The Deep Blue team took this to an extreme,

analyzing roughly 250 million chess positions per second.

In contrast, until recently imperfect information games

have attracted little attention in the literature. In these

games, no player knows the complete state and each

player has to infer the missing information to maximize

the chances of success. For these games, brute-force

component also allows us to unify the expert knowledge

in a poker program, since the same component can be

used for betting decisions, opponent modeling, and

interpreting opponent actions.

Second, Loki now bases its betting strategy on a

simulation-based approach; we call it selective sampling .

It simulates the outcome of each hand, by generating

opponent hands from the sample space of all appropriate

possibilities, trying each betting alternative (call/check,

bet/raise) to find the one that produces the highest

expected winnings. A good definition of appropriate

hands is one of the key concepts in defining selective

sampling and it is one of the main topics of this paper. As

with brute-force search in chess, the simulation (search)

implicitly uncovers information that improves the quality

of a decision. With selective sampling, the knowledge

applied to a simulation quantifies the value of each

choice, improving the chance of making a good decision.

Simulation-based approaches have been used in other

games, such as backgammon [Tesauro, 1995], bridge

[Ginsberg, 1999], and Scrabble 1 [Sheppard, 1998]. The

simulation methods presented in this paper are quite

similar to those used by Ginsberg in Gib, although there

are several distinctions in the details, due to differences in

the games.

For deterministic perfect information games, there is a

well-known framework for constructing these

applications (based on the alpha-beta algorithm). For

games with imperfect information, no such framework

exists. For handling this broader scope of games we

propose that selective sampling simulation be such a

framework.

to act, one of three betting options is available: fold,

call/check, or raise/bet. There is normally a maximum of

three raises allowed per betting round. The betting option

rotates clockwise until each player has matched the

current bet or folded. If there is only one player remaining

(all others having folded) that player is the winner and is

awarded the pot without having to reveal their cards.

3. Building a Poker Program

A minimal set of requirements for a strong poker-

playing program includes assessing hand strength and

potential, betting strategy, bluffing, unpredictability and

opponent modeling. Descriptions of these as they are

implemented in our program, Loki, can be found in

[Billings et. al ., 1998a; 1998b]. There are several other

identifiable characteristics that may not be necessary to

play reasonably strong poker, but may eventually be

required for world-class play.

The architecture of the previous version of Loki, which

we now call Loki-1, is shown in Figure 1. In the diagram,

rectangles are major components, rounded rectangles are

major data structures, and ovals are actions. The data

follows the arrows between components. An annotated

arrow indicates how many times data moves between the

components for each of our betting actions.

To make a betting decision, the Bettor calls the Hand

Evaluator to obtain an assessment of the strength of the

current cards. The Bettor uses this hand strength, the

public game state data and expert-defined betting

knowledge to generate an action (bet, call or raise). To

evaluate a hand, the Hand Evaluator enumerates over all

possible opponent hands and counts how many of them

would win, lose or tie the given hand. After the flop, the

probability for each possible opponent hand is different.

For example, the probability that hole cards of Ace-Ace

are held after the flop is much higher than 7-2, since most

players will fold 7-2. Each possible hand has a weight in

the Weight Table for each opponent, and these weights

are modified after each opponent action. Updating the

probabilities for all hands is a process called re-weighting .

After each opponent action, the Opponent Modeler calls

the Hand Evaluator once for each possible hand and

increases or decreases the weight for that case to be

consistent with the new information. The Hand Evaluator

uses the Weight Table values to bias the calculation,

giving greater weight to the more likely hands. The

absolute values of the probabilities are of little

consequence, since only the relative weights affect the

later calculations.

Loki-1 uses expert knowledge in four places:

1. Pre-computed tables of expected income rates are

used to evaluate its hand before the pre-flop, and to

assign initial weight probabilities for the various

possible opponent hands.

2. The Opponent Modeler applies re-weighting rules to

modify the opponent hand weights based on the

previous weights, new cards on the board, opponent

betting actions, and other contextual information.

2. Texas Hold’em

A hand of Texas Hold’em begins with the pre-flop ,

where each player is dealt two hole cards face down,

followed by the first round of betting. Three community

cards are then dealt face up on the table, called the flop ,

and the second round of betting occurs. On the turn , a

fourth community card is dealt face up and another round

of betting ensues. Finally, on the river , a fifth community

card is dealt face up and the final round of betting occurs.

All players still in the game turn over their two hidden

cards for the showdown . The best five card poker hand

formed from the two hole cards and the five community

cards wins the pot. If a tie occurs, the pot is split.

Typically, Texas Hold’em is played with 8 to 10 players.

Limit Texas Hold’em has a structured betting system,

where the order and amount of betting is strictly

controlled in each betting round. 2 There are two

denominations of bets, called the small bet and the big bet

($10 and $20 in this paper). In the first two betting

rounds, all bets and raises are $10, while in the last two

rounds they are $20. In general, when it is a player’s turn

1 ™ Milton-Bradley company.

2 In No-limit Texas Hold'em, there are no restrictions on the size of

bets.

Opponent

Model

1081

Opponent Modeler

Each time it is Loki-2’s turn to bet, the Action Selector

uses a single probability triple to decide what action to

take (note that the Bettor is gone). For example, if the

triple [0.0,0.8,0.2] is given, then the Action Selector

would call 80% of the time, raise 20% of the time, and

never fold. The choice can be made by generating a

random number, allowing the program to vary its play,

even in identical situations. This is analagous to a mixed

strategy in game theory, but the probablility triple

implicitly contains contextual information resulting in

better informed decisions which, on average, can out-

perform a game theoretic approach.

The Triple Generator is responsible for generating

probability triples. As shown in Figure 2, this routine is

now at the heart of Loki-2. The Triple Generator takes a

two-card hand and calls the Hand Evaluator to evaluate

the cards in the current context. It uses the resulting hand

value, the current game state, and expert-defined betting

rules to compute the triple. Note that in addition to using

the Triple Generator to produce a triple for our known

hand, it can also be used to assess the likely behavior of

the opponent holding any possible hand.

For the Hand Evaluator to assess a hand, it compares

that hand against all possible opponent holdings. To do

this, it uses the opponent Weight Table. In Loki-2, the

Opponent Modeler now uses probability triples to update

this table after each opponent action. To accomplish this,

the Triple Generator is called for each possible two-card

hand. It then multiplies each weight in the Weight Table

by the entry in the probability triple that corresponds to

the opponent’s action. For example, suppose the previous

weight for Ace-Ace is 0.7 (meaning that if it has been

dealt, there is a 70% chance the opponent would have

played it in exactly the manner observed so far), and the

opponent now calls. If the probability triple for the current

context is [0.0, 0.2, 0.8], then the updated weight for this

case would be 0.7 x 0.2 = 0.14. The relative likelihood of

the opponent holding Ace-Ace has decreased to 14%

because there was no raise. The call value of 0.2 reflects

the possibility that this particular opponent might

deliberately try to mislead us by calling instead of raising.

Using a probability distribution allows us to account for

uncertainty in our beliefs, which was not handled by the

previous architecture. This process of updating the weight

table is repeated for each entry.

An important advantage of the probability triple

representation is that imperfect information is restricted to

the Triple Generator and does not affect the rest of the

algorithm. This is similar to the way that alpha-beta

search restricts knowledge to the evaluation function. The

probability triple framework allows the “messy” elements

of the program to be amalgamated into one component,

which can then be treated as a “black box” by the rest of

the system. Thus, aspects like game-specific information,

complex expert-defined rule systems, and knowledge of

human behavior are all isolated from the engine that uses

this input.

weight

table

70%

KK 65%

1081

Public

Game

State

. . .

Hand Evaluator

1081

entries

Hand

Betting

Rule-base

Bettor

Public

Data

Figure 1. The architecture of Loki-1.

3. The Hand Evaluator uses enumeration techniques to

compute hand strength and hand potential for any hand

based on the game state and the opponent model.

4. The Bettor uses a set of expert-defined rules and a

hand assessment provided by the Hand Evaluator for

each betting decision: fold, call/check or raise/bet.

This design has several limitations. First, expert

knowledge appears in various places in the program,

making Loki difficult to maintain and improve. Second,

the Bettor returns a single value (fold, call, raise), which

does not reflect the probabilistic nature of betting

decisions. Finally, the opponent modeler does not

distinguish between the different actions that an opponent

might take. A call/check versus a bet/raise gives valuable

information about the opponent’s cards. These issues led

to a redesign of how knowledge is used in Loki.

The new version of Loki, called Loki-2 makes two

fundamental changes to the architecture. First, it

introduces a useful, new data object called a probability

triple that is used throughout the program (Section 4).

Second, simulation with selective sampling is used to

refine the betting strategy (Section 5). Loki-2 can be used

with or without simulation, as shown in Figure 2. With

simulation, the Simulator component replaces the simpler

Action Selector.

fold, call

or raise

4. Probability Triples

A probability triple is an ordered triple of values, PT =

[ f , c , r ], such that f + c + r = 1.0, representing the

probability distribution that the next betting action in a

given context is a fold, call, or raise, respectively.

Probability triples are used in three places in Loki-2. The

Action Selector uses a probability triple to decide on a

course of action (fold, call, raise). The Opponent Modeler

uses an array of probability triples to update the opponent

weight tables. The Simulator (see Section 5) uses

probability triples to choose actions for simulated

opponent hands.

Private

Data

Opponent

Model

1081

O pponent Modeler

w eight table

70%

K K 65%

.2 .8

.3 .7

1081

Public

G ame

State

. . .

Triple Generator

1081 entries

H and Evaluator

H and

A ction Selector

Simulator

Betting

Rule-base

fold, call

or raise

Figure 2. Using the Triple Generator in Loki-2.

The current architecture also suggests future

enhancements, such as better methods for opponent

modeling. For example, the cards seen at the showdown

reveal clues about how that opponent perceived each

decision during the hand. These hindsight observations can

be used to adaptively measure important characteristics

like aggressiveness, predictability, affinity for draws, and

so forth. The Opponent Modeler can maintain each of these

properties for use by the Triple Generator, which combines

the information in proper balance with all the other factors.

The knowledge is implicitly encoded in the probability

distribution, and is thereby passed on to all components of

the system.

Since the more objective aspects of the game could

eventually be well defined, the ultimate strength of the

program may depend on the success in handling imperfect

information, and the more nebulous aspects of the game,

such as opponent modeling.

permit), the same approach is not feasible for real

imperfect information games because there are too many

possibilities to consider [Koller and Pfeffer, 1997].

Consider a 10-player game of Texas Hold’em. By the

time the flop cards are seen , some players may have folded.

Let’s assume one player bets, and it is Loki’s turn to act.

The program must choose between folding, calling or

raising. Which one is the best decision? 1

After the program’s decision, every other active player

will be faced with a similar choice. In effect, there is a

branching factor of 3 possible actions for each player, and

there may be several such decisions in each betting round.

Further, there are still two betting rounds to come, each of

which may involve several players, and one of many (45 or

44) unknown cards. Computing the complete poker

decision tree in real time is in general, prohibitively

expensive. Since we cannot consider all possible

combinations of hands, future cards, and actions, we

examine only a representative sample from the

possibilities. A larger sample and more informed selection

process will increase the probability that we can draw

meaningful conclusions.

5. Simulation-Based Betting Strategy

The original Bettor component consisted of expert-

defined rules, based on hand strength, hand potential, game

conditions, and probabilities. A professional poker player

defined the system as a first approximation of the return on

investment for each betting decision. As other aspects of

Loki improved, this simplistic betting strategy became the

limiting factor to the playing strength of the program.

Unfortunately, any rule-based system is inherently rigid,

and even simple changes were difficult to implement and

verify for correctness. A more flexible, computation-based

approach was needed.

In effect, a knowledge-based betting strategy is

equivalent to a static evaluation function. Given the current

state of the game, it attempts to determine the action that

yields the best result. If we use deterministic perfect

information games as a model, the obvious extension is to

add search to the evaluation function. While this is easy to

achieve in a perfect-information game such as chess

(consider all possible moves as deeply as resources

5.1 An Expected Value Based Betting Strategy

Loki-2’s new betting strategy consists of playing out

many likely scenarios to determine how much money each

decision will win or lose. Every time it faces a decision,

Loki-2 invokes the Simulator to get an estimate of the

expected value (EV) of each betting action (see the dashed

box in Figure 2 with the Simulator replacing the Action

Selector). A simulation consists of playing out the hand a

specified number of times, from the current state of the

game through to the end. Folding is considered to have a

zero EV, because we do not make any future profit or loss.

Each trial is played out twice—once to consider the

consequences of a check/call and once to consider a

1 “Best” is subjective. Here we do not consider other plays, such as

deliberately misrepresenting the hand to the opponents. The expected

value for a whole session is more important than the expected value for a

single hand.

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: