6.1 AI Applications - Minimax Algorithm

Author: John Rieffel, Union College

To Cite:

Rieffel, John. “AI Applications - Minimax Algorithm”. PDC for Beginners, edited by CSinParallel. 2022. Available Online. https://doi.org/10.55682/AOPD4880

6.1.0 Introduction

In many ways, the progress of Artificial Intelligence research can be measured through the history of human-competitive game play. In 1994, an AI called “Chinook” defeated checkers champion Marion Tinsley. In 1997, IBM’s Deep Blue defeated Gary Kasparov in a six-game match. In 2017, AlphaGo defeated Ke Jie, the world’s top-ranked Go player, earning the rank of 9 dan.

At the heart of these AI milestones is a simple algorithm called Minimax that dates back to John von Neumann (1928), although Claude Shannon (1950) is often given credit for recognizing its applications to games like chess. Minimax is an optimal strategy for deterministic two-player zero-sum games of perfect information.

Note

Deterministic means that there are no random events like dice that determine the state of the game. Zero-sum means that there is only one winner (such that both players’ final scores add up to zero). Perfect Information means that both players can see all the pieces in play (like chess or checkers), rather than keeping some information secret (like poker).

6.1.1 Example: Tic-Tac-Toe

The game of Tic-Tac-Toe (known as Naughts and Crosses in England) involves two players, designated by their respective marks, “X” and “O”. Each player takes turns placing their mark an empty spot within a 3x3 board. “X” typically goes first. The first player to have three marks in a row horizontally, vertically or diagonally wins. A draw occurs if there are no empty spaces an no player has won.

Tic-Tac-Toe is determinstic because there is no element of chance. It has perfect information because both players can see the board at all times. And it is zero-sum because if one player wins, the other loses. We can think of winning as awarding +1 point, and losing penalizing -1 points (and so the sum of both player’s scores will always be zero). Maximizing your score is therefore the same thing as minimizing your opponent’s score.

Here’s a board, let’s say you’re player X, where would you go to win?

../_images/ex-0-x-win-draw.png

Here’s another board - where should you move to maximize your score?

../_images/ex-1-X-lose-draw.png

6.1.5 Programming Implementations

We’ll now explore the code implementations for the serial and parallel approaches.

In both cases, the central recursive minmax function that is run for a starting legal board move configuration down to its leaves is as follows:

//minimax in a single recursive function
// you call max if it is your move
// and min if it is your opponent's move.
int minimax(int * board, int player) {
    //How is the position like for player (their turn) on board?
    int winner = win(board);   //is the board a win?
    if(winner != 0) return winner*player; //base case

    int curbestmove = -1; //the best move possible
    int curbestscore = -2;//Losing moves are preferred to no move
    int i;
    for(i = 0; i < BOARDSIZE; ++i) {//For all moves,
        if(board[i] == 0) {//If legal,
            board[i] = player;//Try the move
	//    draw(board);   // debug
            int thisScore = -1 * minimax(board, player*-1);
            if(thisScore > curbestscore) {
                curbestscore = thisScore;
                curbestmove = i;
            }//Pick the one that's worst for the opponent
            board[i] = 0;//Reset board after try
        }
    }
    if(curbestmove == -1) return 0;
    return curbestscore;
}

The board is a 1x9 array of integers whose values correspond to player pieces (0 is empty). The for loop iterates through all possible legal moves, keeping track of the best move it has found along the way. By using the value 1 to correspond to the maximizing player, and -1 to correspond to the minimizing player allows us to negate scores when passed from a min node to a max node (and vice versa).

Serial Implementation

Here’s how we would implement the play that the computer does to calculate its best move from all the subtrees:

void computerMove(int * board) {
    int move = -1;
    int score = -2;
    int i;
    // This loop is the easiest part to parallelize
    for(i = 0; i < BOARDSIZE; ++i) {
        if(board[i] == 0) {
            board[i] = 1;
            int tempScore = -minimax(board, -1);
            board[i] = 0;
            if(tempScore > score) {
                score = tempScore;
                move = i;
            }
        }
    }
    //returns a score based on minimax tree at a given node.
    board[move] = 1;
}

OpenMP Implementation

In the OpenMP implementation we distribute subtrees of legal moves cyclicly across all threads. A complication of this is that we need to make a private copy of the 9x9 board for each thread. Dynamic scheduling allows the first idle thread to pick up the next legal move. We need a critical section at the point where the threads compare their current move against the global best move.

void computerMove(int * board, int nthreads) {
    int bestmove = -1;
    int score = -2;
    int i;
    //printf("computer move:\n");
    //draw(board);
    #pragma omp parallel num_threads(nthreads) 
    {
        int *privateboard = malloc(9*sizeof(int));
        memcpy((void *)privateboard,(void *)board,9*sizeof(int));

        #pragma omp for schedule(dynamic,1)
        for(i = 0; i < BOARDSIZE; ++i) {
            if(privateboard[i] == 0) {
                privateboard[i] = 1;
                int tempScore = -minimax(privateboard, -1);
                privateboard[i] = 0;
                //critical to protect shared variables
                #pragma omp critical
                if(tempScore > score) {
                     score = tempScore;
                     bestmove = i;
                }
            }
        }
    }
    //returns a score based on minimax tree at a given node.
    board[bestmove] = 1;
}

MPI Implementation

An MPI implementation is very similar, although like all distributed memory applications we don’t need to worry about shared variables or critical sections.

Specifically, to calculate the computer’s next move we broadcast the current state of the board to all worker nodes. If there are p nodes, then each node uses a for loop to iterate through every pth legal move, and calls minimax on the corresponding subtree.

void computerMove(int * board, int rank, int p) {
    int move = -1;
    int score = -2;
    MPI_Bcast(board,BOARDSIZE,MPI_INT,0,MPI_COMM_WORLD);
    for(int i = rank; i < BOARDSIZE; i += p) {
        if(board[i] == 0) {
            board[i] = 1;
            int tempScore = -minimax(board, -1);
            board[i] = 0;
            if(tempScore > score) {
                score = tempScore;
                move = i;
            }
        }
    }
    int local_best[2] = {score, move};
    int global_best[2];
    MPI_Reduce(local_best,global_best,1,MPI_2INT,MPI_MAXLOC,0,MPI_COMM_WORLD);
    if (rank == 0) {
        board[global_best[1]] = 1;
    }
}

Exercise for a challenge

  1. If you’d like a challenge, develop a complete MPI version from the above hint for what the computerMove function would be like.

6.1.6 Further exploration

Download the code files from the links below or from a GitHub repo for the CSInParallel Project. The code is inside a folder called BeginnerCommonApplications/minimax.

To see this code in action, you will need to get your own copy and run it on a terminal, because in the game you play the computer, seeing each new move that counters yours.

You have attempted of activities on this page