4.3 AI Applications - Minimax Algorithm

Author: John Rieffel, Union College

To Cite:

Rieffel, John. “AI Applications - Minimax Algorithm”. PDC for Beginners, edited by CSinParallel. 2022. Available Online. https://doi.org/10.55682/AOPD4880

4.3.0 Introduction

In many ways, the progress of Artificial Intelligence research can be measured through the history of human-competitive game play. In 1994, an AI called “Chinook” defeated checkers champion Marion Tinsley. In 1997, IBM’s Deep Blue defeated Gary Kasparov in a six-game match. In 2017, AlphaGo defeated Ke Jie, the world’s top-ranked Go player, earning the rank of 9 dan.

At the heart of these AI milestones is a simple algorithm called Minimax that dates back to John von Neumann (1928), although Claude Shannon (1950) is often given credit for recognizing its applications to games like chess. Minimax is an optimal strategy for deterministic two-player zero-sum games of perfect information.

Note

Deterministic means that there are no random events like dice that determine the state of the game. Zero-sum means that there is only one winner (such that both players’ final scores add up to zero). Perfect Information means that both players can see all the pieces in play (like chess or checkers), rather than keeping some information secret (like poker).

4.3.1 Example: Tic-Tac-Toe

The game of Tic-Tac-Toe (known as Naughts and Crosses in England) involves two players, designated by their respective marks, “X” and “O”. Each player takes turns placing their mark an empty spot within a 3x3 board. “X” typically goes first. The first player to have three marks in a row horizontally, vertically or diagonally wins. A draw occurs if there are no empty spaces an no player has won.

Tic-Tac-Toe is determinstic because there is no element of chance. It has perfect information because both players can see the board at all times. And it is zero-sum because if one player wins, the other loses. We can think of winning as awarding +1 point, and losing penalizing -1 points (and so the sum of both player’s scores will always be zero). Maximizing your score is therefore the same thing as minimizing your opponent’s score.

Here’s a board, let’s say you’re player X, where would you go to win?

../_images/ex-0-x-win-draw.png

Here’s another board - where should you move to maximize your score?

../_images/ex-1-X-lose-draw.png

4.3.5 Programming Implementations

We’ll now explore the code implementations for the serial and parallel approaches.

Note

Links to source code for these examples is provided at the end of the chapter.

Serial Implementation

Here’s how we would implement the minimax function serially in C (download the full code here). The board is a 1x9 array of integers whose values correspond to player pieces (0 is empty). The for loop iterates through all possible legal moves, keeping track of the best move it has found along the way. By using the value 1 to correspond to the maximizing player, and -1 to correspond to the minimizing player allows us to negate scores when passed from a min node to a max node (and vice versa).

//minimax in a single recursive function
// you call max if it is your move
// and min if it is your opponent's move.
int minimax(int * board, int player) {
    //How is the position like for player (their turn) on board?
    int winner = win(board);   //is the board a win?
    if(winner != 0) return winner*player; //base case

    int curbestmove = -1; //the best move possible
    int curbestscore = -2;//Losing moves are preferred to no move
    int i;
    for(i = 0; i < BOARDSIZE; ++i) {//For all moves,
        if(board[i] == 0) {//If legal,
            board[i] = player;//Try the move
	//    draw(board);
 //	    getchar();
            int thisScore = -1 * minimax(board, player*-1);
            if(thisScore > curbestscore) {
                curbestscore = thisScore;
                curbestmove = i;
            }//Pick the one that's worst for the opponent
            board[i] = 0;//Reset board after try
        }
    }
    if(curbestmove == -1) return 0;
    return curbestscore;
}

OpenMP Implementation

In this OpenMP implementation (download here) we distribute subtrees of legal moves cyclicly across all threads. A complication of this is that we need to make a private copy of the 9x9 board for each thread. Dynamic scheduling allows the first idle thread to pick up the next legal move. We need a critical section at the point where the threads compare their current move against the global best move.

void computerMove(int * board, int nthreads) {
    int bestmove = -1;
    int score = -2;
    int i;
    //printf("computer move:\n");
    //draw(board);
    #pragma omp parallel num_threads(nthreads) 
    {
        int *privateboard = malloc(9*sizeof(int));
        memcpy((void *)privateboard,(void *)board,9*sizeof(int));

        #pragma omp for schedule(dynamic,1)
        for(i = 0; i < BOARDSIZE; ++i) {
            if(privateboard[i] == 0) {
                privateboard[i] = 1;
                int tempScore = -minimax(privateboard, -1);
                privateboard[i] = 0;
                //critical to protect private variables
                #pragma omp critical
                if(tempScore > score) {
                     score = tempScore;
                     bestmove = i;
                }
            }
        }
    }
    //returns a score based on minimax tree at a given node.
    board[bestmove] = 1;
}

MPI Implementation

Our MPI implementation is very similar, although like all distributed memory applications we don’t need to worry about shared variables or critical sections.

Specifically, to calculate the computer’s next move we broadcast the current state of the board to all worker nodes. If there are p nodes, then each node uses a for loop to iterate through every pth legal move, and calls minimax on the corresponding subtree.

void computerMove(int * board, int rank, int p) {
    int move = -1;
    int score = -2;
    MPI_Bcast(board,BOARDSIZE,MPI_INT,0,MPI_COMM_WORLD);
    for(int i = rank; i < BOARDSIZE; i += p) {
        if(board[i] == 0) {
            board[i] = 1;
            int tempScore = -minimax(board, -1);
            board[i] = 0;
            if(tempScore > score) {
                score = tempScore;
                move = i;
            }
        }
    }
    int local_best[2] = {score, move};
    int global_best[2];
    MPI_Reduce(local_best,global_best,1,MPI_2INT,MPI_MAXLOC,0,MPI_COMM_WORLD);
    if (rank == 0) {
        board[global_best[1]] = 1;
    }
}

4.3.6 Further exploration

Code Links:

You have attempted of activities on this page