Reimplementing Blondie24

by Justin Skycak (@justinskycak) on March 06, 2022

Extending Fogel's tic-tac-toe player to the game of checkers.

This post is part of the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2022). Reimplementing Blondie24. In Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/reimplementing-blondie24/

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Fogel and Chellapilla’s Blondie24 was published over the course of two papers. Here we shall address the first paper, Evolving Neural Networks to Play Checkers without Relying on Expert Knowledge, published in 1999.

This first version of Blondie24 operated under similar principles as Fogel’s tic-tac-toe player, described previously. However, there are a number of important differences that are detailed below.

Neural Network Architecture

The neural net consists of the following layers:

Input Layer: $32$ linearly-activated nodes and $1$ bias node (checkers board has $64$ squares but only half of them are used)
First Hidden Layer: $40$ tanh-activated nodes and $1$ bias node
Second Hidden Layer: $10$ tanh-activated nodes and $1$ bias node
Output Layer: $1$ tanh-activated node

Additionally, there is a special node called a piece difference node whose activity is the sum of the $32$ input nodes. The piece difference node connects directly to the output node, bypassing all the layers. The connection, of course, has a variable weight that is learned by the network.

In total, there are $1742$ weights (including the weight of the piece difference node).

Converting Board to Input

This is similar to tic-tac-toe in that the player’s own regular pieces are labeled with $1,$ empty squares with $0,$ and opponent regular pieces with $-1.$ However, the player’s own king pieces are labeled with $K,$ and the opponent’s with $-K,$ where $K$ is a variable that is learned by the network.

Converting Output to Action

An action is chosen via the minimax algorithm using the following heuristic evaluation function. As the network learns, this heuristic evaluation function will become more accurate.

If a board state is a win or a loss, return $1$ or $-1$ respectively.
Otherwise, pass the board state as input to the neural network and return the activity of the output node.

The search depth is set to $d=4$ to allow for reasonable execution times.

Evolution Procedure

The initial population consists of $15$ networks with initial weights randomly chosen from the range $[-0.2, 0.2],$ mutation rates set to $\alpha = 0.05,$ and $K=2.$ Each network is initialized with the same number of nodes (as described earlier), which remains constant throughout the course of evolution (i.e. nodes are not added nor deleted).

Replication

The evolution procedure follows the same rules as those described previously when evolving neural network regressors.

However, in addition to updating the mutation rate and weights, $K$ is also updated through the following rule:

$\begin{align*} K^\text{child} = K^\text{parent} e^{ N(0,1) / \sqrt{2} } \end{align*}$

Note that $K$ is constrained to the range $[1,3],$ meaning that

if $K$ falls below $1$ then it is immediately set to $1,$ and
if $K$ rises above $3$ then it is immediately set to $3.$

Evaluation

Each of the $30$ networks in a generation plays a game of checkers against $5$ other networks randomly selected (with replacement) from the generation. The network is allowed to move first during each game, and it receives a payoff of $1$ for a win, $0$ for a tie, and $-2$ for a loss. (A tie is declared after $100$ moves by each player with no winner.)

The $15$ networks with the highest total payoffs are selected as the parents of the next generation.

Performance Curve

In their paper, Fogel and Chellapilla did not create a curve to demonstrate performance as a function of number of generations. Instead, they played their final network against human players online and demonstrated that it achieved an impressive performance rating.

Here, we will create a performance curve by playing the evolving networks against an external algorithmic strategy and measuring their performance. This can be accomplished as follows:

Develop a heuristic checkers player by hand that plays slightly intelligently. It should capitalize on obvious opportunities to move its pieces forward and jump opponent pieces, but it should not attempt to plan into the future.
During each generation of the evolutionary procedure, before replication, play each of the $15$ parent networks against your heuristic player and compute the average payoff.
Keep evolving new generations until the average payoff levels off.

The resulting plot should show that the average payoff increases with the number of generations (up to some point), demonstrating the that the evolving networks are learning to play checkers intelligently.

Keep in mind that your hand-crafted heuristic checkers player is not actually used during the evolution procedure – it is only used to measure how the evolved networks perform against an external opponent. So, anything that the evolving networks “learn” is organic and self-taught, not tailored to the specifics of your hand-crafted player.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.