Starting To Play

In this chapter we’ll start work on the code that will run a game of Noughts and Crosses.

Types Of Player

The ultimate goal is to create a framework that makes it easy to run MENACE with a variety of different player types.

Eventually we’ll implement several types of players as APL functions.

We’ll also write a game_runner operator which will run a game with any combination of player type.

  1. A random_player will pick a legal move at random.

  2. A menace_player will learn by using MENACE’s strategy to chose which move to make.

  3. A human_player will enable a human to play the game.

  4. A perfect_player will make the best possible moves throughout the game.

In this chapter we will implement the random_player.

I don’t yet know a way to capture user input from a program running in an APL jupyter notebook, but it’s easy to do when running a program using Dyalog’s RIDE interface. Appendix B (currently under construction) will describe how to run a version of MENACE that also supports a human_player.

Each of these players will be implemented as an APL function which will take the current MENACE configuration as an argument, make a move and return an updated configuration as its result.

The Game Runner Code

We’ll plug the players into a game_runnner which we will implement as a defined operator.

Recall that in APL, a function returns data but an operator returns a function. Our game_runner will take a pair of player functions as arguments and return a function that will play an entire game.

That function will take an argument that captures the MENACE configuration and return an updated configuration as its result.

It’s possible to use a nested array to hold the configuraton data but that would requre the code to know positional information. In software we can dientify data by position or by name. In this case, using meaningful names will help readability. In APL we can use a new type of data structure called a *namespace’.

Namespaces in Dyalog APL

One use of namespaces is to represent hierarchical data structures.

We’ll create a two-level structure to represent a configuration. The top level will contain data like anticipated postions and their bead counts. It will contain a nested structure that has information about the game in progress.

Retaining data across multiple sessions

In this chapter we will assume that all the games are played in a single session. In the next chapter we will implement a simple way of retaining that data so that MENACE can learn across multiple sessions.

The MENACE configuration

The MENACE configuration will contain the information needed by the players and the game_runner.

Throughout a game all player types need to know the configuration for the current board. If we track the moves actually made during a game, the current board is simply the last of the moves in the current game, so the game history contains it.

Eventually we may want to examine MENACE’s performance over time. For that reason the configuration will maintain a history of wins, draws and losses. The game_runner will eventually need to maintain that information but the player functions will ignore it.

The original version of MENACE chose its moves and tracked their value by using beads in a matchbox. If a matchbox was used during a game its contents were adjusted when the game was over.

A player that applies MENACE’s strategy needs to access and update that information so we will eventually need to include it in the MENACE configuration. It’s not needed by a random_player so we will omit it for now.

In summary, then, we will start with a MENACE Namespace that contains two sorts of data:

  1. A history of the current game, held as a matrix of position vectors, one row per move.

  2. A vector of MENACE’s game results. A win is represented by 1, a draw by 0 and a loss by ¯1.

We’ll start with the usual dance :)

)copy notebook5
done
./notebook5.dws saved Wed Dec 30 11:42:23 2020
⎕io  0 

Now we can create the initial configuration as a Namespce. We do that by using the system function or quad-name ⎕ns.

The first element contains the current board position. Initially that should be an empty board. The second element contains the results of all games played, which should initially be set to be an empty vector.

config  ⎕ns ''
config.game  1 90
config.results  

Playing The Game

Now we can start work on the game_runner.

The game_runner will be something we haven’t met before: a defined operator or dop.

Recall that in APL a function returns data but an operator returns a function.

We’ve encountered two types of APL function: primitive functions like +, , which are built into the APL interpreter, and dfns, like initial_counts where we defined and named APL code so that we could reuse it wherever it was needed without repetition.

We’ve also encoutered a number of primitive operators, like rank , reduction / and outer product ∘..

game_runner needs to cope with different types of player, and we will do that by providing player functions as arguments to game_player.

A typical game would look like this: config menace_player game_runner random_player config where config is the MENACE configuration data we discussed above and menace_player and random_player are the functions we desicribed at the beginning of this chapter.

Let’s start by defining a random_player as it’s simpler to implement.

We’ll invoke the function by providing a menace configuration as its argument. That contains the current board configuration, and the random_player function should

  1. pick a legal move at random, and then

  2. update the configuration appropriately and return it as the result.

The second step is a little more compicated than it seems, because the player doesn’t know whether it’s playing × or . Luckily it can find that out from the current board position. If the numbers of ‘×’s and ’s are the same, it’s × to play. If not, it’s ’s turn.

Let’s build the code step by step.

g  config.game ⍝ find the game so far
cp  ,¯1g ⍝ find current position
0 0 0 0 0 0 0 0 0
e  0=cp   ⍝ boolean vector of empty board positions. A 1 means the position is available
c  +/e   ⍝ how many empty squares are there? (At the start of the game, all of them).
9

The player neets to pick an empty position at random.

Surprise, surprise! APL has a primitive function to do that.

Roll is represetned by ?. It returns a random number in the range specied by its right argument. So ?5 will return a number between zero and four (inclusive), assuming that ⎕io is zero.

We’ll use it to pick a move at random and then update the game.

i  e ⍝ find indices of the empty positions - these are the legal moves
np  {1+>/+/1 2∘.=} ⍝ who is the next player? 1 for ×, 2 for ○
cp[i[?⍴i]]  np cp ⍝ pick a legal move at random and update the current position
cp
0 0 0 1 0 0 0 0 0

Next, we’ll update the current game by appending the new position to the matrix of moves and put the updated game into the configuration.

config.game  cp

Now we’ll create functions to access and update the configuration. These hide the internal structure of the configuration, making it easier to change later if we need to.

cpos  {,¯1} ⍝ current position from game

We can combine all those steps into the random_player function.

]dinput
random_player  { ⍝ make a legal but random move
cp  cpos .game
i  e0=cp
cp[i[?+/e]]  np cp
.game  cp ⍝ is that OK?
 }

Let’s try it out.

config.game  1 90
config.results  
config  random_player config
config.game
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

If we run random_player again it will make a move for ×.

config  random_player config
config.game
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0

It would get rather tedious if we had to manually run random_player every time we wanted to make another move. APL has a power operator which we can use to execute something repeatedly.

The code below creates a fresh configuration and then runs the random_player 9 times.

config.game  (1 90)
config.results  0
config(random_player  9) config
config.game
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 2 0 1 0 0 1 0 0 2 2 0 1 0 0 1 0 1 2 2 0 1 0 0 1 0 1 2 2 2 1 0 0 1 0 1 2 2 2 1 1 0 1 0 1 2 2 2 1 1 2 1 1 1 2 2 2 1 1 2

What happens if we try to play again?

config  random_player config
DOMAIN ERROR
random_player[3] cp[i[?+/e]]←np cp
                     ∧

The board is full and our code falls over. That’s a problem with our code.

There’s another problem with our design so far. There’s nothing to stop the game when a player has won.

Luckily these problems are easy to fix.

We can write a try operator to stop invoking the player as soon as the game is over. We saw above that our first attempt at play failed if we carried on beyond the 9th move.

More On Operators

Operators can be created by the programmer as dops.

Recall that a function in APL returns data, but an operator returns a function.

The argument(s) to the operator are often functions, but can be arrays.

The definiton of a dop uses its code and its arguments to create a new function.

Within a dop, ⍺⍺ refers to the left argument of the operator and ⍵⍵ to the right argument, if any.

As in the case of a dfn, the function that’s created will get its left argument (if any) from and its left arguemnt from .

draw  {0=+/0=} ⍝ is the game drawn?
gf  wf  draw ⍝ the game is finished if it's been won or the board is full
try {gf cpos .game:⍵  ⍺⍺ } ⍝ if the game is over, do nothing; otherwise invoke the player

Let’s repeat our new code 10 times and see if it copes OK.

init  {.game  (1 90)  .results  0   }
config  init ⎕ns ''
config({random_player try }  10) config
list config.game
┌───┬───┬───┬───┬───┬───┬───┬───┐ │...│...│...│.×.│.×.│.××│.××│.××│ │...│...│...│...│.○.│.○.│○○.│○○×│ │...│..×│.○×│.○×│.○×│.○×│.○×│.○×│ └───┴───┴───┴───┴───┴───┴───┴───┘

Great! Even when we run it ten times, the code stops adding moves as soon as the game is over.

At the moment we can only play games with random_players, but later we will want to use other types of player. We’ll write a play_round operator which will take two functions, one the left, one on the right, and create a function that

  1. takes a configuration as its argument,

  2. asks the function on the left to make a move,

  3. asks the function on the right to make a move, and then

  4. returns the updated configuration.

Here it is.

play_round  {⍺⍺ try    ⍵⍵ try   }
config  init ⎕ns ''
config  {random_player play_round random_player } config 
config.game
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
config  {random_player play_round random_player } config 
config.game
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 1 0

We need to update the history before we can write game_runner.

It will update the confiuration history with the result of the last game and return the updated configuration.

We’ll start by writing result to find the result of the last game.

result will check if the game was drawn, and if so it will return a 0. If not, it will check the number of empty squares.

If the count is even, the x played last and must have won, so it will return a 1. If the count is odd, played last and the game was lost, so it will return a ¯1.

We’ll use the primitive function residue (written |) to see if the number of empty squares is even or odd.

empty_count  { +/0=cpos  }
result  { draw cpos ⍵: 0  1 ¯1[2|empty_count ]}

Now we can develop game_runner.

It will start by setting a new starting postion in the configuration it’s given.

game_runner will repeatedly play a round uop to 9 times leaving the game unchanged if it has been won.

Then it will update the configuration with the game result and return the updated configuration.

]dinput
game_runner  { ⍝ play a game given two players and an initial configuration
.game  1 90
tmp  (⍺⍺ play_round ⍵⍵)9
.results , result .game

}
config  init ⎕ns ''
config  random_player game_runner random_playerconfig
config.game
config.results
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1 0 1 0 0 0 2 0 0 1 2 1 0 0 0 2 0 0 1 2 1 0 1 0 2 0 0 1 2 1 0 1 2 2 0 0 1 2 1 1 1 2 2 0 0 1 2 1 1 1 2 2 2 0 1 2 1 1 1 2 2 2 1
0 0
config  random_player game_runner random_playerconfig
config.game
config.results
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 2 0 2 0 1 1 0 0 0 2 0 2 1 1 1 0 2 0 2 0 2 1 1
0 0 ¯1

Let’s play twenty more games and see the results.

config  {random_player game_runner random_player}20config
config.results
0 0 ¯1 0 1 0 0 1 ¯1 ¯1 ¯1 1 ¯1 0 ¯1 1 0 1 0 1 ¯1 0 0
)save notebook6 -force
notebook6.dws saved Tue Jan 5 08:29:54 2021