Implementing A Simplified MENACE Player

Tracking MENACE’s Beads

The original MATCHBOX-based MENACE remembered the lessons from past games using matchboxes and beads.

  1. It used one matchbox for each board position that MENACE might enounter.

  2. Each machbox was labelled. The label showed the position represented by the box, and showed each possible move from that position. Possible moves were shown in different colours.

  3. Each matchbox contained beads of colours that matched those on the label. The beads determined the probability that MENACE would make the corresponding move.

In our implementation we will maintain the bead information as a two element vector, with each element an array.

The first element will be a vector of decoded (numerical) positions. The second element will be a matrix of bead counts, with one row per possible position and 9 columns.

Each column will corespond to one of the possible moves from the position, and will contain the number of virtual beads corresponding to that move. Some moves will not be possible because the sqare is already occupied. We will initialse the bead counts for such positions to zero, and the counts will never get updated, since the correponding moves will never be made.

TODO: Add illustration of a position (in vector and display form) and its bead counts

The values for possible moves will be initialised to some fixed number when a new MENACE configuration is created, and will then get adjusted at the end of each game by MENACE’s learning rule.

The bead data will form a new third element of the MENACE configuration.

MENACE’s Learning Rule

At the end of a game MENACE adjusted the bead counts.

  1. If MENACE won it added three beads of the same colour matching those picked during the game.

  2. If MENACE drew it added one bead of the appropriate colour to each matchbox used.

  3. If MENACE lost it removed three beads of the appropriate colour from each matchbox used.

We’ll extend game_runner to do the same to our virtual bead counts.

Let’s start by copying in our earlier work.

)copy notebook6
done
./notebook6.dws saved Tue Jan 5 08:29:54 2021
⎕io  0

Simplifying And Extending MENACE’s Approach

In his original design for MENACE, Donald Michie eliminated redundant board positions from consideration and specified that MENACE always made the first move.

He did this in order to reduce the number of machboxes required to a manageable number. A secondary benefit was that one game could provide reinforement for a whole family of symetric versions.

Since we are representing the matchboxed by values in an array we are not constrained in the same way, and the time taken to train the system is also less of a problem.

We’ll also allow a menace_player to be the first or second player.

We would need more code to match a real game position to its canonical representation, so we won’t do so in this chapter. We could maintain a virtual matchbox for beads for every possible position, but many of those could never occur in play. We can reduce the 19683 positions considerably by using the ok function which checks the number of ×s and s.

Let’s see how many positions are ok.

all  encode numbers 3*9 ⍝ all possible positions
ok{0 1-+/1 2∘.=}      ⍝ do positions have counts of × and ○ that could occur in play?
good  (ok all)/numbers    ⍝ find numbers of all positions that are ok
good                      ⍝ how many positions are ok?
6046

Six thousand and forty-six would be a lot of matchboxes, but an APL array of that size is not a problem on a modern computer.

We’ll maintain virtual bean counts for each position in good.

The bead data has three parts which we’ll store in the configuration Namespace.

The first is config.good - a vector of decoded positions.

The second is config.beads a matrix of bead counts, with one row for each position in good, and nine columns correspnding to the nine squares on the board. Each column shoud contain the initial bean count for those positions that are empty, and zero for each position that has already been filled.

We’ll also need to keep track of the beads that were selected by a menace player, so that we can adjust the bead counts when the game is over. We’ll look at how to do that below.

Let’s sart by initialising the virtual bead data. MENACE started with 20 beads in each matchbox, but we will make the starting count to a changeable parameter.

starting_count  20
bc  ((good),9) starting_count

Now we need to set the bead counts to zero for each position that has been filled.

We’ll do that in stages. First we’ll convert encode values in good, That will give us a matrix gbp with one good board position in each row and one column per square of the board.

Next we’ll look for the zeros in gbp, keep the corresponding positions of bead counts unchanged, and set the rest to zero.

gbp  encode good ⍝ a matrix with one board position per row
zero  0=gbp ⍝ a matrix with a 1 for each non-empty position and a 0 for every empty one.
bc  bc × zero
list 5gbp
┌───┬───┬───┬───┬───┐ │...│...│...│...│...│ │...│...│...│...│...│ │...│..×│.×.│.×○│.○×│ └───┴───┴───┴───┴───┘
5bc
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 0 20 20 20 20 20 20 20 0 20 20 20 20 20 20 20 20 0 0 20 20 20 20 20 20 20 0 0

That looks correct. Let’s turn that into a function initial_counts which will take an array of positions as its left argument and the number of intial counts as its right argument.

initial_counts  { bc  ((),9)   bc × 0=encode }
good initial_counts 20
6046 9

Now we can easily create the extended initial configuration.

We’ll write a function called initialise which will create the initial configuration.

initialise  {(1 90)  (() ( initial_counts )) }
]dinput
init  { ⍝ create an initial configuration (TODO: update descritption above
config  ⎕ns ''
config.game  (1 90) 
config.results   
config.good  
config.beads   initial_counts  
config.menace_moves  
config }
config  good init 20

Now we can start work on the implementation of menace_player.

Like the random_player, our menace_player function should expect a menace configuration as its argument, work out how to move, and return an updated configuration as its result.

It will work out which move to make by looking at the bead counts for each move it could make from the current position and chosing one at random. The probablilty of each possible move should reflect the proportion of beads allocated to that move.

We’ll also need to work out how to give the game_runner the data it needs to update the bead counts when the game is over.

Let’s start by writing a function that will pick a move index given a vector of bead counts.

beads  2 0 1 0 3 0 0 0 4
pick  {(0,+\)⍸?+/}

pick uses two APL primitives that we have not used before.

Scan, represented by \, is with + used to create a running total.

Interval Index, represented by , allows us to find which position corresponds to the random number created by ?.

Let’s try out pick.

pick beads ⍝ NB: the result may change each time this is run!
4

Let’s do a more thorough analysis.

The code below creates a 100,000 element vector in which each element is the enclosed character vector pick beads. It then uses execute each ⍎¨ to execute each of those statements. It uses outer product equals ∘.= to create a table with a 1 for each occurrence of the nine possible values, and adds up each row using plus reduction +/. That gives a count of hom many times each possible value was chosen.

Then it divides that by the total which is the number of trials. That gives us a vector of probabilites.

For the given value of beads we would expect those to be 0.2 0 01. 0 0.3 0 0 0 0.4 and the results we get are close to the theoretical values.

picks  ¨100000⍴⊂'pick beads'
counts  +/(9)∘.= picks
counts÷+/counts
0.20051 0 0.0998 0 0.29799 0 0 0 0.4017
]dinput
menace_player  { ⍝ make a move using menace's rules and save it for later bead update
cp  cpos .game
counts  .beads[.gooddecode cp;] ⍝ find the bead counts corresponding to the current position
cp[pick counts]  move  np cp ⍝ pick a move and update cp
.game  cp
.menace_moves, (decode cp) move ⍝ add position and move to the menace-based moves for this game

}
config  menace_player config
config.game
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

We’ll need to modify game_runner slightly. It needs to clear the menace_moves at the start of a game and to update the bead_counts at the end of a game.

The we’ll change the definition of game_runner and then create a new function update_counts_in for it to use. update_counts_in will take a configuration and return the updated comfiguration as its result.

]dinput
game_runner  { ⍝ play a game given two players and an initial configuration
.game  1 90
.menace_moves  
tmp  (⍺⍺ play_round ⍵⍵)9
.results , result .game
update_counts_in 
}

TODO: remove when working!

config.menace_moves
┌────┐ │81 1│ └────┘
adjustment  { draw ⍵: 1  ¯3 3[2|+/0] }
c  ⎕ns ''
adjustment 1 1 0 2 2 2 1 0 0
¯3
adjustment 1 1 1 2 2 0 0 0 0
3
adjustment 1 2 1 2 1 2 1 2 1
1
who  {'○×'[2|empty w]} ⍝ who is the current player?
find_moves_for  {=who } ⍝ what moves were made by a player?
winner_at_end  {}
]dinput
update_counts_in  { ⍝ update the bead counts for moves made by a `menace_player`
⍝ find ajustment based on whether × won, drew or lost
a  '×' ajustment cpos .game
⍝ find all menace moves by '×'

⍝ update them
⍝ do the same for ○
 
}
config  menace_player game_runner random_playerconfig
VALUE ERROR: Undefined name: ajustment
update_counts_in[2] a←'×'ajustment cpos ⍵.game
                         ∧
config  {menace_player game_runner random_player}100config
config.menace_moves
VALUE ERROR: Undefined name: p
update_counts_in[2] a←'×'ajustment p ⍵.game
                                   ∧
┌────┬──────┬──────┬───────┬───────┬───────┬───────┬───────┬───────┐ │27 1│1485 2│1486 1│14608 2│16795 1│16957 2│17200 1│17206 2│17215 1│ └────┴──────┴──────┴───────┴───────┴───────┴───────┴───────┴───────┘

TODO: move/update the text below.

That’s encouraging, but it points out our remaining problem: the current code uses bead counts but it does not update them so it is not learning.

Fixing that needs a bit of thought.

We can’t just change menace_player. If menace_player is the first player, and the second player wins, the meance_player will not be invoked after the win, so it can’t update the bead counts.

We can’t just change game_runner. To fix that we need to make two changes - one to game_runner and one to menace_player.

game_runner knows when a game is over, and can find the result from config. It needs to update bead counts for every move made by a

)save notebook7 -force
notebook7.dws saved Tue Jan 5 08:33:19 2021