Implementing A Simplified MENACE Player¶

Tracking MENACE’s Beads¶

The original MATCHBOX-based MENACE remembered the lessons from past games using matchboxes and beads.

It used one matchbox for each board position that MENACE might enounter.
Each machbox was labelled. The label showed the position represented by the box, and showed each possible move from that position. Possible moves were shown in different colours.
Each matchbox contained beads of colours that matched those on the label. The beads determined the probability that MENACE would make the corresponding move.

In our implementation we will maintain the bead information as a two element vector, with each element an array.

The first element will be a vector of decoded (numerical) positions. The second element will be a matrix of bead counts, with one row per possible position and 9 columns.

Each column will corespond to one of the possible moves from the position, and will contain the number of virtual beads corresponding to that move. Some moves will not be possible because the sqare is already occupied. We will initialse the bead counts for such positions to zero, and the counts will never get updated, since the correponding moves will never be made.

TODO: Add illustration of a position (in vector and display form) and its bead counts

The values for possible moves will be initialised to some fixed number when a new MENACE configuration is created, and will then get adjusted at the end of each game by MENACE’s learning rule.

The bead data will form a new third element of the MENACE configuration.

MENACE’s Learning Rule¶

At the end of a game MENACE adjusted the bead counts.

If MENACE won it added three beads of the same colour matching those picked during the game.
If MENACE drew it added one bead of the appropriate colour to each matchbox used.
If MENACE lost it removed three beads of the appropriate colour from each matchbox used.

We’ll extend game_runner to do the same to our virtual bead counts.

Let’s start by copying in our earlier work.

)copy notebook6

done

./notebook6.dws saved Tue Jan 5 08:29:54 2021

⎕io ← 0

Simplifying And Extending MENACE’s Approach¶

In his original design for MENACE, Donald Michie eliminated redundant board positions from consideration and specified that MENACE always made the first move.

He did this in order to reduce the number of machboxes required to a manageable number. A secondary benefit was that one game could provide reinforement for a whole family of symetric versions.

Since we are representing the matchboxed by values in an array we are not constrained in the same way, and the time taken to train the system is also less of a problem.

We’ll also allow a menace_player to be the first or second player.

We would need more code to match a real game position to its canonical representation, so we won’t do so in this chapter. We could maintain a virtual matchbox for beads for every possible position, but many of those could never occur in play. We can reduce the 19683 positions considerably by using the ok function which checks the number of ×s and ○s.

Let’s see how many positions are ok.

all ← encode numbers← ⍳3*9 ⍝ all possible positions
ok←{0 1∊⍨-⌿+/1 2∘.=⍵}      ⍝ do positions have counts of × and ○ that could occur in play?
good ← (ok all)/numbers    ⍝ find numbers of all positions that are ok
⍴good                      ⍝ how many positions are ok?

6046

Six thousand and forty-six would be a lot of matchboxes, but an APL array of that size is not a problem on a modern computer.

We’ll maintain virtual bean counts for each position in good.

The bead data has three parts which we’ll store in the configuration Namespace.

The first is config.good - a vector of decoded positions.

The second is config.beads a matrix of bead counts, with one row for each position in good, and nine columns correspnding to the nine squares on the board. Each column shoud contain the initial bean count for those positions that are empty, and zero for each position that has already been filled.

We’ll also need to keep track of the beads that were selected by a menace player, so that we can adjust the bead counts when the game is over. We’ll look at how to do that below.

Let’s sart by initialising the virtual bead data. MENACE started with 20 beads in each matchbox, but we will make the starting count to a changeable parameter.

starting_count ← 20
bc ← ((⍴good),9)⍴ starting_count

Now we need to set the bead counts to zero for each position that has been filled.

We’ll do that in stages. First we’ll convert encode values in good, That will give us a matrix gbp with one good board position in each row and one column per square of the board.

Next we’ll look for the zeros in gbp, keep the corresponding positions of bead counts unchanged, and set the rest to zero.

gbp ← encode good ⍝ a matrix with one board position per row
zero ← 0=gbp ⍝ a matrix with a 1 for each non-empty position and a 0 for every empty one.
bc ← bc × zero

list 5↑gbp

┌───┬───┬───┬───┬───┐ │...│...│...│...│...│ │...│...│...│...│...│ │...│..×│.×.│.×○│.○×│ └───┴───┴───┴───┴───┘

5↑bc

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 0 20 20 20 20 20 20 20 0 20 20 20 20 20 20 20 20 0 0 20 20 20 20 20 20 20 0 0

That looks correct. Let’s turn that into a function initial_counts which will take an array of positions as its left argument and the number of intial counts as its right argument.

initial_counts ← { bc ← ((⍴⍺),9)⍴ ⍵ ⋄ bc × 0=encode ⍺}

⍴good initial_counts 20

6046 9

Now we can easily create the extended initial configuration.

We’ll write a function called initialise which will create the initial configuration.

initialise ← {(1 9⍴0) ⍬ ((⍺) (⍺ initial_counts ⍵)) ⍬}

]dinput
init ← { ⍝ create an initial configuration (TODO: update descritption above
config ← ⎕ns ''
config.game ← (1 9⍴0) 
config.results ← ⍬ 
config.good ← ⍺
config.beads ← ⍺ initial_counts ⍵ 
config.menace_moves ← ⍬
config }

config ← good init 20

Now we can start work on the implementation of menace_player.

Like the random_player, our menace_player function should expect a menace configuration as its argument, work out how to move, and return an updated configuration as its result.

It will work out which move to make by looking at the bead counts for each move it could make from the current position and chosing one at random. The probablilty of each possible move should reflect the proportion of beads allocated to that move.

We’ll also need to work out how to give the game_runner the data it needs to update the bead counts when the game is over.

Let’s start by writing a function that will pick a move index given a vector of bead counts.

beads ← 2 0 1 0 3 0 0 0 4
pick ← {(0,+\⍵)⍸?+/⍵}

pick uses two APL primitives that we have not used before.

Scan, represented by \, is with + used to create a running total.

Interval Index, represented by ⍸, allows us to find which position corresponds to the random number created by ?.

Let’s try out pick.

pick beads ⍝ NB: the result may change each time this is run!

4

Let’s do a more thorough analysis.

The code below creates a 100,000 element vector in which each element is the enclosed character vector pick beads. It then uses execute each ⍎¨ to execute each of those statements. It uses outer product equals ∘.= to create a table with a 1 for each occurrence of the nine possible values, and adds up each row using plus reduction +/. That gives a count of hom many times each possible value was chosen.

Then it divides that by the total which is the number of trials. That gives us a vector of probabilites.

For the given value of beads we would expect those to be 0.2 0 01. 0 0.3 0 0 0 0.4 and the results we get are close to the theoretical values.

picks ← ⍎¨100000⍴⊂'pick beads'
counts ← +/(⍳9)∘.= picks
counts÷+/counts

0.20051 0 0.0998 0 0.29799 0 0 0 0.4017

]dinput
menace_player ← { ⍝ make a move using menace's rules and save it for later bead update
cp ← cpos ⍵.game
counts ← ⍵.beads[⍵.good⍳decode cp;] ⍝ find the bead counts corresponding to the current position
cp[pick counts] ← move ← np cp ⍝ pick a move and update cp
⍵.game ⍪← cp
⍵.menace_moves,← ⊂(decode cp) move ⍝ add position and move to the menace-based moves for this game
⍵
}

config ← menace_player config
config.game

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

We’ll need to modify game_runner slightly. It needs to clear the menace_moves at the start of a game and to update the bead_counts at the end of a game.

The we’ll change the definition of game_runner and then create a new function update_counts_in for it to use. update_counts_in will take a configuration and return the updated comfiguration as its result.

]dinput
game_runner ← { ⍝ play a game given two players and an initial configuration
⍵.game ← 1 9⍴0
⍵.menace_moves ← ⍬
tmp ← (⍺⍺ play_round ⍵⍵)⍣9⊢⍵
⍵.results ,← result ⍵.game
update_counts_in ⍵
}

TODO: remove when working!

config.menace_moves

┌────┐ │81 1│ └────┘

adjustment ← { draw ⍵: 1 ⋄ ¯3 3[2|+/0≠⍵] }
c ← ⎕ns ''
adjustment 1 1 0 2 2 2 1 0 0

¯3

adjustment 1 1 1 2 2 0 0 0 0

3

adjustment 1 2 1 2 1 2 1 2 1

1

who ← {'○×'[2|empty w]} ⍝ who is the current player?
find_moves_for ← {⍺⌿⍵=who ⍺} ⍝ what moves were made by a player?
winner_at_end ← {}

]dinput
update_counts_in ← { ⍝ update the bead counts for moves made by a `menace_player`
⍝ find ajustment based on whether × won, drew or lost
a ← '×' ajustment cpos ⍵.game
⍝ find all menace moves by '×'

⍝ update them
⍝ do the same for ○
⍵ 
}

config ← menace_player game_runner random_player⊢config

VALUE ERROR: Undefined name: ajustment
update_counts_in[2] a←'×'ajustment cpos ⍵.game
                         ∧

config ← {menace_player game_runner random_player⊢⍵}⍣100⊢config

config.menace_moves

VALUE ERROR: Undefined name: p
update_counts_in[2] a←'×'ajustment p ⍵.game
                                   ∧

┌────┬──────┬──────┬───────┬───────┬───────┬───────┬───────┬───────┐ │27 1│1485 2│1486 1│14608 2│16795 1│16957 2│17200 1│17206 2│17215 1│ └────┴──────┴──────┴───────┴───────┴───────┴───────┴───────┴───────┘

TODO: move/update the text below.

That’s encouraging, but it points out our remaining problem: the current code uses bead counts but it does not update them so it is not learning.

Fixing that needs a bit of thought.

We can’t just change menace_player. If menace_player is the first player, and the second player wins, the meance_player will not be invoked after the win, so it can’t update the bead counts.

We can’t just change game_runner. To fix that we need to make two changes - one to game_runner and one to menace_player.

game_runner knows when a game is over, and can find the result from config. It needs to update bead counts for every move made by a

)save notebook7 -force

notebook7.dws saved Tue Jan 5 08:33:19 2021

MENACE Revisited

Implementing A Simplified MENACE Player¶

Tracking MENACE’s Beads¶

MENACE’s Learning Rule¶

Simplifying And Extending MENACE’s Approach¶