Reinforcement Learning

Exploring MENACE in APL

Michie’s MENACE used matchboxes to remember what had happened in previous games. Michie used one matchbox for each possible board configuration, and the player picked a coloured bead at random to decide what move MENACE would make.

How many relevant board configurations are there?

This isn’t a trivial question, since many board configurations could never be reached in play.

There’s an obvious upper limit, though. We can count all the board configurations without considering whether they could be reached.

Each square of the board can be empty, contain a nought or contain a cross. There are three possibilities for each of the nine cells, so there are three to the power 9 in total.

Here’s how we calculate that in APL.

Note

Exponentiation is represented by * in APL. APL uses × for multiplication.

3*9
done
19683

It’s interesting to know how many positions there are, but we’ll need to eliminate the ones that are impossible or redundant.

We need some way of representing a board position.

We can show each board position as a 3 by 3 matrix of characters: a × for a cross, a for a nought, and a . for an unfilled square. That means that an empty board does not just look like white space.

Suppose the first player placed an ‘x’ in the centre. We could create that board like this:

3 3'....×....'
... .×. ...

In APL we create character literals by enclosing their characters in single quotation marks. The result is a character vector (a list of characters).

We can create a matrix by asking APL to reshape the vector. stands for reshape; it reshapes the array on its right using the shape on its left.

That way of displaying a board position is great for humans, but there are better ways to represent a board in an APL program. The way we’ll do it is to represent a board as a numeric vector. The vector will have a zero for an unfilled position, a 1 for an × and a 2 for a .

The board we saw above would be represented by the vector 0 0 0 0 1 0 0 0 0.

How can we convert that to something we can visualise? We’ll need to use the vector as an index to an array of characters.

As you may know, the mathematical world is divided about how to index things. Pure mathematicians tend to count from 0, but applied mathematicians often count from 1.

Note

APL supports both approaches. You tell APL which you want to use by setting the index origin to 0 or 1.

In APL, the index origin is a system variable called ⎕io. The line below sets it to 0.

⎕io  0

As you can see, in APL, is used for assignment. The = symbol is used to test for equality.

Here’s the code to convert the vector 0 0 0 0 1 0 0 0 0 to a human-friendly board position.

3 3'.×○'[0 0 0 0 1 0 0 0 0]
... .×. ...

We’re going to use that expression whenever we want to turn a board vector into a human-friendly diagram.

We can use APL’s direct definition to create a function which we can use repeatedly.

show  {3 3'.×○'[]}
show 0 0 0 0 1 0 0 0 0 
... .×. ...

How can we generate all the 19683 board positions?

The first step is to generate all the numbers from 0 to 19682.

In APL we can do that using . IN the lines below, we’ll store the list of numbers in a variable called boards and then display the first 6 numbers just to check that APL did what we wanted.

boards  19683
boards[0 1 2 3 4 5]
0 1 2 3 4 5

That looks good.

We picked the first six elements of boards using indexing, but there’s an easier way. We can use APL’s take function.

6boards
0 1 2 3 4 5

How can we convert those numbers to board positions?

We can use a primitive (built-in) APL function called encode which converts an integer to a representation in any number base we chose. We’ll convert each number to its ternary (base 3) representation.

encoded  (93)boards

What’s the result? We won’t display it, since it contains a large array, but we can find out its shape.

encoded
9 19683

Earlier we used with a shape on the left, and an array on the right to reshape a vector into a matrix. Here we used with just an array on the right. Used that way, returns the shape of the array.

We can use to take 9 rows and 6 columns from the encoded matrix.

9 6encoded
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 2 0 1 2

Each column is a board position. It feels more natural to have the board positions as rows, and APL has a handy function transpose written which will do the job.

9 6encoded
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 2

We’re going to do a lot of encoding, so it makes sense to create an encode function we can use repeatedly.

In APL you can create a function by using a special syntax described below. It’s not the only way to define functions, but it’s the approach we will take thoughout this book.

Functions created this way are referred to as dfns.

Dfns are created by using braces ({ and } to include the APL code to be run when the function is invoked.

In APL, functions can take a single argument on the right or two arguments, one on the left and one on the right.

Functions that take a single argument to the right are monadic. Functions that sit between two arguments are dyadic.

Within a dfn the argument on the right is represented by the Greek letter omega and the left arument, if there is one, is represented by alpha .

Note

It’s possible to create ambivalent functions for which the left option is optional, though we do not need or use them in this book.

It is also possible to create anonymous (un-named) functions but throughout this book we will use assignment to associate the dfn by a name which we can use to invoke it.

For more information about dfns, see the APL Wiki.

We’ll also add comments to our code by using APL’s comment symbol, the lamp .

When APL encounters a lamp symbol in code it ignores the remainder of the line.

Two examples should claify the notation.

increment  {1 + } ⍝ adds one to each element of its argument
increment 2 4 6 8 10
3 5 7 9 11
add  { + } ⍝ adds its left and right arguments
2 add 3
5
2 4 6 add 1 2 5
3 6 11

Now let’s define our encode function

encode  {(93)} ⍝ integer(s) to board vector/matrix

Lets’ try that out.

encode 6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 2

Later on we will want to go in the opposite direction from a vector or vectors to the numbers that correspond to each one.

The function decode does that.

decode  {3⊥⍉} ⍝ convert position vector(s) to the unique number(s) corresponding to the position(s)
decode encode 6
0 1 2 3 4 5

What do these boards look like? Let’s try to use the show function we wrote earlier.

show encode 6
... ... ...

Oops!

That just shows the first board position, which is rather boring, and not what we wanted.

The current definition of show uses 3 3⍴ to generate its result, so we only ever get 9 characters arranged as a 3 by 3 matrix. We need to modify show so that it displays each board.

We’ll use APL’s rank operator to control how the show function is applied to its argument.

Functions And Operators

In APL a function returns data, but an operator returns a function.

The argument(s) to the operator are most often exisiting functions, primitive or defined, which are combined or modified to create a new function, but sometimes they are arrays.1

The function created by an operator will need one or two data arguments to transform into its result.

Some operators return monadic functions which take a single argument on their right.

Some return dyadic functions which take two arguments: one on the left of the function and one on the right.

Note

It’s also possible to create ambivalent functions for whihc the left option is optional, though we do not need or use them in this book.

The Rank Operator

The rank operator applies the function on its left to cells of its arguments to create the cells of its result. The scope of the cells are determined by the numbers on its right.

Rank is one of the most useful APL operators and Dyalog have published some great videos about its use:

A gentle introduction The Rank Operator

and (more specialised)

The Rank Operator and Dyadic Transpose

Advanced Use of The Rank Operator

In this case we want show to apply to each vector along the last dimension of the argument we provide.

3 3⍴⍤ 1 allows us to do that.

We’ll use the samefunction to separate the 1 used as an argument to rank from the expression that picks the characters to display.

show  {3 3 1 '.×○'[]}
show encode 6
... ... ... ... ... ..× ... ... ..○ ... ... .×. ... ... .×× ... ... .×○

Let’s look at the shape of show’s result.

show encode 6
6 3 3

show now converts each vector on its right to a matrix. If we apply show to a matrix with one board per row it returns a 3d array with one plane for each board configuration.

Sometimes it can be easier to see the positions going across the page rather than down it.

Here’s how we can make use of rank again to make our output neater.

2 show encode 6
┌───┬───┬───┬───┬───┬───┐ │...│...│...│...│...│...│ │...│...│...│...│...│...│ │...│..×│..○│.×.│.××│.×○│ └───┴───┴───┴───┴───┴───┘

The expression ⊂⍤2 converts the cube (or 3D array) of characters returned by show into a vector (a 1D array or list) of matrices (2D arrays).

It uses one new function enclose .

Enclose creates a scalar from its right argument. It allows you to create arrays of arrays.

Rank was described above.

Defining another useful function

In the discussion that follows, we’ll use the expression ⊂⍤2 show a lot, so let’s make it a function.

list  {2 show }
list encode 6
┌───┬───┬───┬───┬───┬───┐ │...│...│...│...│...│...│ │...│...│...│...│...│...│ │...│..×│..○│.×.│.××│.×○│ └───┴───┴───┴───┴───┴───┘

Summary

We’ve now created the tools we need to work on the next stage of our implementation of MENACE.

In the next notebook we will look at the relationship between symmetries (rotations and reflections) of board configurations.

We will want to use encode, decode and show so we will save our work ready to reuse in the next notebook.

)save notebook3 is an APL system command. It saves the functions and variables we’ve created in a workspace file called notebook2.dws.

In this case we need to use the -force option because the notebook software silently loads a workspace with a different name.

APL will not change that name unless we tell it to.

)save notebook1 -force
notebook1.dws saved Mon Jan 4 07:22:50 2021

1

As you have seen, rank takes a function on its left and a numeric argument on its right.