Cognitive Psychology
Perception in Chess
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive psychology, 4(1), 55-81.
This paper develops a technique for isolating and studying the perceptual structures that chess players perceive.
The basic lesson: to figure out your chunks for some material, copy it by glancing occasionally. Each glance indicates where you memorized a new chunk. Use this to figure out the exact chunks and their sizes.
Experiment: Copy a Program
Program: sudoku.lhs
from http://www.cs.nott.ac.uk/~pszgmh/sudoku.lhs
Method: Record myself using recordmydesktop
.
Instruction: Copy three functions beginning with prune
.
prune :: Matrix Choices -> Matrix Choices
prune = pruneBy boxes . pruneBy cols . pruneBy rows
where pruneBy f = f . map reduce . f
reduce :: Row Choices -> Row Choices
reduce xss = [xs `minus` singles | xs <- xss]
where singles = concat (filter single xss)
minus :: Choices -> Choices -> Choices
xs `minus` ys = if single xs then xs else xs \\ ys
Video length: 2m35s
Time between glances at source file: (Log once at start of glance and end of glance)
- 0:00:02 :: start
- 0:00:36 :: setup stuff; including `prune ::` and `prune =`
- 0:00:43 :: start
- 0:00:46 :: end (pruneBy rows . pruneBy cols)
- 0:00:52 :: start
- 0:00:55 :: end (where pruneBy = )
- 0:01:04 :: start
- 0:01:07 :: end (f . map reduce . f)
- 0:01:12 :: start
- 0:01:13 :: end (add f to lhs)
- 0:01:16 :: start
- 0:01:17 :: end (reduce :: Row Choices -> Row Choices)
- 0:01:23 :: start
- 0:01:26 :: end (reduce xss =)
- 0:01:30 :: start
- 0:01:31 :: end ([xs `minus` singles | xs <- xss])
- 0:01:41 :: start
- 0:01:44 :: end (where xs =)
- 0:01:50 :: start
- 0:01:51 :: end (fix error; singles = concat ())
- 0:01:56 :: start
- 0:01:58 :: end (filter simple xss)
- 0:02:03 :: start
- 0:02:04 :: end (minus ::)
- 0:02:06 :: start
- 0:02:08 :: end (Choices -> Choices -> Choices)
- 0:02:13 :: start
- 0:02:14 :: end (xs `minus` ys =)
- 0:02:18 :: start
- 0:02:21 :: end (if single xs then xs else xs \\ ys)
- 0:02:29 :: check
- 0:02:30 :: end
Convert to seconds:
- 2 :: start
- 36 :: setup stuff; including `prune ::` and `prune =`
- 43 :: start
- 46 :: end (pruneBy rows . pruneBy cols)
- 52 :: start
- 55 :: end (where pruneBy = )
- 64 :: start
- 67 :: end (f . map reduce . f)
- 72 :: start
- 73 :: end (add f to lhs)
- 76 :: start
- 77 :: end (reduce :: Row Choices -> Row Choices)
- 83 :: start
- 86 :: end (reduce xss =)
- 90 :: start
- 91 :: end ([xs `minus` singles | xs <- xss])
- 101 :: start
- 104 :: end (where xs =)
- 110 :: start
- 111 :: end (fix error; singles = concat ())
- 116 :: start
- 118 :: end (filter simple xss)
- 123 :: start
- 124 :: end (minus ::)
- 126 :: start
- 128 :: end (Choices -> Choices -> Choices)
- 133 :: start
- 134 :: end (xs `minus` ys =)
- 138 :: start
- 141 :: end (if single xs then xs else xs \\ ys)
- 149 :: check
- 150 :: end
Glance interval lengths:
[3,3,3,1,1,3,1,3,1,2,1,2,1,3]
Mean glance interval = 2s
Standard deviation = 0.92s
Each glance seems to take around 2s, which is what the chunking theory predicts:
The pieces placed on the board by the subject in the perception task after a single glance correspond to a single chunk. About 2 sec is required to recognize a chunk and store a label for it in short-term memory.
Writing interval lengths:
[6,9,5,3,6,4,10,6,5,5,2,5,4,8]
Mean writing interval = 5.5s
Standard deviation = 2.1s
What do my chunks look like?
pruneBy rows . pruneBy cols
where pruneBy =
f . map reduce . f
<add f to lhs>
reduce :: Row Choices -> Row Choices
reduce xss =
[xs `minus` singles | xs <- xss]
where xs =
<fix error> singles = concat
filter simple xss
minus ::
Choices -> Choices -> Choices
xs `minus` ys =
if single xs then xs else xs \\ ys
Reading the Paper
Mnemonic: “between-glance” = there’s a glance between the two pieces; “within-glance” = there’s no glance in between
An examination of the z scores shows that the between-glance probabilities are much closer to the chance levels than are the within-glance probabilities. In contrast, the within-glance probabilities are higher than chance for pairs of pieces with several relations, and lower than chance for pairs with few relations. In particular, the relations AP, DC, DPC, PCS, and DPCS have high probabilities, C, S, and null (-) relations have lower- than-chance probabilities.
So, pieces in the same chunk are closely related while pieces in successive chunks are not.
Question: How do they calculate the deviation score Z? It’s not Z = (Po - Pe) / SE.
Thus, from the within-glance relations, it appears that subjects are noticing the pawn structure, clusters of pieces of the same color, and attack and defense relations over small spatial distances.
Apparently, subjects are noticing the same kinds of structures in the random positions as in the game positions even though such structures are rare in the random positions.
Recalling from Memory
Question: Why should people recalling chess pieces do so in 2s chunks?
In the perception task, they glanced at the source chess board, loaded their working memory with as much information as they could comfortably carry, and unloaded it on the target board. So, the time they took for each glance told you the size of their working memory. But why would they take the same time when recalling stuff from memory? It’s not like they’re loading their working memory… again… wait…
Hypothesis: They’re loading information from long-term memory. We assume that time taken to load a “chunk” from long-term memory is the same as the time taken to load a “chunk” from the senses. So, the time taken between the last piece of one chunk and the first piece of the next tells you the loading time.
Note that the two loading times could have been different. Loading from long-term memory could have been much faster (or vice versa). But it doesn’t seem to be.
Also note that there doesn’t seem to be some kind of queue where you can partially load pieces of the next chunk as soon as you’re done with the first pieces of the previous chunk so that there’s never a gap in production. Instead, you wait till you’re done with one chunk before you start with the next. (This could be because you need to pay attention to the label of a chunk to recall its contents, and since your attention is on using the pieces of the previous chunk, you aren’t able to load the next chunk.)
Experiment: Recall an Unfamiliar Function
Program: HughesPJ.hs
Method: Record myself using recordmydesktop
.
Instruction: Memorize fill1
in 3.5m.
14 lines of code.
What it actually was:
fill1 :: Bool -> RDoc -> Int -> [Doc] -> Doc
fill1 _ _ k _ | k `seq` False = undefined
fill1 _ NoDoc _ _ = NoDoc
fill1 g (p `Union` q) k ys = fill1 g p k ys `union_` (aboveNest q False k (fill g ys))
fill1 g Empty k ys = mkNest k (fill g ys)
fill1 g (Nest n p) k ys = nest_ n (fill1 g p (k - n) ys)
fill1 g (NilAbove p) k ys = nilAbove_ (aboveNest p False k (fill g ys))
fill1 g (TextBeside s sl p) k ys = textBeside_ s sl (fillNB g p (k - sl) ys)
fill1 _ (Above {}) _ _ = error "fill1 Above"
fill1 _ (Beside {}) _ _ = error "fill1 Beside"
How I remembered it:
fill1 :: Bool -> RDoc -> Int -> [Doc] -> Doc
fill1 _ _ k _ | k `seq` False = undefined
fill1 _ NoDoc _ _ = NoDoc
fill1 g Empty k ys = mkNest_ k (fill g ys)
fill1 g NestAbove {n = n, p = p} k ys = nest_ n (fill1 g p (k - n) ys)
fill1 _ NilAbove _ _ = something
fill1 _ NilBeside _ _ = something
fill1 _ Above {} _ _ = error
fill1 _ Beside {} _ _ = error
What I missed:
fill1 g (p `Union` q) k ys = fill1 g p k ys `union_` (aboveNest q False k (fill g ys))
fill1 g (NilAbove p) k ys = nilAbove_ (aboveNest p False k (fill g ys))
fill1 g (TextBeside s sl p) k ys = textBeside_ s sl (fillNB g p (k - sl) ys)
Time to memorize: 3m30s
Time log (start and end of each chunk):
- 0:03:33 :: start writing
- 0:03:37 :: end
- 0:03:43 :: start type
- 0:03:47 :: end
- 0:03:48 :: start int to end
- 0:03:50 :: end
- 0:03:53 :: start fill1
- 0:03:56 :: end
- 0:03:58 :: start k
- 0:03:59 :: end
- 0:04:04 :: start k seq
- 0:04:08 :: end
- 0:04:11 :: start undefined
- 0:04:13 :: end
- 0:04:16 :: start lhs
- 0:04:17 :: end
- 0:04:18 :: start nodoc
- 0:04:22 :: end
- 0:04:25 :: start _
- 0:04:26 :: end
- 0:04:28 :: start nodoc
- 0:04:29 :: end
- 0:04:32 :: start copy
- 0:04:34 :: end
- 0:04:35 :: start empty
- 0:04:36 :: end
- 0:04:41 :: start nestabv
- 0:04:44 :: end
- 0:04:46 :: start nildoc
- 0:04:47 :: end
- 0:04:54 :: start nilabov
- 0:04:58 :: end
- 0:05:00 :: start above
- 0:05:01 :: end
- 0:05:02 :: start beside
- 0:05:03 :: end
- 0:05:06 :: start error
- 0:05:07 :: end
- 0:05:08 :: start error
- 0:05:09 :: end
- 0:05:15 :: start
- 0:05:20 :: end
- 0:05:21 :: start
- 0:05:25 :: end
- 0:05:27 :: start
- 0:05:28 :: end
- 0:05:31 :: start
- 0:05:32 :: aend
- 0:05:42 :: start
- 0:05:43 :: end
- 0:05:54 :: start
- 0:05:55 :: ned
- 0:05:56 :: start
- 0:05:58 :: end
- 0:06:00 :: start
- 0:06:04 :: end
- 0:06:05 :: start
- 0:06:06 :: end
- 0:06:09 :: tart
- 0:06:10 :: end
- 0:06:22 :: start
- 0:06:23 :: end
- 0:06:28 :: start
- 0:06:30 :: end
- 0:06:32 :: tart
- 0:06:33 :: end
- 0:06:36 :: start
- 0:06:37 :: end
- 0:06:40 :: start
- 0:06:46 :: end
- 0:06:52 :: start
- 0:06:54 :: end
- 0:07:33 :: astart
- 0:07:35 :: end
In seconds:
213 start writing
217 end
223 start type
227 end
228 start int to end
230 end
233 start fill1
236 end
238 start k
239 end
244 start k seq
248 end
251 start undefined
253 end
256 start lhs
257 end
258 start nodoc
262 end
265 start _
266 end
268 start nodoc
269 end
272 start copy
274 end
275 start empty
276 end
281 start nestabv
284 end
286 start nildoc
287 end
294 start nilabov
298 end
300 start above
301 end
302 start beside
303 end
306 start error
307 end
308 start error
309 end
315 start
320 end
321 start
325 end
327 start
328 end
331 start
332 aend
342 start
343 end
354 start
355 ned
356 start
358 end
360 start
364 end
365 start
366 end
369 tart
370 end
382 start
383 end
388 start
390 end
392 tart
393 end
396 start
397 end
400 start
406 end
412 start
414 end
453 astart
455 end
460 dummy start
Writing intervals (= end - start
):
[4,4,2,3,1,4,2,1,4,1,1,2,1,3,1,4,1,1,1,1,5,4,1,1,1,1,2,4,1,1,1,2,1,1,6,2,2]
Mean = 2.1s
Standard deviation = 1.4s
Median and mode = 1s
Recall intervals (= start of this chunk - end of previous chunk
):
[6,1,3,2,5,3,3,1,3,2,3,1,5,2,7,2,1,3,1,6,1,2,3,10,11,1,2,1,3,12,5,2,3,3,6,39,5]
Mean = 4.5s
Standard deviation = 6.3s
Median and mode = 3s
I was struggling to recall anything. Needless to say, the final program doesn’t really work.
We should also note in passing that errors usually occur toward the end of the protocol; subjects usually report first what they know, and fast, and these errors generally have long latencies and few relations.
Amen. In the end, all I could do was add fill1 _ NilAbove _ _ = something
.
Continuing with the Paper
The processes that occur during an interval of more than 2 set between the placing of two pieces appear to be significantly different from the proc- esses that occur during an interval of less than 2 sec.
One gets a piece from the same chunk and the other gets it from a new chunk.
Recall that in the perception task there was a systematic difference in the duration of the glances as a function of chess skill, with less time being taken by the more skilled players. Rut the average number of pieces per glance did not vary systematically as a function of chess skill.
…
Thus, it appears that the chunks are about the same size in both tasks, but that chess skill is reflected in the speed with which chunks are perceived in the perception task and the size of the chunks in the memory task.
Chunk size and Memory span
The general practice of the subjects was to place first those groups of pieces they thought they remembered well, then to search memory for additional pieces. During the first phase, placing pieces in recall without “problem solving,” chunks tended to be relatively large and errors relatively few. During the second phase, pieces tended to be placed one by one (pawns sometimes by pairs or triads), time being taken for deliberation between pieces. Errors were relatively frequent, and in many instances the player appeared to be determining where pieces ought to be (i.e., where they would function well, or where they are often posted in actual games), rather than recalling where he had actually seen them.
I filled in Empty
and NilAbove
at the end, because I was least sure of these.
We observe, second, that the average number of chunks for each subject is well within the memory span, as hypothesized; but, contrary to our expectation, the number of chunks is related to chess skill.
Hmm… Was I holding the function in working memory? It didn’t feel that way. Well, it wasn’t like a phone number that I would forget if I thought about something else.
Observation: I took 3.5 minutes to memorize that function. The chess players in the experiment had 5 seconds to memorize the chess board.
So, I’d been building stronger associations in my mind.
Experiment: Unfamiliar function, short time
Program: HughesPJ.hs
Method: Record myself using recordmydesktop
.
Instruction: Memorize fullRender
in 5s.
I took around 8s.
Actual code:
fullRender OneLineMode _ _ txt end doc = easy_display space_text txt end (reduceDoc doc)
fullRender LeftMode _ _ txt end doc = easy_display nl_text txt end (reduceDoc doc)
fullRender the_mode line_length ribbons_per_line txt end doc
= display the_mode line_length ribbon_length txt end best_doc
where
best_doc = best the_mode hacked_line_length ribbon_length (reduceDoc doc)
hacked_line_length, ribbon_length :: Int
ribbon_length = round (fromIntegral line_length / ribbons_per_line)
hacked_line_length = case the_mode of
ZigZagMode -> maxBound
_ -> line_length
My answer:
fullRender OneLiner = easy_display ...
fullRender SomethingElse = easy_display ...
fullRender FullOn = display ...
where best_display = ...
foo = ...
bar = ...
baz = ...
LOL.
Observation: Missed the parameters completely. Missed the fact that the last argument to easy_display was actually a function call.
It was best_doc
not best_display
. Didn’t even notice the case statement at the end. Nor the type signature. Didn’t even have time to understand that the last three lines were just calculating some kind of length. Or that best_doc
just calculated the best document and passed it to display
.
In short, I didn’t figure out what the function really did.
But I did notice the overall structure: + do some simple thing for OneLineMode + a similar simple thing for LeftMode, and then + some complex thing for the general mode (using some smaller functions).
I inferred that easy_display
was for simple special cases, and display
for the general case.
That’s all.
EPAM-like Models of Recognition and Learning
Feigenbaum, E. A., & Simon, H. A. (1984). EPAM-like models of recognition and learning. Cognitive Science, 8, 305-336.
[FAMILIARIZE] begins by sorting an object through the net. If no image is found, an initial image is created containing only the information that was examined along the path to the net leaf. However, if an image ex- ists, FAM checks object and image for differences. If no differences are found, at least to the extent of the (possibly incomplete) information in the
So, the human mind might create a new category using only the information that it used to examine the new object.
TODO: example?
If differences [between the object and our proposed category] are found, that means a confusion (failure to discriminate adequately) exists, and the discrimination net must be aug- mented. The strategy of DISC is to modify the discrimination net as little as possible to accomplish the augmentation. The existing net tests are used to the extent possible to accomplish the discrimination. Starting with the root node of the net, the tests are applied to both object and discrepant image. If a test is encountered for which the outcomes differ, which can happen if the image is attached to a n.e.c. branch, the object and the image are each given a new branch appropriately labeled with its test value. Thus, discrimination learning is forced to the highest possible levels of the existing tree before any new subtrees will be added. For an understanding of subsequent sections of this paper, it is important to keep in mind that in the n-ary nets of EPAM III, net growth is biased toward breadth, rather than depth.
A Study of Thinking
Examples that a categorization algorithm must explain:
People learn to distinguish conceptually between daylight color film and indoor color film, between different cuts of meat, between fresh vegetables and stale ones, between policemen and subway guards, between detergents and soap flakes, between honest and crooked politicians, between a flow of traffic that permits crossing the street safely and a flow that one should not risk. How may one go about analyzing the learning process that leads to such rational behaviour?
– A Study of Thinking, Bruner et al, 1956
They’ve covered some very real, very “complex” categories that we humans recognize every single day. None of these is a toy example. Some of them hold a lot of value too.
Note that these are categories that I can’t recognize even when I try to. Others I can recognize when I pay attention, such as perfectionism or opportunity costs or resentment.
Virtually all the effective strategies for attaining concepts depend upon the use of some sort of initial focus.
Hmm… We might remember the abstract definition using a concrete exemplar. For example, the book suggests that we use 3 as a reference prime number, noticing that it is divisible only by itself and 1.
The authors suggest that we then form a “typical instance” of the category to help us “keep order” (?). For example, subjects were apparently better able to point to the typical colour of an orange than the range of colours of oranges.
Scope Insensitivity
But what about the billions of other children in the world? Why isn’t it a bad idea to help this one child, when that causes the value of all the other children to go down? How can it be significantly better to have 1,329,342,410 happy children than 1,329,342,409, but then somewhat worse to have seven more at 1,329,342,417?
Observation: It takes me time to differentiate between 1,329,342,409 and 1,329,342,417. I have to read past eight similar digits (“1,329,342,4”) before I can come to the part that actually differs (“09” vs “17”).
Hypothesis: Scope insensitivity <- can’t differentiate large numbers because they are largely similar and we don’t invest the time necessary to dig down into the differing bits.
Hypothesis: If we had a notation that emphasized the differing parts, maybe we would not be so scope-insensitive after all.
TODO: Find a way to differentiate large numbers in a way that appeals to the human mind.
What about scientific notation? 6.023 x 10^23 focusses on the first four digits of a large number. Planck’s constant (6.626 x 10^-34 J.s) focusses on the last four decimal digits of a number with 37 decimal digits (or something like that).
Experimental Lessons
Lesson: Keep the source and target buffers in separate workspaces so that you don’t copy by staring.
comments powered by Disqus