©1990, 1995 section list 7: Experiment 2 overview General Contents
Section 7.2.6 7.2 Analysis and results subsections Section 7.3

7.2.7 Deriving rules for contexts

Another aspect of the validity of the contexts that were derived in analysis is whether they themselves can be predicted in terms of the variables in the simulation. This would be crucial to the ability to simulate the performance of a human operator, since to use the rules from a particular context needs first to determine which context to use. At the same time, the question arises whether we can measure in some way the difference between the two subjects' context structures.

To these ends, the final intervals of both subjects were each divided into three sections, and data was prepared with many of the likely variables as attributes, and the contexts that we have used above as class values. The data sets were E1, E2, and E3 from MT's interval E, and H1, H2 and H3 from AJ's interval H. Both context structures were used with data from both subjects: that is, the data was put into both the representation derived from that data, and also what should have been a less well-fitting representation from the other subject. For each of the two representations, rules were induced on each of the 6 sets of data, and these rules were tested on each of the 6 sets.


Table 7.23:Testing representation rules using contexts from AJ


Table 7.24:Testing representation rules using contexts from MT

The results of this analysis are given in Tables 7.23 and 7.24. The first point to be recognised is that the leading diagonal of these tables must be discounted, since the training set and the test set are the same, giving much higher accuracy values.

The next point to consider is the comparison of the figures from the top left and bottom right quadrants (where the training set and test set come from the same subject) and figures from the top right and bottom left quadrants (where the training set and test set cross between the two subjects). On the whole, the figures for the crossed training and test sets are lower than for the homogeneous case. This suggests that the rules for the contexts differ between the two subjects, even in this case where the same context structure is being used to process the original data.

The third point worthy of consideration comes from a comparison between the two tables. Though it is difficult to pick out any very marked differences, It would appear that where the training set is prepared with its proper representation (H with AJ, E with MT), the distinction between the performance of own subject test sets and other subject test sets is more marked. Thus, in Table 7.23, having discounted the leading diagonal element, there is no clear difference between the performance of test sets on the training set E1, which is from MT's data. For the H training sets, however, there is in each case a marked difference between the E and H test set performance. This could be due to an appropriate context structure leading to clearer context selection rules, which in turn lead to clearer distinctions between individuals.

Looking at the table again, one sees that the performance with E2 as test set is consistently lower than for the other E test sets (suggesting an unrecognised cause operating), and this tendency contributes to the effect just described as the third point. But even discounting the E2 test set results, there is still a slight trend in the way described. If one discounts E2 as a reliable test set, then Table 7.24 shows the same pattern as the other table. But the figures supporting this third point are far from conclusive, so more evidence would be needed to establish this as an effect.

As we have already seen that different contexts can differ markedly from each other, it also makes sense to look at the performance of the rules induced for predicting each context separately. This was done for both subjects, and the results are presented in Tables 7.25 and 7.26.


Table 7.25:Accuracy of rules for contexts for AJ, for the last interval


Table 7.26:Accuracy of rules for contexts for MT, for the last interval

These two tables show to what extent rules can be constructed to predict the context itself, from the attributes included in the analysis. It is important to note from these tables that the overall accuracy figures in previous tables do not consistently reflect the predictability of individual context use. For some contexts, their selection would appear to be highly rule-based, e.g., both subjects' ship search contexts. It is interesting that the ship search contexts are so highly predictable, despite the fact (above) that within the context no good predictive rules could be discovered. Other contexts are not so rule-based. This could be due to a number of reasons.

  1. They could be fictitious contexts thrown up by the analysis, having no foundation in human cognition. We have already raised this doubt about MT's ROV non-graphic context.
  2. They may not be selected in a systematic way. For example, one may suspect that the General Position Indicator is used sporadically. This would account for the very low accuracies for the GPI contexts.
  3. The analysis may not have included the attributes on the basis of which they are selected. The general cable contexts, for example, have a low predictability, which might be surprising, given the high predictability of actions within the context. This would be a good candidate to stimulate searching for further attributes to govern the context selection process.

7.2.8 Further analysis of the ROV data

The ROV non-graphic context posed the question of whether this was a fictitious context, or alternatively a real context in which there were no straightforward rules inducible on the basis of the attributes chosen. One test for this was to attempt to make the granularity of the contexts finer. This was done by setting the absorption distance to 1 rather than 2 in the process of construction of the contexts. As a result, more putative contexts were produced, and these then served as the basis for another similar process of rule induction and testing. This revealed little difference from the previous analysis. The same contexts were still dominant, with the same general patterns of results: the other contexts were generally low in examples, and offered no further coherent insight into the context structure. This result also serves to cast doubt on the value of pursuing still finer context divisions.

Perhaps a more challenging question was raised by the ROV visual context for MT, and the ROV visual and ROV approach contexts of AJ. We have here what look like well-defined contexts, yet the overall performance of induced rules is not as high as one might hope for, looking at the performance of rules in other contexts. Why not? One possible reason worth investigating was that during ROV manoeuvring, there are three concurrent tasks: to deal with speed, direction, and height. It may be that these tasks interfere with each other, because the human controller cannot attend to all at once, and that therefore at times more than one action may become appropriate according to simple rules. But the human will only be able to deal with one at a time, and any set of combined rules may predict either, but cannot predict both simultaneously.

A method of testing this is to separate out the control actions for the three different sub-tasks, and see how rules for the actions separately (together with null actions) compared on performance with the rules for all the actions together, which we have already discussed. Because null actions are only included when there is a reasonable thinking break, it is plausible to suppose that this process would avoid the potential clashes, although of course it cannot do anything about the extra fuzziness introduced by the actions having had to be delayed. This analysis of the sub-tasks turns out not to be strongly suggestive of any particular explanation of why the overall accuracy figures for the ROV contexts are not very high. It is given in Appendix C.

In an area so barely explored, it cannot be doubted that there must be other methods of analysis which have not been pursued here: further discussion is in the next section (7.3).

7.2.9 Verbal reports of task performance

At this point we shall turn to verbal reports both as a means of explaining some of the findings here, and of highlighting some of the problems, to be discussed in the next section (§ 7.3). For both AJ and MT, on the same day as their last trial, the author and the player discussed a replay of their final, highest-scoring run, and this discussion was recorded on audio tape. This replay was a version using the expanded file, with the facility to stop, go slowly or fast, backwards, or skip forwards or backwards.

This study is not primarily a study of verbal data, and therefore far less than a full analysis of the verbal reports is offered here. We will rather pick out certain points that are relevant to the general issues under consideration. The extracts below are quoted as near as possible verbatim, because in most cases the subjects did not (and perhaps could not) give concise accurate accounts of the rules they were using. In the extracts, ``I'' stands for the author/experimenter.

7.2.9.1 Distinguishing contexts where there is no difference in sensor usage

One of the potential failings of the method of analysis described is that it will not distinguish contexts that have the same sensor usage. An example of such an undistinguished context is the start, described by both subjects.


MT: The first objective is to try to hit the red square at roughly, to go through the corner axis of the square I'm aiming for that point. I start off by just, er, aiming to go full ahead then I move onto my display screen and switch off the, er; fix the ship in the centre, and reduce the scale by double the amount (whatever it is). This sets up the screen for the right sort of like distance I'm going to be using now when I want to be, er, when I want to use the screen. I would then --- that's a reasonable amount of time to actually come back and start turning the ship then: the ship's going a sufficiently amount, er, speed forward, then start turning it to the port, so it's going to actually hit this, aiming at round about 300 degrees to be able to hit it in the right direction.

AJ: Go into full ahead, I want to get as near to the area as possible. I know roughly the direction of the area.

I: So you do that of course without looking at anything.

AJ: Yes; which is a bit unfair; you should change the area each time. Right, now, checking the heading, see how far I've gone, 340, that's fine, I don't have to adjust that.

I: But what wouldn't be fine? But you've gone centre rudder there I notice.

AJ: Centre rudder, yes. 340, that's OK, that's a fine bearing. If it was going towards, er, if it was still about 350 say, I'd want to have a wee bit more port. Yeah, I've just remembered, I want to get the position indicator ready, in case I have to look ...


This could well be regarded as a separate context, since special rules apply that do not apply anywhere else. But the sensor usage is fundamentally the same as for the general ship searching: that is, no sensors on all the time. A more subtle approach would be needed to distinguish this context from other similar ones on the basis of the players actions and information usage.

7.2.9.2 High-level concepts in ship searching

We noted above the lack of effective rules coming from the process of rule induction, for ship searching contexts. This is not surprising, given the kind of high-level concepts employed by the subjects describing their searching strategy.


AJ: What's my strategy? I usually just keep going along, um, the bottom half.

I: Yes. So where roughly abouts?

AJ: Say, about the middle of the bottom half.

I: About two and a half squares up from the bottom? 250 metres up from the bottom?

AJ: Three, three I'd say.

I: Yes, 300 metres up from the bottom. So you go along the bottom and back round the top, do you?

AJ: Yes. But usually it never works that way.


MT: The next bit I'm looking for is the easterly direction, and when it's about 600, what I intend to do then is to turn it North, and hit 0 degrees and just go North, bring it round south, north and south, and then back in. It's a pattern to follow through the, er, the maze, the red maze you give us. It adds some rules and directions to where I'm going within there, rather than search aimlessly.

These strategies would be very difficult to discover from the data, and without them we cannot very well make sense of the decisions taken in this context.

7.2.9.3 Using information from a combination of sensors

The discussions brought out the fact that some sensors were used together, to form a new compound quantity, which seemed more likely to figure in the rules. Here is an example of a quantity that was not included in the analysis, and could therefore be partly responsible for the fact that the rules generated were less than optimal.


MT: ... look at the range. I should have also looked at the height, or the depth.

I: Yes, the height.

MT: And decided on when I want to actually thrust down to, to get there.

I: Do you feel yourself making some sort of intuitive judgement of angle, on the range and height together?

MT: Yes.

I: Right, and when do you --- have you formalised that in your head, or is that just a sort of vague idea?

MT: I don't want to be diving too deeply, er, by being too close. The problem is, you end up losing the vehicle underneath you ...


There were other instances in the discussions which could have been taken to imply that certain quantities were playing a part in operational rules, which were derived from the quantities displayed, rather than being displayed directly.

7.2.9.4 Verbal reports of context structure

Both subjects were asked explicitly how they would describe the structure of the task in terms of phases. Subject MT came up with approximately the following outline.

  1. Startup.
  2. Hunting phase. When a mine is found, work out if it's obtainable within the desired path.
  3. Slowing down. Includes consideration of direction for next movement.
  4. Stopping phase.
  5. ROV location (turning).
  6. Approach to the mine. Check what it is.
  7. Slowing down and stopping: fine manoeuvring.
  8. Recover ROV.
There were also a few non-standard situations that were recognised as having separate rules.

Subject AJ's reported outline pattern can be summarised as follows.

  1. General search pattern.
  2. Approach to target.
  3. ROV handling.
  4. Pulling in the ROV and restarting the ship.
Getting stuck in the mud (on the sea-bed) was another obvious separate context, as indeed were other recoveries from mistakes.

The phases mentioned by the two subjects have some similarities with the contexts produced in the analysis described earlier. The number of them is comparable, and some of them can be identified with one of the analysed contexts. However, they are not clearly identical, either with each other, or with the analysed contexts, and this adds doubt to the idea that the context analysis procedure is perfect.

7.2.9.5 Conscious changes in strategy or tactics

Both subjects reported recent changes in the way they performed the task. An example of what might be called a strategic change was given by MT.


MT: The last 3 turns, probably from 8th August, or the go before then, there's been, er, a conscious switch in the rules that's been used, to generate the strategy used for finding the ships, pointing the ships. The direction to --- the rules to go up and down --- there's various mines been left around, right on the edge, which I've not been getting, because I've been wandering from one mine to the other, which meant I'd come back to the base,

I: And you wouldn't have finished, yes.


At a more tactical level, AJ reported having just started to use the `turn' effectors, where he had previously exclusively used the `kick' effectors to change direction. He also said that he had recently begun ``going by feel'', rather than, presumably, going by conscious rules.

Here is an important point for the experimental methodology. Even after 20 or 30 hours of practice on this task, the players performance was still in a state of flux, and hence, since stable rules would be easier to discover, a longer period of practice would be better. It is also consistent with the observation, for some of the tables above, that the rules induced for one time interval performed much less well when tested against data from a different interval. Ideally, an experiment such as this should be long enough for the rules to stabilise, which would mean both that subjects did not report recent changes, and that rules induced on one interval performed equally well on neighbouring intervals.

An interesting and important additional point is that neither subject reported recently changing their view of the structure of the task, in terms of stages or contexts.

7.2.9.6 Other points

Another factor seen as leading to change in task performance was the experience of recent problems. MT talked a lot about confidence, how it was lost, and how this affected the tactics. AJ reported not taking a certain action because of recent experience of failure. Changes in tactics for these reasons would also be reflected in more poorly performing rules being induced from intervals including such changes.

But, among the many other interesting facets of the discussions, which are less directly relevant here, there was a clearly apparent difference between many aspects of the way the two subjects performed the task.

Next Section 7.3
General Contents Copyright