When a human interacts with a complex system, where there is a large amount of information potentially available, there is a basic HCI issue beyond ergonomics, screen design, or choice of the medium of interaction. That is: what information should be selected, given priority, or made easy to access? Quite probably, there will not be a static answer to this, but it will depend on the situation, or context. An important consideration here is to do with the human involved. What is their strategy for dealing with the system? What information is needed to implement this strategy? A designer of a complex system can, of course, ignore this question, and the frequent result is that information is presented in the same fashion as it is collected, from the many sensors or sources of information, to a multitude of displays arranged statically around the control area [Woods, 1991]. This ducking of the cognitive issues can be held at least partly responsible for errors and disasters.
There are more reasons to be concerned with representation. Many of the chapters in this volume involve learning in some way, and it is well recognised in the machine learning community that the representation of the material to be learned crucially affects the effectiveness of learning algorithms, whether those be symbolic, connectionist, or whatever. One could therefore say that finding a good representation is a large part of learning.
Clearly, if a human does operate in a contextual way, the contextual divisions and articulation are important aspects of their representation of the task or the system with which they are working. In effect, a context-based approach divides a task into a number of separate micro-representations. In each context, only a limited number of rules need to be considered, and only a limited number of items of information are relevant to these rules. If, for any task, a human does use this contextual approach, then the patterns we recognise in records of human actions are going to be clearer if we take into account the contextuality than if we assume that the whole strategy is monolithic.
The dependence of ruliness on representation can be illustrated from common experience. Many people must have played party games, such as the one where a pair of scissors is passed from person to person around a ring, accompanied by a declaration, at each passing, of whether the scissors are `crossed' or `uncrossed'. Most people, confronted by the task of classifying each passing of the scissors as crossed or not, seem to attempt at first to induce a rule, from examples of the reported state of the scissors, based on a representation that includes visible characteristics of the scissors. The amusing part of the game is in seeing how much experience is needed, by those initially naive to the game, to find a representation adequate to induce the (very simple) rule which determines crossedness, perhaps despite misleading actions and clues from those in the know. To the naive player, the examples are unruly, because all tentative rules, induced from a few examples using representations including only obvious features of the scissors, are contradicted by subsequent examples. To those in the know, the examples are perfectly ruly. Several other party games are based around similar learning tasks.
The perceived ruliness of data is also dependent on the method of learning. Perhaps it is because learning is innate to humans that it seems difficult to be aware of the learning processes that we use. In machine learning, in contrast, the algorithms or methods are well-defined and repeatable, although the advantages and disadvantages of different approaches are not yet fully clear, since the subject could be regarded as still at an early stage of development. But there are many examples of academic papers investigating differences between the performance of different algorithms (e.g., [Gams and Lavrac, 1987]), so at least we can safely say that the apparent ruliness discovered in data depends on the learning technique or algorithm.
Having recognised that ruliness is dependent on representation and learning method, there are at least two ways to use this result. Firstly, if we fix the representation, we can compare learning methods by comparing the apparent ruliness of the same data with different algorithms; but also, secondly, we can compare representations by fixing the learning method, and comparing the ruliness of the same data, represented differently. It is this second comparison that we will be discussing further in this chapter.
For these, and other practical reasons, the author chose to develop a purpose-built simulation game on a graphics workstation, designed to provide tractable data, while attempting to retain relevance to tasks in the real world. The task chosen was nautical mine-hunting, with a ship and a remotely-operated vehicle (ROV) connected by a cable. Subjects on this task needed several hours of learning and practice before developing a reasonable facility at the task. Their actions, mediated through discrete mouse-button clicks, were recorded for analysis. The game had a scoring system, to encourage a uniform appreciation of the task. A more detailed description of this work, and further discussion on most of the issues discussed in this chapter, is in the author's doctoral dissertation [Grant 1990].
The system-level actions were obvious enough, since the interface had been designed to make the characterisation of actions relatively easy, by avoiding analogue input. To describe situations at a system level, the relevant information which went into forming the displays was taken. Cognitive-level actions were devised by following a similar procedure to other approaches such as chunking [Laird et al. 1987] and plan recognition [Davenport and Weir 1986]. Here, it was assumed that if a short sequence of actions occurred frequently, it was probably being treated by the human as a single, compound action. However, the representation of situations at a more cognitive level posed more problems. There were a reasonably large number of different variables in the system, and it seemed clear from first-hand experience that only a small selection of these would be relevant to a particular action. We wanted to know which variables were relevant, so that examples could be prepared using those variables to define the situations, to be put along with the actions (which could be null actions).
At this first stage of experimentation, suitable representations for particular actions were hand-crafted with up to about a dozen variables: on the one hand (the system level), using variables that were explicitly present, and on the other hand (the more cognitive level), aggregating them into higher-level variables that seemed plausible from experience.
When portions of the data had been selected, represented in different ways, and assembled together, the ruliness was assessed in terms of the performance of a rule-induction program, CN2 [Clark and Niblett 1989], with the data. 1 Parts of the data were taken as training sets, and rules induced, connecting the situations (as attributes) and the actions (as decision classes). Then other data, not part of the training sets, were tested on the induced rules, and the degree to which the predictions of the rules matched the new data was recorded.
The next interesting analysis used data relating to the ROV speed control. Table 1 shows some summary results from the analysis of one subject's data. From left to right, the different columns show data relating to a sequence of time periods of practice, from earlier to later. From top to bottom, the broad divisions (RS0, RS1, RS2) are between three parallel analyses using three slightly different representations. Within each of these divisions, the `relative' figure is the improvement in predictive power of the rules, over the default rule, as previously explained. The `examples' figure gives the number of examples in the data set, and the `overall' figure is the absolute accuracy of the induced rules at classifying examples in each set of data. The third column's data was used to induce rules, and hence the accuracy of the predictions on that data is artificially good, as the same data is being used both for training and test data. But neglecting this column, the main important feature of the table is the difference in relative accuracy of the rules induced under the three representations. We can see a clear, if small, increase in the relative accuracy of RS1 and RS2, intended to be more cognitive-level representations, compared with RS0, the lower-level representation, across the figures in the central block of the table. The low relative accuracy figures in the outermost columns suggest that any rules that were being followed in these areas were substantially different from the rules discovered in the training set. This is consistent with continued learning and development over time.
The research that was done in this area Grant [1990] took this first step. All the information available via the control interface was given a cost, and enabled to be turned on or off at any time. When the subjects progressed to attempting to maximise their scores, they turned off all those parts of the display that they could do without, so leaving the ones they needed. From the data collected, it was possible to tell, for every action that was taken, what information was visible at that time. One clear result of this experiment was to show that different people used the sensors differently. Further analysis was needed to discover more about the contextuality of human control.
The data analysis then proceeded as follows. The examples were divided into the different contexts, according to the sensors that were visible at that time. Each context had its own selection of variables, assembled from information about the sensors that had been visible, and it was with those attributes that a separate rule induction was performed for each context.
Some of the results are reproduced in Tables 2 and 3 In order to avoid the problem of using the training set as a test set, the data from each time period (C--H) were divided into two, simply by allocating alternate examples to the two sets (e.g., C0 and C1). The subscript 0 sets were used for training, and the subscript 1 sets for testing. Each cell of these tables has firstly, the overall accuracy of the rules induced from the training set, when tested with the examples of the test set (``overall'' in Table 1) and secondly, a positive or negative figure indicating the difference between this overall accuracy, and the accuracy of the default rule, as explained above (``relative'' in Table 1). The number of examples in each set is given with its name: thus, the set C0 had 60 examples. The results were quite striking, though they amounted to less than a good model of human skill. Some contexts were highly ruly (e.g., Table 3), and some were less so, down to some (e.g., Table 2), where the learned rules actually performed worse than the default rule. If the data in these tables is analysed all together, the performance figures come in between the ranges of performance typical of the tables separately.
Clearly there were great differences in the character of the subjects' behaviour between these different contexts, which means at least that the contexts have a relationship with some important feature of the human way of doing things. So, what we can see here is that the concept of context, as here used, is confirmed as significant by an analysis based on ruliness. This can be seen as related to Rasmussen's concepts of skill-, rule- and knowledge-based behaviour in process operators (e.g., [Rasmussen 1983]). If a set of data for one context appears ruly in terms of low-level variables, then we might imagine it as being a context in which the operator is at a skill- or a rule-based level. In contrast, where a context is unruly in terms of simple variables, there is a choice of conclusion. Either there is some pattern-based variables present, which the learning algorithm is incapable of detecting; or there is higher-level processing (knowledge-based) going on which, again, is not uncovered by the learning algorithm.
The methods reported here are far from ideal, and, as we look at the problems, we can see opportunities for further progress along these lines of investigation. To start with, the information-costing interface is a special case, and makes the task different from the task with a more ordinary interface. How could we adapt the methods here for use with a wider range of interfaces? One possibility is to extend the ruliness analysis so that it serves also to pick out the different contexts in the first place, as well as confirming their existence, as is described above. This remains an area for future research.
Another point mentioned above concerned analogue channels of information and control. In order to work towards a human representation, and to assess ruliness, we have to be able to find some symbolic, or at least discrete variables, on the basis of the analogue channels that are measured. When looking at car driving, for example, we can describe the actions at various levels of granularity, and it remains a taxing problem to relate these different levels together effectively. Perhaps here, we should not be looking at either symbolic or connectionist models alone, but rather together. We would expect different aspects of human skill to show up at different levels of granularity. When looking at detailed physical movements, we should consider how we can model, or allow for, physiological and psycho-motor aspects of human skill; at a larger granularity, there is the more purely cognitive side, exemplified by conscious reasoning. Clearly no one current methodology is optimally suited for modelling this whole range of aspects of human skill. 2
Looking ahead further, the initial design of complex tasks and interfaces could be helped by an appreciation of the ways in which humans habitually organise tasks. This would go forward from the `Model Human Processor' of Card, Moran and Newell [1983], to model other aspects of human ability. There would be a possibility of focusing much investigation on the characteristics of the contexts into which humans habitually divide tasks, or, to put it another way, the contextuality of peoples' representations. The questions one would be tackling would be: what structuring in the task makes it humanly possible to execute; and what implications does this have for the design of such tasks?
The modelling of human learning is perhaps more of a long-term interest in HCI, concerning the design and efficiency of training, and how difficult a new system is to learn, with or without externally structured training. But it is an open question, to what extent human learning can be modelled, without being corroborated by a clear knowledge of what humans have actually learned in particular situations. Traditional examinations or interviews scratch the surface; but methods building on discovering human representations could give a more thorough view, not reliant on verbal reports of uncertain veracity.
Card, S. K., Moran, T. P., and Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ.
Carroll, J. M., Kellogg, W. A., and Rosson, M. B. (1991). The task-artifact cycle. In: Carroll, J. M. (ed.), Designing Interaction: Psychology at the Human Computer Interface. Cambridge University Press, Cambridge.
Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4): 261--283.
Davenport, C. and Weir, G. (1986). Plan recognition for intelligent advice and monitoring. In: Harrison, M. D. and Monk, A. F. (eds), People and Computers: Designing for Usability, pp. 296--315. Cambridge University Press.
Gams, M. and Lavrac, N. (1987). Review of five empirical learning systems within a proposed schemata. In: Bratko, I. and Lavrac, N. (eds), Progress in Machine Learning: Proceedings of EWSL-87, Bled, Yugoslavia, pp. 46--66, Wilmslow. Sigma Press.
Grant, A. S. (1990). Modelling Cognitive Aspects of Complex Control Tasks. PhD thesis, Department of Computer Science, University of Strathclyde, Glasgow.
Laird, J. E., Newell, A., and Rosenbloom, P. S. (1987). SOAR: An architecture for general intelligence. Artificial Intelligence, 33: 1--64.
Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Transactions on Systems, Man and Cybernetics, SMC-13: 257--266.
Woods, D. D. (1991). The cognitive engineering of problem representations. In: Weir, G. R. S. and Alty, J. L. (eds), Human-Computer Interaction and Complex Systems. Academic Press, London.