Everyday activities range from the easy, and the straightforward, to
the difficult, and the complex. A model designed to cover simple
activities may not be adequate to explain and cover more complex ones,
whereas a more elaborate model could well cover simpler activities as
well as more complex ones.
Tasks done using, or with the aid of,
computer systems also have a range of complexity.
In the study of human-computer interaction (HCI), it is not surprising that
the aim of modelling different human tasks has given rise to
models of differing capabilities. A model of the cognition underlying
human text-editing tasks may inform better HCI design of text editors,
but it is less likely greatly to inform the design of interfaces for tasks
of substantially greater complexity, such as process control, traffic
control, management of large organisations, etc.
Current models of cognition do not deal fully with the
cognition underlying the human performance of such complex tasks.
Part of the purpose of this paper is to draw attention to this,
and to identify some of the missing parts of the models.
Extending the range of models is also important because,
if we are to develop principles to inform HCI design for complex tasks,
we should be able to base the principles on models that successfully
cover the phenomena that have already been widely observed and discussed,
for example by Bainbridge
[2],
[3],
Rasmussen
[8]
and Woods
[11].
New theoretical developments could help in the making of
models that could ultimately be used in HCI design for complex systems.
In the belief that some aspects of everyday life
(such as social and family interaction, and household management)
are at least as complex as process control tasks, here we choose to
look at the problems of process control.
This paper first draws attention to phenomena in process control
that pose particular problems for current models.
Here, we use the example of protocols from steel
process operators, gathered by Bainbridge
[2].
The difficulties are then spelled out for some examples of current
models of cognition.
A new idea for modelling is then proposed, based around the idea of
an immediate human cognitive task context. This draws on ideas
from previous models, but pays specific attention to the transition
between contexts.
Much has been written on the nature of complex tasks from the HCI
viewpoint (see references given above). We will here focus on a
relatively early example of this kind of study, that of Bainbridge
[2]. The reason for this is that Bainbridge
reproduces a long protocol, and on the basis of the protocol data
constructs a model of some of the
cognition behind the task actions. This is probably one of the fullest
accounts of the analysis of a verbal protocol from a complex task.
Protocols do not directly reveal the operator's knowledge structures, in
the sense of naming the mental structures and contents.
This had previously been held by some psychologists, and is called
classical introspectionism by Prætorius and Duncan
[7].
We must infer mental phenomena from the protocols.
To illustrate the nature of this inference,
we shall start with a protocol extract.
In Bainbridge's experiment, the subjects were asked to `think aloud' or
`talk about everything you think of'.
To give a short extract
(not selected for any particular purpose)
from the protocol in Bainbridge
[2]:
Bainbridge carefully analysed the patterns to be found in the protocols,
looking for recurrent routines or sequences of actions able to
explain the phrases. She admits that this is impossible to do
in a completely formal way, but she was attempting to be parsimonious
in the explanatory framework, while at the same time taking account of
human limitations in working memory.
For our purposes, it is more important that Bainbridge shows that
such analysis to be possible, than whether her
particular analysis is the best possible.
The fact that such an analysis is both extensive and coherent
supports the idea that phenomena of the type described occur.
The model that Bainbridge then constructs is able to account for
the majority of the protocol phrases.
At the highest level, the model consists of interrelated
``routines'' and ``sequences''.
These specify the information sought and used by the human, the basis
for the decisions taken, and the conditions under which the subject
moves to another routine or sequence.
These are seen as the largest units of cognitive behaviour in this
particular task that occur as a whole.
The most important aspect of Bainbridge's analysis, for our present
purposes, is her identification of some characteristics
of her routines and sequences.
Her findings are consistent with the findings in another experiment
using a task of probably similar complexity, which
we shall now briefly consider.
In this other study
[5],
subjects were given a dynamic control task
which involved simulated naval mine-hunting.
The interface in this task
had a facility for turning the information displays
on and off, and there was a cost (in terms of game score)
associated with the visibility of any piece of information.
When the subjects had sufficient practice, they were able
to perform the task using only a small number of information
sources at one time. These information sources were used regularly
at particular stages of the task, and could be seen as supporting
the rules for action that were employed during that stage.
The common point in the analysis of these different studies
was the identification of cognitive structures that are compatible with
human limitations, and that can be invoked to explain at once both
the information needs of the task in terms of immediate decisions,
and the information needs in terms of decisions to change context.
Although other models have described the division of human
activities into separate modules (under a variety of names),
none have convincingly integrated the explanation of
both this and movement between contexts, in a
description of a complex task, where contextuality
is central to the human strategy.
The aim of this section is not primarily to review previous models,
but only to point out briefly, for a few models,
where it appears that they do not easily account for
the kinds of cognition attributable to process control.
SOAR [6]
is a well-known model of general intelligence,
although it is not primarily designed as a model of human cognition.
It is based on clearly-expressed hypotheses about the structure
of an architecture capable of general intelligence, and these hypotheses
have certain implications for the way in which SOAR would be capable
of modelling cognition in a task like process control.
The hypotheses of goal structure, problem space and universal subgoaling
map out a problem-solving strategy in which the context can be identified
with the current
goal, and on this may depend the problem space and the operators that
are appropriate. However, this goal or context structure is of limited
flexibility. It appears that the goal or context can only be changed
in two ways: firstly by success or failure, back to the parent goal;
and secondly by generating a subgoal. This discounts the possibility
of switching context in a variety of ways, without return to the
former goal or context. To connect this with everyday life,
surely the experience of being interrupted and
losing track of what one was doing must be nearly universal.
In process control tasks, it is important to allow for interruptions.
An alarm must take one out of one's present context,
and although memory of the context that was left may remain for a while,
there is no guarantee of returning to it.
Another hypothesis of SOAR is to do with control knowledge.
According to this, ``Any decision can be controlled by indefinite
amounts of knowledge, both domain dependent and independent.''
This invites models of process control which violate established
human limits on attention, memory and speed of processing.
A useful summary paper by Anderson
[1] takes us through
the essential features of these theories, where they are presented
as theories of learning, rather than theories
of how knowledge is put into action.
Questions that follows from this are, firstly,
can these theories learn the structures that
are suggested by empirical studies of complex tasks?
and secondly, would they learn this from the experience
that we could expect them to work from?
We may accept that
it is possible to create a context-like task structure
using a production system, simply because in principle
anything that can be modelled at all can be done with
production systems. A more important practical
question is whether it can be done conveniently.
There must be room to doubt this
in the case of ACT* and PUPS, since there is no explicit
provision for such a structure in Anderson's proposed
memory systems, and no detailed theory of how knowledge is used
in practice, such as could be applied to complex tasks.
Schank's MOP (mental organization packet) theory
[9] is constructed primarily
to deal with natural story fragments, as was its predecessor,
Schank and Abelson's Scripts
[10].
According to Schank
[9], MOPs serve to sequence scenes, such
that if you are in a certain MOP, then you will a fortiori know
what scene will come next. But he does not discuss attention, and the
way in which a MOP would be chosen in everyday life at a particular
time.
Bartlett's theory of schemata
[4]
is a theory of memory, rather than learning or action.
Theories of learning rest on theories of memory,
because one cannot have a theory of learning without having a theory
of memory (i.e., what is learned and remembered).
It is also difficult to imagine a theory of action without a theory
of memory.
Therefore, theories of memory will constrain both theories of learning
and of action. One of the claims made in this paper is that current theories
are not well-matched with the apparent phenomena of process control and
other complex tasks. What is proposed below amounts to an enhancement of
a theory of what is stored in memory, together with a theory of action
closely bound to it. It also invites a theory of learning, but this is not
yet developed.
The situation that is described most easily by the model
is of a person engaged in a task (particularly a demanding task),
and for whom, at any time, there are a small
number of appropriate decisions or actions.
This person will be attending to information corresponding to
those decisions that are relevant to that situation, and perhaps
also other information.
This is most clear in well-practiced tasks, where the human has learned
both how to perform the task, and what information is needed for it.
When we look at less practiced tasks, or novice behaviour, this is much
less clear. So, for clarity of exposition, we here start by
considering a well-learned task.
Well-learned complex tasks are likely to have well-defined stages,
and each stage will have its own relevant information and
decisions to be made, as outlined above. What we here
(naturalistically) call a `context' may then be thought of as a knowledge
structure corresponding to a stage of a task, including those
cognitive constituents that are peculiar to that stage, such as
the information requirements, the appropriate decisions, etc.
The context as here described would then be an obvious candidate for
something that is stored in long-term memory, and recalled as a whole,
as a viable unit of task strategy appropriate to some stage of some
task.
If tasks are organised into contexts, the question arises, how are
contexts selected or moved between?
One obvious apparent possibility is to have a method (or function)
mapping situations onto contexts, in which case the human would
monitor the appropriate variables in the situation,
so that the correct context could be determined.
But there is no compelling reason why contexts need to be fully
determined by the situation existing at a particular time only:
they could have a dependency on history as well.
A real (rather than ideal) thermostat is an obvious example:
there is always a small interval between the switching-on temperature
and the switching-off temperature, and within these limits one has to
know the history of the temperature or the state
to determine the present state of the thermostat.
Furthermore, the way in which people sometimes get disoriented in a
complex task suggests that there is no simple function mapping
observable variables onto contexts. If there were, it would permit rapid
and effective reorientation, which manifestly often fails to happen.
There is a further objection to theories that have an independent function
mapping situations to contexts. What variables should be monitored
by such a function at any particular time?
There are two possible answers.
Firstly, all variables that could cause a context shift are monitored
concurrently. This is difficult to reconcile with the limitations
of human ability, though it is not unreasonable to assume that some few
variables are monitored thus.
Secondly, the variables determining context shifts at the lowest level
are selected according to a higher-level context.
This would mean that the same question would reappear at a higher level.
What variables determine the higher-level `meta-context', as we might call it?
Neither of these answers appears satisfactory.
But the information necessary to guide the human into the appropriate
context must be monitored and processed somehow.
Sometimes, the structure of a task
may permit this information to be channelled through a medium, or modality,
different from the information necessary for action or decision-making
within the current task context; but this is by no means universal.
The channel used for the current context could also warn of a change
of context --- particularly, for example, when the value of a certain
variable goes outside the normal bounds appropriate to the context.
It then makes sense to consider the mechanism for
determining the next context to be part of the current context.
It is not necessary to prescribe whether this mechanism uses rules, patterns,
triggers or whatever, because it does not alter the form of the model.
Suffice it to say that the model needs to account for the process
of context changing, as well as the information needed to support that process.
A promising approach to modelling context changing would be
to have both context-dependent and context-independent context changing
mechanisms. We are all familiar with the way in which immediate physical
threat, or other strong emotion, can interrupt a task independently
of the stage of the task (after allowing perhaps for differing
degrees of concentration).
But the transitions between contexts within a task seem to be
carried out in a context-dependent manner.
We could invoke a hybrid model to account for this, in that
the context-dependent within-task transitions could be modelled
by a sequential, symbolic process, whereas the context-independent transitions
could be modelled by parallel processes, working at a sub-symbolic level.
The essential components of a human context are then:
the information that is needed within that context,
together with whatever processes or procedures are needed to
gather or arrive at that information;
some means of deciding when to move to another context,
based on some of the information used;
and rules that connect (possibly) other information used
to the decisions that need to be taken in that context.
It is not very difficult to imagine how a wide range of human activities
(including much from everyday life)
could be described in terms of a context model of this kind.
So far, this model has been described as a model of
the operator's mental processes,
but it also clearly relates to what the operator could have
in mind (though the degree of conscious awareness is unspecified,
as with other kinds of mental models).
According to the model, an operator will be attending to just those
sources of information that are relevant to the context.
This includes the information necessary for the rules governing task
actions in that context, and also the information necessary for the
rules governing the changes of context within the overall structure of
the task.
This attention could, perhaps, be centred around a
mental model in the sense of visual imagery,
though one cannot claim that this kind of image is necessary.
Also in the operator's mind, at some level,
will be some awareness of what the other
contexts are that connect with the current context, but the mental
structures associated with these other contexts need not be
fully brought to mind until the particular context is entered.
Thus, if a different model image is required by a different context
(as would often be the case), that image would come to mind
at the point of switching to its appropriate context.
One of the problems of context-like theories is in setting limits
for the size of a context.
Clearly, we could construct a reasonable model of people's context
structure based only on observation and analysis of people's actual
performance (as Bainbridge, op. cit.), but that by itself would give
only a weak guide, if any, to predicting the context structure that
a human will develop in response to a particular complex activity.
One of the potential strongpoints of the theory being proposed is that
the combination of rules in context and rules for change of context
could form the basis for prediction of the optimum size range for
contexts for human use.
What is done here is only a sketch of how such a demonstration might be
done.
To begin with, we need to note what the cognitively demanding aspects of
task performance are, which is related to what the limiting features of
cognition are. To do this fully would be a major task, so for the time
being let us simply equate the cognitive demands to the limitations on
working memory for the items that are central to this model, namely: the
information that needs to be monitored at one time; the complexity of
the rules (or whatever) governing decisions within the context;
and the complexity of the rules governing transition to other contexts.
The is plenty of scope for the enhancement of the theory by
incorporating better models of cognition here.
We may imagine a range of structures representing task cognition, from,
at one end, strategies
where all the information is always present --- that is,
where there is only one context; to, at the other end,
strategies where each decision has a context to itself.
If there is only one, large context, all information and rules will be
present concurrently, so there is much within-context rule complexity,
but no rules necessary for the changing of context.
If, on the other hand, there is only going to be
one decision rule in each context (and therefore little complexity here),
there will be relatively many transitions possible between
different contexts, and thus much complexity to
the rules governing the transition between contexts.
What will happen in between?
As the number of contexts increases, so the size of each context
will decrease, in terms of the amount of information necessary
and the complexity of decision rules internal to that context.
We may conjecture that for a given increase in the size of
the contexts (decrease in number of contexts)
there would be a more than proportionate increase
in the complexity of rules. Intuitively, this would be because
in a larger context, the decision rules would have to deal with the
extra load of distinguishing between different parts of the context.
Going the opposite way along the scale, as we increase the number
of contexts (the size of each context decreasing),
the complexity of the transition rules is going to
increase (from zero). Again, it is not unreasonable to conjecture
that as the number of contexts increases, the complexity of
the transition rules will increase more than proportionately,
intuitively because the number of possible transitions
increases exponentially with the number of contexts.
The resulting schematic graph is shown in Figure 1.
When the effects of intra-context and inter-context rules and information
are added together, it is reasonable to suppose that there will be
some moderate size of context where the total human
cognitive demand of executing the task will be at a minimum.
This corresponds to the structure that we would expect humans
to use, on the basis that humans are likely to opt for
the easiest task performance strategy.
It should be emphasised that this graph is currently conjectural,
and more work needs to be done in establishing
its correct form.
Computer technology offers us the possibility of putting into
effect insights gained in the study of the human use of information.
If these studies of information use in complex tasks were successful,
it might prove possible to design
information systems to take into account human context structures,
and to present just the right relevant information at the right time.
Context modelling studies would provide the basis for assessing
what information was minimally necessary at any particular stage in
a task, and thus interfaces could be designed that provided no more
than what was required. Whereas this may not be an important issue
in other fields, confusion caused by excess information has been
held partly responsible for incidents such as Three Mile Island.
A key feature of this kind of interface might be the keeping track
of human task context, providing information specific to each
context, and facilitating the transition between contexts.
To enable easy context transition, we could envisage either an
automatic system, where the display was changed according to what
the system reckoned matched the human context transitions,
or else, in a graphical interface, buttons could be provided
for effecting a change of display according to context, and perhaps
the system could flash the appropriate button when calculations
revealed that the human would be likely to want to change context.
A context model could thus help in the
design of interfaces to complex tasks.
I am grateful to Angela Sasse for discussion of the form of this paper.
1 Introduction
2 Characteristics of a complex task
I'm going to have to
These kind of protocol phrases do not make a great deal of sense out of context,
and one cannot with any degree of certainty infer the characteristics
of process control tasks directly from them.
I'm going to really have to, yes, manipulate on
I'll leave B and C on full tilt
now I'm going to have to manipulate one of these
if I want to cut down
if I'm going to go over it
4 minutes and I've used 6
I've got A oxidising
3 Other models and their lack of perfect match
SOAR
ACT* and PUPS
MOPs
General points
4 An outline model of human cognitive contexts
Reasons for a structure of medium-sized contexts
Figure 1: A schematic diagram of a possible relationship
between context size and cognitive demand
5 Implications for computer information systems and HCI
Acknowledgements
References
(c) Copyright Simon Grant.
This material is copyright and must not be
reproduced or copied or have any links set up to it
without the formal consent of the copyright owner.
If you have any comments, or wish to use the material
in any way, please send me e-mail
Other publications by me are listed on a
separate page.
For further information on the author, please refer to
my home page.