Cognitive Dimensions and An Empirical Evaluation: Lessons Learned
Francesmary Modugno
University of Washington
Seattle, WA 98115
fm@cs.washington.edu
http://www.cs.washington.edu/homes/fm
ABSTRACT
We discuss usability problems uncovered by a Cognitive Dimensions (CD)
analysis of a demonstrational desktop and verified by an empirical
evaluation. These combined analyses provide lessons for those
selecting usability evaluation techniques and those developing
demonstrational systems: CD's are often overlooked for evaluation;
CD's can be learned and used quickly; CD's can help designers
understand and evaluate the differences between alternative designs;
non-empirical evaluation techniques can guide the interpretation of
empirical data and shed light on overlooked aspects of a system; and
demonstrational systems should support programming strategy selection.
Keywords:
cognitive dimensions, usability evaluation, programming by
demonstration, end-user programming
INTRODUCTION AND MOTIVATION
System designers often make tradeoffs to satisfy design goals. Also,
before embarking on a costly empirical evaluation, designers usually
employ non-empirical evaluation technique (e.g., heuristic evaluation)
to uncover potential usability problems quickly and cheaply.
Which technique(s) can help designers understand the tradeoffs in a
design or between designs and provide them with feedback on how to
improve a design without requiring that they become experts in the
technique? We present a case study of one technique, Cognitive
Dimensions [2] (CD's), and share the lessons learned by doing a CD
analysis of a system and then comparing the results with an empirical
study.
COGNITIVE DIMENSIONS
Cognitive Dimensions are a framework for a broad-brush assessment of a
system's form and structure. To evaluate a system, the user analyzes
it along each of 12 dimensions. The dimension, which are grounded in
psychological theory, can provide insight into the cognitively
important aspects of a system and can reveal potential usability
problems. (For details of how the dimensions derive from
psychological theory and how they are applied see [2])
The Pursuit Desktop
We analyzed Pursuit [6], a demonstrational desktop
(similar to the Macintosh Finder) whose goal is to enable
non-programmers to construct programs containing loops, variables and
conditionals without having to develop programming expertise. To
create a program, users demonstrate its actions on files and
folders on the desktop, and Pursuit infers a general procedure. An
open problem for demonstrational systems is how to represent the
inferred program. We explored two equivalent languages to represent
the evolving program while the user demonstrates it : a mostly
graphical language containing icons for data and operations, and a
mostly textual language containing icons for data and text for
operations. We developed two Pursuit prototypes that differed only
in how they represented the evolving program.
The Cognitive Dimensions of Pursuit
The goal of the CD analysis was to understand the tradeoffs between
the two languages and to gain insight into the impact the languages
might have on Pursuit's effectiveness. A second goal was to
uncover potential usability problems prior to the empirical
evaluation. We chose CD's over other analysis techniques
because CD's explicitly explore design tradeoffs. Also, as
novices to usability evaluation, we wanted a technique that we could
learn and use quickly. Moreover, we wanted to get a deeper
understanding of the user's interaction with the system, not just find
interface problems.
After reading papers on CD's, we spent a day thinking about how each
dimension applied to Pursuit. In a few days, we detailed the results
(see [6]). The analysis (1) revealed insights about Pursuit's overall
design, (2) provided a way to characterize the differences between the two
representation languages, and (3) clarified the tradeoffs between these
differences. For example, the mostly graphical language is more role
expressive, meaning it more closely reflects desktop objects and
operations. The mostly textual language is terser, meaning more of it
appears in the program window. The cost of greater role
expressiveness (terseness) is less terseness (role expressiveness).
The Strategy Choosing Problem.
A surprising result, uncovered while analyzing Pursuit along the
Look-Ahead dimension, applies to Pursuit as well as to other
demonstrational systems. Look-Ahead constraints impose an order on
user actions. For example, to select a menu item the user must first
expose the menu. These constraints require users to plan before
they execute any actions -- the more planning, the greater the burden
(i.e., look-ahead) on the user.
In Pursuit (and some other demonstrational systems), users must decide
a priori how to demonstrate a program. That is, the user must
determine the specification strategy, which involves thoroughly
examining the state of the desktop and inferring state changes that
may result from intermediate program actions. We refer to this as the
strategy-choosing problem . We added a feature to Pursuit to
automatically handle certain classes of this problem: during the
demonstration, if Pursuit recognizes an inappropriate demonstration
strategy, it notifies the user, changes the strategy, updates the
program to reflect the change, and enables the user to continue the
demonstration. This reduces look-ahead because the user is less
constrained to examine the system state, etc. before a demonstration.
EMPIRICAL EVALUATION OF PURSUIT
After incorporating the changes suggested by the CD analysis into the
prototypes, we performed a user study. Sixteen non-programmers were
randomly assigned to use one of the prototypes and were given program
construction and comprehension tasks. Both groups successfully
constructed and comprehended programs containing loops, variables and
conditionals. Thus, Pursuit met its goal of enabling non-programmers
to access the power of programming.
We also wanted to understand the effects of the language tradeoffs on
the usability of Pursuit. An interesting result was the effect
on users' ability to construct programs: the more
graphical language group was twice as accurate in constructing
programs (F(1,28)=13.00, p<.002) and was also better at
comprehending programs containing control constructs and variables
(t(14)=1.84, p<.04). Since user actions to construct a program
are identical for both prototypes, these differences could only be due
to the different representation languages. These findings were
consistent with the CD analysis, which suggested that since the mostly
graphical language was more role expressive and closer to the
representations in the interface, it might better facilitate learning
and comprehension.
The Strategy-Choosing Problem Revisited.
The study also confirmed the strategy-choosing problem. By examining
the log files from the program construction tasks, we discovered that
users often had difficulty determining how to demonstrate a
program. Of the 16 users, all but one chose an incorrect
demonstration strategy at least once. In only 18% of these cases did
the user eventually create a correct program -- by starting the
programming task over with another strategy .
Recall, that Pursuit incorporated a feature to handle the strategy-choosing
problem. Although the mechanism was not documented (to reduce what
users had to learn prior to the construction task), 9 of the 16 users
accidentally happened on it. Of those 9 initially incorrect
programming attempts, 6 went on to correctly construct the program by
adopting the new strategy and continuing the demonstration .
Thus, the mechanism provided a 67% recovery rate from an error in
strategy without the user starting over as compared to an 18%
recovery rate in general with the user starting over .
DISCUSSION AND CONCLUSIONS
There is ongoing study of the effectiveness, applicability,
learnability and usability of different usability evaluation
techniques. Much of this work compares performance outcomes of the
different techniques (e.g. [1,3,5]), although John [4] has used the
case-study approach to understand what people do when using
these techniques. Our work supplements these results by adding CD's
as an evaluation technique to study and by suggesting further
investigation into how each of these techniques might interact with a
formal user study.
Our experience has taught us several lessons. First, a computer
scientist with little psychology or HCI training can learn and use
CD's in a few days. For designers, we recommend CD's not only for
revealing potential usability problems, but also for understanding
different design tradeoffs and their potential impact on usability.
The discovery of the strategy-choosing problem in Pursuit and the
confirmation by the empirical study of the severity of this problem
suggest that designers of demonstrational systems need to consider
ways to support the strategy-selection process for users. The ability
to provide this support, at least for some types of strategy-selection
errors, was demonstrated by the successful use of a feature added to
Pursuit.
Finally, the CD analysis influenced how we analyzed the empirical
data. Because the CD analysis revealed the strategy-choosing problem, we
looked for confirmatory evidence in the data logs. We might not have
looked for this problem otherwise. We thus might have missed a
stumbling block for users (at least prior to the empirical study),
might have incurred greater cost to fix it after the study in terms of
additional user testing, and would not have learned as much from the
user study. Moreover, the analysis of the data logs as a result of
the CD analysis not only showed us how our solution to the
strategy-choosing problem helped users, it also revealed particular
instances where it failed and suggested future research into devising
mechanisms for handling different types of strategy-selection
problems in demonstrational systems.
REFERENCES
- D. L. Cuomo and C. D. Bowen. Understanding Usability Issues
Addressed by Three User-System Interface Evaluation Techniques.
Interacting with Computers , 6(1):86--108, 1994.
- T.R.G. Green. Cognitive Dimensions of Notations. In People
and Computers V , 1989.
- R. Jeffries et 1al. User Interface Evaluation in the Real World:
A Comparison of Four Techniques. In Proceedings of CHI '91 .
- B. E. John and H. Packer. Learning and Using the Cognitive
Walkthrough Method: A Case Study Approach. In Procedings of
CHI'95 .
- C. M. Karat. A Comparison of User Interface Evaluation Methods. In
J. Nielsen and R. L. Mack, editors, Usability Inspection Methods
.
- F. Modugno. Extending End-User Programming in a Visual Shell with
Programming by Demonstration and Graphical Language Techniques .
PhD thesis, Carnegie Mellon University, March 1995.