1. Always start by "drawing a fence" around
the unit, program, agency, group of agencies or functions for which performance
measures are to be developed.
Advice from:
Organizational Resources
References
The Short Answer
There are many different ways to do this (see 3.1). Here's
one approach that goes directly to performance measures themselves:
All performance measures (that have ever existed for any program in the
history of the universe) fall into one of four categories, derived from the
intersection of quantity and quality vs. effort and
effect.
QUANTITY
QUALITY
EFFORT
What did we do?
How much service did we deliver?
How well did we do it?
How well did we deliver service?
EFFECT
Is anyone better off (#)?
How much change for the better did we produce
Is anyone better off (%)? What quality of change for the better did we produce?
In each quadrant the questions are answered with # or % data
statements:
What did we do? (e.g. # clients served, #
activities performed).
How well did we do it? (e.g. % timely actions, %
complete actions, client staff ratio, staff turnover rate, unit cost).
Is anyone better off? (# and % of clients who
show improvement in skills/ knowledge, attitude, behavior or
circumstance).
(1) The first step in any performance measurement work is to identify what
organizational entity or function we are talking about. This can be thought of as a "fence drawing"
exercise. We will draw a fence around the thing whose performance is to be
measured. This could be an agency, a program, a subprogram or a component unit
or activity of the program. Or it could be a function of the
organization which crosses organizational lines. The idea is simple: take a picture of the
organization in whatever form it makes sense to you. Draw a line around all of
it or a piece of it. And consider the performance of what's inside the fence.
(2) Service systems and systems reform and integration: Fences can also be
drawn around a set of related programs or agencies that make up a service
system (e.g. the out of home care system including child welfare, juvenile
justice, mental health and education), and performance measures developed for
the system as a whole. This kind of process should be among the first things
done in any systems reform effort. (Note: service integration and systems reform
are means to the end
of better results, not ends in themselves.) For example, in discussions of
service integration (as a possible component of reform), we could consider the following performance measures to test whether we were making
progress from the client's perspective.
Average number of workers and case plans per family in the
system
Average number of offices that clients must visit each
month.
Average number of bus changes required for clients to get
to current offices.
This kind of information could be gathered on a sample basis. Baselines
could be created and the performance accountability process described in this
guide could be used to drive the numbers down. Performance measures can have
the effect, as in this case, of giving an operational definition to an
otherwise vague notion like "service integration."
(3) TECHNIQUE: Here is a five step process
that's the best way to help people identify performance measures, select
the most important ones and identify a data development agenda.
Step 1. HOW MUCH WE DO (Upper Left): Draw the four quadrants on a big piece
of flip chart paper. Start in the upper left quadrant. First put down the
measure "# of customers served." in the upper left quadrant. Ask if
there are better more specific ways to count customers or important
subcategories of customers, and list them. (e.g. #
of families served, # of children with disabilities served etc.). Next
ask what activities are performed. Convert each activity into a measure (e.g.
"we train people" becomes # of people trained.) When you're
finished, ask if there are any major activities that are not listed.
Step 2. HOW WELL DO WE DO IT? HOW WELL DO WE PERFORM THESE ACTIVITIES?
(Upper Right): Ask people to
review the standard measures for this quadrant that apply to most if not all
programs, services or activities (e.g. unit cost, staff turnover, etc.) These
are shown on the "Separating the Wheat From Chaff" worksheet (LINK HERE) in the upper right quadrant
under "standard measures." Write each answer in the upper right
quadrant. Next take each activity listed in the upper left and ask if there
are measures that tell whether that particular activity was performed well. If
you get blank looks, ask if timeliness matters, if accuracy matters. Convert
each answer into a measure and be specific (e.g. the timeliness of case
reviews becomes "percent of case reviews completed on
time" or "percent of case reviews completed within 30 days after
opening."
Step 3. IS ANYONE BETTER OFF? (Lower Left and Lower Right): Ask "In
what ways could clients be better off as a result of getting this service? How we would know if
they were better off in measurable terms?" Create pairs of
measures (# and %) for each answer (e.g. # and % of clients who get jobs above the
minimum wage). The # answers go in the lower left; the % answers go in the lower right.
There are two ways to state these kind of measures: point in time and
improvement over time (e.g. % of children with good attendance this report
card period vs. % of
children whose attendance improved since the last report card period).
This is the most interesting and challenging part of this process. Dig
deep into the different ways this can show up in the lives of the people
served. Explore each of the four categories of "better-offness":
skills/knowledge, attitude, behavior and circumstance. If
people get stuck, try the reverse question: "If your service was terrible,
how would it show up in the lives of your clients?"
Look first for data that is already collected. Then be creative about
things that could/should be counted and the ways in which data could be
generated. It is not always necessary to do 100% reporting. Sampling can be
used, either regular and continuous sampling or one time studies based on
sampling. Pre and post testing can be used to show improvement in skills,
knowledge or attitude. Surveys can be used which ask clients to self report
improvement or benefits.
NOTE: Every performance measure has two incarnations: a lay definition and
a technical definition. The lay definition is one that anyone could understand
(e.g. Percentage of clients who got jobs) and a technical definition which,
for percentages, exactly specifies the numerator and denominator (e.g. the
number of clients who got jobs this month, divided by the total number of
clients enrolled in the program at any time during the month).
Now you have filled in the four quadrants with as many entries as you
can. Next we select the most important measures and a data development
agenda. Here's a SHORT CUT way to do that:
Step 4. HEADLINE MEASURES: Identify the measures in the upper right and
lower right quadrants for which there is (good) data. This means decent data is
available today (or could be produced with little effort). Circle each
one of these measures with a colored marker. Ask "If you had to talk
about your program with just one of these circled measures, which one would it
be?" Put a star by the answer. Then ask "If you could have a second
measure... and a third?" You should identify no more than 4 or 5 measures. And those
should be a mix of upper right and lower right measures. These choices
represent a working
list of headline measures for the program.
Step 5. DATA DEVELOPMENT AGENDA: Ask "If you could buy one of the
measures for which you don't have data, which one would it be?" Mark that
with a different colored marker. "If you could have a second measure...
and a third?" List
4 or 5 measures. These is the beginning of your data development agenda in priority order.
(4) The longer and more thorough method for selecting
performance measures involves rating each measure High Medium or Low on three
criteria: Communication, Proxy and Data Power.
Communication Power:
Does the performance measure communicate to a broad range of audiences? It is possible
to think of this in terms of the public square test. If you had to
stand in a public square and explain the performance of this program to your neighbors, what two or three
measures would you use?
Proxy Power:
Does the performance measure say something of central importance about the
program (agency or service system)? Can this measure stand as a proxy for the
most important things the program does?
Data Power:
Do we have quality data on a timely basis? We need data which is reliable andconsistent. And we need timely data so we can see progress - or the
lack thereof -on a regular and
frequent basis.
(5) Both methods will lead to the same list. The SHORT CUT works
because the "forced choice" process leads people intuitively to
think about communication and proxy power. When they do this for measures
where they have data, the selected measures are the Headline Measures. When
they do this for measures where they do not have data, the selected measures
are the Data Development Agenda.
This process will lead to a three part list of performance measures:
Headline Performance Measures
Those 3 to 5 measures you would use to present or explain your
program's performance to policy makers or to the public.
Secondary Measures
All other measures for which you now have data. These measures will
be used to help manage the program. And they will often figure in the
story behind the curve for headline measures.
Data Development Agenda
Measures you would like to have. These should be listed in priority
order. Since data is expensive both in dollars and worker time, you must
make a judgment about how far down this list you can afford to go.
The headline measures are the starting point for using data to improve
program performance.
(6) Several things to keep in mind here: It is best if the program or
service, for which performance measures are developed, has some organizational identity. Performance accountability is about holding
managers accountable for the performance of what it is they manage. If the
thing to be measured has no organizational identity, then there is no person
or persons who can be held accountable for its performance.
This does not mean
that the thing to be measured must be a box on the organization chart or a
physical unit in a single geographic location. In matrix management, for
example, it
can be a function that cuts across organization lines for which some person or
persons has been given lead responsibility (for example budgeting or staff
development, where some staff may be decentralized but the function is still
managed or "lead" by someone.) It can be a program which operates in
many different locations. The notion of fence drawing is
flexible enough to work with any organizational structure old or new.
(7) Second thing to keep in mind: When you are trying to teach these ideas to
new people start with small units which have a clear identity. Then move on to
larger units and functions without physical organizational identity.
(8) Third: performance measurement starts with the idea of customers or
clients. CUSTOMERS are people who can be made better or worse off by the
services of the program.
Performance measurement is an easier discussion for
organizational entities who can clearly identify their customers. So, for
example, direct service programs like child support enforcement or mentoring
will have a head start on programs or activities where this discussion is
unclear.
Performance measurement of customer well-being is harder for administrative functions such as budget, personnel, general services etc.
It will be necessary to spend some quality
time helping these people understand/discover who their customers are. Hint: for administrative functions the customers are
often the managers of the agency itself. And customer satisfaction turns out
to be the most important lower right quadrant measure. (See 3.10)
(9) One of the best ways to teach this method is to conduct a
"fishbowl" at the front of the room. Get four or five people to
volunteer who know a particular program well. Position them in chairs in a small
semi-circle at the front of the room, facing forward (i.e. back to everyone
else). Conduct a short session (15 to 20 minutes) using the technique above.
Periodically pause to ask if the larger audience has any questions. If time
permits, break the larger group into groups of 6 and have them pick a program.
One member of the group then leads the group through the 5 steps of the
technique above. Depending on time, two or three rounds of this could be done.
Debrief the large group. "What worked and didn't work about this
experience? What did you learn? How many think thay could lead a small group of
coworkers through this thinking process?"
(10) Technical note: Some people correctly point out that client results
actually have two components which parallel the difference between results and
indicators at the population level, i.e. a plain language statement of client
well-being (clients are self sufficient) and a measurement that describes this
condition of well-being (# and % of clients who get jobs and keep them 6
months or more). In practice, these two ideas are addressed in a single step
in the thinking process which asks "In
what ways could clients be better off as a result of getting this service? How we would know if
they were better off in measurable terms?" (step 3 above). Experience
suggests that when these two questions are separated as they are (and must be)
at the population level (e.g. first fully answer in plain language, then take
each plain language statement and identify measures that can serve as
proxy) then the process loses its common sense feel and becomes
unnecessarily complicated and time consuming. One interesting and usable
variation of this approach, used by the Department of Developmental Services
in California, listed all client results in plain language, and then developed
a set of measures for the group of client results as a whole (i.e. not
condition by condition).
(11) Obscure note #232: Some people wonder
why the progression from least important to most important runs from the
upper left to the lower right. There are 23 other possibilities (six
variations for each placement of most important). And some other systems
place the most important category in the "first read" upper left quadrant (6
ways to do this). Here's why. In this country we read from left to right and
from top to bottom. So the natural progression of reading a 4 quadrant chart
is upper left, upper right, lower left, lower right. This would obviously be
different for Hebrew or Chinese ideograms which proceed in different
direction. In Results-Based Accountability, we get the "How much did we do?"
question and set of measures out of the way first. "Yes, you work hard. Yes,
you do a lot of things. Yes, you see a lot of clients. Yes, it takes a lot
of time. You're great. We love you. Can we move on now." We let people
get the credit trap out of their system. Yes they get credit for all their
hard work in the upper left quadrant. With this out of the way it is much
easier to have the rest of the discussion. It is also essential to
understand who your customers are and what you do, in order to answer the
next two questions. "How well did we do it?" is next. Having established
what people do and for whom, we can now go on to examine how well they
perform the functions of their job(s). We set aside effects for customers
for a moment and focus on how well the service "plays" are executed. We'll
deal with whether we scored a goal or won the game in a minute. We also
think of course that there is a relationship between how well we deliver
service and whether our customers are better off. It helps to understand
these "drivers" of better-offness before getting to the third and fourth
quadrants. Finally we come to "Is anyone better off?" Here we look at
numbers and percentage pairs of measures. The raw numbers are less important
than the percentages (except in the case of small numbers), and so we put
them in the "next read" quadrant, lower left. So we read from upper left to
lower right because this is the natural progression in thinking about what
programs do and how to measure performance, and because this then matches
the natural sequence of reading in most countries. But there is nothing
magic or absolute about this. A number of people over the years have said
they find it easier start with the "Is anyone better off?" lower right
quadrant and work backwards to the other questions. Nothing wrong with that
if it works for you. I do ask, however, that, when presenting the model, you
keep the order of the quadrants as they are, for the simple reason that
thousands of people have seen them this way, and switching them now could
cause unnecessary confusion. Thanks.