Commitment considered harmful

Disclaimer: This content falls into the realm of Systems Thinking. Systems which include humans are complex and changes to them often result in unintended consequences. As this post covers ‘productivity’ metrics we should tread carefully and remember the warning from Eliyahu M. Goldratt, “Tell me how you measure me, and I will tell you how I will behave.”

Estimating the amount of work that will be completed has always been a controversial topic. In 2011 the Scrum guide was updated and the term commitment was replaced with forecast. Regardless of the terminology, most Scrum teams make a guess as to how much they will do in a sprint, and this is typically reported to management.

There are generally three different techniques teams use for deciding on the forecast . A team can use a ‘gut feeling’ and take an educated guess as to how much will be completed. Alternatively a team may take an average of the last three to five sprints and use that as the forecast. Thirdly, there is a technique called yesterdays weather. This is when the actual points delivered in the previous sprint are used as the forecast for the next sprint. I have concerns about all three of these approaches. I feel that all of these approaches lead a team into overcommitting.

When teams use their ‘gut feeling’ they are subject to a number of biases. There is the planning fallacy which is the tendency for people to underestimate how long it will take to complete a task, even when they have done similar tasks before. Hofstadter’s Law states that “it always takes longer than you expect, even when you take into account Hofstadter’s Law.” On scrumplop.org they point out that motivated teams naturally set higher goals for themselves, however this does not always result in an improvement.

It’s human nature that individuals and teams with self-esteem set increasingly higher goals for themselves. By trying to achieve these goals, teams learn. Sometimes such strivings leads to immediate improvement, particularly when the team challenges itself to improve through use of a new-found technique or technology. Sometimes, however, the new technique doesn’t pan out and the team’s performance remains the same or even gets worse.

Sometimes, the team raises the bar just to raise the bar, either to test the limits or to unconsciously publish its bravado. Even these self-challenges can pay off, especially in the short term. It is more common that these higher levels are unsustainable in the long term.

Therefore: In most cases, the number of Estimation Points completed in the last Sprint is the most reliable predictor of how many Estimation Points will be completed in the next Sprint.

The quote above ends with scrumplop.org recommending yesterdays weather as the most reliable way to forcast the work that will be done in the next sprint. Unlike using your ‘gut feeling’ with ‘’yesterdays weather’ the team can’t simply choose a high velocity (potentially hiding from reality). In order to get a high commitment, you need to have actually delivered a high amount in the previous sprint.

This may work for mature teams with stable platforms, it may however be less successful when there are many disruptions e.g. changes in personal, many technical unknowns, unpredictable development and test environments e.t.c. The following diagram is data taken from a real project. It shows actual delivery plotted against a retrofitted forecast based on yesterdays weather. In this case the team would have met the goal ~52% of the time. This may not be consistent enough for the teams planning needs and this success rate may negatively affect the teams motivation over time.

ForecastYesterdays Weather

Forecast based on ‘Yesterdays Weather’

Note: The forecastscommitments have been retrofitted to the graph. The team may have had different results had they used yesterdays weather during planning.

If the actual velocity is varying enough to make yesterdays weather unreliable, a team may decide to take the average of the last few sprints (typically 3 – 6 sprints). If we consider the definition of average, this means that about half of the actuals will fall above the average and about half will be below. Statistically speaking we only have a 50% chance of meeting the forecast if we base this on the average.

A retrofitted forecast based on the total average

A retrofitted forecast based on the total average, means that the commitment is met 50% of the time

A retrofitted forecast based on a rolling average of 5 sprints

A retrofitted forecast based on a rolling average of 5 sprints, means that the commitment is met 46% of the time

Due to the reasons stated above I believe that these, well intentioned, forecasting techniques may lead a team into over committing on a regular basis. Not only does this make release planning and budgeting difficult, overtime this can have a detrimental effect on a team. Team members may stop caring about meeting the commitment and sprint goal. This can result in a general lack of motivation. If your team is suffering from this, you may want to try using a control chart to help you set realistic forecasts.

Control Charts1

Control Charts are typically used to determine if a manufacturing or business process is in a state of statistical control. Carl Berardinelli explains that there are four process states (seen above). He states that “every process falls into one of these states at any given time, but will not remain in that state. All processes will migrate toward the state of chaos. Companies typically begin some type of improvement effort when a process reaches the state of chaos (although arguably they would be better served to initiate improvement plans at the brink of chaos or threshold state). Control charts are robust and effective tools to use as part of the strategy used to detect this natural process degradation”.

ControlCharts2

The following link shows children explaining how Flowcharts and Control Charts are used at their kindergarten.

Above is a Control Chart derived from a teams velocity

Above is a Control Chart derived from a teams velocity. The dark blue line is the actual velocity over the past 29 sprints. The grey (gray?) line represents the average actual velocity. The red lines show the Upper and Lower control limits. These are plotted at one standard deviation from the average.

In statistics, the standard deviation (represented by the Greek letter sigma, σ) is used to show how consistent data points are relative the the average. A higher standard deviation means that the data points are spread further from the average than a lower standard deviation. In the graph above, the wider the redlines are the more variation there is.

The following diagram, from Carl Berardinelli, shows the relationship Relationship of Control Chart to Normal Curve.

Statistically speaking

Statistically speaking, given a normal distribution, there is a 68.26% chance that the points delivered in the next sprint will fall between these two control limits. Therefore, in the example above the the team is likely to do more than 15 points but less than 26 points in the next sprint.

These values (15 – 26) are based on one standard deviation (represented by the Greek letter sigma, σ) above and below the average. The images above show three standard deviations away. In manufacturing, it can be necessary to use up to six standard deviation points away from the average. This is where Six Sigma get’s its name from.

An alternate representation of a sprint control chart

An alternate representation of a ‘sprint control chart’ showing additional control lines

Diagram from Musing Mathematically

Diagram from Musing Mathematically

Mathematically speaking there is a 16% chance of the next data point falling below the lower control limit 1 – meaning that there is an 84% that the team will deliver this amount.  Therefore the team can confidently commit to delivering this amount. This is not to be seen as a cap or limit, it is simply a commitment that the team is statistically likely to deliver on.

Note that a Control Chart (and other charts like a burn down) show symptoms. There needs to be a discussion with the team to discover the root cause, of either positive or negative symptoms.   WikiHow explains the general rules which are followed when assessing a Control Chart.

wikihow

the data above
The above data shows a team using different forecasting techniques
Sprint 1 – 8: gut feelingeducated guessing
Sprint 9 – 18: Yesterdays weather
Sprint 19 – 29: Committing to the Lower Control Limit

The chart above is not meant to imply that setting a commitment to the lower control limit will decrease variance. It does show that the team would have met the commitment more often if they had based their forecast on the Lower Control Limit 1.

While the maths may add up, is this truly good advice for teams? Are the previously mentioned concerns about over committing really that valid? Shouldn’t we use challenging stretch goals to put pressure on teams? While answering these questions is up to you, I would like to put forward the following views.

In the book Oops!,Aubrey Daniels argues that stretch goals are ineffective and a waste of time and money. Daniels cites one study that shows that when individuals repeatedly fail to reach stretch goals, their performance declined. Psychologist Karl Weick, author of “Small Wins”,  argues that people often become overwhelmed and discouraged when faced with massive and complex problems. He recommends breaking up larger problems into smaller, tractable challenges that produce visible results. Weick explains that the strategy of “small wins” can often generate more action and more complete solutions to major problems because it enables people to make slow, steady progress.

When using the control chart, the first small, team level, win is to pass the lower control limit 1. Meeting this will mean the team have met their forecast for the sprint. The team can set the next goal of surpassing the average. Again, they will only pass this second goal about half of the time, but at least they would have met their commitment and had one small win.

In this way we might avoid the long term consequences of missing goals, and benefit from “the happiness advantage” – just like Amy the unicorn.

Additional notes:

What about #NoEstimates?

Some teams have dropped estimating and forecasting. This can work, however I believe it is still good to have some form of short term goal. If not meeting a forecast, then something else. Many people prefer to keep track of the number of stories and not the story points. If you are doing this you can still use a control chart, just base the data on the number of stories.

What’s next?

I am looking into ways of making the chart more responsive to change. For example, using a Moving Range Control Chart or a Control Chart with a rolling average and deviation as shown in the chart below.

Control Chart with a rolling average and deviation

Control Chart with a rolling average and deviation

3 thoughts on “Commitment considered harmful

  1. Hi Daryn,
    This stuff is great fun, but is it really what teams need?

    I’d start with this observation: “most Scrum teams make a guess as to how much they will do in a sprint, and this is typically reported to management.” True, it typically is. But, should it be? Whatever this amount is called, isn’t it an internal control for the team? Why is it being reported to management? We know only too well what managers like do with numbers like that! Rather, wouldn’t the team be better off using that commitment, forecast, estimate, whatever, to give an update on how confident they are that the next release will deliver sufficient value? Reporting sprint-by-sprint forecasts seems like an invitation to micromanage.

    Also, be careful with averages. In general, this: “the definition of average […] means that about half of the actuals will fall above the average and about half will be below” is true for the median. We only see half the values above the mean, and half below, for symmetrical distributions. Eyeballing your examples suggests to me that in fact the velocity figures are not symmetrically distributed (an therefore not normally distributed). Characteristics of long runs of large number of multiples of manufactured articles, like weights and diameters, do tend to vary in a way often well represented by the normal distribution, and this is the well understood “common cause variation”. But for something like a development team’s velocity, representing the outcome of a (small) group of people doing a mostly novel activity over a short time I’d expect “special cause” variation to dominate, which is not easily modelled.

    When I look at your charts, I imagine the distress in the team around, say, sprints 17 to 21, things are just getting worse and worse, they can’t seem to turn the corner, can’t seem to get stuff done, even as little as they did last sprint. Do we really need the control chart to tell us that there is a problem? Sprint after sprint they get less done: catastrophe! I’d bet money that an unpredicted, unexpected, special cause jumped out at them in sprint 18 and they took a long, long time to recover. And so on.

    As you know, I love this kind of stuff, and I’d love if helping teams perform could be this quantitative and crisp, but I doubt it more and more every day.

    Best,
    Keith

  2. It appears your embedded images are not longer displaying. I loved the write up, but it would have made more of an impact if I could have seen the images. Can you fix this for us?

Leave a Reply

Your email address will not be published. Required fields are marked *