Disclaimer: This content falls into the realm of Systems Thinking. Systems which include humans are complex and changes to them often result in unintended consequences. As this post covers ‘productivity’ metrics we should tread carefully and remember the warning from Eliyahu M. Goldratt, “Tell me how you measure me, and I will tell you how I will behave.”
Estimating the amount of work that will be completed has always been a controversial topic. In 2011 the Scrum guide was updated and the term commitment was replaced with forecast. Regardless of the terminology, most Scrum teams make a guess as to how much they will do in a sprint, and this is typically reported to management.
There are generally three different techniques teams use for deciding on the forecast . A team can use a ‘gut feeling’ and take an educated guess as to how much will be completed. Alternatively a team may take an average of the last three to five sprints and use that as the forecast. Thirdly, there is a technique called yesterdays weather. This is when the actual points delivered in the previous sprint are used as the forecast for the next sprint. I have concerns about all three of these approaches. I feel that all of these approaches lead a team into overcommitting.
When teams use their ‘gut feeling’ they are subject to a number of biases. There is the planning fallacy which is the tendency for people to underestimate how long it will take to complete a task, even when they have done similar tasks before. Hofstadter’s Law states that “it always takes longer than you expect, even when you take into account Hofstadter’s Law.” On scrumplop.org they point out that motivated teams naturally set higher goals for themselves, however this does not always result in an improvement.
It’s human nature that individuals and teams with self-esteem set increasingly higher goals for themselves. By trying to achieve these goals, teams learn. Sometimes such strivings leads to immediate improvement, particularly when the team challenges itself to improve through use of a new-found technique or technology. Sometimes, however, the new technique doesn’t pan out and the team’s performance remains the same or even gets worse.
Sometimes, the team raises the bar just to raise the bar, either to test the limits or to unconsciously publish its bravado. Even these self-challenges can pay off, especially in the short term. It is more common that these higher levels are unsustainable in the long term.
Therefore: In most cases, the number of Estimation Points completed in the last Sprint is the most reliable predictor of how many Estimation Points will be completed in the next Sprint.
The quote above ends with scrumplop.org recommending yesterdays weather as the most reliable way to forcast the work that will be done in the next sprint. Unlike using your ‘gut feeling’ with ‘’yesterdays weather’ the team can’t simply choose a high velocity (potentially hiding from reality). In order to get a high commitment, you need to have actually delivered a high amount in the previous sprint.
This may work for mature teams with stable platforms, it may however be less successful when there are many disruptions e.g. changes in personal, many technical unknowns, unpredictable development and test environments e.t.c. The following diagram is data taken from a real project. It shows actual delivery plotted against a retrofitted forecast based on yesterdays weather. In this case the team would have met the goal ~52% of the time. This may not be consistent enough for the teams planning needs and this success rate may negatively affect the teams motivation over time.
Forecast based on ‘Yesterdays Weather’
Note: The forecastscommitments have been retrofitted to the graph. The team may have had different results had they used yesterdays weather during planning.
If the actual velocity is varying enough to make yesterdays weather unreliable, a team may decide to take the average of the last few sprints (typically 3 – 6 sprints). If we consider the definition of average, this means that about half of the actuals will fall above the average and about half will be below. Statistically speaking we only have a 50% chance of meeting the forecast if we base this on the average.
A retrofitted forecast based on the total average, means that the commitment is met 50% of the time
A retrofitted forecast based on a rolling average of 5 sprints, means that the commitment is met 46% of the time
Due to the reasons stated above I believe that these, well intentioned, forecasting techniques may lead a team into over committing on a regular basis. Not only does this make release planning and budgeting difficult, overtime this can have a detrimental effect on a team. Team members may stop caring about meeting the commitment and sprint goal. This can result in a general lack of motivation. If your team is suffering from this, you may want to try using a control chart to help you set realistic forecasts.
Control Charts are typically used to determine if a manufacturing or business process is in a state of statistical control. Carl Berardinelli explains that there are four process states (seen above). He states that “every process falls into one of these states at any given time, but will not remain in that state. All processes will migrate toward the state of chaos. Companies typically begin some type of improvement effort when a process reaches the state of chaos (although arguably they would be better served to initiate improvement plans at the brink of chaos or threshold state). Control charts are robust and effective tools to use as part of the strategy used to detect this natural process degradation”.
The following link shows children explaining how Flowcharts and Control Charts are used at their kindergarten.
Above is a Control Chart derived from a teams velocity. The dark blue line is the actual velocity over the past 29 sprints. The grey (gray?) line represents the average actual velocity. The red lines show the Upper and Lower control limits. These are plotted at one standard deviation from the average.