Standard Deviation Challenged
Thursday 24th May 2018
Side by side

The Charts

Show one of many runs, with old (left) and new methods side by side
There's now a narrower left to right 'cluster', but that is expected from the new method.
A better measure of success is, for every 1,000 runs:-

  1. How many greens are there?
  2. How many reds?
  3. How far out did the worst 10% of reds start?

Originally published by Cranfield


In this paper, first published 2002, we found that:-


Across a wide range of rates of sale, forecast methods, lead times and service levels, standard deviation compounded the forecast error (in the total of cycle + safety stock) more often than it cancelled or partly cancelled it.

Communicating the Results

The intended audience were busy line managers with no grounding in statistics. We needed something punchy and very visual to capture such a wide range of input variables and outcomes.

We set up a competition between the conventional and proposed alternative method (above). To win, one method had to have more 'hits' and fewer bad misses and 900 of the 1000 points closer than the other method. In each run, the computer randomly chose and recalculated 10 cases from the ~4,500 possible combinations in the model.

The proposed method won 85% of trials and tied the rest. The textbook method never won.

Other lines of enquiry

We looked at stepped demand, considered building in trend or seasonality, or extending the range of forecast methods.

Stepped demand showed nothing new. Past SD is simply a poor predictor of future SD (colloquially, 'SD is a poor predictor of itself') and that error reinforces the forecast error more than it cancels. In the words of one, 'if SD were a forecast method we would never use it. Once you isolate SD from all the noise around it, it's simply too bad a forecast of itself to be usable' In the circumstances, making demand more complex by inducing a step would prove nothing new. This is ironic. Step changes in demand are a special case, a minority subset of the whole. Yet the only reason we reforecast is because we think there might have been a step change. If there is no change, there's no reason to reforecast. 'To find a needle in a haystack, don't start by adding hay' It seems to us that if the simple case SD (no trend, no seasonality) is unsound then complication will simply increase the decibels on something already too noisy to use.

We think these nuances are red herrings - the SD problem is structural, and the occasional stoke of luck at particular settings should not divert us from the search for a better method.

The search for alternatives

In forecasting it's widely accepted that we can predict a family with more accuracy than an individual within the family. This is the principle underlying the insurance and assurance markets, that swings cancel roundabouts. It may be widely understood; it's less widely practiced. Individuals and (especially) computer systems forecast at the SKU (Stock Keeping Unit, i.e. product), and sometimes SKU + shop level because they can, not because they should.

'Dynamic response to real time sales data' (a direct quote from a software sales brochure) starts with an assumption that demand changed, and that we should therefore do something about it - in this case change the store stock. In a couple of case studies we've shown how this doubles the shop out-of-stock. A 'change' signalled by a sale or lack of sale (and subsequent reforecast) isn't necessarily a change in demand, it may be pure luck.

Grouping products into 'volatility families' seemed a fruitful place to start a search for an alternative to SKU by SKU SD calculation. Can products be grouped by volatility, and the SD for the group be used for each member?

  1. Would coefficient of variance (the square of the SD) be a better place to start?
  2. Or SD as a percent of the square root of mean? (SD%)
In theory they can (especially the last, for reasons beyond this paper), and there's some empirical evidence that this would work in practice.

For example, we know that spares demand for expensive items follows a predictable skew curve, while cheap spares are more erratic[1] This reflects a 'one to fit, one for the van' buying pattern for cheap items

Elsewhere, we know that companies who have given unreliable ex stock service experience demand amplification, and there's some clustering of the SKU by SKU SD% for that firm.

Further, different firms experience different amounts of amplification - if a firm has higher than normal amplification on one product it will tend to have similar amplification on all products.

Colloquially, 'If demand is erratic, it's universally erratic'.

  1. For the same rate of consumption, cheap spares might have a SD of 1.8 times the square root of the forecast (SD180%)

If most of the inputs are in control most of the time, the outcome is still erratic.
Home | About Us | Showcase | Research | Cases | The Vaults | Tips & Quips | Contact Us …
© 2002 - 2018 Supply Chain Tools Ltd.