Why should you remove outliers?

Sometimes mistakes are made when recording data, or some exceptional events result in observations which are correct but not representative of normal operations. If such abnormal observations are taken into account, they will bias your estimates. They should therefore be identified and removed before cost estimation is attempted.

To identify potential outliers, you can first ask whether some exceptional events occurred in specific periods. Then you can also plot the data to see if some observations are off pattern. For instance, look at the following scatter graph:

Clearly, two observations appear off pattern (in red):

And here is the cost function obtained with a least square regression if we include (in red) or exclude (in blue) these outliers:

In this case, including outliers clearly reduces the slope (underestimates unit variable costs) and increases the intercept (overestimates the fixed costs). Depending on the position of the outliers, these effects might be different, but the estimates will in any case be biased.

Please indicate how clear and understandable this page was for you: