Why we should take the safety performance of small samples with a grain of salt

Safety when expressed quantitatively as the probability of a loss over some unit of exposure, is in effect a proportional rate. This is useful as we can compare the performance of different systems or operations when one has of operating hours, and potentially lots of accidents while another has only a few operating hours and therefore fewer accidents.

The problem is that these sort of comparisons are actually unfair, and the reason they are is what’s called the law of large numbers. In essence it says we can expect small samples to have much greater variability than larger sample sizes. As a small cohort of systems will operate for fewer hours compared to a larger cohort you’ll therefore most likely see them occupying both the highest and lowest loss rate positions in any such comparison of safety.

To give you a feel for this effect our friend de Moivre found that the size of a typical discrepancy (variance) goes up proportional to the square root of the number of samples. As we divide that number by the total number of samples to get our proportional rate the proportional discrepancy is going to increase as our sample gets smaller.

Statistics don’t count for anything. They have no place in in engineering anywhere.

— Will Willoughby, Head of the Apollo reliability and safety program, The Space Shuttle: A Case of Subjective Engineering, Bell & Esch, 1989

As a real world example, think about what one accident did to the safety statistics of the Concorde fleet. The day before the accident it was the safest aircraft in the world, the day after it was suddenly the worst. All because the number of hours flown by the fleet was so small. Now was the aircraft really both the safest, then the unsafest in the space of two days? Answer no. To understand what’s happening here we need to realise that the law of large numbers works by diluting what’s already happened with new data until the past, in Concorde’s case one accident, becomes proportionally negligible. In the case of Concorde, there is just not a big enough pool of data, and therefore the contribution of one historical event significantly effects the proportional rate.

So if someone recommends Djibouti AirExpress because of their incredible safety record, when compared to a large international carrier, or advances the theory that a particular safety regulator for a smaller country does a bang up job comparitively speaking, take a moment to reflect that the law of large numbers is probably not your friend.