Archives For Safety

When you look at the safety performance of industries which have a consistent focus on safety as part of the social permit, nuclear or aviation are the canonical examples, you see that over time increases in safety tend to plateau out. This looks like some form of a learning curve, but what’s the mechanism, or mechanisms that actually drives this process? I believe there are two factors at play here, firstly the increasing marginal cost of improvement and secondly the problem of learning from events that we are trying to prevent.

Increasing marginal cost is simply an economist’s way of stating that it will cost more to achieve that next increment in performance. For example, airbags are more expensive than seat-belts by roughly an order of magnitude (based on replacement costs) however airbags only deliver 8% reduced mortality when used in conjunction with seat belts, see Crandall (2001). As a result the next increment in safety takes longer and costs more (1).

The learning factor is in someways like an informational version of the marginal cost rule. As we reduce accident rates accidents become rarer. Now one of the traditional ways in which safety improvements occur is through studying accidents when they occur and then to eliminate or mitigate identified causal factors. Obviously as the accident rate decreases this likewise the opportunity for improvement also decreases. When accidents do occur we have a further problem because (definitionally) the cause of the accident will comprise a highly unlikely combination of factors that are needed to defeat the existing safety measures. Corrective actions for such rare combination of events therefore are highly specific to that event’s context and conversely will have far less universal applicability.  For example the lessons of metal fatigue learned from the Comet airliner disaster has had universal applicability to all aircraft designs ever since. But the QF-72 automation upset off Learmouth? Well those lessons, relating to the specific fault tolerance architecture of the A330, are much harder to generalise and therefore have less epistemic strength.

In summary not only does it cost more with each increasing increment of safety but our opportunity to learn through accidents is steadily reduced as their arrival rate and individual epistemic value (2) reduce.

Notes

1. In some circumstances we may also introduce other risks, see for example the death and severe injury caused to small children from air bag deployments.

2. In a Popperian sense.

References

1. Crandall, C.S., Olson, L.M.,  P. Sklar, D.P., Mortality Reduction with Air Bag and Seat Belt Use in Head-on Passenger Car Collisions, American Journal of Epidemiology, Volume 153, Issue 3, 1 February 2001, Pages 219–224, https://doi.org/10.1093/aje/153.3.219.

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a explanation based on a defined set of phenomena, that argues that if they are present, and given there exists some relevant domain source (1) of hazard an accident will occur. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom, i.e. if we have both a source and a set of mechanisms in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s and it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (2). The one advantage that covering models do possess over other explanatory models however is that they largely avoid the problems of causal arguments. Which may well be why they persist in engineering arguments about safety.

Notes

1. The source in this instance is the ‘covering law’.

2. Such as counterfactual, statistical relevance or causal explanations.

References

Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

And not quite as simple as you think…

The testimony of Michael Barr, in the recent Oklahoma Toyota court case highlighted problems with the design of Toyota’s watchdog timer for their Camry ETCS-i  throttle control system, amongst other things, which got me thinking about the pervasive role that watchdogs play in safety critical systems. The great strength of watchdogs is of course that they provide a safety mechanism which resides outside the state machine, which gives them fundamental design independence from what’s going on inside. By their nature they’re also simple and small scale beasts, thereby satisfying the economy of mechanism principle.

Continue Reading…

Taboo transactions and the safety dilemma Again my thanks goes to Ross Anderson over on the Light Blue Touchpaper blog for the reference, this time to a paper by Alan Fiske  an anthropologist and Philip Tetlock a social psychologist, on what they terms taboo transactions. What they point out is that there are domains of sharing in society which each work on different rules; communal, versus reciprocal obligations for example, or authority versus market. And within each domain we socially ‘transact’ trade-offs between equivalent social goods.

Continue Reading…

So what do gambling, thermodynamics and risk all have in common?

Continue Reading...

In June of 2011 the Australian Safety Critical Systems Association (ASCSA) published a short discussion paper on what they believed to be the philosophical principles necessary to successfully guide the development of a safety critical system. The paper identified eight management and eight technical principles, but do these principles do justice to the purported purpose of the paper?

Continue Reading…

Why something as simple as control stick design can break an aircrew’s situational awareness

One of the less often considered aspects of situational awareness in the cockpit is the element of knowing what the ‘guy in the other seat is doing’. This is a particularly important part of cockpit error management because without a shared understanding of what someone is doing it’s kind of difficult to detect errors.

Continue Reading…