Archives For Hazards

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a causal explanation, based on a defined set of phenomena, that argues that if they are present and given a relevant domain `law’ there will be an accident. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom , i.e. if we have all three elements in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s, and while it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (1). The one advantage that covering models do possess over other explanatory models is that they largely avoid the problems of causal arguments, as their makers intended. Which may well be why they persist in engineering arguments about safety.

Notes

1. Such as counterfactual, statistical relevance and causal explanations.

References

Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

Perusing the FAA’s system safety handbook while doing some research for a current job, I came upon an interesting definition of severities. What’s interesting is that the FAA introduces the concept of safety margin reduction as a specific form of severity (loss).

Here’s a summary of Table (3-2) form the handbook:

  • Catastrophic – ‘Multiple fatalities and/or loss of system’
  • Major – ‘Significant reduction in safety margin…’
  • Minor – ‘Slight reduction in safety margin…’

If we think about safety margins for a functional system they represent a system state that’s a precursor to a mishap, with the margin representing some intervening set of states. But a system state of reduced safety margin (lets call it a hazard state) is causally linked to a mishap state, else we wouldn’t care, and must therefore inherit it’s severity. The problem is that in the FAA’s definition they have arbitrarily assigned severity levels to specific hazardous degrees of safety margin reduction, yet all these could still be linked causally to a catastrophic event, e.g. a mid-air collision.

What the FAA’s Systems Engineering Council (SEC) has done is conflate severity with likelihood, as a result their severity definition is actually a risk definition, at least when it comes to safety margin hazards. The problem with this approach is that we end up under treating risks as per classical risk theory. For example say we have a potential reduction in safety margin, which is also casually linked to a catastrophic outcome. Now per Table 3-2 if the reduction was classified as ‘slight’, then we would assess the probability and given the minor severity decide to do nothing, even though in reality the severity is still catastrophic. If, on the other hand, we decided to make decisions based on severity alone, we would still end up making a hidden risk judgement depending on what the likelihood of propagation form hazard state to accident state was (undefined in the handbook). So basically the definitions set you up for trouble even before you start.

My guess is that the SEC decided to fill in the lesser severities with hazard states because for an ATM system true mishaps tend to be invariably catastrophic, and they were left scratching their head for lesser severity mishap definitions. Enter the safety margin reduction hazard. The take home from all this is that severity needs to be based on the loss event, introducing intermediate hybrid hazard/severity state definitions leads inevitably to incoherence of your definition of risk. Oh and (as far as I am aware) this malformed definition has spread everywhere…

Although you would expect a discipline like safety engineering to have a very well defined and agreed set of foundational concepts, strangely the definition of what is a hazard (one such) remains elusive, with a range of different standards introducing differing definitions.

Continue Reading…