Archives For Analysis

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a causal explanation, based on a defined set of phenomena, that argues that if they are present and given a relevant domain `law’ there will be an accident. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom , i.e. if we have all three elements in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s, and while it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (1). The one advantage that covering models do possess over other explanatory models is that they largely avoid the problems of causal arguments, as their makers intended. Which may well be why they persist in engineering arguments about safety.

Notes

1. Such as counterfactual, statistical relevance and causal explanations.

References

Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

Screwtape(Image source: end time info)

More infernal statistics

Well, here we are again. Given recent developments in the infernal region it seems like a good time for another post. Have you ever, dear reader, been faced with the problem of how to achieve an unachievable safety target? Well worry no longer! Herewith is Screwtape’s patented man based mitigation medicine.

The first thing we do is introduce the concept of ‘mitigation’, ah what a beautiful word that is. You see it’s saying that it’s OK that your system doesn’t meet its safety target, because you can claim credit for the action of an external mitigator in the environment. Probability wise if the probability of an accident is P_a then P_a equals the product of your systems failure probability P_s and. the probability that some external mitigation also fails P_m or P_a = P_s X P_m. 

So let’s use operator intervention as our mitigator, lovely and vague. But how to come up with a low enough P_m? Easy, we just look at the accident rate that has occurred for this or a like system and assume that these were due to operator mitigation being unsuccessful. Voila, we get our really small numbers. 

Now, an alert reader might point out that this is totally bogus and that P_m is actually the likelihood of operator failure when the system fails. Operators failing, as those pestilential authors of the WASH1400 study have pointed out, is actually quite likely. But I say, if your customer is so observant and on the ball then clearly you are not doing your job right. Try harder or I may eat your soul, yum yum. 

Yours hungrily, 

Screwtape.

Morely.png

Why writing a safety case might (actually) be a good idea

Frequent readers of my blog would probably realise that I’m a little sceptical of safety cases, as Scrooge remarked to Morely’s ghost, “There’s more of gravy than of grave about you, whatever you are!” So to for safety cases, oft more gravy than gravitas about them in my opinion, regardless of what their proponents might think.

Continue Reading…

Lady Justice (Image source: Jongleur CC-BY-SA-3.0)

Or how I learned to stop worrying about trifles and love the Act

One of the Achilles heels of the current Australian WH&S legislation is that it provides no clear point at which you should stop caring about potential harm. While there are reasons for this, it does mean that we can end up with some theatre of the absurd moments where someone seriously proposes paper cuts as a risk of concern.

The traditional response to such claims of risk is to point out that actually the law rarely concerns itself with such trifles. Or more pragmatically, as you are highly unlikely to be prosecuted over a paper cut it’s not worth worrying about. Continue Reading…

787 Battery after fire (Image source: NTSB)

The NTSB have released their final report on the Boeing 787 Dreamliner Li-Ion battery fires. The report makes interesting reading, but for me the most telling point is summarised in conclusion seven, which I quote below.

Conclusion 7. Boeing’s electrical power system safety assessment did not consider the most severe effects of a cell internal short circuit and include requirements to mitigate related risks, and the review of the assessment by Boeing authorized representatives and Federal Aviation Administration certification engineers did not reveal this deficiency.

NTSB/AIR-14/01  (p78 )

In other words Boeing got themselves into a position with their safety assessment where their ‘assumed worst case’ was much less worse case than the reality. This failure to imagine the worst ensured that when they aggressively weight optimised the battery design instead of thermally optimising it, the risks they were actually running were unwittingly so much higher.

The first principal is that you must not fool yourself, and that you are the easiest person to fool

Richard P. Feynman

I’m also thinking that the behaviour of Boeing is consistent with what McDermid et al, calls probative blindness. That is, the safety activities that were conducted were intended to comply with regulatory requirements rather than actually determine what hazards existed and their risk.

… there is a high level of corporate confidence in the safety of the [Nimrod aircraft]. However, the lack of structured evidence to support this confidence clearly requires rectifying, in order to meet forthcoming legislation and to achieve compliance.

Nimrod Safety Management Plan 2002 (1)

As the quote from the Nimrod program deftly illustrates, often (2) safety analyses are conducted simply to confirm what we already ‘know’ that the system is safe, non-probative if you will. In these circumstances the objective is compliance with the regulations rather than to generate evidence that our system is unsafe. In such circumstances doing more or better safety analysis is unlikely to prevent an accident because the evidence will not cause beliefs to change, belief it seems is a powerful thing.

The Boeing battery saga also illustrates how much regulators like the FAA actually rely on the technical competence of those being regulated, and how fragile that regulatory relationship is when it comes to dealing with the safety of emerging technologies.

Notes

1. As quoted in Probative Blindness: How Safety Activity can fail to Update Beliefs about Safety, A J Rae*, J A McDermid, R D Alexander, M Nicholson (IET SSCS Conference 2014).

2. Actually in aerospace I’d assert that it’s normal practice to carry out hazard analyses simply to comply with a regulatory requirement. As far as the organisation commissioning them is concerned the results are going to tell them what they know already, that the system is safe.

An interesting post by Mike Thicke over at Cloud Chamber on the potential use of prediction markets to predict the location of MH370. Prediction markets integrate ‘diffused’ knowledge using a market mechanism to derive a predicted likelihood, essentially market prices are assigned to various outcomes and are treated as analogs of their likelihood. Market trading then established what the market ‘thinks’ is the value of each outcome. The technique has a long and colourful history, but it does seem to work. As an aside prediction markets are still predicting a No vote in the upcoming referendum on Scottish Independence despite recent polls to the contrary.

Returning to the MH370 saga, if the ATSB is not intending to use a Bayesian search plan then one could in principle crowd source the effort through such a prediction market. One could run the market in a dynamic fashion with the market prices updating as new information comes in from the ongoing search. Any investors out there?

Enshrined in Australia’s current workplace health and safety legislation is the principle of ‘So Far As Is Reasonably Practicable’. In essence SFAIRP requires you to eliminate or to reduce risk to a negligible level as is (surprise) reasonably practicable. While there’s been a lot of commentary on the increased requirements for diligence (read industry moaning and groaning) there’s been little or no consideration of what is the ‘theory of risk’ that backs this legislative principle and how it shapes the current legislation, let alone whether for good or ill. So I thought I’d take a stab at it. 🙂 Continue Reading…