Archives For Analysis

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a explanation based on a defined set of phenomena, that argues that if they are present, and given there exists some relevant domain source (1) of hazard an accident will occur. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom, i.e. if we have both a source and a set of mechanisms in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s and it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (2). The one advantage that covering models do possess over other explanatory models however is that they largely avoid the problems of causal arguments. Which may well be why they persist in engineering arguments about safety.


1. The source in this instance is the ‘covering law’.

2. Such as counterfactual, statistical relevance or causal explanations.


Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

To err is inhuman


Screwtape(Image source: end time info)

More infernal statistics

Well, here we are again. Given recent developments in the infernal region it seems like a good time for another post. Have you ever, dear reader, been faced with the problem of how to achieve an unachievable safety target? Well worry no longer! Herewith is Screwtape’s patented man based mitigation medicine.

The first thing we do is introduce the concept of ‘mitigation’, ah what a beautiful word that is. You see it’s saying that it’s OK that your system doesn’t meet its safety target, because you can claim credit for the action of an external mitigator in the environment. Probability wise if the probability of an accident is P_a then P_a equals the product of your systems failure probability P_s and. the probability that some external mitigation also fails P_m or P_a = P_s X P_m. 

So let’s use operator intervention as our mitigator, lovely and vague. But how to come up with a low enough P_m? Easy, we just look at the accident rate that has occurred for this or a like system and assume that these were due to operator mitigation being unsuccessful. Voila, we get our really small numbers. 

Now, an alert reader might point out that this is totally bogus and that P_m is actually the likelihood of operator failure when the system fails. Operators failing, as those pestilential authors of the WASH1400 study have pointed out, is actually quite likely. But I say, if your customer is so observant and on the ball then clearly you are not doing your job right. Try harder or I may eat your soul, yum yum. 

Yours hungrily, 



Why writing a safety case might (actually) be a good idea

Frequent readers of my blog would probably realise that I’m a little sceptical of safety cases, as Scrooge remarked to Morely’s ghost, “There’s more of gravy than of grave about you, whatever you are!” So to for safety cases, oft more gravy than gravitas about them in my opinion, regardless of what their proponents might think.

Continue Reading…

Lady Justice (Image source: Jongleur CC-BY-SA-3.0)

Or how I learned to stop worrying about trifles and love the Act

One of the Achilles heels of the current Australian WH&S legislation is that it provides no clear point at which you should stop caring about potential harm. While there are reasons for this, it does mean that we can end up with some theatre of the absurd moments where someone seriously proposes paper cuts as a risk of concern.

The traditional response to such claims of risk is to point out that actually the law rarely concerns itself with such trifles. Or more pragmatically, as you are highly unlikely to be prosecuted over a paper cut it’s not worth worrying about. Continue Reading…

787 Battery after fire (Image source: NTSB)

The NTSB have released their final report on the Boeing 787 Dreamliner Li-Ion battery fires. The report makes interesting reading, but for me the most telling point is summarised in conclusion seven, which I quote below.

Conclusion 7. Boeing’s electrical power system safety assessment did not consider the most severe effects of a cell internal short circuit and include requirements to mitigate related risks, and the review of the assessment by Boeing authorized representatives and Federal Aviation Administration certification engineers did not reveal this deficiency.

NTSB/AIR-14/01  (p78 )

In other words Boeing got themselves into a position with their safety assessment where their ‘assumed worst case’ was much less worse case than the reality. This failure to imagine the worst ensured that when they aggressively weight optimised the battery design instead of thermally optimising it, the risks they were actually running were unwittingly so much higher.

The first principal is that you must not fool yourself, and that you are the easiest person to fool

Richard P. Feynman

I’m also thinking that the behaviour of Boeing is consistent with what McDermid et al, calls probative blindness. That is, the safety activities that were conducted were intended to comply with regulatory requirements rather than actually determine what hazards existed and their risk.

… there is a high level of corporate confidence in the safety of the [Nimrod aircraft]. However, the lack of structured evidence to support this confidence clearly requires rectifying, in order to meet forthcoming legislation and to achieve compliance.

Nimrod Safety Management Plan 2002 (1)

As the quote from the Nimrod program deftly illustrates, often (2) safety analyses are conducted simply to confirm what we already ‘know’ that the system is safe, non-probative if you will. In these circumstances the objective is compliance with the regulations rather than to generate evidence that our system is unsafe. In such circumstances doing more or better safety analysis is unlikely to prevent an accident because the evidence will not cause beliefs to change, belief it seems is a powerful thing.

The Boeing battery saga also illustrates how much regulators like the FAA actually rely on the technical competence of those being regulated, and how fragile that regulatory relationship is when it comes to dealing with the safety of emerging technologies.


1. As quoted in Probative Blindness: How Safety Activity can fail to Update Beliefs about Safety, A J Rae*, J A McDermid, R D Alexander, M Nicholson (IET SSCS Conference 2014).

2. Actually in aerospace I’d assert that it’s normal practice to carry out hazard analyses simply to comply with a regulatory requirement. As far as the organisation commissioning them is concerned the results are going to tell them what they know already, that the system is safe.

An interesting post by Mike Thicke over at Cloud Chamber on the potential use of prediction markets to predict the location of MH370. Prediction markets integrate ‘diffused’ knowledge using a market mechanism to derive a predicted likelihood, essentially market prices are assigned to various outcomes and are treated as analogs of their likelihood. Market trading then established what the market ‘thinks’ is the value of each outcome. The technique has a long and colourful history, but it does seem to work. As an aside prediction markets are still predicting a No vote in the upcoming referendum on Scottish Independence despite recent polls to the contrary.

Returning to the MH370 saga, if the ATSB is not intending to use a Bayesian search plan then one could in principle crowd source the effort through such a prediction market. One could run the market in a dynamic fashion with the market prices updating as new information comes in from the ongoing search. Any investors out there?

Enshrined in Australia’s current workplace health and safety legislation is the principle of ‘So Far As Is Reasonably Practicable’. In essence SFAIRP requires you to eliminate or to reduce risk to a negligible level as is (surprise) reasonably practicable. While there’s been a lot of commentary on the increased requirements for diligence (read industry moaning and groaning) there’s been little or no consideration of what is the ‘theory of risk’ that backs this legislative principle and how it shapes the current legislation, let alone whether for good or ill. So I thought I’d take a stab at it. 🙂 Continue Reading…


On Artificial Intelligence as ethical prosthesis

Out here in the grim meat-hook present of Reaper missions and Predator drone strikes we’re already well down track to a future in which decisions as to who lives and who dies are made less and less by human beings, and more and more by automation. Although there’s been a lot of ‘sexy’ discussion recently of the possibility of purely AI decision making, the current panic misses the real issue d’jour, that is the question of how well current day hybrid human-automation systems make such decisions, and the potential for the incremental abrogation of moral authority by the human part of this cybernetic system as the automation in this synthesis becomes progressively more sophisticated and suasive.

As Dijkstra pointed out in the context of programming, one of the problems or biases humans have in thinking about automation is that because it ‘does stuff’, we find the need to imbue it with agency, and from there it’s a short step to treating the automation as a partner in decision making. From this very human misunderstanding it’s almost inevitable that the the decision maker holding such a view will feel that the responsibility for decisions are shared, and responsibility diluted, thereby opening up potential for choice shift in decision making. As the degree of sophistication of such automation increases of course this effect becomes stronger and stronger, even though ‘on paper’ we would not recognise the AI as a rational being in the Kantian sense.

Even the design of decision support system interfaces can pose tricky problems when an ethical component is present, as the dimensions of ethical problem solving (time intensiveness, consideration, uncertainty, uniqueness and reflection) directly conflict with those that make for efficient automation (brevity, formulaic, simplification, certainty and repetition). This inherent conflict thereby ensuring that the interaction of automation and human ethical decision making becomes a tangled and conflicted mess. Technologists of course look at the way in which human beings make such decisions in the real world and believe, rightly or wrongly, that automation can do better. What we should remember is that such automation is still a proxy for the designer, if the designer has no real understanding of the needs of the user in forming such ethical decisions then if if the past is any guide we are up for a future of poorly conceived decision support systems, with all the inevitable and unfortunate consequences that attend. In fact I feel confident in predicting that the designers of such systems will, once again, automate their biases about how humans and automation should interact, with unpleasant surprises for all.

In a broader sense what we’re doing with this current debate is essentially rehashing the old arguments between two world views on the proper role of automation, on the one side automation is intended to supplant those messy, unreliable humans, in the current context effecting an unintentional ethical prosthetic. On the other hand we have the view that automation can and should be used to assist and augment human capabilities, that is it should be used to support and develop peoples innate ethical sense. Unfortunately in this current debate it also looks like the prosthesis school of thought is winning out. My view is that if we continue in this approach of ‘automating out’ moral decision making we will inevitably end up with the amputation of ethical sense in the decision maker, long before killer robots stalk the battlefield, or the high street of your home town.

Hazard checklists


As I had to throw together an example checklist for a course I’m running, here it is. I’ve also given a little bit of a commentary on the use, advantages and disadvantages of checklists as well. Enjoy. 🙂

NASA safety handbook cover

Way, way back in 2011 NASA published the first volume of their planned two volume epic on system safety titled strangely enough “NASA System Safety Handbook Volume 1, System Safety Framework and Concepts for Implementation“, catchy eh?

Continue Reading…

Cleveland street train overrun (Image source: ATSB)

The ATSB has released it’s preliminary report of it’s investigation into the Cleveland street overrun accident which I covered in an earlier post, and it makes interesting reading.

Continue Reading…

One of the recurring problems in running hazard identification workshops is being faced by a group whose members are passively refusing to engage in the process.

A technique that I’ve found quite valuable in breaking participants out of that mindset is TRIZ, or the Theory of Solving Problems Creatively (teoriya resheniya izobretatelskikh zadatch).

Continue Reading…

Warsaw A320 Accident (Image Source: Unknown)

One of the questions that we should ask whenever an accident occurs is whether we could have identified the causes during design? And if we didn’t, is there a flaw in our safety process?

Continue Reading…

One of my somewhat perennial concerns when reviewing a functional hazard analysis (FHA) is what’s termed the completeness question. In this case whether all the potentially hazardous functional failure modes have been considered, and to what degree? Continue Reading…