Archives For Categories of risk

When you look at the safety performance of industries which have a consistent focus on safety as part of the social permit, nuclear or aviation are the canonical examples, you see that over time increases in safety tend to plateau out. This looks like some form of a learning curve, but what’s the mechanism, or mechanisms that actually drives this process? I believe there are two factors at play here, firstly the increasing marginal cost of improvement and secondly the problem of learning from events that we are trying to prevent.

Increasing marginal cost is simply an economist’s way of stating that it will cost more to achieve that next increment in performance. For example, airbags are more expensive than seat-belts by roughly an order of magnitude (based on replacement costs) however airbags only deliver 8% reduced mortality when used in conjunction with seat belts, see Crandall (2001). As a result the next increment in safety takes longer and costs more (1).

The learning factor is in someways like an informational version of the marginal cost rule. As we reduce accident rates accidents become rarer. Now one of the traditional ways in which safety improvements occur is through studying accidents when they occur and then to eliminate or mitigate identified causal factors. Obviously as the accident rate decreases this likewise the opportunity for improvement also decreases. When accidents do occur we have a further problem because (definitionally) the cause of the accident will comprise a highly unlikely combination of factors that are needed to defeat the existing safety measures. Corrective actions for such rare combination of events therefore are highly specific to that event’s context and conversely will have far less universal applicability.  For example the lessons of metal fatigue learned from the Comet airliner disaster has had universal applicability to all aircraft designs ever since. But the QF-72 automation upset off Learmouth? Well those lessons, relating to the specific fault tolerance architecture of the A330, are much harder to generalise and therefore have less epistemic strength.

In summary not only does it cost more with each increasing increment of safety but our opportunity to learn through accidents is steadily reduced as their arrival rate and individual epistemic value (2) reduce.

Notes

1. In some circumstances we may also introduce other risks, see for example the death and severe injury caused to small children from air bag deployments.

2. In a Popperian sense.

References

1. Crandall, C.S., Olson, L.M.,  P. Sklar, D.P., Mortality Reduction with Air Bag and Seat Belt Use in Head-on Passenger Car Collisions, American Journal of Epidemiology, Volume 153, Issue 3, 1 February 2001, Pages 219–224, https://doi.org/10.1093/aje/153.3.219.

Donald Trump

Image source: AP/LM Otero

A Trump presidency in the wings who’d have thought! And what a total shock it was to all those pollsters, commentators and apparatchiks who are now trying to explain why they got it so wrong. All of which is a textbook example of what students of risk theory call a Black Swan event. Continue Reading…

M1 Risk_Spectrum_redux

A short article on (you guessed it) risk, uncertainty and unpleasant surprises for the 25th Anniversary issue of the UK SCS Club’s Newsletter, in which I introduce a unified theory of risk management that brings together aleatory, epistemic and ontological risk management and formalises the Rumsfeld four quadrant risk model which I’ve used for a while as a teaching aid.

My thanks once again to Felix Redmill for the opportunity to contribute.  🙂

Crowely (Image source: Warner Bro's TV)

The psychological basis of uncertainty

There’s a famous psychological experiment conducted by Ellsberg, called eponymously the Ellsberg paradox, in which he showed that people overwhelmingly prefer a betting scenario in which the probabilities are known, rather than one in which the odds are actually ambiguous, even if the potential for winning might be greater.  Continue Reading…

4blackswans

One of the problems that we face in estimating risk driven is that as our uncertainty increases our ability to express it in a precise fashion (e.g. numerically) weakens to the point where for deep uncertainty (1) we definitionally cannot make a direct estimate of risk in the classical sense. Continue Reading…

Inspecting Tacoma Narrows (Image source: Public domain)

We don’t know what we don’t know

The Tacoma Narrows bridge stands, or rather falls, as a classic example of what happens when we run up against the limits of our knowledge. The failure of the bridge due to an as then unknown torsional aeroelastic flutter mode, which the bridge with it’s high span to width ratio was particularly vulnerable to, is almost a textbook example of ontological risk. Continue Reading…

I’ve rewritten my post on epistemic, aleatory and ontological risk pretty much completely, enjoy.

SFAIRP

The current Workplace Health and Safety (WHS) legislation of Australia formalises the common law principle of reasonable practicability in regard to the elimination or minimisation of risks associated with industrial hazards. Having had the advantage of going through this with a couple of clients the above flowchart is my interpretation of what reasonable practicability looks like as a process, annotated with cross references to the legislation and guidance material. What’s most interesting is that the process is determinedly not about tolerance of risk but instead firmly focused on what can reasonably and practicably be done. Continue Reading…

For those of you interested in such things, there’s an interesting thread running over on the Safety Critical Mail List at Bielefeld on software failure. Sparked off by Peter Ladkin’s post over on Abnormal Distribution on the same subject. Whether software can be said to fail and whether you can use the term reliability to describe it is one of those strange attractors about which the list tends to orbit. An interesting discussion, although at times I did think we were playing a variant of Wittgenstein’s definition game.

And my opinion? Glad you asked.

Yes of course software fails. That it’s failure is not the same as the pseudo-random failure that we posit to hardware components is neither here nor there. Continue Reading…

2015/02/img_4509.jpg
Why we should take the safety performance of small samples with a grain of salt

Safety when expressed quantitatively as the probability of a loss over some unit of exposure, is in effect a proportional rate. This is useful as we can compare the performance of different systems or operations when one has of operating hours, and potentially lots of accidents while another has only a few operating hours and therefore fewer accidents. Continue Reading…

4blackswans

Or how do we measure the unknown?

The problem is that as our understanding and control of known risks increases, the remaining risk in any system become increasingly dominated by  the ‘unknown‘. The higher the integrity of our systems the more uncertainty we have over the unknown and unknowable residual risk. What we need is a way to measure, express and reason about such deep uncertainty, and I don’t mean tools like Pascalian calculus or Bayesian prior belief structures, but a way to measure and judge ontological uncertainty.

Even if we can’t measure ontological uncertainty directly perhaps there are indirect measures? Perhaps there’s a way to infer something from the platonic shadow that such uncertainty casts on the wall, so to speak. Nassim Taleb would say no, the unknowability of such events is the central thesis of his Ludic Fallacy after all. But I still think it’s worthwhile exploring, because while he might be right, he may also be wrong.

*With apologies to Nassim Taleb.

Well it was either Crowley or Kylie Minogue given the title of the post, so think yourselves lucky (Image source: Warner Brothers TV)

Sometimes it’s just a choice between bad and worse

If we accept that different types of uncertainty create different types of risk then it follows that we may in fact be able to trade one type of risk for another, and in certain circumstances this may be a preferable option.

Continue Reading…

Interesting article on old school rail safety and lessons for the modern nuclear industry. As a somewhat ironic addendum the early nuclear industry safety studies also overlooked the risks posed by large inventories of fuel rods on site, the then assumption being that they’d be shipped off to a reprocessing facility as soon as possible, it’s hard to predict the future. 🙂

NASA safety handbook cover

Way, way back in 2011 NASA published the first volume of their planned two volume epic on system safety titled strangely enough “NASA System Safety Handbook Volume 1, System Safety Framework and Concepts for Implementation“, catchy eh?

Continue Reading…

Monument to the conquerors of space Moscow (Copyright)

Engineers as the agents of evolution

Continue Reading…

As Weick pointed out, to manage the unexpected we need to be reliably mindful, not reliably mindless. Obvious as that truism may be, those who invest heavily in plans, procedures, process and policy also end up perpetuating and reinforcing a whole raft of expectations about how the world is, thus investing in an organisational culture of mindlessness rather than mindfulness. Understanding that process inherently elides to a state of organisational mindlessness, we can see that a process oriented risk management standard such as ISO 31000 perversely cultivates a climate of inattentiveness, right where we should be most attentive and mindful. Nor am I alone in my assessment of ISO 31000, see for example John Adams criticism of the standard as  not fit for purpose , or KaplanMike’s assessment of ISO 31000 essentially ‘not relevant‘. Process is no substitute for paying attention.

Don’t get me wrong there’s nothing inherently wrong with a small dollop of process, just that it’s place is not centre stage in an international standard that purports to be about risk, not if you’re looking for an effective outcome. In real life it’s the unexpected, those black swans of Nassim Taleb’s flying in the dark skies of ignorance, that have the most effect, and about which ISO 31000 has nothing to say.

Postscript

Also the application of ISO 31000’s classical risk management to the workplace health and safety may actually be illegal in some jurisdictions (like Australia) where legislation is based on a backwards looking principle of due diligence, rather than a prospective risk based approach to workplace health and safety.

And not quite as simple as you think…

The testimony of Michael Barr, in the recent Oklahoma Toyota court case highlighted problems with the design of Toyota’s watchdog timer for their Camry ETCS-i  throttle control system, amongst other things, which got me thinking about the pervasive role that watchdogs play in safety critical systems. The great strength of watchdogs is of course that they provide a safety mechanism which resides outside the state machine, which gives them fundamental design independence from what’s going on inside. By their nature they’re also simple and small scale beasts, thereby satisfying the economy of mechanism principle.

Continue Reading…

Singularity (Image source: Tecnoscience)

Or ‘On the breakdown of Bayesian techniques in the presence of knowledge singularities’

One of the abiding problems of safety critical ‘first of’ systems is that you face, as David Collingridge observed, a double bind dilemma:

  1. Initially an information problem because ‘real’ safety issues (hazards) and their risk cannot be easily identified or quantified until the system is deployed, but 
  2. By the time the system is deployed you now face a power (inertia) problem, that is control or change is difficult once the system is deployed or delivered. Eliminating a hazard is usually very difficult and we can only mitigate them in some fashion. Continue Reading…

The igloo of uncertainty (Image source: UNEP 2010)

Ethics, uncertainty and decision making

The name of the model made me smile, but this article The Ethics of Uncertainty by TannertElvers and Jandrig argues that where uncertainty exists research should be considered as part of an ethical approach to managing risk.

Continue Reading…

An articulated guess beats an unspoken assumption

Frederick Brooks

A point that Fred Brooks makes in his recent work the Design of Design is that it’s wiser to explicitly make specific assumptions, even if that entails guessing the values, rather than leave the assumption un-stated and vague because ‘we just don’t know’. Brooks notes that while specific and explicit assumptions may be questioned, implicit and vague ones definitely won’t be. If a critical aspect of your design rests upon such fuzzy unarticulated assumptions, then the results can be dire.

Continue Reading…

Cleveland street train overrun (Image source: ATSB)

The ATSB has released it’s preliminary report of it’s investigation into the Cleveland street overrun accident which I covered in an earlier post, and it makes interesting reading.

Continue Reading…

4100 class crew escape pod #0

On the subject of near misses…

Presumably the use of the crew cab as an escape pod was not actually high on the list of design goals for the 4000 and 4100 class locomotives, and thankfully the locomotives involved in the recent derailment at Ambrose were unmanned.

Continue Reading…

The pentagon is functioning (Image Source: USN)

….And there are still unknown, unknowns

A while ago I posted a short piece on the difference between aleatory, epistemic and ontological uncertainty, using Don Rumsfeld’s famous news conference comments as a good introduction to the subject.

Continue Reading…

Well it sounded reasonable…

One of the things that’s concerned me for a while is the potentially malign narrative power of a published safety case. For those unfamiliar with the term, a safety case can be defined as a structured argument supported by a body of evidence that provides a compelling, comprehensible and valid case that a system is safe for a given application in a given environment. And I have not yet read a safety case that didn’t purport to be exactly that.

Continue Reading…

In June of 2011 the Australian Safety Critical Systems Association (ASCSA) published a short discussion paper on what they believed to be the philosophical principles necessary to successfully guide the development of a safety critical system. The paper identified eight management and eight technical principles, but do these principles do justice to the purported purpose of the paper?

Continue Reading…

I’ve recently been reading John Downer on what he terms the Myth of Mechanical Objectivity. To summarise John’s argument he points out that once the risk of an extreme event has been ‘formally’ assessed as being so low as to be acceptable it becomes very hard for society and it’s institutions to justify preparing for it (Downer 2011).

Continue Reading…

Did the designers of the japanese seawalls consider all the factors?

In an eerie parallel with the Blayais nuclear power plant flooding incident it appears that the designers of tsunami protection for the Japanese coastal cities and infrastructure hit by the 2011 earthquake did not consider all the combinations of environmental factors that go to set the height of a tsunami.

Continue Reading…

What a near miss flooding incident at a french reactor plant in 1999, it’s aftermath and the subsequent Fukushima plant disaster can tell us about fault tolerance and designing for reactor safety.

Continue Reading...

One of the tenets of safety engineering is that simple systems are better. Many practical reasons are advanced to justify this assertion, but I’ve always wondered what, if any, theoretical justification was there for such a position.

Continue Reading...

In one’s professional life there are certain critical works that open your eyes to how lucidly a technical argument can be stated. Continue Reading…

Risk - what is it?

The continuum of uncertainty

What do an eighteenth century mathematician and a twentieth century US Secretary of Defence have to do with engineering and risk? The answer is that both thought about uncertainty and risk, and the differing definitions that they arrived at neatly illustrate that there is more to the concept of risk than just likelihood multiplied by consequence. Which in turn has significant implications for engineering risk management.

Editorial note. I’ve pretty much completely revised this post since the original, hope you like it

Continue Reading…