Archives For Risk

What is risk, how dow we categorise it and deal with it.

MH 370 search vessel (Image source: ATSB)

Once more with feeling

Sonar vessels searching for Malaysia Airlines Flight MH370 in the southern Indian Ocean may have missed the jet, the ATSB’s Chief Commissioner Martin Dolan has told News Online. he went on to point out the uncertainties involved, the difficulty of terrain that could mask the signature of wreckage and that therefore problematic areas would need to be re-surveyed. Despite all that the Commissioner was confident that the wreckage site would be found by June. Me I’m not so sure.

Continue Reading…

The long gone, but not forgotten, second issue of the UK MoD’s safety management standard DEFSTAN 00-56 introduced the concept of a qualitative likelihood of Incredible, this is however not just another likelihood category. The intention of the standard writers was that it would be used to capture risks that were deemed effectively impossible to occur, given the assumptions about the domain and system. The category was be applied to those scenarios where the hazard had been designed out, where the design concept had been assessed and it turns out that the posited hazard was just not applicable or where some non-probabilistic technique is used to verify the safety of the system (think mathematical proof). Such a category records that yes, it’s effectively impossible, while retaining the record of assessment should it become necessary to revisit it, a useful mechanism.

A.1.19 Incredible. Believed to have a probability of occurrence too low for expression in meaningful numerical terms.

DEFSTAN 00-56 Issue 2

I’ve seen this approach mangled in a number or hazard analyses were the disjoint nature of the incredible category was not recognised and it was thereafter assigned a specific likelihood that followed on in a decadal fashion from the next highest category. Yes difficulties ensued. The key is that the Incredible is not the next likelihood bin after Improbable it is in fact beyond the end of the line where we park those hazards that we have judged to have an immeasurably small likelihood of occurrence. This, we are asserting, will not happen and we are as confident of that fact as one can ever be.

“Incredible” may be exceptionally defined in terms of reasoned argument that does not rely solely on numerical probabilities.

DEFSTAN 00-56 Issue 2

To put it another way the category reflects a statement of our degree of belief that an event will not occur rather than an assertion as to its frequency of occurrence as the other subjective categories do. What the standard writers have unwittingly done is introduce a superset, in which the ‘no hazard exists’ set is represented by Incredible and the other likelihoods form the ‘a hazard exists’ set. All of which starts to sound like an mashup of frequentist probabilities with Dempster Shafer  belief structures. Promising, it’s a pity the standard committee didn’t take the concept further.

Postscript

The other pity is that the standard committee didn’t link this idea of “incredible” to Borel’s law. Had they done so we would have a mechanism to make explicit what I call the infinite monkey’s safety argument.

Crowely (Image source: Warner Bro's TV)

The psychological basis of uncertainty

There’s a famous psychological experiment conducted by Ellsberg, called eponymously the Ellsberg paradox, in which he showed that people overwhelmingly prefer a betting scenario in which the probabilities are known, rather than one in which the odds are actually ambiguous, even if the potential for winning might be greater.  Continue Reading…

4blackswans

One of the problems that we face in estimating risk driven is that as our uncertainty increases our ability to express it in a precise fashion (e.g. numerically) weakens to the point where for deep uncertainty (1) we definitionally cannot make a direct estimate of risk in the classical sense. Continue Reading…

Perusing the FAA’s system safety handbook while doing some research for a current job, I came upon an interesting definition of severities. What’s interesting is that the FAA introduces the concept of safety margin reduction as a specific form of severity (loss).

Here’s a summary of Table (3-2) form the handbook:

  • Catastrophic – ‘Multiple fatalities and/or loss of system’
  • Major – ‘Significant reduction in safety margin…’
  • Minor – ‘Slight reduction in safety margin…’

If we think about safety margins for a functional system they represent a system state that’s a precursor to a mishap, with the margin representing some intervening set of states. But a system state of reduced safety margin (lets call it a hazard state) is causally linked to a mishap state, else we wouldn’t care, and must therefore inherit it’s severity. The problem is that in the FAA’s definition they have arbitrarily assigned severity levels to specific hazardous degrees of safety margin reduction, yet all these could still be linked causally to a catastrophic event, e.g. a mid-air collision.

What the FAA’s Systems Engineering Council (SEC) has done is conflate severity with likelihood, as a result their severity definition is actually a risk definition, at least when it comes to safety margin hazards. The problem with this approach is that we end up under treating risks as per classical risk theory. For example say we have a potential reduction in safety margin, which is also casually linked to a catastrophic outcome. Now per Table 3-2 if the reduction was classified as ‘slight’, then we would assess the probability and given the minor severity decide to do nothing, even though in reality the severity is still catastrophic. If, on the other hand, we decided to make decisions based on severity alone, we would still end up making a hidden risk judgement depending on what the likelihood of propagation form hazard state to accident state was (undefined in the handbook). So basically the definitions set you up for trouble even before you start.

My guess is that the SEC decided to fill in the lesser severities with hazard states because for an ATM system true mishaps tend to be invariably catastrophic, and they were left scratching their head for lesser severity mishap definitions. Enter the safety margin reduction hazard. The take home from all this is that severity needs to be based on the loss event, introducing intermediate hybrid hazard/severity state definitions leads inevitably to incoherence of your definition of risk. Oh and (as far as I am aware) this malformed definition has spread everywhere…

National-Terrorism-Threat-Advisory-System

With much pomp and circumstance the attorney general and our top state security mandarin’s have rolled out the brand new threat level advisory system. Congrats to us, we are now the proud owners of a five runged ladder of terror. There’s just one small teeny tiny insignificant problem, it just doesn’t work. Yep that’s right, as a tool for communicating it’s completely void of meaning, useless in fact, a hopelessly vacuous piece of security theatre.

You see the levels of this scale are based on likelihood. But whoever designed the scale forgot to include over what duration they were estimating the likelihood. And without that duration it’s just a meaningless list of words. 

Here’s how likelihood works. Say you ask me whether it’s likely to rain tomorrow, I say ‘unlikely’, now ask me whether it will rain in the next week, well that’s a bit more likely isn’t it? OK, so next you ask me whether it’ll rain in the next year? Well unless you live in Alice Springs the answer is going to be even more likely, maybe almost certain isn’t it? So you can see that the duration we’re thinking of affects the likelihood we come up with because it’s a cumulative measure. 

Now ask me whether a terrorist threat was going to happen tomorrow? I’d probably say it was so unlikely that it was, ‘Not expected’. But if you asked me whether one might occur in the next year I’d say (as we’re accumulating exposure) it’d be more likely, maybe even ‘Probable’ while if the question was asked about a decade of exposure I’d almost certainly say it was,  ‘Certain’. So you see how a scale without a duration means absolutely nothing, in fact it’s much worse than nothing, it actually causes misunderstanding because I may be thinking in threats across the next year, while you may be thinking about threats occurring in the next month. So it actually communicates negative information.

And this took years of consideration according to the Attorney General, man we are governed by second raters. Puts head in hands. 

Screwtape(Image source: end time info)

How to deal with those pesky high risks without even trying

Screwtape here,

One of my clients recently came to me with what seemed to be an insurmountable problem in getting his facility accepted despite the presence of an unacceptably high risk of a catastrophic accident. The regulator, not happy, likewise all those mothers with placards outside his office every morning. Most upsetting. Not a problem said I, let me introduce you to the Screwtape LLC patented cut and come again risk refactoring strategy. Please forgive me now dear reader for without further ado we must do some math.

Risk is defined as the loss times probability of loss or R = L x P (1), which is the reverse of expectation, now interestingly if we have a set of individual risks we can add them together to get the total risk, for our facility we might say that total risk is R_f = (R_1 + R_2 + R_3 … + R_n). ‘So what Screwtape, this will not pacify those angry mothers!’ I hear you say? Ahh, now bear with me as I show you how we can hide, err I mean refactor, our unacceptable risk in plain view. Let us also posit that we have a number of systems S_1, S_2, S_3 and so on in our facility… Well instead of looking at the total facility risk, let’s go down inside our facility and look at risks at the system level. Given that the probability of each subsystem causing an accident is (by definition) much less, why then per system the risk must also be less! If you don’t get an acceptable risk at the system level then go down to the subsystem, or equipment level.

The fin de coup is to present this ensemble of subsystem risks as a voluminous and comprehensive list (2), thereby convincing everyone of the earnestness of your endeavours, but omit any consideration of ensemble risk (3). Of course one should be scrupulously careful that the numbers add up, even though you don’t present them. After all there’s no point in getting caught for stealing a pence while engaged in purloining the Bank of England! For extra points we can utilise subjective measures of risk rather than numeric, thereby obfuscating the proceedings further.

Needless to say my client went away a happy man, the facility was built and the total risk of operation was hidden right there in plain sight… ah how I love the remorseless bloody hand of progress.

Infernally yours,

Screwtape

Notes

1. Where R = Risk, L = Loss, and P = Probability after De’Moivre. I believe Screwtape keeps De’Moivre’s heart in a jar on his desk. (Ed.).

2. The technical term for this is a Preliminary Hazard Analysis.

3. Screwtape omitted to note that total risk remains the same, all we’ve done is budgeted it out across an ensemble of subsystems, i.e. R_f = R_s1 + R_s2 + R_s3 (Ed.).