Archives For Risk
What is risk, how dow we categorise it and deal with it.
Bayes and the search for MH370
We are now approximately 60% of the way through searching the MH370 search area, and so far nothing. Which is unfortunate because as the search goes on the cost continues to go up for the taxpayer (and yes I am one of those). What’s more unfortunate, and not a little annoying, is that that through all this the ATSB continues to stonily ignore the use of a powerful search technique that’s been used to find everything from lost nuclear submarines to the wreckage of passenger aircraft. Continue Reading…
Here’s an interesting graph that compares Class A mishap rates for USN manned aviation (pretty much from float plane to Super-Hornet) against the USAF’s drone programs. Interesting that both programs steadily track down decade by decade, even in the absence of formal system safety programs for most of the time (1).
The USAF drone program start out with around the 60 mishaps per 100,000 flight hour rate (equivalent to the USN transitioning to fast jets at the close of the 1940s) and maintains a steeper decrease rate that the USN aviation program. As a result while the USAF drones program is tail chasing the USN it still looks like it’ll hit parity with the USN sometime in the 2040s.
So why is the USAF drone program doing better in pulling down the accident rate, even when they don’t have a formal MIL-STD-882 safety program?
Well for one a higher degree of automation does have comparitive advantages. Although the USN’s carrier aircraft can do auto-land, they generally choose not to, as pilot’s need to keep their professional skills up, and human error during landing/takeoff inevitably drives the mishap rate up. Therefore a simple thing like implementing an auto-land function for drones (landing a drone is as it turns out not easy) has a comparatively greater bang for your safety buck. There’s also inherently higher risks of loss of control and mid air collision when air combat manoeuvring, or running into things when flying helicopters at low level which are operational hazards that drones generally don’t have to worry about.
For another, the development cycle for drones tends to be quicker than manned aviation, and drones have a ‘some what’ looser certification regime, so improvements from the next generation of drone design tend to roll into an expanding operational fleet more quickly. Having a higher cycle rate also helps retain and sustain the corporate memory of the design teams.
Finally there’s the lessons learned effect. With drones the hazards usually don’t need to be identified and then characterised. In contrast with the early days of jet age naval aviation the hazards drone face are usually well understood with well understood solutions, and whether these are addressed effectively has more to do with programmatic cost concerns than a lack of understanding. Conversely when it actually comes time to do something like put de-icing onto a drone, there’s a whole lot of experience that can be brought to bear with a very good chance of first time success.
A final question. Looking at the above do we think that the application of rigorous ‘FAA like’ processes or standards like ARP 4761, ARP 4754 and DO-178 would really improve matters?
Hmmm… maybe not a lot.
1. As a historical note while the F-14 program had the first USN aircraft system safety program (it was a small scale contractor in house effort) it was actually the F/A-18 which had the first customer mandated and funded system safety program per MIL-STD-882. USAF drone programs have not had formal system safety programs, as far as I’m aware.
For those of you interested in such things, there’s an interesting thread running over on the Safety Critical Mail List at Bielefeld on software failure. Sparked off by Peter Ladkin’s post over on Abnormal Distribution on the same subject. Whether software can be said to fail and whether you can use the term reliability to describe it is one of those strange attractors about which the list tends to orbit. An interesting discussion, although at times I did think we were playing a variant of Wittgenstein’s definition game.
And my opinion? Glad you asked.
Yes of course software fails. That it’s failure is not the same as the pseudo-random failure that we posit to hardware components is neither here nor there. Continue Reading…
Safety when expressed quantitatively as the probability of a loss over some unit of exposure, is in effect a proportional rate. This is useful as we can compare the performance of different systems or operations when one has of operating hours, and potentially lots of accidents while another has only a few operating hours and therefore fewer accidents. Continue Reading…
I’ll give you a hint it’s not pretty
Current Australian rail and workplace safety legislation requires that safety risks be either eliminated, or if that’s not possible be reduced, ‘so far as is reasonably practical’. The intent is to ensure that all reasonable practicable precautions are in place, not to achieve some target level of risk.
There are two elements to what is ‘reasonably practicable’. A duty-holder must first consider what can be done – that is, what is possible in the circumstances for ensuring health and safety. They must then consider whether it is reasonable, in the circumstances to do all that is possible. This means that what can be done should be done unless it is reasonable in the circumstances for the duty-holder to do something less.
This is a real and intractable problem for standards that determine the degree of effort applied to treat a hazard using an initial assessment of risk (1). Nor can the legislation be put aside through appeals to such formalisms as the ALARP principle, or the invocation of a standard such as AS 61508 (2). In essence if it is practical to do something, regardless of the degree of risk, then something should be done. Continue Reading…
A short (and possibly evil) treatise on SILs from our guest blogger
May I introduce myself? The name’s Screwtape, some of you might have heard of me from that short and nasty book by C.S. Lewis. All lies of course, and I would know, about lies that is… baboom tish! Anyway the world has moved on and I’m sure that you’d be completely unsurprised to hear that I’ve branched out into software consulting now. I do find the software industry one that is oh so over-ripe for the plucking of immortal souls, ah but I digress. Your good host has asked me here today to render a few words on the question of risk based safety integrity levels and how to turn such pesky ideals, akin in many ways to those other notions of christian virtue, to your own ends. Continue Reading…
Practical risk management, or why I love living in Australia
We’re into the ninth day of closed beaches here with two large great whites spotted ‘patrolling our shores’, whatever that means. Of course in Australia closed doesn’t actually mean the beaches are padlocked, not yet anyway. We just put a sign up and people can make their own minds up as to whether they wish to run the risk of being bitten. In my books a sensible approach to the issue, one that balances societal responsibility with personal freedom. I mean it’s not like they’re as dangerous as bicycles Continue Reading…
I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.
DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.
So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.
Or how do we measure the unknown?
The problem is that as our understanding and control of known risks increases, the remaining risk in any system become increasingly dominated by the ‘unknown‘. The higher the integrity of our systems the more uncertainty we have over the unknown and unknowable residual risk. What we need is a way to measure, express and reason about such deep uncertainty, and I don’t mean tools like Pascalian calculus or Bayesian prior belief structures, but a way to measure and judge ontological uncertainty.
Even if we can’t measure ontological uncertainty directly perhaps there are indirect measures? Perhaps there’s a way to infer something from the platonic shadow that such uncertainty casts on the wall, so to speak. Nassim Taleb would say no, the unknowability of such events is the central thesis of his Ludic Fallacy after all. But I still think it’s worthwhile exploring, because while he might be right, he may also be wrong.
*With apologies to Nassim Taleb.
Well if news from the G20 is anything to go by we may be on the verge of a seismic shift in how the challenge of climate change is treated. Our Prime Ministers denial notwithstanding :)
A report issued by the US Chemical Safety Board on Monday entitled “Regulatory Report: Chevron Richmond Refinery Pipe Rupture and Fire,” calls on California to make changes to the way it manages process safety.
The report is worth a read as it looks at various regulatory regimes in a fairly balanced fashion. A strong independent competent regulator is seen as a key factor for success by the reports authors, regardless of the regulatory mechanisms. I don’t however think the evidence is as strong as the report makes out that safety case/goal based safety regimes perform ‘all that better’ than other regulatory regimes. Would have also been nice if they’d compared and contrasted against other industries, like aviation.
A quick report from sunny Manchester, where I’m attending the IET’s annual combined conference on system safety and cyber security. Day one of the conference proper and I got to be lead off with the first keynote. I was thinking about getting everyone to do some Tai Chii to limber up (maybe next year). Thanks once again to Dr Carl Sandom for inviting me over, it was a pleasure. I just hope the audience felt the same way. :)
Interesting article on old school rail safety and lessons for the modern nuclear industry. As a somewhat ironic addendum the early nuclear industry safety studies also overlooked the risks posed by large inventories of fuel rods on site, the then assumption being that they’d be shipped off to a reprocessing facility as soon as possible, it’s hard to predict the future. :)
And in news just to hand, the first Ebola case is reported in the US. It’ll be very interesting to see what happens next, and how much transmission rate is driven by cultural and socio-economic effects…
I realise that you are not directly responsible for the repeal of the carbon tax by the current government, and I also realise that we the voting public need to man up and shoulder the responsibility for the government and their actions. I even appreciate that if you did wish to retain the carbon tax as a green surcharge, the current government would undoubtedly act to force your hand.
But really, I have to draw the line at your latest correspondence. Simply stamping the latest bill with “SAVINGS FROM REMOVING THE CARBON TAX” scarcely does the benefits of this legislative windfall justice. You have, I fear, entirely undersold the comprehensive social, moral and economic benefits that accrue through the return of this saving to your customers. I submit therefore for your corporate attention some alternatives slogans:
- “Savings from removing the carbon tax…you’ll pay for it later”
- “Savings from removing the carbon tax…buy a bigger air conditioner, you’ll need it”
- “Savings from removing the carbon tax…we also have a unique coal seam investment opportunity”
- “Savings from removing the carbon tax, invest in climate change!”
- “Savings from removing the carbon tax, look up the word ‘venal’, yep that’s you”
- “Savings from removing the carbon tax, because a bigger flatscreen TV is worth your children’s future”
- “Savings from removing the carbon tax, disinvesting in the future”
So be brave and take advantage of this singular opportunity to fully invest your corporate reputation in the truly wonderful outcomes of this prescient and clear sighted decision by our federal government.
An interesting post by Mike Thicke over at Cloud Chamber on the potential use of prediction markets to predict the location of MH370. Prediction markets integrate ‘diffused’ knowledge using a market mechanism to derive a predicted likelihood, essentially market prices are assigned to various outcomes and are treated as analogs of their likelihood. Market trading then established what the market ‘thinks’ is the value of each outcome. The technique has a long and colourful history, but it does seem to work. As an aside prediction markets are still predicting a No vote in the upcoming referendum on Scottish Independence despite recent polls to the contrary.
Returning to the MH370 saga, if the ATSB is not intending to use a Bayesian search plan then one could in principle crowd source the effort through such a prediction market. One could run the market in a dynamic fashion with the market prices updating as new information comes in from the ongoing search. Any investors out there?
Enshrined in Australia’s current workplace health and safety legislation is the principle of ‘So Far As Is Reasonably Practical’. In essence SFAIRP requires you to eliminate or to reduce risk to a negligible level as is (surprise) reasonably practical. While there’s been a lot of commentary on the increased requirements for diligence (read industry moaning and groaning) there’s been little or no consideration of what is the ‘theory of risk’ that backs this legislative principle and how it shapes the current legislation, let alone whether for good or ill. So I thought I’d take a stab at it. :) Continue Reading…
Finding MH370 is going to be a bitch
The aircraft has gone down in an area which is the undersea equivalent of the eastern slopes of the Rockies, well before anyone mapped them. Add to that a search area of thousands of square kilometres in about an isolated a spot as you can imagine, a search zone interpolated from satellite pings and you can see that it’s going to be tough.
Just received a text from my gas and electricity supplier. Good news! My gas and electricity bills will come down by about 4 and 8% respectively due to the repeal of the carbon tax in Australia. Of course we had to doom the planetary ecosystem and condemn our children to runaway climate change but hey, think of the $550 we get back per year. And, how can it get any better, now we’re also seen as a nation of environmental wreckers. I think I’ll go an invest the money in that AGL Hunter coal seam gas project, y’know thinking global, acting local. Thanks Prime Minister Abbott, thanks!
The limits of rational-legal authority
One of the underlying and unquestioned aspects of modern western society is that the power of the state is derived from a rational-legal authority, that is in the Weberian sense of a purposive or instrumental rationality in pursuing some end. But what if it isn’t? What if the decisions of the state are more based on belief in how people ought to behave and how things ought to be rather than reality? What, in other words, if the lunatics really are running the asylum?
Way, way back in 2011 NASA published the first volume of their planned two volume epic on system safety titled strangely enough “NASA System Safety Handbook Volume 1, System Safety Framework and Concepts for Implementation“, catchy eh?
As I was asked a question on risk homeostasis at the course I’m teaching, here without further ado is John Adam’s tour de force on The failure of seat belt legislation. Collectively, the group of countries that had not passed seat belt laws experienced a greater decrease than the group that had passed laws. Now John doesn’t directly draw the conclusion, but I will, that the seat belt laws kill more people than they save.
And it gets worse, in 1989 the British Government made seat belt wearing compulsory for children under 14 years old in the rear seats of cars, the result? In the year after there was an increase of almost 10% in the numbers of children killed in rear seats, and of almost 12% in the numbers injured (both above background increases). If not enacted there would be young adults now walking around today enjoying their lives, but of course the legislation was passed and we have to live with the consequences.
Now I could forgive the well intentioned who passed these laws, if when it became apparent that they were having a completely contrary effect they repealed them. But what I can’t forgive is the blind persistence, in practices that clearly kill more than they save. What can we make of this depraved indifference, other than people and organisations will sacrifice almost anything and anyone rather than admit they’re wrong?
Process is no substitute for paying attention
As Weick has pointed out, to manage the unexpected we need to be reliably mindful, not reliably mindless. Obvious as that truism may be, those who invest heavily in plans, procedures, process and policy also end up perpetuating and reinforcing a whole raft of expectations, and thus investing in an organisational culture of mindlessness rather than mindfulness.
John Adams has an interesting take on the bureaucratic approach to risk management in his post reducing zero risk.
The problem is that each decision to further reduce an already acceptably low risk is always defended as being ‘cheap’, but when you add up the increments it’s the death of a thousand cuts, because no one ever considers the aggregated opportunity cost of course.
This remorseless slide of our public and private institutions into a hysteria of risk aversion seems to me to be be due to an inherent societal psychosis that nations sharing the english common law tradition are prone to. At best we end up with pointless safety theatre, at worst we end up bankrupting our culture.
The above info graphic courtesy of Jeff Masters Wunderblog blog says it all, 6 out of the 13 most destructive superstorms have occurred after 1998.
Interestingly the study is circa 2011 but I’ve seen no reflection in Australia on the uncomfortable fact that the study found, i.e that all we are doing with such schemes is shifting the death rate to an older cohort. Of course all the adults can sit back and congratulate themselves on a job well done, except it simply doesn’t work, and worse yet sucks resources and attention away from searching for more effective remedies.
In essence we’ve done nothing as a society to address teenage driving related deaths, safety theatre of the worst sort…
And not quite as simple as you think…
The testimony of Michael Barr, in the recent Oklahoma Toyota court case highlighted problems with the design of Toyota’s watchdog timer for their Camry ETCS-i throttle control system, amongst other things, which got me thinking about the pervasive role that watchdogs play in safety critical systems.
Why risk communication is tricky…
An interesting post by Ross Anderson on the problems of risk communication, in the wake of the savage storm that the UK has just experienced. Doubly interesting to compare the UK’s disaster communication during this storm to that of the NSW governments during our recent bushfires.
Or ‘On the breakdown of Bayesian techniques in the presence of knowledge singularities’
One of the abiding problems of safety critical ‘first of’ systems is that you face, as David Collingridge observed, a double bind dilemma:
- Initially an information problem because ‘real’ safety issues (hazards) and their risk cannot be easily identified or quantified until the system is deployed, but
- By the time the system is deployed you now face a power (inertia) problem, that is control or change is difficult once the system is deployed or delivered. Eliminating a hazard is usually very difficult and we can only mitigate them in some fashion.
Why saying the wrong thing at the wrong time is sometimes necessary
The Green’s senator Adam Bandt has kicked up a storm of controversy amongst the running dogs of the press after pointing out in this Guardian article that climate change means a greater frequency of bad heat waves which means in turn a greater frequency of bad bush fires. Read the article if you have a moment, I liked his invoking the shade of Ronald Reagan to judge the current government especially. Continue Reading…
The consensus project: Yes there is one on climate change
Despite what you may see in the media, yes there is an overwhelming consensus on climate change (it’s happening), what the cause is (our use of fossil fuels) and what we can do about it (a whole bunch of things with today’s tech). Here’s the link to the projects web page, neat info graphics…enjoy.
Oh and if like me you live in Australia I’d start getting used to the increasing frequency of extreme weather events and bush-fires, the only uncertainty left is whether we can put the brakes on in time to prevent a complete catastrophe.
Taboo transactions and the safety dilemma Again my thanks goes to Ross Anderson over on the Light Blue Touchpaper blog for the reference, this time to a paper by Alan Fiske an anthropologist and Philip Tetlock a social psychologist, on what they terms taboo transactions. What they point out is that there are domains of sharing in society which each work on different rules; communal, versus reciprocal obligations for example, or authority versus market. And within each domain we socially ‘transact’ trade-offs between equivalent social goods.
I was reading a post by Ross Anderson on his dismal experiences at John Lewis, and ran across the term security theatre, I’ve actually heard the term, before, it was orignally coined by Bruce Schneier, but this time it got me thinking about how much activity in the safety field is really nothing more than theatrical devices that give the appearance of achieving safety, but not the reality. From zero harm initiatives to hi-vis vests, from the stylised playbook of public consultation to the use of safety integrity levels that purport to show a system is safe. How much of this adds any real value?
Worse yet, and as with security theatre, an entire industry has grown up around this culture of risk, which in reality amounts to a culture of risk aversion in western society. As I see it risk as a cultural concept is like fire, a dangerous tool and an even more terrible master.
In 1996 the European Space Agency lost their brand new Ariane 5 launcher on it’s first flight. Here’s a recently updated annotated version of that report. I’d also note that the software that faulted was written using Ada a ‘strongly typed’ language, which does point to a few small problems with the use of such languages.
An articulated guess beats an unspoken assumption
A point that Fred Brooks makes in his recent work the Design of Design is that it’s wiser to explicitly make specific assumptions, even if that entails guessing the values, rather than leave the assumption un-stated and vague because ‘we just don’t know’. Brooks notes that while specific and explicit assumptions may be questioned, implicit and vague ones definitely won’t be. If a critical aspect of your design rests upon such fuzzy unarticulated assumptions, then the results can be dire.
From Les Hatton, here’s how, in four easy steps:
- Insist on using R = F x C in your assessment. This will panic HR (People go into HR to avoid nasty things like multiplication.)
- Put “end of universe” as risk number 1 (Rationale: R = F x C. Since the end of the universe has an infinite consequence C, then no matter how small the frequency F, the Risk is also infinite)
- Ignore all other risks as insignificant
- Wait for call from HR…
A humorous note, amongst many, in an excellent presentation on the fell effect that bureaucracies can have upon the development of safety critical systems. I would add my own small corollary that when you see warning notes on microwaves and hot water services the risk assessment lunatics have taken over the asylum…
On the subject of near misses…
Presumably the use of the crew cab as an escape pod was not actually high on the list of design goals for the 4000 and 4100 class locomotives, and thankfully the locomotives involved in the recent derailment at Ambrose were unmanned.
Well it sounded reasonable…
One of the things that’s concerned me for a while is the potentially malign narrative power of a published safety case.
Why sometimes simpler is better in safety engineering.
I was thinking about how the dubious concept of ‘safety integrity levels’ continues to persist in spite of protracted criticism. in essence if the flaws in the concept of SILs are so obvious why they still persist?
Another in the occasional series of posts on systems engineering, here’s a guide to evaluating technical risk, based on the degree of technical maturity of the solution.
The idea of using technical maturity as an analog for technical risk first appears (to my knowledge) in the 1983 Systems Engineering Management Guide produced by the Defense Systems Management College (1).
Using such analogs is not unusual in engineering, you usually find it practiced where measuring the actual parameter is too difficult. For example architects use floor area as an analog for cost during concept design because collecting detailed cost data at that point is not really feasible.
While you can introduce other analogs, such as complexity and interdependence, as a first pass assessment of inherent feasibility I’ve found that the basic question of ‘have we done this before’ to be a powerful one.
1. The 1983 edition is IMO the best of all the Guides with subsequent editions of the DSMC guide rather more ‘theoretic’ and not as useful, possibly because the 1983 edition was produced by Lockheed Martin Missile and Space Companies Systems Engineering Directorate. Or to put it another way it was produced by people who wrote about how they actually did their job… :)