It is better to be vaguely right than exactly wrong
How a invention that flew on the SR-71 could help commercial aviation today
In a previous post on unusual attitude I talked about the use of pitch ladders as a means of providing greater attensity to aircraft attitude as well as a better indication of what the aircraft is dong, having entered into it. There are, of course, still disadvantages to this because such data in a commercial aircraft is usually presented ‘eyes down’, and in high stress, high workload situations it can be difficult to maintain an instrument scan pattern. There is however an alternative, and one that has a number of allied advantages. Continue Reading…
Unreliable airspeed events pose a significant challenge (and safety risk) because such situations throw onto aircrew the most difficult (and error prone) of human cognitive tasks, that of ‘understanding’ a novel situation. This results in a double whammy for unreliable airspeed incidents. That is the likelihood of an error in ‘understanding’ is far greater than any other error type, and having made that sort of error it’s highly likely that it’s going to be a fatal one. Continue Reading…
In case you’re wondering what’s going on dear reader, human factors can be a bit dry, and the occasional poster style blog posts you may have noted is my attempt to hydrate the subject a little. The continuing series can be found on the page imaginatively titled Human error in pictures, and who knows someone may find it useful…
The chess board is the world, the pieces are the phenomena of the universe, the rules of the game are what we call the laws of Nature. The player on the other side is hidden from us. We know that his play is always fair, just and patient. But we also know, to our cost, that he never overlooks a mistake, or makes the smallest allowance for ignorance.
Safety when expressed quantitatively as the probability of a loss over some unit of exposure, is in effect a proportional rate. This is useful as we can compare the performance of different systems or operations when one has of operating hours, and potentially lots of accidents while another has only a few operating hours and therefore fewer accidents. Continue Reading…
What the future of high assurance may look like, DARPA’s HACMS, open source and formal from the ground up.
Originally posted on A Critical Systems Blog:
Some of the work I lead at Galois was highlighted in the initial story on 60 Minutes last night, a spot interviewing Dan Kaufman at DARPA. I’m Galois’ principal investigator for the HACMS program, focused on building more reliable software for automobiles and aircraft and other embedded systems. The piece provides a nice overview for the general public on why software security matters and what DARPA is doing about it; HACMS is one piece of that story.
I was busy getting married when filming was scheduled, but two of my colleagues (Dylan McNamee and Pat Hickey) appear in brief cameos in the segment (don’t blink!). Good work, folks! I’m proud of my team and the work we’ve accomplished so far.
You can see more details about how we have been building better programming languages for embedded systems and using them to build unpiloted air vehicle software here.
A while ago, while I was working on a project that would have been based (in part) in Queensland I was asked to look at the implications of the Registered Professional Engineers Queensland act for the project, and in particular for software development. For those not familiar, the Act provides for the registration of professional engineers to practise in Queensland. If you’re not registered you can’t practice unless you’re supervised by a registered engineer. Upon registering you then become liable to a statutory Board of Professional Engineers for your professional conduct. Oh yes and practicing without coverage is a crime.
While the act is oriented squarely at the provision of professional services, don’t presume that it is solely the concern of consultancies. Continue Reading…
The important thing is to stop lying to yourself. A man who lies to himself, and believes his own lies, becomes unable to recognise the truth, either in himself or in anyone else.
I’ll give you a hint it’s not pretty
Current Australian rail and workplace safety legislation requires that safety risks be either eliminated, or if that’s not possible be reduced, ‘so far as is reasonably practicable’. The intent is to ensure that all reasonable practicable precautions are in place, not to achieve some target level of risk.
There are two elements to what is ‘reasonably practicable’. A duty-holder must first consider what can be done – that is, what is possible in the circumstances for ensuring health and safety. They must then consider whether it is reasonable, in the circumstances to do all that is possible. This means that what can be done should be done unless it is reasonable in the circumstances for the duty-holder to do something less.
This is a real and intractable problem for standards that determine the degree of effort applied to treat a hazard using an initial assessment of risk (1). Nor can the legislation be put aside through appeals to such formalisms as the ALARP principle, or the invocation of a standard such as AS 61508 (2). In essence if you can do something, regardless of the degree of risk, then something should be done. Continue Reading…
A short (and possibly evil) treatise on SILs from our guest blogger
May I introduce myself? The name’s Screwtape, some of you might have heard of me from that short and nasty book by C.S. Lewis. All lies of course, and I would know, about lies that is… baboom tish! Anyway the world has moved on and I’m sure that you’d be completely unsurprised to hear that I’ve branched out into software consulting now. I do find the software industry one that is oh so over-ripe for the plucking of immortal souls, ah but I digress. Your good host has asked me here today to render a few words on the question of risk based safety integrity levels and how to turn such pesky ideals, akin in many ways to those other notions of christian virtue, to your own ends. Continue Reading…
Stall warning and Alternate law
This post is part of the Airbus aircraft family and system safety thread.
According to an investigator from Indonesia’s National Transportation Safety Committee (NTSC) several alarms, including the stall warning, could be heard going off on the Cockpit Voice Recorder’s tape.
Now why is that so significant?
Aviation is in itself not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity or neglect.
The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 32,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.
This post is part of the Airbus aircraft family and system safety thread.
While there’s often a lot of discussion about short term response of aircraft to control inputs, in practice it’s often the long term response of the aircraft state vector at constant thrust and neutral control inputs that’s just as important to flight control system designers. In the case of Airbus the selection by the designers of a modified C* feedback loop (1) for primary pitch axis control law (Airbus 1998) in flight has led to what you’d call interesting consequences. Continue Reading…
Practical risk management, or why I love living in Australia
We’re into the ninth day of closed beaches here with two large great whites spotted ‘patrolling our shores’, whatever that means. Of course in Australia closed doesn’t actually mean the beaches are padlocked, not yet anyway. We just put a sign up and people can make their own minds up as to whether they wish to run the risk of being bitten. In my books a sensible approach to the issue, one that balances societal responsibility with personal freedom. I mean it’s not like they’re as dangerous as bicycles Continue Reading…
A short digression on who vs whom, neatly illustrating why writing requirements in natural english can be so damn difficult… I also love the idea of spider fastballs :)
I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.
DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.
So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.
So what did happen?
This post is part of the Airbus aircraft family and system safety thread.
While the media ‘knows’ that the aircraft climbed steeply before rapidly descending, we should remember that this supposition relies on the self reported altitude and speed of the aircraft. So we should be cautious about presuming that what we see on a radar screen is actually what happened to the aircraft. There are of course also disturbing similarities to the circumstances in which Air France AF447 was lost, yet at this moment all they are are similarities. One things for sure though, there’ll be little sleep in Toulouse until the FDRs are recovered.
Or how do we measure the unknown?
The problem is that as our understanding and control of known risks increases, the remaining risk in any system become increasingly dominated by the ‘unknown‘. The higher the integrity of our systems the more uncertainty we have over the unknown and unknowable residual risk. What we need is a way to measure, express and reason about such deep uncertainty, and I don’t mean tools like Pascalian calculus or Bayesian prior belief structures, but a way to measure and judge ontological uncertainty.
Even if we can’t measure ontological uncertainty directly perhaps there are indirect measures? Perhaps there’s a way to infer something from the platonic shadow that such uncertainty casts on the wall, so to speak. Nassim Taleb would say no, the unknowability of such events is the central thesis of his Ludic Fallacy after all. But I still think it’s worthwhile exploring, because while he might be right, he may also be wrong.
*With apologies to Nassim Taleb.
The Dreamliner and the Network
Big complicated technologies are rarely (perhaps never) developed by one organisation. Instead they’re a patchwork quilt of individual systems which are developed by domain experts, with the whole being stitched together by a single authority/agency. This practice is nothing new, it’s been around since the earliest days of the cybernetic era, it’s a classic tool that organisations and engineers use to deal with industrial scale design tasks (1). But what is different is that we no longer design systems, and systems of systems, as loose federations of entities. We now think of and design our systems as networks, and thus our system of systems have become a ‘network of networks’ that exhibit much greater degrees of interdependence.
The NTSB have released their final report on the Boeing 787 Dreamliner Li-Ion battery fires. The report makes interesting reading, but for me the most telling point is summarised in conclusion seven, which I quote below.
Conclusion 7. Boeing’s electrical power system safety assessment did not consider the most severe effects of a cell internal short circuit and include requirements to mitigate related risks, and the review of the assessment by Boeing authorized representatives and Federal Aviation Administration certification engineers did not reveal this deficiency.
NTSB/AIR-14/01 (p78 )
In other words Boeing got themselves into a position with their safety assessment where their ‘assumed worst case’ was much less worse case than the reality. This failure to imagine the worst ensured that when they aggressively weight optimised the battery design instead of thermally optimising it, the risks they were actually running were unwittingly so much higher.
The first principal is that you must not fool yourself, and that you are the easiest person to fool
Richard P. Feynman
I’m also thinking that the behaviour of Boeing is consistent with what McDermid et al, calls probative blindness. That is, the safety activities that were conducted were intended to comply with regulatory requirements rather than actually determine what hazards existed and their risk.
… there is a high level of corporate confidence in the safety of the [Nimrod aircraft]. However, the lack of structured evidence to support this confidence clearly requires rectifying, in order to meet forthcoming legislation and to achieve compliance.
Nimrod Safety Management Plan 2002 (1)
As the quote from the Nimrod program deftly illustrates, often (2) safety analyses are conducted simply to confirm what we already ‘know’ that the system is safe, non-probative if you will. In these circumstances the objective is compliance with the regulations rather than to generate evidence that our system is unsafe. In such circumstances doing more or better safety analysis is unlikely to prevent an accident because the evidence will not cause beliefs to change, belief it seems is a powerful thing.
The Boeing battery saga also illustrates how much regulators like the FAA actually rely on the technical competence of those being regulated, and how fragile that regulatory relationship is when it comes to dealing with the safety of emerging technologies.
1. As quoted in Probative Blindness: How Safety Activity can fail to Update Beliefs about Safety, A J Rae*, J A McDermid, R D Alexander, M Nicholson (IET SSCS Conference 2014).
2. Actually in aerospace I’d assert that it’s normal practice to carry out hazard analyses simply to comply with a regulatory requirement. As far as the organisation commissioning them is concerned the results are going to tell them what they know already, that the system is safe.
Here’s a short tutorial I put together (in a bit of a rush) about the ‘mechanics’ of producing compliance finding as part of the ADF’s Airworthiness Regime. Hopefully this will be of assistance to anyone faced with the task of making compliance findings, managing the compliance finding process or dealing with the ADF airworthiness certification ‘beast’.
The tutorial is a mix of how to think about and judge evidence, drawing upon legal principles, and how to use practical argumentation models to structure the finding. No Dempster Shafer logic yet, perhaps in the next tutorial.
Anyway, hope you enjoy it. :)
A safety engineer is someone who builds castles in the air and an operator is someone who goes and lives in them. But nature is the one who collects the rent…
Well if news from the G20 is anything to go by we may be on the verge of a seismic shift in how the challenge of climate change is treated. Our Prime Ministers denial notwithstanding :)
A report issued by the US Chemical Safety Board on Monday entitled “Regulatory Report: Chevron Richmond Refinery Pipe Rupture and Fire,” calls on California to make changes to the way it manages process safety.
The report is worth a read as it looks at various regulatory regimes in a fairly balanced fashion. A strong independent competent regulator is seen as a key factor for success by the reports authors, regardless of the regulatory mechanisms. I don’t however think the evidence is as strong as the report makes out that safety case/goal based safety regimes perform ‘all that better’ than other regulatory regimes. Would have also been nice if they’d compared and contrasted against other industries, like aviation.
So I’ve been invited to to give a talk on risk at the conference dinner. Should be interesting.
When is an interlock not an interlock?
I was working on an interface problem the other day. The problem related to how to judge when a payload (attached to a carrier bus like) had left the parent (like the Huygens lander leaving the Cassini spacecraft above). Now I could use what’s called the ‘interlock interface’ which is a discrete ‘loop back’ that runs through the bus to payload connector then turns around and heads back into the bus again. The interlock interface is there to provides a means for the carriers avionics to determine if the payload is electrically mated to the bus. So should I use this as an indication that the payload has left the carrier bus as well? Well maybe not.
A quick report from sunny Manchester, where I’m attending the IET’s annual combined conference on system safety and cyber security. Day one of the conference proper and I got to be lead off with the first keynote. I was thinking about getting everyone to do some Tai Chii to limber up (maybe next year). Thanks once again to Dr Carl Sandom for inviting me over, it was a pleasure. I just hope the audience felt the same way. :)
An interesting article in Forbes on human error in a very unforgiving environment, i.e. treating ebola patients, and an excellent use of basic statistics to prove that cumulative risk tends to do just that, accumulate. As the number of patients being treated in the west is pretty low at the moment it also gives a good indication of just how infectious Ebola is. One might also infer that the western medical establishment is not quite so smart as it thought it was, at least when it comes to treating the big E safely.
Of course the moment of international zen in the whole story had to be the comment by the head of the CDC Dr Friedan, that and I quote “clearly there was a breach in protocol”, a perfect example of affirming the consequent. As James Reason pointed out years ago there are two ways of dealing with human error, so I guess we know where the head of the CDC stands on that question. :)
Well after a week away teaching system safety to Navy in the depths of Victoria I’m off again! Current destination, the IET’s System Safety and Cyber Security conference in Manchester.
Just to mention that if you’re coming to the workshop day I’ll be running one on safety cases, so if you’re interested drop by and pull up a chair. :)
TCAS, emergent properties and risk trade-offs
There’s been some comment from various regulator’s regarding the use of Traffic Collision Avoidance System (TCAS) on the ground, experience shows that TCAS is sometimes turned on and off at the same time as the Mode S transponder. Eurocontrol doesn’t like it and is quite explicit about their dislike, ‘do not use it while taxiing’ they say, likewise the FAA also states that you should ‘minimise use on ground’. There are legitimate reasons for this dislike, having too many TCAS transponders operating within a specific area can degrade system performance as well as potentially interfering with airport ground radars. And as the FAA point out operating with the AD-B transponder on will also ensure that the aircraft is visible to ATC and other ADS-B (in) equipped aircraft (1). Which leaves us with the question, why are aircrew using TCAS on the ground? Is it because it’s just easy enough to turn on at the push back? Or is there another reason?
Interesting article on old school rail safety and lessons for the modern nuclear industry. As a somewhat ironic addendum the early nuclear industry safety studies also overlooked the risks posed by large inventories of fuel rods on site, the then assumption being that they’d be shipped off to a reprocessing facility as soon as possible, it’s hard to predict the future. :)
And in news just to hand, the first Ebola case is reported in the US. It’ll be very interesting to see what happens next, and how much transmission rate is driven by cultural and socio-economic effects…
In case anyone missed it the Ebola outbreak in Africa is now into the ‘explosive’ phase of the classic logistics growth curve, see this article from New Scientist for more details. For small world perspective on pandemics see my earlier post on the H1N1 outbreak.
Here in the west we get all the rhetoric about Islamic State as an existential threat but little to nothing about the big E, even though this epidemic will undoubtedly kill more people than that bunch of crazies ever will. Ebola doesn’t hate us for who we are, but it’ll damn well kill a lot of people regardless.
Another worrying thought is that the more cases, the more generations of the disease clock over and the more chance there is for a much worse variant to emerge that’s got global legs. We’ve been gruesomely lucky to date that Ebola is so nasty, because it tends too burn out before going to far, but that can change ver quickly. This is a small world, and what happens inside a village in West Africa actually matters to people in London, Paris, Sydney or Moscow. Were I PM that’s where I’d be sending assistance, not back into the cauldron of the Middle East…