I’ve just finished reading an interesting post by Andrew Rae on the missing aspects of engineering education (Mind the Feynman gap) which parallels my more specific concerns, and possibly unkinder comments, about the lack of professionalism in the software community.
Archives For Psychology
Human psychology and the role it plays in decision making under uncertainty.
As I was asked a question on risk homeostasis at the course I’m teaching, here without further ado is John Adam’s tour de force on The failure of seat belt legislation. Collectively, the group of countries that had not passed seat belt laws experienced a greater decrease than the group that had passed laws. Now John doesn’t directly draw the conclusion, but I will, that the seat belt laws kill more people than they save.
And it gets worse, in 1989 the British Government made seat belt wearing compulsory for children under 14 years old in the rear seats of cars, the result? In the year after there was an increase of almost 10% in the numbers of children killed in rear seats, and of almost 12% in the numbers injured (both above background increases). If not enacted there would be young adults now walking around today enjoying their lives, but of course the legislation was passed and we have to live with the consequences.
Now I could forgive the well intentioned who passed these laws, if when it became apparent that they were having a completely contrary effect they repealed them. But what I can’t forgive is the blind persistence, in practices that clearly kill more than they save. What can we make of this depraved indifference, other than people and organisations will sacrifice almost anything and anyone rather than admit they’re wrong?
Well I can’t believe I’m saying this but those happy clappers of the software development world, the proponents of Agile, Scrum and the like might (grits teeth), actually, have a point. At least when it comes to the development of novel software systems in circumstances of uncertainty, and possibly even for high assurance systems.
Mindfulness and paying attention to the wrong things
As I talked about in a previous post on the Deepwater Horizon disaster, I believe one of the underlying reasons, perhaps the reason, for Deepwater’s problems escalating to into a catastrophe was the attentional blindness of management to the indicators of problems on the rig, and that this blindness was due in large part to a corporate focus on individual worker injury rates at the expense of thinking about those rare but catastrophic risks that James Reason calls organisational accidents. And, in a coincidence to end all coincidences there was actually a high level management team visiting just prior to the disaster to congratulate the crew as to their seven years of injury free operations.
So it was kind of interesting to read in James Reason’s latest work ‘A Life in Error‘ his conclusion that the road to epic organisational accidents, is paved with declining or low Lost Time Injury Frequency Rates (LTIFR). He goes on to give the following examples in support:
- Westray mining disaster (1992), Canada. 26 miners died, but the company had received an award for reducing the LTIFR,
- Moura mining disaster (1994), Queensland. 11 miners died. The company had halved its LTIFR in the four years preceding the accident.
- Longford gas plant explosion (1998), Victoria. Two died, eight injured. Safety was directed to reducing LTIFR rather than identifying and fixing the major hazards of un-repaired equipment.
- Texas City explosion (2005), Texas. The Independent Safety Review panel identified that BP relied on injury rates to evaluate safety performance.
As Reason concludes, the causes of accidents that result in a direct (and individual injury) are very different to those that result in what he calls an organisational accident, that is one that is both rare and truly catastrophic. Therefore data gathered on LTIFR tells you nothing about the likelihood of such a catastrophic event, and as it turns out can be quite misleading. My belief is that not only is such data misleading, it’s salience actively channelises management attention, thereby ensuring the organisation is effectively unable to see the indications of impending disaster.
So if you see an organisation whose operations can go catastrophically wrong, but all you hear from management is proud pronouncements as to how they’re reducing their loss time injury rate then you might want to consider maintaining a safe, perhaps very safe, distance.
Reason’s A Life in Error is an excellent read by the way, I give if four omitted critical procedural steps out of five. :)
How do we give meaning to experience in the midst of crisis?
Instead people strive to create a view of it by establishing a common framework into which events can be fitted to makes sense of the world, what Weick (1993) calls a process of sensemaking. And what is true for individuals is also true for the organisations they make up. In return people also use an organisation to make sense of what’s going on, especially in situations of uncertainty, ambiguity or contradiction.
Gregory (Scotland Yard detective): “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the dog in the night-time.”
Gregory: “The dog did nothing in the night-time.”
Holmes: “That was the curious incident.”
What you pay attention to dictates what you’ll miss
The point that the great detective was making was that the absence of something was the evidence which the Scotland Yard detective had overlooked. Holmes of course using imagination and intuition did identify that this was in fact the vital clue. Such a plot device works marvelously well because almost all of us, like detective Gregory, fail to recognise that such an absence is actually ‘there’ in a sense, let alone that it’s important.
John Adams has an interesting take on the bureaucratic approach to risk management in his post reducing zero risk.
The problem is that each decision to further reduce an already acceptably low risk is always defended as being ‘cheap’, but when you add up the increments it’s the death of a thousand cuts, because no one ever considers the aggregated opportunity cost of course.
This remorseless slide of our public and private institutions into a hysteria of risk aversion seems to me to be be due to an inherent societal psychosis that nations sharing the english common law tradition are prone to. At best we end up with pointless safety theatre, at worst we end up bankrupting our culture.
In a slight segue, I was reading Bruce Schneier’s blog on security and came across this post on the psychology behind fraud. Bruce points to this post on why, yes I know, ‘good people do bad things’. The explanation that researchers such as Ann Tenbrunsel of Notre Dame offer is that in the same way that we are boundedly rational in other aspects of decision making so to are our ethical decisions.
In particular, the way in which decision problems were framed seems to have a great impact upon how we make decisions. Basically if a problem was framed without an ethical dimension then decision makers were much less likely to consider that aspect.
Additionally to framing effects, researchers found in studying collusion in fraud cases most people seem to act from an honest desire simply to help others, regardless of any attendant ethical issues.
What fascinates me is how closely such research parallels the work in system safer and human error. Clearly if management works within a frame based upon performance and efficiency, they are simply going to overlook the down side completely, and in a desire to be helpful why everyone else ‘goes along for the ride’.
There is as I see it a concrete recommendation that come out of this research that we can apply to safety; that fundamentally safety management systems need to be designed to take account of of our weaknesses as boundedly rational actors.
One of the perennial issues in regulating the safety of technological systems is how prescriptively one should write the regulations. At one end of the spectrum is a rule based approach, where very specific norms are imposed and at least in theory there is little ambiguity in either their interpretation or application. At the other end you have performance standards, which are much more open-ended, allowing a regulator to make circumstance specific determinations as to whether the standard has been met. Continue Reading…
Taboo transactions and the safety dilemma Again my thanks goes to Ross Anderson over on the Light Blue Touchpaper blog for the reference, this time to a paper by Alan Fiske an anthropologist and Philip Tetlock a social psychologist, on what they terms taboo transactions. What they point out is that there are domains of sharing in society which each work on different rules; communal, versus reciprocal obligations for example, or authority versus market. And within each domain we socially ‘transact’ trade-offs between equivalent social goods.
I was reading a post by Ross Anderson on his dismal experiences at John Lewis, and ran across the term security theatre, I’ve actually heard the term, before, it was orignally coined by Bruce Schneier, but this time it got me thinking about how much activity in the safety field is really nothing more than theatrical devices that give the appearance of achieving safety, but not the reality. From zero harm initiatives to hi-vis vests, from the stylised playbook of public consultation to the use of safety integrity levels that purport to show a system is safe. How much of this adds any real value?
Worse yet, and as with security theatre, an entire industry has grown up around this culture of risk, which in reality amounts to a culture of risk aversion in western society. As I see it risk as a cultural concept is like fire, a dangerous tool and an even more terrible master.
From Les Hatton, here’s how, in four easy steps:
- Insist on using R = F x C in your assessment. This will panic HR (People go into HR to avoid nasty things like multiplication.)
- Put “end of universe” as risk number 1 (Rationale: R = F x C. Since the end of the universe has an infinite consequence C, then no matter how small the frequency F, the Risk is also infinite)
- Ignore all other risks as insignificant
- Wait for call from HR…
A humorous note, amongst many, in an excellent presentation on the fell effect that bureaucracies can have upon the development of safety critical systems. I would add my own small corollary that when you see warning notes on microwaves and hot water services the risk assessment lunatics have taken over the asylum…
With apologies to the philosopher George Santayana, I’ll make the point that the BMW Head Up Display technology is in fact not the unalloyed blessing premised by BMW in their marketing material.
On the subject of near misses…
Presumably the use of the crew cab as an escape pod was not actually high on the list of design goals for the 4000 and 4100 class locomotives, and thankfully the locomotives involved in the recent derailment at Ambrose were unmanned.
Occasional readers of this blog might have noticed my preoccupation with unreliable airspeed and the human factors and system design issues that attend it. So it was with some interest that I read the recent paper by Sathy Silva of MIT and Roger Nicholson of Boeing on aviation accidents involving unreliable airspeed.
But, we tested it? Didn’t we?
Earlier reports of the Boeing 787 lithium battery initial development indicated that Boeing engineers had conducted tests to confirm that a single cell failure would not lead to a cascading thermal runaway amongst the remaining batteries. According to these reports their tests were successful, so what went wrong?
Well it sounded reasonable…
One of the things that’s concerned me for a while is the potentially malign narrative power of a published safety case.
Why sometimes simpler is better in safety engineering.
I’ve just finished up the working week with a day long Safety Conversations and Observations course conducted by Dr Robert Long of Human Dymensions. A good, actually very good, course with an excellent balance between the theory of risk psychology and the practicalities of successfully carrying out safety conversations. I’d recommend it to any organisation that’s seeking to take their safety culture beyond systems and paperwork. Although he’s not a great fan of engineers. :)
One of the recurring problems in running hazard identification workshops is being faced by a group whose members are passively refusing to engage in the process.
A technique that I’ve found quite valuable in breaking participants out of that mindset is TRIZ, or the Theory of Solving Problems Creatively (teoriya resheniya izobretatelskikh zadatch).
The following is an extract from Kevin Driscoll’s Murphy Was an Optimist presentation at SAFECOMP 2010. Here Kevin does the maths to show how a lack of exposure to failures over a small sample size of operating hours leads to a normalcy bias amongst designers and a rejection of proposed failure modes as ‘not credible’.
The reason I find it of especial interest is that it gives, at least in part, an empirical argument to why designers find it difficult to anticipate the system accidents of Charles Perrow’s Normal Accident Theory.
Kevin’s argument also supports John Downer’s (2010) concept of Epistemic accidents. John defines epistemic accidents as those that occur because of an erroneous technological assumption, even though there were good reasons to hold that assumption before the accident.
Kevin’s argument illustrates that engineers as technological actors must make decisions in which their knowledge is inherently limited and so their design choices will exhibit bounded rationality.
In effect the higher the dependability of a system the greater the mismatch between designer experience and system operational hours and therefore the tighter the bounds on the rationality of design choices and their underpinning assumptions. The tighter the bounds the greater the effect of congnitive biases will have, e.g. such as falling prey to the Normalcy Bias.
Of course there are other reasons for such bounded rationality, see Logic, Mathematics and Science are Not Enough for a discussion of these.
Just finished reading the excellent paper A Conundrum: Logic, Mathematics and Science Are Not Enough by John Holloway on the the swirling currents of politics, economics and emotion that can surround and affect any discussions of safety. The paper neatly illustrates why the canonical rational-philosophical model of expert knowledge is inherently flawed.
What I find interesting as a practicing engineer is that although every day debates and discussions with your peers emphasise the subjectivity of engineering ‘knowledge’ as engineers we all still like to pretend and behave as if it is not.
The “‘Oh #%*!”, moment captured above definitely qualifies for the vigorous application of the rule that when the fire’s too hot, the water’s too deep or the smoke’s too thick leave. :-)
But in fact in this incident the pilot actually had to convince the navigator that he needed to leave ‘right now!’. The navigator it turned out was so fixated on shutting down the aircrafts avionics system he didn’t realise how bad thing were, nor recognise that immediate evacuation was the correct response.
In a recent NRCOHSR white paper on the Deeepwater Horizon explosion Professor Andrew Hopkins of the Australian National University argued that the Transocean and BP management teams that were visiting the rig on the day of the accident failed to detect the unsafe well condition because of biases in their audit practices.
An interesting theory of risk perception and communication is put forward by Kahan (2012) in the context of climate risk.
Why the risk matrix?
For new systems we generally do not have statistical data on accidents, and high consequence events are, we hope, quite rare leaving us with a paucity of information. So we usually end up basing any risk assessment upon low base rate data, and having to fall back upon some form of subjective (and qualitative) method of risk assessment.
Risk matrices were developed to guide such qualitative risk assessments and decision making, and the form of these matrices is based on a mix of decision and classical risk theory. The matrix is widely described in safety and risk literature and has become one of the less questioned staples of risk management.
Despite this there are plenty of poorly constructed and ill thought out risk matrices out there, in both the literature and standards, and many users remain unaware of the degree of epistemic uncertainty that the use of a risk matrix introduces. So this post attempts to establish some basic principles of construction as an aid to improving the state of practice and understanding.
In an article published in the online magazine Spectrum Eliza Strickland has charted the first 24 hours at Fukushima. A sobering description of the difficulty of the task facing the operators in the wake of the tsunami.
Her article identified a number of specific lessons about nuclear plant design, so in this post I thought I’d look at whether more general lessons for high consequence system design could be inferred in turn from her list.
Why We Automate Failure
A recent post on the interface issues surrounding the use of side-stick controllers in current generation passenger aircraft led me to think more generally about the the current pre-eminence of software driven visual displays and why we persist in their use even though there may be a mismatch between what they can provide and what the operator needs.
Airbuses side stick improves crew comfort and control, but is there a hidden cost?
The Airbus FBW side stick flight control has vastly improved the comfort of aircrew flying the Airbus fleet, much as the original Airbus designers predicted (Corps, 188). But the implementation also expresses the Airbus approach to flight control laws and that companies implicit assumption about the way in which humans interact with automation and each other. Here the record is more problematic.
Out of the loop, aircrew and unreliable airspeed at high altitude
The BEA’s third interim report on AF 447 highlights the vulnerability of aircrew when their usually reliable automation fails in the challenging operational environment of high altitude flight.
This post is part of the Airbus aircraft family and system safety thread.
How the marking of a traffic speed hump provides a classic example of a false affordance and an unintentional hazard.Continue Reading...
So I’ve read the BEA report from one end to the other and overall it’s a solid and creditable effort. The report will probably disappoint those who are looking for a smoking gun, once again we see a system accident in which the outcome is derived from a complex interaction of system, environment, circumstance and human behavior.
However I do consider that the conclusions, and therefore recommendations, are hasty and incomplete.
This post is part of the Airbus aircraft family and system safety thread.
Why something as simple as control stick design can break an aircrew’s situational awareness
One of the less often considered aspects of situational awareness in the cockpit is the element of knowing what the ‘guy in the other seat is doing’. This is a particularly important part of cockpit error management because without a shared understanding of what someone is doing it’s kind of difficult to detect errors.
Requirements completeness and the AF447 stall warning
Reading through the BEA’s precis of the data contained on Air France’s AF447 Flight Data Recorder you find that during the final minutes of AF447 the aircrafts stall warning ceased, even though the aircraft was still stalled, thereby removed a significant cue to the aircrew that they had flown the aircraft into a deep stall.
Good and bad in the design of an Oliver Hazard Perry class frigates ECS propulsion control console HMI.Continue Reading...
A small question for the ATSB
According to the preliminary ATSB report the crew of QF32 took approximately 50 minutes to process all the Electronic Centralised Aircraft Monitor (ECAM) messages. This was despite this normal crew of three being augmented by a check captain in training and a senior check captain.
Because they have typically pitch unity ratios (1:1) scales, aircraft primary flight displays provide a pitch display that is limited by the vertical field of view. This display can move very rapidly and be difficult to use in unusual attitude recoveries becoming another adverse performance shaping factor for aircrew in such a scenario. Trials by the USAF have conclusively demonstrated that an articulated style of pitch ladder can reduce disorientation of aircrew in such situations.Continue Reading...
I attended the annual Rail Safety conference for 2011 earlier in the year and one of the speakers was Group capt Alan Clements, the Director Defence Aviation Safety and Air Force Safety. His presentation was interesting in both where the ADO is going with their aviation safety management system as well as providing some historical perspective, and statistics.Continue Reading...
I think it was John Norman who pointed out that accidents in complex automated systems often arise because of unintended interactions between operator and automation where both are trying to control the same system.
Now Johns comment is an insightful one, but the follow on question is, logically, how are automation and operator trying to control the system?Continue Reading...
James Reason would classify this as a violation rather than error
What the economic theory of sunk costs tells us about plan continuation bias
Plan continuation bias is a recognised and subtle cognitive bias that tends to force the continuation of an existing plan or course of action even in the face of changing conditions. In the field of aerospace it has been recognised as a significant causal factor in accidents, with a 2004 NASA study finding that in 9 out of the 19 accidents studied aircrew exhibited this behavioural bias. One explanation of this behaviour may be a version of the well known ‘sunk cost‘ economic heuristic.
What the Cry Wolf effect tells us about pilot’s problems with unreliable air data
In a recurring series of incidents air crew have consistently demonstrated difficulty in firstly identifying and then subsequently dealing with unreliable air data and warnings. To me figuring out why this difficulty occurs is essential to addressing what has become a significant issue in air safety.
Knowing the outcome of an accident flight does not ‘explain’ the accident
Hindsight bias and it’s mutually reinforcing cognitive cousin the just world hypothesis are traditional parts of public comment on a major air accident investigation when pilot error is revealed as a causal factor. The public comment in various forum after the release of the BEA’s precis on AF447 is no exception.
This post is part of the Airbus aircraft family and system safety thread.
Why more information does not automatically reduce risk
I recently re-read the article Risks and Riddles by Gregory Treverton on the difference between a puzzle and a mystery. Treverton’s thesis, taken up by Malcom Gladwell in Open Secrets, is that there is a significant difference between puzzles, in which the answer hinges on a known missing piece, and mysteries in which the answer is contingent upon information that may be ambiguous or even in conflict. Continue Reading…