Archives For Safety

The practice of safety engineering in various high consequence industries.

A UAV and COMAIR near miss over Kabul illustrates the problem of emergent hazards when we integrate systems or operate existing systems in operational contexts not considered by their designers.

Continue Reading...

For those interested the interim report by Mike Weightman, the UK’s Inspector of Nuclear Installations, on lessons from Fukushima has been released.

Continue Reading...

A near disaster in space 40 years ago serves as a salutory lesson on Common Cause Failure (CCF)

Two days after the launch of Apollo 13 an oxygen tank ruptured crippling the Apollo service module upon which the the astronauts depended for survival, precipitating a desperate life or death struggle for survival. But leaving aside what was possibly NASA’s finest hour, the causes of this near disaster provide important lessons for design damage resistant architectures.

Continue Reading…

Why we risk…

15/05/2011

Why taking risk is an inherent part of the human condition

On the 6th of May 1968 Neil Armstrong stepped aboard the Lunar Lander Test Vehicle (LLTV) for a routine training mission. During the flight the vehicle went out of control and crashed with Armstrong ejecting to safety seconds before impact. Continue Reading…

Blayais Plant (Image source: Wikipedia Commons)

What a near miss flooding incident at a French nuclear plant in 1999 and the Fukushima 2012 disaster can tell us about fault tolerance and designing for reactor safety

Continue Reading…

Retrieval of the data recorders heralds the end of the beginning for the AF 447 accident investigation, rather that the beiginning of the end…

Continue Reading...

For the STS 134 mission NASA has estimated a 1 in 90 chance of loss of vehicle and crew (LOCV) based on a Probabilistic Risk Assessment (PRA). But should we believe this number?

Continue Reading...

Fukushima NPP March 17 (Image Source: AP)

There are few purely technical problems…

The Washington Post has discovered that concerns about the vulnerability of the Daiichi Fukushima plant to potential Tsunami events were brushed aside at a review of nuclear plant safety conducted in the aftermath of the Kobe earthquake. Yet at other plants the Japanese National Institute of Advanced Industrial Science and Technology (NISA) had directed the panel of engineers and geologists to consider tsunami events.

Continue Reading…

QF32 Redux

29/03/2011

QF32 - No. 1 engine failure to shutdown

The ABC’s treatment of the QF 32 incident treads familiar and slightly disappointing ground

While I thought that the ABC 4 Corners programs treatment of the QF 32 incident was a creditable effort I have to say that I was unimpressed by the producers homing in on a (presumed) Rolls Royce production error as the casus belli.

The report focused almost entirely upon the engine rotor burst and its proximal cause but failed to discuss (for example) the situational overload introduced by the ECAM fault reporting, or for that matter why a single rotor burst should have caused so much cascading damage and so nearly led to the loss of the aircraft.

Overall two out of four stars 🙂

If however your interested in a discussion of the deeper issues arising from this incident then see:

  1. Lessons from QF32. A discussion of some immediate lessons that could be learned from the QF 32 accident;
  2. The ATSB QF32 preliminary report. A commentary on the preliminary report and its strengths and weaknesses;
  3. Rotor bursts and single points of failure. A review and discussion of the underlying certification basis for commercial aircraft and protection from rotor burst events;
  4. Rotor bursts and single points of failure (Part II), Discusses differences between the damage sustained by QF 32 and that premised by a contemporary report issued by the AIA on rotor bursts;
  5. A hard rain is gonna fall. An analysis of 2006 American Airlines rotor burst incident that indicated problems with the FAA’s assumed rotor burst debris patterns; and
  6. Lies, damn lies and statistics. A statistical analysis, looking at the AIA 2010 report on rotor bursts and it’s underestimation of their risk.

There has been a good deal of print and perspiration expended in the OH&S community on the principal of Zero Harm with the proponents of Zero Harm taking the position that no industrial accident is acceptable, regardless of how small it is. There are however, certain problems with their position.

Continue Reading…

20110122-093121

Why we may be carrying an order of magnitude greater risk from aircraft engine rotor bursts than we thought

One of the ‘implicit’ conclusions of the 2010 AIA study on the threat posed by jet engine rotor bursts was that the fleet of modern aircraft designed to meet FAA circular AC 20-128A also met the FAA established safety targets of a 1 in 20 likelihood of a catastrophic loss, in the event of a engine rotor burst.

Continue Reading…

On June 2, 2006, an American Airlines B767-223(ER), N330AA, equipped with General Electric (GE) CF6-80A engines experienced an uncontained failure of the high pressure turbine (HPT) stage 1 disk2 in the No. 1 (left) engine during a high-power ground run for maintenance at Los Angeles International Airport (LAX), Los Angeles, California.

To provide a better appreciation of aircraft level effects I’ve taken the NTBS summary description of the damage sustained by the aircraft and illustrated it with pictures taken of the accident by bystanders and technical staff.

Continue Reading...

QF 72 (Image Source: Terence Ong)

The QF 72 accident illustrates the significant effects that ‘small field’ decisions can have on overall system safety Continue Reading…

A report by the AIA on engine rotor bursts and their expected severity raises questions about the levels of damage sustained by QF 32.

Continue Reading...

It appears that the underlying certification basis for aircraft safety in the event of a intermediate power turbine rotor bursts is not supported by the rotor failure seen on QF 32.

Continue Reading...

The ATSB release the preliminary report on the QF 32 A380 uncontained engine failure. While the report sheds light on a number of key issues in the investigation and certainly provides a ‘smoking gun’ for the engine failure I was left a little underwhelmed by the entire report.

Continue Reading...

Lessons from QF 32

06/11/2010

The recent Qantas QF32 engine failure illustrates the problems of dealing with common cause failure

This post is part of the Airbus aircraft family and system safety thread.

Updated: 15 Nov 2012

Generally the reason we have more than one of anything on a passenger aircraft is because we know that components can fail so independent redundancy is the cornerstone strategy to achieve the required levels of system reliability and safety. But while overall aircraft safety is predicated on the independence of these components, the reality is that the catastrophic failure of one component can also affect adjacent equipment and systems leading to what are termed common cause failures.

Continue Reading…

The Titanic effect

27/09/2010

So why did the Titanic sink? The reason highlights the role of implicit design assumptions in complex accidents and the interaction of design with operations of safety critical systems

Continue Reading...

The fallout from the QF 72 in flight accident has now reached the courts with Australian Aviation reporting that passengers and crew have taken up a joint class action against Airbus and Northrop Grumman (the manufacturer of the faulty Air Data Inertial Reference Unit).

Continue Reading...

The reality is that when pilots fly through an icing event or a driver steers through a skid the aircraft or car is not intelligent, the intelligence is actually in the head of the designer, the automation is merely his proxy.

Continue Reading...

Disappointingly the Black Saturday royal commission report makes no mention of the effect of cognitive biases upon making a ‘stay or go’ decision, instead assuming that such decisions are made in a completely rationa fashion. As Black Saturday and other disasters show this is rarely the case.

Continue Reading...

Lead Tangara car damage (Source: Commission report)

On the 31st of January 2003 at approx. 7:14 am a four car Tangara passenger train on run C311 from Sydney Central to Port Kembla (G7) oversped on a downhill gradient leading into a curve and left the track. The train driver and six passengers were killed and the remaining passengers suffered various injuries ranging from minor bruising and lacerations to severe disabling injuries. Continue Reading…

Over the last couple of months I’ve posted on various incidents involving the Airbus A330 aircraft from the perspective of system safety. As these posts are scattered through my blog I thought I’d pull them together, the earliest post is at the bottom.

Continue Reading...

Last week the FAA released an Airworthiness Directive (2010-06-09) for the Boeing 777 aircraft to prevent inadvertent engagement of the autopilot during takeoff roll, which could result in a rejected takeoff a runway overrun. But what are the deeper issues behind the incident?

Continue Reading...

At the height of the cold war with bombers carrying nuclear weapons on airborne alert and the strategic forces of both sides on a knife edge the possibility that a nuclear weapon could go off purely by accident and trigger nuclear war was a disquieting one.

Both sides realised that the risk of inadvertently starting World War III had to be minimised, and on the American side after several near misses in the 40s and 50s engineers at Los Alamos and Sandia labs started to work seriously on how to prevent nuclear weapons from going off by accident.

Continue Reading…

A330 Right hand AoA probes (Image source: ATSB)

I’ve just finished reading the ATSB’s second interim report on the the QF 72 in flight upset that resulted in two uncommaned pitch over events (1). In this accident one of the Air Data Inertial Reference Units (ADIRU) provided erroneous data in the form of transient spikes vales of the angle of attack AoA parameter to the flight control computers which then initiated two un-commanded extreme pitch overs.

This post is part of the Airbus aircraft family and system safety thread. Continue Reading…

I attended the Australian Rail Safety Conference 2010 in Melbourne this week. The conference’s theme was safety leadership and as a result we had a broad spread of corporate executives present providing their views on the leadership aspect of safety.

Continue Reading...

So far as we know flight AF 447 fell out of the sky with its systems performing as their designers had specified, if not how they expected, right up-to the point that it impacted the surface of the ocean.

So how is it possible that incorrect air data could simultaneously cause upsets in aircraft functions as disparate as engine thrust management, flight law protection and traffic avoidance?

Continue Reading...

One of the positive outcomes from a disaster such as Black Saturday is that a window of opportunity opens in which opinions, behaviour and even public policy can be changed.

Continue Reading...

So, a year on from the Black Saturday fires and the royal commission established in their aftermath is working it’s way to a conclusion. While the commission has certainly been busy, I guess you could say that I was left unsatisfied by the recommendations.

Continue Reading...

From the BEA’s second interim report (BEA 2009) we now know that AF 447 was flown into the water in a deep stall. Given the training and experience of the flight crew how did they end up in such a situation?

Continue Reading...

Reading the 2nd BEA interim report’s analysis of ACARS message timing provides us with a further refinement of a calculation of AF 447’s terminal vertical speed (posted here) based on the cabin vertical speed advisory.

Continue Reading…

After several months of undersea searching for the black boxes of AF 447, no joy. So, let’s ask a simple question. Why should the FDR’s end up on the sea bed in the first place?

Continue Reading...

The use of median value voting algorithms as part of fault tolerant design has become an almost ubiquitous design solution, especially for avionics systems. But have we really considered their suitability?

Continue Reading...

The TCAS II specification credibility window can provide us with an insight into the magnitude initial unreliable air data parameters in the AF 447 disaster.

Continue Reading...

The latest revision of the BEA report on the AF 447 accident omits mention of the last ACARS message received, a cabin vertical speed advisory. But from this message we can infer at least approximations of the final segments of AF 447’s flight profile.

Continue Reading...

Pitot sensor (Source: BEA)

The theory of Highly Optimised Tolerance (HOT) predicts that as technological systems evolve to become more robust to common perturbations they still remain vulnerable to rare events (Carlson, Doyle 2002) and this theory may give us an insight into the performance of modern integrated air data systems in the face of in-flight icing incidents. 

Continue Reading…

Ariane 501 Launch

I was cleaning up some of my reference material and came across a copy of the ESA board of investigation report into the Ariane 501 accident. I’ve added my own personal observations, as well as those of other commentators, to the report. Continue Reading…

In one’s professional life there are certain critical works that open your eyes to how lucidly a technical argument can be stated. Continue Reading…

The Queensland Transport Rail Safety Unit (QTRSU) report into the fatal rail accident at Mindi in 2007 offers a good example of problem framing bias effect during safety investigations.

Continue Reading...

Invalid air data may have triggered the cabin pressure differential safety function on AF 447.

Continue Reading...

Recent incidents involving Airbus aircraft have again focused attention on their approach to cockpit automation and it’s interaction with the crew.

Underlying the current debate is perhaps a general view that the automation should somehow be ‘perfect’, and that failure of automation is also a form of moral failing (1). While this weltanschauung undoubtedly serves certain social and psychological needs the debate it engenders doesn’t really further productive discussion on what could or indeed should be done to improve cockpit automation. So let’s take a closer look at the Airbus protection laws implemented in the flight control automation and compare it with how experienced aircrew actually make decisions in the cockpit.

Continue Reading…

A cross walk of the interim investigation accident reports issued by the ATSB and BEA for the QF72 and AF447 accidents respectively shows that in both accidents the inertial reference units that are part of the onboard air data inertial reference unit (ADIRU) that exhibited anomalous behaviour also declared a failure. Why did this occur?

Continue Reading...

Fire has been an integral part of the Australian ecosystem for tens of thousands of years. Both the landscape and it’s native inhabitants have adapted to this periodic cycle of fire and regeneration. These fires are not bolts from the blue, they occur regularly and predictably, yet modern Australians seem to have difficulty understanding that their land will burn, regularly, and sometimes catastrophically.

So why do we studiously avoid serious consideration of the hazards of living in a country that regularly produces firestorms? Why, in the time of fire, do we go through the same cycle of shock, recrimination, exhortations to do better, diminishing interest and finally forgetfulness?

Continue Reading...

Authors Note. Below is my original post on the potential causes of the AF 447 cabin altitude advisory, I concluded that there were a number of potential causes one of which could be an erroneous altitude input from the ADIRU. What I didn’t consider was that the altitude advisory could have been triggered by correct operation of the cabin pressure control system, see  The AF 447 cabin vertical speed advisory and Pt II for more on this.

The last ACARS transmision received from AF 447 was the ECAM advisory that the cabin altitude (pressure) variation had exceeded 1,800 ft/min for greater than 5 seconds. While some commentators have taken this message to indicate that the aircraft had suffered a catastrophic structural failure, all we really know is that at that point there was a rapid change in reported cabin altitude. Given the strong indications of unreliable air data from other on-board systems, perhaps it’s worthwhile having a look for other potential causes of such rapid cabin pressure changes.

Continue Reading…

TCAS Indicator (Image Source: Public Domain)

What TCAS can tell us about AF447 (Updated 27 Sept 09)

The BEA interim report on the AF447 accident confirms that the Traffic Alert and Collision Avoidance System (TCAS) had become inoperative during the early part of the event sequence for an, as yet, un-identified reason. The explanation may actually be fairly straight forward and lie within the fault tolerance requirements of the TCAS specification. Continue Reading…

Flaws in the glass

19/07/2009

DO-178B and the B-777 9M-MRG Incident

In August 2005 a Boeing 777 experienced an in-flight upset caused by the aircraft’s Air Data Inertial Reference Unit (ADIRU), generating erroneous acceleration data. The software fault that caused this upset raises questions in turn about the DO-178 software development process. A subsequent investigation of the accident by the Australian Transportation Board (ATSB) identified that the following had occured:

  • accelerometer #5 failed on the first of June in a false high value output mode,
  • the ADIRU excluded accelerometer #5 from use in its computations,
  • the ADIRU unit remained in service with this failed component (1),
  • power to the ADIRU was cycled (causing a system reset),
  • accelerometer #6 then failed in-flight,
  • accelerometer #6 was excluded from use by the ADIRU,
  • the ADIRU then re-admitted accelerometer #5 into its computations, and
  • erroneous acceleration values were output to the flight computer.

Continue Reading…

Reading the ATSB interim report on the QF72 in flight accident one could easily overlook the statement, “…the crew reported that the (ECAM (1)) messages were constantly scrolling, and they could not effectively interact with the ECAM to action and/or clear the messages.”. So why did the A330 ECAM display fail during such a critical event?

Continue Reading...

If the theory of Highly Optimised Tolerance (HOT) theory holds true then we should be able to see a change in the distribution of the severity of adverse events as the design paradigm for a family of systems moves from the, ‘just make it work’ stage to the ‘optimise for robustness’ stage. This is something we can actually test through observation of real world systems.

Continue Reading...

The statement by, AirBus regarding the robustness of the AirBus AOA voting logic disclosed in the ATSB QF72 accident report raises some interesting questions as to what was actually meant by the term robustness.

Continue Reading...