On the brittleness of software


Air France Tail plane (Image Source: Agencia Brasil CCLA 2.5)

Requirements completeness and the AF447 stall warning

Reading through the BEA’s precis of the data contained on Air France’s AF447 Flight Data Recorder you find that during the final minutes of AF447 the aircrafts stall warning ceased, even though the aircraft was still stalled, thereby removed a significant cue to the aircrew that they had flown the aircraft into a deep stall.

As the BEA noted, the loss of stall warning was due to the sensed speed values decreasing below a set threshold, beyond which they were considered invalid. When the speed values drop below 60 knots, alpha values are invalidated and as alpha is required for stall protection the stall warning ceases.

Note: When the measured speeds are below 60 kt, the measured angle of attack values are considered invalid and are not taken into account by the systems. When they are below 30 kt, the speed values themselves are considered invalid.

BEA, Update on Investigation, 27 May 2011

Normally loss of stall warnings at low air speed is perfectly appropriate, for example as the aircraft slows on touch down you don’t want the stall warning sounding continuously so in that context the automation performed it’s job to specification. But, in the case of AF447 once the aircraft had departed from controlled flight into a deep stall the same logic removed a critical cue to the aircrew as to the state of the aircraft.

In fact the effect was much worse than just removing a cue, because the logic actually ensured that during stall recovery as the air speed picked up the stall warning would recommence. Presenting a counter intuitive warning during any recovery  to the crew.

Now a loss of forward airspeed is a fundamental aspect of entering a deep stall. So it appears that there was an assumption made by the software designers that the aircraft would not be flown into this flight state. Of course when we make an assumption during design attendant to it goes a degree of epistemic risk.

Within the context of such an assumption the cessation of the stall warning makes perfect sense, is perfectly reasonable (in fact desired) and safe. Without it what James Reason calls a ‘latent condition’ had been introduced into the flight software, waiting for the right circumstances to emerge and affect the crews response.

This unintended behavior highlights the difference between software’s response to an ambiguous situation versus a human operators. The aircraft’s software had no internal ‘model of flight’ that it used to advise the crew. Instead ‘brittle’ procedural rules provided a set of warnings, and in the circumstances of AF447 they broke.

So why did the designers make such an assumption? Well, the truth is that we may never know. Design records (even for flight software) don’t normally go down to that level of introspection.

One reason may simply be the industrial scale of developing a million SLOC size software product. With a multi-developer, multi-year effort it is quite difficult to cover off all interactions and in this case the interaction may have arisen because of a failure of communication between separate development teams.

Another reason may be that the conditional logic was simply not expressed clearly enough that the implications could be understood. As Nancy Leveson has pointed out, often how we represent and communicate complex logical statements is just as important in avoiding error as actually understanding the functional requirements.

Finally a cultural perspective, one of the fundamental paradigms of large aircraft flight safety has been to avoid/prevent the aircraft from departing into uncontrolled flight. If your world view is shaped by providing a barrier between safe and unsafe operational states, then you may not consider what happens on the other side of stall entry as carefully…

My fundamental criticism of the resultant design approach, regardless of causation, is that the stall warning function failed to operate with a ‘never give up strategy’. To be brutal, the software quit before the pilots did.

This post is part of the Airbus aircraft family and system safety thread.

3 responses to On the brittleness of software


    “If your world view is shaped by providing a barrier between safe and unsafe operational states, then you may not consider what happens on the other side of stall entry as carefully…”

    I would be interested in taking this line of thinking a bit further: was the aircraft at that point so far outside the normal flight envelope that recovery would have been excessively difficult anyway (given the altitude)? The stall warning was ignored for quite some time before the values were considered invalid. And in that case, isn’t it better to optimize its behavior for “normal” flight, so unnecessary risk of error isn’t introduced by trying to detect more states?


      Matthew Squair 08/08/2011 at 6:25 pm

      Thanks for your comment John.

      Yes people do tend to assume that the lines draw on a flight envelope are ‘solid’ walls and that flight outside them is impossible. In practice one can fly an aircraft into a deep stall such as AF447 and recover, if given enough altitude. Likewise one can ‘loft’ right through maximum altitude given enough speed. These concepts are fairly well understood in the high performance (read military) aviation community.

      If you look at the flight data contained in the BEA’s third interim report you’ll see an point at which the pilot puts in a control forward input and the aircraft actually responds by picking up air speed. Then the stall warning kicks in and control back inputs resume…

      So yes, up to (some) point the aircraft was eminently recoverable, but the crew for complex reasons that are beyond this comment failed to recognize that their model of the system was invalid and that they were stalled. Check the voice recorder, no mention of ‘stall’.


Trackbacks and Pingbacks:

  1. AF447 wreckage found - Page 128 - PPRuNe Forums - August 4, 2011

    […] […]