Flaws in the glass

19/07/2009 — 6 Comments

DO-178B and the B-777 9M-MRG Incident

In August 2005 a Boeing 777 experienced an in-flight upset caused by the aircraft’s Air Data Inertial Reference Unit (ADIRU), generating erroneous acceleration data. The software fault that caused this upset raises questions in turn about the DO-178 software development process. A subsequent investigation of the accident by the Australian Transportation Board (ATSB) identified that the following had occured:

  • accelerometer #5 failed on the first of June in a false high value output mode,
  • the ADIRU excluded accelerometer #5 from use in its computations,
  • the ADIRU unit remained in service with this failed component (1),
  • power to the ADIRU was cycled (causing a system reset),
  • accelerometer #6 then failed in-flight,
  • accelerometer #6 was excluded from use by the ADIRU,
  • the ADIRU then re-admitted accelerometer #5 into its computations, and
  • erroneous acceleration values were output to the flight computer.

A more severe event was prevented by the inclusion of Mid Value Select (MVS) voting into the primary flight computer that compared the Standby Air-data Attitude Reference Unit (SAARU) acceleration values to those being generated by the ADIRU (2).

Why did it happen?

While the status of the failed unit was recorded in the on-board maintenance computer memory, that memory was not checked by the ADIRU Operational Program Software (OPS) during the start-up initialisation sequence. The ADIRU software then treated the accelerometer fault as a transient event and allowed the accelerometer to be re-instated. The ATSB determined that a software ‘anomaly’ (but let’s call it what it was, a latent software fault) existed in the ADIRU OPS that allowed the reentry of accelerometer #5 into the channel. Previous versions of the software had masked this latent fault but software changes in a late release had exposed the fault allowing the above scenario to occur. The ATSB went onto state that:

The certification of the ADIRU operational program software (OPS) was dependent on it being tested against the requirements specified in the initial design. The conditions involved in this event were not identified in the testing requirements, so were not tested.

From the ATSB’s perspective at least, the root cause of the software fault is an incompleteness in the requirements specification resulting in the OPS having an incorrect model of the system upon startup. Safety engineers have frequently pointed out that inadequate specifications are the primary cause of safety related software problems, see for example the work of Leveson (1995) and Lutz (1993). Their work has identified incompleteness in software specifications as a major source of these problems. Leveson posits  appears to be that these incompleteness errors are related to fundamental human cognitive biases (Leveson 2000).

But what about DO-178B?

The ADIRU OFS had been previously certified as meeting the requirements of DO-178B the de-facto software certification standard for aviation software. So, if the software had been developed to DO-178, why was this fault still present? To answer this question one needs to understand that DO-178 is a process standard. Simplistically one formulates high level requirements at one end, follow the DO-178 process for deriving more and more detailed requirements and verifying them religiously until finally out pops a verified software product at the other end. If you follow the process your software is deemed to be safe. As a process standard DO-178 was not intended, and nor does it, incorporate any lessons learned from previous accidents or incidents involving software. So unlike other engineering standards, which embody the remembrance of past mistakes, DO-178 is therefore ‘memory-less’.

As a result there is nothing in DO-178 that prevents a software development team from recapitulating the sins of the past, or in this case, failing to ensure that the internal OPS model of the accelerometer state was updated to reflect the actual accelerometer operational state at startup. Had such a requirement been levied the ADIRU software would have been re-certified as meeting it at each new release and it’s reasonable to expect that the ‘masked’ software fault would have been detected when the masking functionality was removed.

Completeness criteria

As it happens the missing requirement identified above was derived directly from a set of completeness criteria developed to identify missing, incorrect or ambiguous requirements for process control software (Leveson 1995). These criteria have been part of the published literature for more than a decade, yet the question of software requirements completeness has still not been meaningfully addressed by the current crop of software safety standards, including of course DO-178B. So where does that leave us? Clearly DO-178 has not delivered a safe software product in this instance, yet this failure seems not to have triggered any alarm bells with either regulators or standardisation bodies.

Unsettling isn’t it.

Notes

1.  The ADIRU was designed with fault containment areas that allowed continued operation with an unserviceable item in any of the FCAs. This allowed operators to defer maintenance until the number of serviceable FCMs in any single area was less than that specified by the component manufacturer.

2.  As a BTW we should definitely thank the Boeing engineers who decided to leave the MVS algorithm in place, despite theoretical analyses indicating that the theorised ADIRU outputs could  not occur.

References

1.  ATSB Transport Safety Report, Aviation Occurrence Investigation, AO-200503722 Final Report on In-flight upset event 240 km north-west of Perth, WA Boeing Company 777-200, 9M-MRG 1 August 2005.

2.  Leveson, N.G., Safeware: System Safety and Computers, Addison-Wesley Pub., 1995.

3.  Leveson, N.G., Completeness in Formal Specification Language Design for Process-Control Systems, Proc. of the ACM SigSoft 3rd workshop on Formal methods in software practice, pp 75 – 87, 2000.

4.  Lutz, R.R., Analysing software requirements errors in safety-critical, embedded systems, Proc. of the IEEE International Symposium on Requirements Engineering, pp 35–46, 1993.

6 responses to Flaws in the glass

  1. 
    stallwarning 20/05/2013 at 9:44 am

    Is it right that Boeing only suffered one case of UAS since 2005 whilst Airbus suffered 40+ cases?
    So the Airbus architecture with 3x ADRs has no real back-up compared with Boeing’s separate ADIRUs and SAARUs?

    • 
      Matthew Squair 20/05/2013 at 4:09 pm

      I can’t comment as to the total number of UAS that Boeing has had in that time-frame across it’s various fleets. Airbus architecture does have a standby attitude reference called the Integrated Standby Instrument System (ISIS), unfortunately it’s fed off the same air data probes as the primary channels.

      • 
        stallwarning 20/05/2013 at 7:40 pm

        …and therin lies the deadly and nasty rub!
        This is what I was trying to fathom.
        Many thanks!

      • 
        stallwarning 21/05/2013 at 7:40 am

        ISFD:
        Shares pitot/static probes with one of the ADIRUs but as far as I know it does all its own processing.
        So, short physical probe malfunction, the ISFD should be OK.

        ISIS:
        Same data probes (the #3) but it does not go through any air data or IR reference units.
        It has its own air data and attitude reference

      • 
        Matthew Squair 21/05/2013 at 10:54 am

        If by ‘it has its own…’ you mean it is directly connected to the standby static and pitot probes pneumatic lines without an intervening ADM or ADR then yes.

        However ISIS does share probe and static port with ADR 3 (the standby probes). Captain probes feed ADR1, Copilot probes feed ADR2 and Standby probes feed ADR3 and (directly) ISIS. Standby pitot is on the same side as the Captains pitot.

      • 
        stallwarning 21/05/2013 at 11:17 pm

        Airbus uses very different fault handling logic as opposed to Boeing’s.
        http://stallwarning.wordpress.com/2013/04/19/adiru/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s