DO-178B and the B-777 9M-MRG Incident
In August 2005 a Boeing 777 experienced an in-flight upset caused by the aircraft’s Air Data Inertial Reference Unit (ADIRU), generating erroneous acceleration data. The software fault that caused this upset raises questions in turn about the DO-178 software development process. A subsequent investigation of the accident by the Australian Transportation Board (ATSB) identified that the following had occured:
- accelerometer #5 failed on the first of June in a false high value output mode,
- the ADIRU excluded accelerometer #5 from use in its computations,
- the ADIRU unit remained in service with this failed component (1),
- power to the ADIRU was cycled (causing a system reset),
- accelerometer #6 then failed in-flight,
- accelerometer #6 was excluded from use by the ADIRU,
- the ADIRU then re-admitted accelerometer #5 into its computations, and
- erroneous acceleration values were output to the flight computer.
A more severe event was prevented by the inclusion of Mid Value Select (MVS) voting into the primary flight computer that compared the Standby Air-data Attitude Reference Unit (SAARU) acceleration values to those being generated by the ADIRU (2).
Why did it happen?
While the status of the failed unit was recorded in the on-board maintenance computer memory, that memory was not checked by the ADIRU Operational Program Software (OPS) during the start-up initialisation sequence. The ADIRU software then treated the accelerometer fault as a transient event and allowed the accelerometer to be re-instated. The ATSB determined that a software ‘anomaly’ (but let’s call it what it was, a latent software fault) existed in the ADIRU OPS that allowed the reentry of accelerometer #5 into the channel. Previous versions of the software had masked this latent fault but software changes in a late release had exposed the fault allowing the above scenario to occur. The ATSB went onto state that:
The certification of the ADIRU operational program software (OPS) was dependent on it being tested against the requirements specified in the initial design. The conditions involved in this event were not identified in the testing requirements, so were not tested.
From the ATSB’s perspective at least, the root cause of the software fault is an incompleteness in the requirements specification resulting in the OPS having an incorrect model of the system upon startup. Safety engineers have frequently pointed out that inadequate specifications are the primary cause of safety related software problems, see for example the work of Leveson (1995) and Lutz (1993). Their work has identified incompleteness in software specifications as a major source of these problems. Leveson posits appears to be that these incompleteness errors are related to fundamental human cognitive biases (Leveson 2000).
But what about DO-178B?
The ADIRU OFS had been previously certified as meeting the requirements of DO-178B the de-facto software certification standard for aviation software. So, if the software had been developed to DO-178, why was this fault still present? To answer this question one needs to understand that DO-178 is a process standard. Simplistically one formulates high level requirements at one end, follow the DO-178 process for deriving more and more detailed requirements and verifying them religiously until finally out pops a verified software product at the other end. If you follow the process your software is deemed to be safe. As a process standard DO-178 was not intended, and nor does it, incorporate any lessons learned from previous accidents or incidents involving software. So unlike other engineering standards, which embody the remembrance of past mistakes, DO-178 is therefore ‘memory-less’.
As a result there is nothing in DO-178 that prevents a software development team from recapitulating the sins of the past, or in this case, failing to ensure that the internal OPS model of the accelerometer state was updated to reflect the actual accelerometer operational state at startup. Had such a requirement been levied the ADIRU software would have been re-certified as meeting it at each new release and it’s reasonable to expect that the ‘masked’ software fault would have been detected when the masking functionality was removed.
As it happens the missing requirement identified above was derived directly from a set of completeness criteria developed to identify missing, incorrect or ambiguous requirements for process control software (Leveson 1995). These criteria have been part of the published literature for more than a decade, yet the question of software requirements completeness has still not been meaningfully addressed by the current crop of software safety standards, including of course DO-178B. So where does that leave us? Clearly DO-178 has not delivered a safe software product in this instance, yet this failure seems not to have triggered any alarm bells with either regulators or standardisation bodies.
Unsettling isn’t it.
1. The ADIRU was designed with fault containment areas that allowed continued operation with an unserviceable item in any of the FCAs. This allowed operators to defer maintenance until the number of serviceable FCMs in any single area was less than that specified by the component manufacturer.
2. As a BTW we should definitely thank the Boeing engineers who decided to leave the MVS algorithm in place, despite theoretical analyses indicating that the theorised ADIRU outputs could not occur.
1. ATSB Transport Safety Report, Aviation Occurrence Investigation, AO-200503722 Final Report on In-flight upset event 240 km north-west of Perth, WA Boeing Company 777-200, 9M-MRG 1 August 2005.
2. Leveson, N.G., Safeware: System Safety and Computers, Addison-Wesley Pub., 1995.
3. Leveson, N.G., Completeness in Formal Specification Language Design for Process-Control Systems, Proc. of the ACM SigSoft 3rd workshop on Formal methods in software practice, pp 75 – 87, 2000.
4. Lutz, R.R., Analysing software requirements errors in safety-critical, embedded systems, Proc. of the IEEE International Symposium on Requirements Engineering, pp 35–46, 1993.