### Archives For software safety

A short tutorial on the architectural principles of integrity level partitioning,  I wrote this a while ago, but the fundamentals remain the same. Partitioning is a very powerful design technique but if you apply it you also need to be aware that it can interact with all sorts of other system design attributes, like scheduling and fault tolerance to name but two.

The material is drawn from may different sources, which unfortunately at the time I didn’t reference, so all I can do is offer a general acknowledgement here. You can also find a more permanent link to the tutorial on my publications page.

I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.

DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.

So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.

Enjoy :)

For those interested, here’s a draft of the ‘Fundamentals of system safety‘ module from a course that I teach on system safety. Of course if you want the full effect, you’ll just have to come along. :)

From Les Hatton, here’s how, in four easy steps:

1. Insist on using R = F x C in your assessment. This will panic HR (People go into HR to avoid nasty things like multiplication.)
2. Put “end of universe” as risk number 1 (Rationale: R = F x C. Since the end of the universe has an infinite consequence C, then no matter how small the frequency F, the Risk is also infinite)
3. Ignore all other risks as insignificant
4. Wait for call from HR…

A humorous note, amongst many, in an excellent presentation on the fell effect that bureaucracies can have upon the development of safety critical systems. I would add my own small corollary that when you see warning notes on microwaves and hot water services the risk assessment lunatics have taken over the asylum…

The QF 72 accident illustrates the significant effects that ‘small field’ decisions can have on overall system safety Continue Reading…

How do ya do and shake hands, shake hands, shake hands. How do ya do and shake hands and state your name and business…

Lewis Carrol, Through the Looking Glass

You would have thought after the Leveson and Knight experiments that the  theory that independently written software would only contain independent faults was dead and buried, another beautiful theory shot down by hard cold fact.  But unfortunately like many great errors the theory of n-versioning keeps on keeping on (1).

I was cleaning up some of my reference material and came across a copy of the ESA board of investigation report into the Ariane 501 accident. I’ve added my own personal observations, as well as those of other commentators, to the report. Continue Reading…

Recent incidents involving Airbus aircraft have again focused attention on their approach to cockpit automation and it’s interaction with the crew.

Underlying the current debate is perhaps a general view that the automation should somehow be ‘perfect’, and that failure of automation is also a form of moral failing (1). While this weltanschauung undoubtedly serves certain social and psychological needs the debate it engenders doesn’t really further productive discussion on what could or indeed should be done to improve cockpit automation. So let’s take a closer look at the Airbus protection laws implemented in the flight control automation and compare it with how experienced aircrew actually make decisions in the cockpit.

Authors Note. Below is my original post on the potential causes of the AF 447 cabin altitude advisory, I concluded that there were a number of potential causes one of which could be an erroneous altitude input from the ADIRU. What I didn’t consider was that the altitude advisory could have been triggered by correct operation of the cabin pressure control system, see  The AF 447 cabin vertical speed advisory and Pt II for more on this.

The last ACARS transmision received from AF 447 was the ECAM advisory that the cabin altitude (pressure) variation had exceeded 1,800 ft/min for greater than 5 seconds. While some commentators have taken this message to indicate that the aircraft had suffered a catastrophic structural failure, all we really know is that at that point there was a rapid change in reported cabin altitude. Given the strong indications of unreliable air data from other on-board systems, perhaps it’s worthwhile having a look for other potential causes of such rapid cabin pressure changes.

The DO-178 Software Development Standard and the B-777 9M-MRG Incident

In August 2005 a Boeing 777 experienced an in-flight upset caused by the aircraft’s Air Data Inertial Reference Unit (ADIRU), generating erroneous acceleration data. The software fault that caused this upset points to flaws in the DO-178 software development process. Continue Reading…

The statement by, AirBus regarding the robustness of the AirBus AOA voting logic disclosed in the ATSB QF72 accident report raises some interesting questions as to what was actually meant by the term robustness.