Archives For Software
The 15 commandments of the god of the machine
Herewith, are the 15 commandments for thine safety critical software as spoken by the machine god unto his prophet Kopetz.
- Thou shalt regard the system safety case as thy tabernacle of safety and derive thine critical software failure modes and requirements from it.
- Thou shalt adopt a fundamentally safe architecture and define thy fault tolerance hypothesis as part of this. Even unto the definition of fault containment regions, their modes of failure and likelihood.
- Thine fault tolerance shall include start-up operating and shutdown states
- Thine system shall be partitioned to ‘divide and conquer’ the design. Yea such partitioning shall include the precise specification of component interfaces by time and value such that all manner of men shall comprehend them
- Thine project team shall develop a consistent model of time and state for even unto the concept of states and fault recovery by voting is the definition of time important.
- Yea even though thou hast selected a safety architecture pleasing to the lord, yet it is but a house built upon the sand, if no ‘programming in the small’ error detection and fault recovery is provided.
- Thou shall ensure that errors are contained and do not propagate through the system for a error idly propagated to a service interface is displeasing to the lord god of safety and invalidates your righteous claims of independence.
- Thou shall ensure independent channels and components do not have common mode failures for it is said that homogenous redundant channels protect only from random hardware failures neither from the common external cause such as EMI or power loss, nor from the common software design fault.
- Thine voting software shall follow the self-confidence principle for it is said that if the self-confidence principle is observed then a correct FCR will always make the correct decision under the assumption of a single faulty FCR, and only a faulty FCR will make false decisions.
- Thou shall hide and separate thy fault-tolerance mechanisms so that they do not introduce fear, doubt and further design errors unto the developers of the application code.
- Thou shall design your system for diagnosis for it is said that even a righteously designed fault tolerant system my hide such faults from view whereas thy systems maintainers must replace the affected LRU.
- Thine interfaces shall be helpful and forgive the operator his errors neither shall thine system dump the problem in the operators lap without prior warning of impending doom.
- Thine software shall record every single anomaly for your lord god requires that every anomaly observed during operation must be investigated until a root cause is defined
- Though shall mitigate further hazards introduced by your design decisions for better it is that you not program in C++ yet still is it righteous to prevent the dangling of thine pointers and memory leaks
- Though shall develop a consistent fault recovery strategy such that even in the face of violations of your fault hypothesis thine system shall restart and never give up.
To err is human, but to really screw it up takes a team of humans and computers…
How did a state of the art cruiser operated by one of the worlds superpowers end up shooting down an innocent passenger aircraft? To answer that question (at least in part) here’s a case study that’s part of the system safety course I teach that looks at some of the casual factors in the incident.
In the immediate aftermath of this disaster there was a lot of reflection, and work done, on how humans and complex systems interact. However one question that has so far gone unasked is simply this. What if the crew of the USS Vincennes had just used the combat system as it was intended? What would have happened if they’d implemented a doctrinal ruleset that reflected the rules of engagement that they were operating under and simply let the system do its job? After all it was not the software that confused altitude with range on the display, or misused the IFF system, or was confused by track IDs being recycled… no, that was the crew.
Hannibal ante portas!
A recent article in Wired discloses how hospital drug pumps can be hacked and the firmware controlling them modified at will. Although in theory the comms module and motherboard should be separated by an air gap, in practice there’s a serial link cunningly installed to allow firmware to be updated via the interwebz.
As the Romans found, once you’ve built a road that a legion can march down it’s entirely possible for Hannibal and his elephants to march right up it. Thus proving once again, if proof be needed, that there’s nothing really new under the sun. In a similar vein we probably won’t see any real reform in this area until someone is actually killed or injured.
This has been another Internet of Things moment of zen.
A while ago, while I was working on a project that would have been based (in part) in Queensland I was asked to look at the implications of the Registered Professional Engineers Queensland act for the project, and in particular for software development. For those not familiar, the Act provides for the registration of professional engineers to practise in Queensland. If you’re not registered you can’t practice unless you’re supervised by a registered engineer. Upon registering you then become liable to a statutory Board of Professional Engineers for your professional conduct. Oh yes and practicing without coverage is a crime.
While the act is oriented squarely at the provision of professional services, don’t presume that it is solely the concern of consultancies. Continue Reading…
Stall warning and Alternate law
This post is part of the Airbus aircraft family and system safety thread.
According to an investigator from Indonesia’s National Transportation Safety Committee (NTSC) several alarms, including the stall warning, could be heard going off on the Cockpit Voice Recorder’s tape.
Now why is that so significant?
I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.
DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.
So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.
Enshrined in Australia’s current workplace health and safety legislation is the principle of ‘So Far As Is Reasonably Practicable’. In essence SFAIRP requires you to eliminate or to reduce risk to a negligible level as is (surprise) reasonably practicable. While there’s been a lot of commentary on the increased requirements for diligence (read industry moaning and groaning) there’s been little or no consideration of what is the ‘theory of risk’ that backs this legislative principle and how it shapes the current legislation, let alone whether for good or ill. So I thought I’d take a stab at it. 🙂 Continue Reading…
The DEF STAN 00-55 monster is back!!
That’s right, moves are afoot to reboot the cancelled UK MOD standard for software safety, DEF STAN 00-55. See the UK SCSC’s Event Diary for an opportunity to meet and greet the writers. They’ll have the standard up for an initial look on-line sometime in July as we well, so stay posted.