Archives For Safety

The practice of safety engineering in various high consequence industries.

About time I hear you say!🙂

Yes I’ve just rewritten a post on functional failure taxonomies to include how to use them to gauge the completeness of your analysis. This came out of a question I was asked in a workshop that went something like, ‘Ok mr big-shot consultant tell us, exactly how do we validate that our analysis is complete?’. That’s actually a fair question, standards like EUROCONTROL’s SAM Handbook and ARP 4761 tell you you ought to, but are not that helpful in the how to do it department. Hence this post.

Using a taxonomy to determine the coverage of the analysis is one approach to determining completeness. The other is to perform at least two analyses using different techniques and then compare the overlap of hazards using a capture/recapture technique. If there’s a high degree of overlap you can be confident there’s only a small hidden population of hazards as yet unidentified. If there’s a very low overlap, you may have a problem.


The 15 commandments of the god  of the machine

Herewith, are the 15 commandments for thine safety critical software as spoken by the machine god unto his prophet Kopetz.

  1. Thou shalt regard the system safety case as thy tabernacle of safety and derive thine critical software failure modes and requirements from it.
  2. Thou shalt adopt a fundamentally safe architecture and define thy fault tolerance hypothesis as part of this. Even unto the definition of fault containment regions, their modes of failure and likelihood.
  3. Thine fault tolerance shall include start-up operating and shutdown states
  4. Thine system shall be partitioned to ‘divide and conquer’ the design. Yea such partitioning shall include the precise specification of component interfaces by time and value such that  all manner of men shall comprehend them
  5. Thine project team shall develop a consistent model of time and state for even unto the concept of states and fault recovery by voting is the definition of time important.
  6. Yea even though thou hast selected a safety architecture pleasing to the lord, yet it is but a house built upon the sand, if no ‘programming in the small’ error detection and fault recovery is provided.
  7. Thou shall ensure that errors are contained and do not propagate through the system for a error idly propagated  to a service interface is displeasing to the lord god of safety and invalidates your righteous claims of independence.
  8. Thou shall ensure independent channels and components do not have common mode failures for it is said that homogenous redundant channels protect only from random hardware failures  neither from the common external cause such as EMI or power loss, nor from the common software design fault.
  9. Thine voting software shall follow the self-confidence principle for it is said that if the self-confidence principle is observed then a correct FCR will always make the correct decision under the assumption of a single faulty FCR, and only a faulty FCR will make false decisions.
  10. Thou shall hide and separate thy fault-tolerance mechanisms so that they do not introduce fear, doubt and further design errors unto the developers of the application code.
  11. Thou shall design your system for diagnosis for it is said that even a righteously designed fault tolerant system my hide such faults from view whereas thy systems maintainers must replace the affected LRU.
  12. Thine interfaces shall be helpful and forgive the operator his errors neither shall thine system dump the problem in the operators lap without prior warning of impending doom.
  13. Thine software shall record every single anomaly for your lord god requires that every anomaly observed during operation must be investigated until a root cause is defined
  14. Though shall mitigate further hazards introduced by your design decisions for better it is that you not program in C++ yet still is it righteous to prevent the dangling of thine pointers and memory leaks
  15. Though shall develop a consistent fault recovery strategy such that even in the face of violations of your fault hypothesis thine system shall restart and never give up.

MH370 underwater search area map (Image source- Australian Govt)

After millions of dollars and years of effort the ATSB has suspended it’s search for the wreck of MH370. There’s some bureaucratic weasel words, but we are done people. Of course had the ATSB applied Bayesian search techniques, as the USN did in the successful search for it’s missing  USS Scorpion, we might actually know where it is.

M1 Risk_Spectrum_redux

A short article on (you guessed it) risk, uncertainty and unpleasant surprises for the 25th Anniversary issue of the UK SCS Club’s Newsletter, in which I introduce a unified theory of risk management that brings together aleatory, epistemic and ontological risk management and formalises the Rumsfeld four quadrant risk model which I’ve used for a while as a teaching aid.

My thanks once again to Felix Redmill for the opportunity to contribute.  :)

Joshua Brown screen grab

Keep your eyes on the road, and your hands upon the wheel…

The first fatality involving the use of Tesla’s autopilot* occurred last May. The Guardian reported that the autopilot sensors on the Model S failed to distinguish a white tractor-trailer crossing the highway against a bright sky and promptly tried to drive under the trailer, with decapitating results. What’s emerged is that the driver had a history of driving at speed and also of using the automation beyond the maker’s intent, e.g. operating the vehicle hands off rather than hands on, as the screen grab above indicates. Indeed recent reports indicate that immediately prior to the accident he was travelling fast (maybe too fast) whilst watching a Harry Potter DVD. There also appears to be a community of like minded spirits out there who are intent on seeing how far they can push the automation… sigh.  Continue Reading…

System Safety Fundamentals Concept Cloud

Have just updated the safety case module for the system safety course I teach at UNSW. Have revised it to include John Rushby’s approach to determining the soundness and strength of a safety argument (I like his simplification and separation of concerns strategy). Enjoy!


Why writing a safety case might (actually) be a good idea

Frequent readers of my blog would probably realise that I’m a little sceptical of safety cases, as Scrooge remarked to Morely’s ghost, “There’s more of gravy than of grave about you, whatever you are!” So to for safety cases, oft more gravy than gravitas about them in my opinion, regardless of what their proponents might think.

Continue Reading…