Archives For Uncategorized

Inertia has a thousand fathers…

Matthew Squair

Challenger_flight_51-l_crew

Vale Challenger

The anniversary of the loss of Challenger passed by on thursday. In memorium, I’ve updated my post that deals with the failure of communication that I think lies at the heart of that disaster.

The first principle is that you must not fool yourself and you are the easiest person to fool.

Richard P. Feynman

Here’s a copy of the presentation that I gave at ASSC 2015 on how to use MIL-STD-882C to demonstrate compliance to the WHS Act 2011. The Model Australian Workplace Health and Safety (WHS) Act places new and quite onerous requirements upon manufacturer, suppliers and end users organisations. These new requirements include the requirement to demonstrate due diligence in the discharge of individual and corporate responsibilities. Traditionally contracts have steered clear of invoking Workplace Health and Safety (WHS) legislation in anything other than a most abstract form, unfortunately such traditional approaches provide little evidence with which to demonstrate compliance with the WHS act.

The presentation describes an approach to establishing compliance with the WHS Act (2011) using the combination of a contracted MIL-STD-882C system safety program and a compliance finding methodology. The advantages and effectiveness of this approach in terms of establishing compliance with the act and the effective discharge the responsibilities of both supplier and acquirer are illustrated using a case study of a major aircraft modification program. Limitations of the approach are then discussed given the significant difference between the decision making criteria of classic systems safety and the so far as is reasonably practicable principle.

More woes for OPM, and pause for thought for the proponents of centralized government data stores. If you build it they will come…and steal it.

ASSC 2015 Brisbane

29/05/2015 — 3 Comments

  
Just attended the Australian System Safety Conference, the venue was the Customs House right on River. Lots of excellent speakers and interesting papers, I enjoyed Drew Rae’s on tribalism in system safety particularly.  The keynotes on resilience by John Bergstrom and cyber-security by Chris Johnson were also very good. I gave a presentation on the use of MIL-STD-882 as a tool for demonstrating compliance to the WHS Act, a subject that only a mother could love. Favourite moment? Watching the attendees faces when I told them that 61508 didn’t comply with the law.:)

Thanks again to Kate Thomson and John Davies for reviewing the legal aspects of my paper. Much appreciated guys.

Just added a short case study on the Patriot software timing error to the software safety micro course page. Peter Ladkin has also subjected the accident to a Why Because Analysis.

What the?

14/02/2015 — Leave a comment

In case you’re wondering what’s going on dear reader, human factors can be a bit dry, and the occasional poster style blog posts you may have noted is my attempt to hydrate the subject a little. The continuing series can be found on the page imaginatively titled Human error in pictures, and who knows someone may find it useful…

An interesting little exposition of the current state of the practice in information risk management using the metaphor of the bald tire on the FAIR wiki. The authors observe that there’s much more shamanistic ritual (dressed up as ‘best practice’) than we’d like to think in risk assessment. A statement that I endorse, actually I think it’s mummery for the most part, but ehem, don’t tell the kids.

Their two fold point. First that while experience and intuition are vital, on their own they give little grip to critical examination. Second that if you want to manage you must measure, and to measure you need to define.

A disclaimer, I’m neither familiar with or a proponent of the FAIR tool, and I strongly doubt as to whether we can ever put risk management onto a truly scientific footing, much like engineering there’s more art than artifice, but it’s an interesting commentary nonetheless.

I give it 011 out 101 tooled up script kiddies.

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here's an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 32,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.

Yep that’s right, due to popular demand I’m running ZEIT 8236 System Safety as an Intensive Delivery mode course in the second session at ADFA from the 13th to 17th of July 2015. If you want a flavour, here’s the introductory module. Remember, I love this stuff.:)

EECON 2014

07/11/2014 — Leave a comment

So I’ve been invited to to give a talk on risk at the conference dinner. Should be interesting.

An interesting article in Forbes on human error in a very unforgiving environment, i.e. treating ebola patients, and an excellent use of basic statistics to prove that cumulative risk tends to do just that, accumulate. As the number of patients being treated in the west is pretty low at the moment it also gives a good indication of just how infectious Ebola is. One might also infer that the western medical establishment is not quite so smart as it thought it was, at least when it comes to treating the big E safely.

Of course the moment of international zen in the whole story had to be the comment by the head of the CDC Dr Friedan, that and I quote “clearly there was a breach in protocol”, a perfect example of affirming the consequent. As James Reason pointed out years ago there are two ways of dealing with human error, so I guess we know where the head of the CDC stands on that question.:)

If you were wondering why the Outliers post was, ehem, a little rough I accidentally posted an initial draft rather than the final version. I’ve now released the right one.

20140629-132953-48593553.jpg

On Artificial Intelligence as Ethical Prosthesis

Out here in the grim meat-hook present of Reaper missions and Predator drone strikes we’re already well down track to a future in which decisions as to who lives and who dies are made less and less by human beings, and more and more by automation.

Continue Reading…

Just added a modified version of the venerable subjective 882 hazard risk matrix to my useful stuff page in which I fix a few issues that have bugged me about that particular tool, see Risk and the Matrix for a fuller discussion of the problems with risk matrices.

For those of you with a strong interest in such I’ve translated the matrix into cartesian coordinates, revised the risk zone and definitions to make the matrix ‘De Moivre theorem’ compliant (and a touch more conservative), added the AIAA’s combinatorial probability thresholds, introduced a calibration point and added the ALARP principal.

Who knows maybe the US DoD will pick it up…but probably not.:)

MIL-STD-882 Hazard Risk Matrix (Modified).

 

I’ve put the original Def Stan 00-55 (both parts) onto my resources page for those who are interested in doing a compare and contrast between the old, and the new (whenever it’s RFC is released). I’ll be interested to see whether the standards reluctance to buy into the whole safety by following a process argument is maintained in the next iteration. The problem of arguing from fault density to safety that they allude to also remains, I believe, insurmountable.

The justification of how the SRS development process is expected to deliver SRS of the required safety integrity level, mainly on the basis of the performance of the process on previous projects, is covered in 7.4 and annex E. However, in general the process used is a very weak predictor of the safety integrity level attained in a particular case, because of the variability from project to project. Instrumentation of the process to obtain repeatable data is difficult and enormously expensive, and capturing the important human factors aspects is still an active research area. Furthermore, even very high quality processes only predict the fault density of the software, and the problem of predicting safety integrity from fault density is insurmountable at the time of writing (unless it is possible to argue for zero faults).

Def Stan 00-55 Issue 2 Part 2 Cl. 7.3.1

Just as an aside, the original release of Def Stan 00-56 is also worth a look as it contains the methodology for the assignment of safety integrity levels. Basically for a single function or N>1 non-independent functions the SIL assigned to the function(s) is derived from the worst credible accident severity (much like DO-178). In the case of N>1 independent functions, one of these functions gets a SIL based on severity but the remainder have a SIL rating apportioned to them based on risk criteria. From which you can infer that the authors, just like the aviation community were rather mistrustful of using estimates of probability in assuring a first line of defence.:)

Preamble

The following is a critique of a teleconference conducted on the 16 March  between the UK embassy in Japan and the UK Governments senior scientific advisor and members of SAGE, a UK government crisis panel formed in the aftermath of the Japanese tsunami to advise on the Fukushima crisis. These comments pertain specifically to the 16 March (UK time) teleconference with the British embassy and the minutes of SAGE meetings on the 15th and 16th that preceded that teleconference. Continue Reading…

I’ve just reread Peter Ladkin’s 2008 dissection of the conceptual problems of IEC 61508 here, and having just worked through a recent project in which 61508 SILs were applied, I tend to agree that SIL is still a bad idea, done badly… I’d also add that, the HSE’s opinion notwithstanding, I don’t actually see that the a priori application of a risk derived SIL level to a specific software development acquits ones ‘so far as is reasonably practicable’ duty of care. Of course if your regulator says it does, why then smile bravely and complement him on the wonderful cut of his new clothes. On the other hand if you’re design the safety system for a nuclear plant maybe have a look at how the aviation industry do business with their Design Assurance Levels. :)

post_title

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 24,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 9 sold-out performances for that many people to see it.

Click here to see the complete report.

And I’ve just updated the philosophical principles for acquiring safety critical systems. All suggestions welcome…

Enjoy:)

Colossus the forbin project (Image source: Movie still)

Risk as uncontrollability…

The venerable safety standard MIL-STD-882 introduced the concept of software hazard and risk in Revision C of that standard. Rather than using the classical definition of risk as combination of severity and likelihood the authors struck off down quite a different, and interesting, path.

Continue Reading…

In an earlier post I had a look at the role played by design authorities in an organisation, which can have a major affect upon both safety and project success. My focus in that post was on the authority aspect.

However another perspective on the role that a design authority performs is that of someone who is able to understand both the operational requirements for a system (e.g. those that define a need) as well as the technical (those that define a solution) and most importantly be able to translate between them.

This is a role that is well understood in architecture, but one that has seemed to diminish and dwindle in engineering where projects of any complexity are more often undertaken by large bureaucratic organisations, which also traditionally fear assigning responsibility to one person.

20130405-110510.jpg
Provided as part of the QR show bag for the CORE 2012 conference. The irony of a detachable cab being completely unintentional…

20130223-170419.jpg

For somebody.:)

787 Lithium Battery (Image Source: JTSB)

But, we tested it? Didn’t we?

Earlier reports of the Boeing 787 lithium battery initial development indicated that Boeing engineers had conducted tests to confirm that a single cell failure would not lead to a cascading thermal runaway amongst the remaining batteries. According to these reports their tests were successful, so what went wrong?

Continue Reading…

Over on the RVS Bielefield site Peter Ladkin has just put up a white paper  entitled 61508 Weaknesses and Anomalies which looks at the problems with the current version of the IEC 61508 functional safety standard, part 6 of which sits on my desk even as we speak. Comments are welcome.

For my own contributions to the commentary on IEC 61508 see Buncefield the alternate view , Component SIL rating memes and SILs and Safety Myths.

Dr Nancy Leveson will be teaching a week-long class on system safety this year at the Talaris Conference Center in Seattle from July 15-19.

Her focus will be on the new techniques and approaches described in her latest book Engineering a Safer World. Should be excellent.

See the class announcement for further details.

The WordPress.com stats prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 29,000 views in 2012. If each view were a film, this blog would power 7 Film Festivals

Click here to see the complete report.

A Working Definition and a Key Question…

One of the truisms of systems safety theory is that safety is an emergent attribute of the design. But looking at this in reverse it also implies that system accidents are emergent, if unintended, attributes of a system. So when we talk about system accidents and hazards we’re really (if we accept the base definition) talking about an emergent attribute of a system.

But what are emergent attributes? Now that is a very interesting philosophical question, and answering it might help clarify what exactly is a system accident…

Just put the final conference ready draft of my Writing Specs for Fun and Profit up, enjoy!

As readers may have noted, I’ve changed the name of my blog to Critical Uncertainties which I think better represents what it’s all about, well at least that’s my intent.:-)

I’m still debating whether to go the whole hog and register a domain name, it has it’s advantages and disadvantages. But if I do don’t worry, my host WordPress.com will make sure that you don’t get lost on the way here.

P.S I did think of a few alternate names. Some better than others…

20120827-213039.jpg

Neil Armstrong died yesterday at the age of 82, and rather than celebrating his achievements as an astronaut, marvelous though they are, I’d like to pay tribute here to his work as an engineer and test pilot.

Before Apollo Neil Armstrong was a test pilot for NACA flying the X15 rocket plane, and during his test piloting he came up with what they ended up calling the Armstrong spiral. The manoeuvre was a descending glide spiral that tightened the turn radius as the glide speed reduced. Armstrong’s manouevre was so widely regarded that it was later adopted by the Space Shuttle program.

Fast forward to 4 November 2010 and Richard De Crespigny the Captain of QF 32 after experiencing a catastrophic engine failure and faced with the potential for a glide back to Changi remembers and uses the Armstrong approach in his plan for an engine out approach.

So misquote Shakespeare, sometimes the good that men do is not interred with them.

How driver training problems for the M113 Armoured Personnel Carrier provide and insight into the ecology of interface design.

Continue Reading...
Tweedle Dum and Dee (Image source: Wikipedia Commons)
How do ya do and shake hands, shake hands, shake hands. How do ya do and shake hands and state your name and business…

Lewis Carrol, Through the Looking Glass

You would have thought after the Leveson and Knight experiments that the  theory that independently written software would only contain independent faults was dead and buried, another beautiful theory shot down by hard cold fact.  But unfortunately like many great errors the theory of n-versioning keeps on keeping on (1).
Continue Reading…