Archives For Uncategorized

Imperial College London just  updated their report on interventions (other than pharmacological) to reduce death rates and prevent the health care system being overwhelmed. The news is not good.

They first modelled traditional mitigation strategies that seek to slow but not stop the spread, e.g. flatten the curve, for Great Britain and the United States. For an unmitigated epidemic they found you’d end up with 510,000 deaths in Great Britain, not accounting for health systems being overwhelmed on mortality. Even with a full optimised set of mitigations in place they found that this would only reduce peak critical care demand by two-thirds and halve the number of deaths. Yet this ‘optimal’ scenario would also still result in an 8-fold higher peak demand on critical care beds over available capacity.

Their conclusion? That epidemic suppression is the only viable strategy at the current time. This has profound implications for Australia which still appears to be on a mitigation path. First, even if we do our very best there will be a reduction of at best 50% in the death rate. This translates to on the order of at least 100,000 deaths. To put that in context that’s more than Australia lost in two world wars. The associated number of sick patients would undoubtedly also overwhelm our critical care system.

The only viable alternative Imperial College identified was to act to suppress the epidemic, e.g. to reduce R (the reproductive number) to close to 1 or below. To do so would need a combination of strict case isolation, population level social distancing, household quarantines and/or school and university closure. This suppression would need to be in place for at least five months. Having supressed it a combination of rigorous case isolation and contact tracing would (hopefully) then be able deal with subsequent outbreaks.

However Australia is not doing this, the Prime Minister has made this very plain, he’s not going to, ‘turn Australia off and then back on again’. We also seem to be underestimating the numbers (see this Guardian article on NSW Health’s estimates). So in the absence of the State Governments breaking ranks we are now on an express train ride (see chart below) to a national disaster of epic proportions. Jesus.

NSWCOVID19

 

Farewell

22/02/2019

This will be the last post on this website, so if you want to grab some of the media available under useful stuff feel free.

Facebook and Google back Labor changes to laws which break encryption

A debate on tools, assurance and ethics

If you want to know where Crew Resource Management as a discipline started, then you need to read NASA Technical Memorandum 78482 or “A Simulator Study of the Interaction of Pilot Workload With Errors, Vigilance, and Decisions” by H.P. Ruffel Smith, the British borne physician and pilot. Before this study it was hours in the seat and line seniority that mattered when things went to hell. After it the aviation industry started to realise that crews rose or fell on the basis of how well they worked together, and that a good captain got the best out of his team. Today whether crews get it right, as they did on QF72, or terribly wrong, as they did on AF447, the lens that we view their performance through has been irrevocably shaped by the work of Russel Smith. From little seeds great oaks grow indeed.

Update to the MH-370 hidden lesson post just published, in which I go into a little more detail on what I think could be done to prevent another such tragedy.

Talking to one another not intuitive for engineers…

Bath Iron Works Corporation Report 1995

One of the things that they don’t teach you at University is that as an engineer you will never have enough time. There’s never the time in the schedule to execute that perfect design process in your head which will answer all questions, satisfy all stakeholders and optimises your solution to three decimal places. Worse yet you’re going to be asked to commit to parts of your solution in detail before you’ve finished the overall design because for example, ‘we need to order the steel bill now because there’s a 6 month lead time, so where’s the steel bill?’. Then there’s dealing with the ‘internal stakeholders’ in the design process, who all have competing needs and agendas. You generally end up with the electrical team hating the mechanicals, nobody talking to structure and everybody hating manufacturing (1).

So good engineering managers, (2) spend a lot of their time managing the risk of early design commitment and the inherent concurrency of the program, disentangling design snarls and adjudicating turf wars over scarce resources (3). Get it right and you’re only late by the usual amount, get it wrong and very bad things can happen. Strangely you’ll not find a lot of guidance in traditional engineering education on these issues, but for my part what I’ve found to be helpful is a pragmatic design process that actually supports you in doing the tough stuff (4). Oh and being able to manage outsourcing of design would also be great (5). This all gets even more difficult when you’re trying to do vehicle design, which I liken to trying to stick 5 litres of stuff into a 4 litre container. So at the link below is my architecting approach to managing at least part of the insanity. I doubt that there’ll ever be a perfect answer to this, far too too many constraints, competing agenda’s and just plain cussedness of human beings. But if your last project was anarchy dialled up-to eleven it might be worth considering some of these possible approaches. Hope it helps, and good luck!

Notes

1. It is a truth universally acknowledged that engineers are notoriously bad at communicating.

2. We don’t talk about the bad.

3. Such as, cable and piping routes, whose sensors goes at the top of mast, mass budgets, and power constraints. I’m sure we’ve all been there.

4. My observation is that (some) engineers tend to design their processes to be perfect and conveniently ignore the ugly messiness of the world, because they are uncomfortable with  being accountable for decisions made under uncertainty. Of course when you can’t both follow these processes and get the job done these same engineers will use this as a shield from all blame, e.g. ‘If you’d only followed our process..’ say they, `sure…’ say I.

5. A traditional management ploy to reduce costs, but rarely does management consider that you then need to manage that outsourced effort which takes particular mix of skills. Yet another kettle of dead fish as Boeing found out on the B787.

Reference

1.  A Vehicle design process v1.0

So here’s a question for the safety engineers at Airbus. Why display unreliable airspeed data if it truly is that unreliable?

In slightly longer form. If (for example) air data is so unreliable that your automation needs to automatically drop out of it’s primary mode, and your QRH procedure is then to manually fly pitch and thrust (1) then why not also automatically present a display page that only provides the data that pilots can trust and is needed to execute the QRH procedure (2)? Not doing so smacks of ‘awkward automation’ where the engineers automate the easy tasks but leave the hard tasks to the human, usually with comments in the flight manual to the effect that, “as it’s way too difficult to cover all failure scenarios in the software it’s over to you brave aviator” (3). This response is however something of a cop out as what is needed is not a canned response to such events but rather a flexible decision and situational awareness (SA) toolset that can assist the aircrew in responding to unprecedented events (see for example both QF72 and AF447) that inherently demand sense-making as a precursor to decision making (4). Some suggestions follow:

  1. Redesign the attitude display with articulated pitch ladders, or a Malcom’s horizon to improve situational awareness.
  2. Provide a fallback AoA source using an AoA estimator.
  3. Provide actual direct access to flight data parameters such as mach number and AoA to support troubleshooting (5).
  4. Provide an ability to ‘turn off’ coupling within calculated air data to allow rougher but more robust processing to continue.
  5. Use non-aristotlean logic to better model the trustworthiness of air data.
  6. Provide the current master/slave hierarchy status amongst voting channels to aircrew.
  7. Provide an obvious and intuitive way to  to remove a faulted channel allowing flight under reversionary laws (7).
  8. Inform aircrew as to the specific protection mode activation and the reasons (i.e. flight data) triggering that activation (8).

As aviation systems get deeper and more complex this need to support aircrew in such events will not diminish, in fact it is likely to increase if the past history of automation is any guide to the future.

Notes

1. The BEA report on the AF447 disaster surveyed Airbus pilots for their response to unreliable airspeed and found that in most cases aircrew, rather sensibly, put their hands in their laps as the aircraft was already in a safe state and waited for the icing induced condition to clear.

2. Although the Airbus Back Up Speed Display (BUSS) does use angle-of-attack data to provide a speed range and GPS height data to replace barometric altitude it has problems at high altitude where mach number rather than speed becomes significant and the stall threshold changes with mach number (which it doesn’t not know). As a result it’s use is 9as per Airbus manuals) below 250 FL.

3. What system designers do, in the abstract, is decompose and allocate system level behaviors to system components. Of course once you do that you then need to ensure that the component can do the job, and has the necessary support. Except ‘apparently’ if the component in question is a human and therefore considered to be outside’ your system.

4. Another way of looking at the problem is that the automation is the other crew member in the cockpit. Such tools allow the human and automation to ‘discuss’ the emerging situation in a meaningful (and low bandwidth) way so as to develop a shared understanding of the situation (6).

5. For example in the Airbus design although AoA and Mach number are calculated by the ADR and transmitted to the PRIM fourteen times a second they are not directly available to aircrew.

6. Yet another way of looking at the problem is that the principles of ecological design needs to be applied to the aircrew task of dealing with contingency situations.

7. For example in the Airbus design the current procedure is to reach up above the Captain’s side of the overhead instrument panel, and deselect two ADRs…which ones and the criterion to choose which ones are not however detailed by the manufacturer.

8. As the QF72 accident showed, where erroneous flight data triggers a protection law it is important to indicate what the flight protection laws are responding to.

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a explanation based on a defined set of phenomena, that argues that if they are present, and given there exists some relevant domain source (1) of hazard an accident will occur. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom, i.e. if we have both a source and a set of mechanisms in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s and it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (2). The one advantage that covering models do possess over other explanatory models however is that they largely avoid the problems of causal arguments. Which may well be why they persist in engineering arguments about safety.

Notes

1. The source in this instance is the ‘covering law’.

2. Such as counterfactual, statistical relevance or causal explanations.

References

Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

Here’s a working draft of the introduction and first chapter of my book…. Enjoy 🙂

We are hectored almost daily basis on the imminent threat of islamic extremism and how we must respond firmly to this real and present danger. Indeed we have proceeded far enough along the escalation of response ladder that this, presumably existential threat, is now being used to justify talk of internment without trial. So what is the probability that if you were murdered, the murderer would be an immigrant terrorist?

In NSW in 2014 there were 86 homicides, of these 1 was directly related to the act of a homegrown islamist terrorist (1). So there’s a 1 in 86 chance that in that year if you were murdered it was at the hands of a mentally disturbed asylum seeker (2). Hmm sounds risky, but is it? Well there was approximately 2.5 million people in NSW in 2014 so the likelihood of being murdered (in that year) is in the first instance 3.44e-5. To figure out what the likelihood of being murdered and that murder being committed by a terrorist  we just multiply this base rate by the probability that it was at the hands of a `terrorist’, ending up with 4e-7 or 4 chances in 10 million that year. If we consider subsequent and prior years where nothing happened that likelihood becomes even smaller.

Based on this 4 in 10 million chance the NSW government intends to build a super-max 2 prison in NSW, and fill it with ‘terrorists’ while the Federal government enacts more anti-terrorism laws that take us down the road to the surveillance state, if we’re not already there yet. The glaring difference between the perception of risk and the actuality is one that politicians and commentators alike seem oblivious to (3).

Notes

1. One death during the Lindt chocolate siege that could be directly attributed to the `terrorist’.

2. Sought and granted in 2001 by the then Liberal National Party government.

3. An action that also ignores the role of prisons in converting inmates to Islam as a route to recruiting their criminal, anti-social and violent sub-populations in the service of Sunni extremists.

The Sydney Morning Herald published an article this morning that recounts the QF72 midair accident from the point of view of the crew and passengers, you can find the story at this link. I’ve previously covered the technical aspects of the accident here, the underlying integrative architecture program that brought us to this point here and the consequences here. So it was interesting to reflect on the event from the human perspective. Karl Weick points out in his influential paper on the Mann Gulch fire disaster that small organisations, for example the crew of an airliner, are vulnerable to what he termed a cosmology episode, that is an abruptly one feels deeply that the universe is no longer a rational, orderly system. In the case of QF72 this was initiated by the simultaneous stall and overspeed warnings, followed by the abrupt pitch over of the aircraft as the flight protection laws engaged for no reason.

Weick further posits that what makes such an episode so shattering is that both the sense of what is occurring and the means to rebuild that sense collapse together. In the Mann Gulch blaze the fire team’s organisation attenuated and finally broke down as the situation eroded until at the end they could not comprehend the one action that would have saved their lives, to build an escape fire. In the case of air crew they implicitly rely on the aircraft’s systems to `make sense’ of the situation, a significant failure such as occurred on QF72 denies them both understanding of what is happening and the ability to rebuild that understanding. Weick also noted that in such crises organisations are important as they help people to provide order and meaning in ill defined and uncertain circumstances, which has interesting implications when we look at the automation in the cockpit as another member of the team.

“The plane is not communicating with me. It’s in meltdown. The systems are all vying for attention but they are not telling me anything…It’s high-risk and I don’t know what’s going to happen.”

Capt. Kevin Sullivan (QF72 flight)

From this Weickian viewpoint we see the aircraft’s automation as both part of the situation `what is happening?’ and as a member of the crew, `why is it doing that, can I trust it?’ Thus the crew of QF72 were faced with both a vu jàdé moment and the allied disintegration of the human-machine partnership that could help them make sense of the situation. The challenge that the QF72 crew faced was not to form a decision based on clear data and well rehearsed procedures from the flight manual, but instead they faced much more unnerving loss of meaning as the situation outstripped their past experience.

“Damn-it! We’re going to crash. It can’t be true! (copilot #1)

“But, what’s happening? copilot #2)

AF447 CVR transcript (final words)

Nor was this an isolated incident, one study of other such `unreliable airspeed’ events, found errors in understanding were both far more likely to occur than other error types and when they did much more likely to end in a fatal accident.  In fact they found that all accidents with a fatal outcome were categorised as involving an error in detection or understanding with the majority being errors of understanding. From Weick’s perspective then the collapse of sensemaking is the knock out blow in such scenarios, as the last words of the Air France AF447 crew so grimly illustrate. Luckily in the case of QF72 the aircrew were able to contain this collapse, and rebuild their sense of the situation, in the case of other such failures, such as AF447, they were not.

 

For those of you who might be wondering at the lack of recent posts I’m a little pre-occupied at the moment as I’m writing a book. Hope to have a first draft ready in July. ; )

A recent workplace health and safety case in Australia has emphasised that an employer does not have to provide training for tasks that are considered to be ‘relatively’ straight forward. The presiding judge also found that while changes to the workplace  could in theory be made, in practice it would be unreasonable to demand that the employer make such changes. The judge’s decision was subsequently upheld on appeal.

What’s interesting is the close reasoning of the court (and the appellate court) to establish what is reasonable and practicable in the circumstances. While the legal system is not perfect it does have a long standing set of practices and procedures for getting at the truth. Perhaps we may be able to learn something from the legal profession when thinking about the safety of critical systems. More on this later.

Cowie v Gungahlin Veterinary Services Pty Ltd [2016] ACTSC 311 (25 October 2016)

Second part of the SBS documentary on line now. Looking at the IoT this episode. 

To err is inhuman

10/11/2016

Screwtape(Image source: end time info)

More infernal statistics

Well, here we are again. Given recent developments in the infernal region it seems like a good time for another post. Have you ever, dear reader, been faced with the problem of how to achieve an unachievable safety target? Well worry no longer! Herewith is Screwtape’s patented man based mitigation medicine.

The first thing we do is introduce the concept of ‘mitigation’, ah what a beautiful word that is. You see it’s saying that it’s OK that your system doesn’t meet its safety target, because you can claim credit for the action of an external mitigator in the environment. Probability wise if the probability of an accident is P_a then P_a equals the product of your systems failure probability P_s and. the probability that some external mitigation also fails P_m or P_a = P_s X P_m. 

So let’s use operator intervention as our mitigator, lovely and vague. But how to come up with a low enough P_m? Easy, we just look at the accident rate that has occurred for this or a like system and assume that these were due to operator mitigation being unsuccessful. Voila, we get our really small numbers. 

Now, an alert reader might point out that this is totally bogus and that P_m is actually the likelihood of operator failure when the system fails. Operators failing, as those pestilential authors of the WASH1400 study have pointed out, is actually quite likely. But I say, if your customer is so observant and on the ball then clearly you are not doing your job right. Try harder or I may eat your soul, yum yum. 

Yours hungrily, 

Screwtape.

One of the great mistakes is to judge policies and programs by their intentions rather than their results

Milton Friedman

About time I hear you say! 🙂

Yes I’ve just rewritten a post on functional failure taxonomies to include how to use them to gauge the completeness of your analysis. This came out of a question I was asked in a workshop that went something like, ‘Ok mr big-shot consultant tell us, exactly how do we validate that our analysis is complete?’. That’s actually a fair question, standards like EUROCONTROL’s SAM Handbook and ARP 4761 tell you you ought to, but are not that helpful in the how to do it department. Hence this post.

Using a taxonomy to determine the coverage of the analysis is one approach to determining completeness. The other is to perform at least two analyses using different techniques and then compare the overlap of hazards using a capture/recapture technique. If there’s a high degree of overlap you can be confident there’s only a small hidden population of hazards as yet unidentified. If there’s a very low overlap, you may have a problem.

Apparently 2 million Australians trying to use the one ABS website because they’re convinced the government will fine them if they don’t is the freshly minted definition of “Distributed Denial of Service (DDOS)” attack. 🙂

Alternative theory, who would have thought that foreign nationals (oh all right, we all know it’s the Chinese*) might try to disrupt the census in revenge for those drug cheat comments at the Olympics?

Interesting times. I hope the AEC is taking notes for their next go at electronic voting

#censusfail

#Censusfail

09/08/2016

Currently enjoying watching the ABS Census website burn to the ground. Ah schadenfreude, how sweet you are. 

Census time again, and those practical jokers at the Australian Bureau of Statistics have managed to spring a beauty on the Australian public. The  joke being that, rather than collecting the data anonymously you are now required to fill in your name and address which the ABS will retain (1). This is a bad idea, in fact it’s a very bad idea, not quite as bad as say getting stuck in a never ending land war in the Middle East, but certainly much worse than experiments in online voting. Continue Reading…

Inertia has a thousand fathers…

Matthew Squair

Challenger_flight_51-l_crew

Vale Challenger

The anniversary of the loss of Challenger passed by on thursday. In memorium, I’ve updated my post that deals with the failure of communication that I think lies at the heart of that disaster.

The first principle is that you must not fool yourself and you are the easiest person to fool.

Richard P. Feynman

WHS and MIL-STD-882

27/09/2015

Here’s a copy of the presentation that I gave at ASSC 2015 on how to use MIL-STD-882C to demonstrate compliance to the WHS Act 2011. The Model Australian Workplace Health and Safety (WHS) Act places new and quite onerous requirements upon manufacturer, suppliers and end users organisations. These new requirements include the requirement to demonstrate due diligence in the discharge of individual and corporate responsibilities. Traditionally contracts have steered clear of invoking Workplace Health and Safety (WHS) legislation in anything other than a most abstract form, unfortunately such traditional approaches provide little evidence with which to demonstrate compliance with the WHS act.

The presentation describes an approach to establishing compliance with the WHS Act (2011) using the combination of a contracted MIL-STD-882C system safety program and a compliance finding methodology. The advantages and effectiveness of this approach in terms of establishing compliance with the act and the effective discharge the responsibilities of both supplier and acquirer are illustrated using a case study of a major aircraft modification program. Limitations of the approach are then discussed given the significant difference between the decision making criteria of classic systems safety and the so far as is reasonably practicable principle.

More woes for OPM, and pause for thought for the proponents of centralized government data stores. If you build it they will come…and steal it.

ASSC 2015 Brisbane

29/05/2015

  
Just attended the Australian System Safety Conference, the venue was the Customs House right on River. Lots of excellent speakers and interesting papers, I enjoyed Drew Rae’s on tribalism in system safety particularly.  The keynotes on resilience by John Bergstrom and cyber-security by Chris Johnson were also very good. I gave a presentation on the use of MIL-STD-882 as a tool for demonstrating compliance to the WHS Act, a subject that only a mother could love. Favourite moment? Watching the attendees faces when I told them that 61508 didn’t comply with the law. 🙂

Thanks again to Kate Thomson and John Davies for reviewing the legal aspects of my paper. Much appreciated guys.

Just added a short case study on the Patriot software timing error to the software safety micro course page. Peter Ladkin has also subjected the accident to a Why Because Analysis.

What the?

14/02/2015

In case you’re wondering what’s going on dear reader, human factors can be a bit dry, and the occasional poster style blog posts you may have noted is my attempt to hydrate the subject a little. The continuing series can be found on the page imaginatively titled Human error in pictures, and who knows someone may find it useful…

An interesting little exposition of the current state of the practice in information risk management using the metaphor of the bald tire on the FAIR wiki. The authors observe that there’s much more shamanistic ritual (dressed up as ‘best practice’) than we’d like to think in risk assessment. A statement that I endorse, actually I think it’s mummery for the most part, but ehem, don’t tell the kids.

Their two fold point. First that while experience and intuition are vital, on their own they give little grip to critical examination. Second that if you want to manage you must measure, and to measure you need to define.

A disclaimer, I’m neither familiar with or a proponent of the FAIR tool, and I strongly doubt as to whether we can ever put risk management onto a truly scientific footing, much like engineering there’s more art than artifice, but it’s an interesting commentary nonetheless.

I give it 011 out 101 tooled up script kiddies.

2014 in review

22/01/2015

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here's an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 32,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.

Yep that’s right, due to popular demand I’m running ZEIT 8236 System Safety as an Intensive Delivery mode course in the second session at ADFA from the 13th to 17th of July 2015. If you want a flavour, here’s the introductory module. Remember, I love this stuff. 🙂

EECON 2014

07/11/2014

So I’ve been invited to to give a talk on risk at the conference dinner. Should be interesting.

An interesting article in Forbes on human error in a very unforgiving environment, i.e. treating ebola patients, and an excellent use of basic statistics to prove that cumulative risk tends to do just that, accumulate. As the number of patients being treated in the west is pretty low at the moment it also gives a good indication of just how infectious Ebola is. One might also infer that the western medical establishment is not quite so smart as it thought it was, at least when it comes to treating the big E safely.

Of course the moment of international zen in the whole story had to be the comment by the head of the CDC Dr Friedan, that and I quote “clearly there was a breach in protocol”, a perfect example of affirming the consequent. As James Reason pointed out years ago there are two ways of dealing with human error, so I guess we know where the head of the CDC stands on that question. 🙂

If you were wondering why the Outliers post was, ehem, a little rough I accidentally posted an initial draft rather than the final version. I’ve now released the right one.

 

On Artificial Intelligence as ethical prosthesis

Out here in the grim meat-hook present of Reaper missions and Predator drone strikes we’re already well down track to a future in which decisions as to who lives and who dies are made less and less by human beings, and more and more by automation. Although there’s been a lot of ‘sexy’ discussion recently of the possibility of purely AI decision making, the current panic misses the real issue d’jour, that is the question of how well current day hybrid human-automation systems make such decisions, and the potential for the incremental abrogation of moral authority by the human part of this cybernetic system as the automation in this synthesis becomes progressively more sophisticated and suasive.

As Dijkstra pointed out in the context of programming, one of the problems or biases humans have in thinking about automation is that because it ‘does stuff’, we find the need to imbue it with agency, and from there it’s a short step to treating the automation as a partner in decision making. From this very human misunderstanding it’s almost inevitable that the the decision maker holding such a view will feel that the responsibility for decisions are shared, and responsibility diluted, thereby opening up potential for choice shift in decision making. As the degree of sophistication of such automation increases of course this effect becomes stronger and stronger, even though ‘on paper’ we would not recognise the AI as a rational being in the Kantian sense.

Even the design of decision support system interfaces can pose tricky problems when an ethical component is present, as the dimensions of ethical problem solving (time intensiveness, consideration, uncertainty, uniqueness and reflection) directly conflict with those that make for efficient automation (brevity, formulaic, simplification, certainty and repetition). This inherent conflict thereby ensuring that the interaction of automation and human ethical decision making becomes a tangled and conflicted mess. Technologists of course look at the way in which human beings make such decisions in the real world and believe, rightly or wrongly, that automation can do better. What we should remember is that such automation is still a proxy for the designer, if the designer has no real understanding of the needs of the user in forming such ethical decisions then if if the past is any guide we are up for a future of poorly conceived decision support systems, with all the inevitable and unfortunate consequences that attend. In fact I feel confident in predicting that the designers of such systems will, once again, automate their biases about how humans and automation should interact, with unpleasant surprises for all.

In a broader sense what we’re doing with this current debate is essentially rehashing the old arguments between two world views on the proper role of automation, on the one side automation is intended to supplant those messy, unreliable humans, in the current context effecting an unintentional ethical prosthetic. On the other hand we have the view that automation can and should be used to assist and augment human capabilities, that is it should be used to support and develop peoples innate ethical sense. Unfortunately in this current debate it also looks like the prosthesis school of thought is winning out. My view is that if we continue in this approach of ‘automating out’ moral decision making we will inevitably end up with the amputation of ethical sense in the decision maker, long before killer robots stalk the battlefield, or the high street of your home town.

Just added a modified version of the venerable subjective 882 hazard risk matrix to my useful stuff page in which I fix a few issues that have bugged me about that particular tool, see Risk and the Matrix for a fuller discussion of the problems with risk matrices. For those of you with a strong interest in such I’ve translated the matrix into cartesian coordinates, revised the risk zone and definitions to make the matrix ‘De Moivre theorem’ compliant (and a touch more conservative), added the AIAA’s combinatorial probability thresholds, introduced a calibration point.

Update. I subsequently revised the ALARP principle to reflect added the reasonably practicable principles to address the Australian legal concept of SFAIRP to the decision making criteria and adjusted the risk curves to reflect how non-ergodic (catastrophic) risks should be treated differently.

MIL-STD-882 Hazard Risk Matrix (Modified).

 

I’ve put the original Def Stan 00-55 (both parts) onto my resources page for those who are interested in doing a compare and contrast between the old, and the new (whenever it’s RFC is released). I’ll be interested to see whether the standards reluctance to buy into the whole safety by following a process argument is maintained in the next iteration. The problem of arguing from fault density to safety that they allude to also remains, I believe, insurmountable.

The justification of how the SRS development process is expected to deliver SRS of the required safety integrity level, mainly on the basis of the performance of the process on previous projects, is covered in 7.4 and annex E. However, in general the process used is a very weak predictor of the safety integrity level attained in a particular case, because of the variability from project to project. Instrumentation of the process to obtain repeatable data is difficult and enormously expensive, and capturing the important human factors aspects is still an active research area. Furthermore, even very high quality processes only predict the fault density of the software, and the problem of predicting safety integrity from fault density is insurmountable at the time of writing (unless it is possible to argue for zero faults).

Def Stan 00-55 Issue 2 Part 2 Cl. 7.3.1

Just as an aside, the original release of Def Stan 00-56 is also worth a look as it contains the methodology for the assignment of safety integrity levels. Basically for a single function or N>1 non-independent functions the SIL assigned to the function(s) is derived from the worst credible accident severity (much like DO-178). In the case of N>1 independent functions, one of these functions gets a SIL based on severity but the remainder have a SIL rating apportioned to them based on risk criteria. From which you can infer that the authors, just like the aviation community were rather mistrustful of using estimates of probability in assuring a first line of defence. 🙂

Preamble

The following is a critique of a teleconference conducted on the 16 March  between the UK embassy in Japan and the UK Governments senior scientific advisor and members of SAGE, a UK government crisis panel formed in the aftermath of the Japanese tsunami to advise on the Fukushima crisis. These comments pertain specifically to the 16 March (UK time) teleconference with the British embassy and the minutes of SAGE meetings on the 15th and 16th that preceded that teleconference. Continue Reading…

IEC 61508 dissected

28/04/2014

I’ve just reread Peter Ladkin’s 2008 dissection of the conceptual problems of IEC 61508 here, and having just worked through a recent project in which 61508 SILs were applied, I tend to agree that SIL is still a bad idea, done badly… I’d also add that, the HSE’s opinion notwithstanding, I don’t actually see that the a priori application of a risk derived SIL level to a specific software development acquits ones ‘so far as is reasonably practicable’ duty of care. Of course if your regulator says it does, why then smile bravely and complement him on the wonderful cut of his new clothes. On the other hand if you’re design the safety system for a nuclear plant maybe have a look at how the aviation industry do business with their Design Assurance Levels. 🙂

2013 in review

03/01/2014

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 24,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 9 sold-out performances for that many people to see it.

Click here to see the complete report.

And I’ve just updated the philosophical principles for acquiring safety critical systems. All suggestions welcome…

Enjoy 🙂

Colossus the forbin project (Image source: Movie still)

Risk as uncontrollability…

The venerable safety standard MIL-STD-882 introduced the concept of software hazard and risk in Revision C of that standard. Rather than using the classical definition of risk as combination of severity and likelihood the authors struck off down quite a different, and interesting, path.

Continue Reading…

In an earlier post I had a look at the role played by design authorities in an organisation, which can have a major affect upon both safety and project success. My focus in that post was on the authority aspect.

However another perspective on the role that a design authority performs is that of someone who is able to understand both the operational requirements for a system (e.g. those that define a need) as well as the technical (those that define a solution) and most importantly be able to translate between them.

This is a role that is well understood in architecture, but one that has seemed to diminish and dwindle in engineering where projects of any complexity are more often undertaken by large bureaucratic organisations, which also traditionally fear assigning responsibility to one person.

20130405-110510.jpg
Provided as part of the QR show bag for the CORE 2012 conference. The irony of a detachable cab being completely unintentional…

20130223-170419.jpg

For somebody. 🙂

787 Lithium Battery (Image Source: JTSB)

But, we tested it? Didn’t we?

Earlier reports of the Boeing 787 lithium battery initial development indicated that Boeing engineers had conducted tests to confirm that a single cell failure would not lead to a cascading thermal runaway amongst the remaining batteries. According to these reports their tests were successful, so what went wrong?

Continue Reading…

Over on the RVS Bielefield site Peter Ladkin has just put up a white paper  entitled 61508 Weaknesses and Anomalies which looks at the problems with the current version of the IEC 61508 functional safety standard, part 6 of which sits on my desk even as we speak. Comments are welcome.

For my own contributions to the commentary on IEC 61508 see Buncefield the alternate view , Component SIL rating memes and SILs and Safety Myths.

Dr Nancy Leveson will be teaching a week-long class on system safety this year at the Talaris Conference Center in Seattle from July 15-19.

Her focus will be on the new techniques and approaches described in her latest book Engineering a Safer World. Should be excellent.

See the class announcement for further details.