Archives For Safety

When you look at the safety performance of industries which have a consistent focus on safety as part of the social permit, nuclear or aviation are the canonical examples, you see that over time increases in safety tend to plateau out. This looks like some form of a learning curve, but what’s the mechanism, or mechanisms that actually drives this process? I believe there are two factors at play here, firstly the increasing marginal cost of improvement and secondly the problem of learning from events that we are trying to prevent.

Increasing marginal cost is simply an economist’s way of stating that it will cost more to achieve that next increment in performance. For example, airbags are more expensive than seat-belts by roughly an order of magnitude (based on replacement costs) however airbags only deliver 8% reduced mortality when used in conjunction with seat belts, see Crandall (2001). As a result the next increment in safety takes longer and costs more (1).

The learning factor is in someways like an informational version of the marginal cost rule. As we reduce accident rates accidents become rarer. Now one of the traditional ways in which safety improvements occur is through studying accidents when they occur and then to eliminate or mitigate identified causal factors. Obviously as the accident rate decreases this likewise the opportunity for improvement also decreases. When accidents do occur we have a further problem because (definitionally) the cause of the accident will comprise a highly unlikely combination of factors that are needed to defeat the existing safety measures. Corrective actions for such rare combination of events therefore are highly specific to that event’s context and conversely will have far less universal applicability.  For example the lessons of metal fatigue learned from the Comet airliner disaster has had universal applicability to all aircraft designs ever since. But the QF-72 automation upset off Learmouth? Well those lessons, relating to the specific fault tolerance architecture of the A330, are much harder to generalise and therefore have less epistemic strength.

In summary not only does it cost more with each increasing increment of safety but our opportunity to learn through accidents is steadily reduced as their arrival rate and individual epistemic value (2) reduce.


1. In some circumstances we may also introduce other risks, see for example the death and severe injury caused to small children from air bag deployments.

2. In a Popperian sense.


1. Crandall, C.S., Olson, L.M.,  P. Sklar, D.P., Mortality Reduction with Air Bag and Seat Belt Use in Head-on Passenger Car Collisions, American Journal of Epidemiology, Volume 153, Issue 3, 1 February 2001, Pages 219–224,

One of the perennial problems we face in a system safety program is how to come up with a convincing proof for the proposition that a system is safe. Because it’s hard to prove a negative (in this case the absence of future accidents) the usual approach is to pursue a proof by contradiction, that is develop the negative proposition that the system is unsafe, then prove that this is not true, normally by showing that the set of identified specific propositions of `un-safety’ have been eliminated or controlled to an acceptable level.  Enter the term `hazard’, which in this context is simply shorthand for  a specific proposition about the unsafeness of a system. Now interestingly when we parse the set of definitions of hazard we find the recurring use of terms like, ‘condition’, ‘state’, ‘situation’ and ‘events’ that should they occur will inevitably lead to an ‘accident’ or ‘mishap’. So broadly speaking a hazard is a explanation based on a defined set of phenomena, that argues that if they are present, and given there exists some relevant domain source (1) of hazard an accident will occur. All of which seems to indicate that hazards belong to a class of explanatory models called covering laws. As an explanatory class Covering laws models were developed by the logical positivist philosophers Hempel and Popper because of what they saw as problems with an over reliance on inductive arguments as to causality.

As a covering law explanation of unsafeness a hazard posits phenomenological facts (system states, human errors, hardware/software failures and so on) that confer what’s called nomic expectability on the accident (the thing being explained). That is, the phenomenological facts combined with some covering law (natural and logical), require the accident to happen, and this is what we call a hazard. We can see an archetypal example in the Source-Mechanism-Outcome model of Swallom, i.e. if we have both a source and a set of mechanisms in that model then we may expect an accident (Ericson 2005). While logical positivism had the last nails driven into it’s coffin by Kuhn and others in the 1960s and it’s true, as Kuhn and others pointed out, that covering model explanations have their fair share of problems so to do other methods (2). The one advantage that covering models do possess over other explanatory models however is that they largely avoid the problems of causal arguments. Which may well be why they persist in engineering arguments about safety.


1. The source in this instance is the ‘covering law’.

2. Such as counterfactual, statistical relevance or causal explanations.


Ericson, C.A. Hazard Analysis Techniques for System Safety, page 93, John Wiley and Sons, Hoboken, New Jersey, 2005.

And not quite as simple as you think…

The testimony of Michael Barr, in the recent Oklahoma Toyota court case highlighted problems with the design of Toyota’s watchdog timer for their Camry ETCS-i  throttle control system, amongst other things, which got me thinking about the pervasive role that watchdogs play in safety critical systems. The great strength of watchdogs is of course that they provide a safety mechanism which resides outside the state machine, which gives them fundamental design independence from what’s going on inside. By their nature they’re also simple and small scale beasts, thereby satisfying the economy of mechanism principle.

Continue Reading…

Taboo transactions and the safety dilemma Again my thanks goes to Ross Anderson over on the Light Blue Touchpaper blog for the reference, this time to a paper by Alan Fiske  an anthropologist and Philip Tetlock a social psychologist, on what they terms taboo transactions. What they point out is that there are domains of sharing in society which each work on different rules; communal, versus reciprocal obligations for example, or authority versus market. And within each domain we socially ‘transact’ trade-offs between equivalent social goods.

Continue Reading…

So what do gambling, thermodynamics and risk all have in common?

Continue Reading...

In June of 2011 the Australian Safety Critical Systems Association (ASCSA) published a short discussion paper on what they believed to be the philosophical principles necessary to successfully guide the development of a safety critical system. The paper identified eight management and eight technical principles, but do these principles do justice to the purported purpose of the paper?

Continue Reading…

Why something as simple as control stick design can break an aircrew’s situational awareness

One of the less often considered aspects of situational awareness in the cockpit is the element of knowing what the ‘guy in the other seat is doing’. This is a particularly important part of cockpit error management because without a shared understanding of what someone is doing it’s kind of difficult to detect errors.

Continue Reading…

Fighter Cockpit Rear View Mirror

What the economic theory of sunk costs tells us about plan continuation bias

Plan continuation bias is a recognised and subtle cognitive bias that tends to force the continuation of an existing plan or course of action even in the face of changing conditions. In the field of aerospace it has been recognised as a significant causal factor in accidents, with a 2004 NASA study finding that in 9 out of the 19 accidents studied aircrew exhibited this behavioural bias. One explanation of this behaviour may be a version of the well known ‘sunk cost‘ economic heuristic.

Continue Reading…

One of the tenets of safety engineering is that simple systems are better. Many practical reasons are advanced to justify this assertion, but I’ve always wondered what, if any, theoretical justification was there for such a position.

Continue Reading...

In one’s professional life there are certain critical works that open your eyes to how lucidly a technical argument can be stated. Continue Reading…

TCAS Indicator (Image Source: Public Domain)

What TCAS can tell us about AF447 (Updated 27 Sept 09)

The BEA interim report on the AF447 accident confirms that the Traffic Alert and Collision Avoidance System (TCAS) had become inoperative during the early part of the event sequence for an, as yet, un-identified reason. The explanation may actually be fairly straight forward and lie within the fault tolerance requirements of the TCAS specification. Continue Reading…

If the theory of Highly Optimised Tolerance (HOT) theory holds true then we should be able to see a change in the distribution of the severity of adverse events as the design paradigm for a family of systems moves from the, ‘just make it work’ stage to the ‘optimise for robustness’ stage. This is something we can actually test through observation of real world systems.

Continue Reading...

The effect of poorly considered originating requirements (the recommendations of the Waterfall accident commisioner) upon system safety requirements for a passenger emergency door release function.

Continue Reading...

The use of integrity levels to achieve ultra high levels of safety has become an ‘accepted wisdom’ in the safety community. Yet I, and others, remain unconvinced as to their efficacy. In this post I argue that integrity levels are not scientific in any real sense of that term which leads in turn to the reasonable question of whether they work.

Testability and disconfirmation

The basis of science is empirical observation and inductive reasoning. For example we may observe that swans are white and therefore form a theory that all swans must be white. But as Hume pointed out inductive reasoning is inherently limited because the premises of an inductive argument support but cannot logically entail the conclusion. For example, in our original example a single black swan is sufficient to refute our theory, despite there being a thousand white swans…This does not mean that a theory cannot be useful (that is it works), but just because a theory has worked a number of times does not mean that it is proven to be true. For example, we can build ten bridges that stay up (our theory is useful) but there is nothing to say that the eleventh will not fall down due to effects not considered in the existing theory of how to design bridges. As any test of a theory cannot prove the truth of a theory only disprove it, when we say a theory is testable we are not saying that we can prove it, only that there exists an opportunity to disprove or falsify it. This concept of disproof is very much akin to the legal principal of finding a person ‘not guilty’, rather than ‘innocent’ of a charge. Which leaves us with a problem as to how science really works, if we presume that it’s science’s job to prove things.

The response of the philosopher Karl Popper to this problem of induction was to accept this inability to absolutely prove the truth and conclude that because we can never prove the truth of a scientific theory, science has to advance on the basis of the falsification of existing theories and replacing them with theories that better explain the facts (Popper 1968). From Popper’s perspective a good theory is one that offers us ample opportunity to falsify it. Conversely a theory which is not refutable by any conceivable means is non-scientific. Irrefutability is in fact not a virtue of a theory (as people often think) but a vice (Popper 1968).  To achieve falsifiability, according to Popper, a theory therefore needs to be:

  1. precisely stated (i.e. unambiguous),
  2. wide ranging, and
  3. testable in practical terms.

As a corollary if a theory does not satisfy these criteria it should not be considered scientific (Popper 1968). For example we could develop a design hypothesis  as to how we could design a bridge to span the straits of Gibraltar using as yet undeveloped hyper-strength materials, but as we have no practical way to test such a theory this should not be considered as scientific. Another way to look at it is that a ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is because it gives us greater scope to falsify it. For example we build a very slender bridge using deflection theory (we do X) the design hypothesis forbids the bridge from falling down under specific deck loads (If X then not Y) which is eminently testable. Confirmations should also count only if they are the result of risky predictions. That is if based on our original theory or understanding, we expect an event that is incompatible with some new hypothesis then that would refute our new hypothesis. If on the other hand a new hypothesis predicts pretty much the same results as the accepted theory then there’s not much at risk. So in our bridge example if our new theory of bridge construction predicts that such a bridge will not fail under an known load, whereas the older theory based on traditional techniques predicts that it will, then there’s a clear, and testable, difference.

From an engineering perspective this means that the confirmation of a theory comes when it allows engineers to do something beyond the current state of the art. For example, we could use a new bridge deflection theory to design a lighter and more slender bridge span for given wind loads. If the bridge stands under the loads then the the results would count as confirmation of the new hypothesis as our old theory would have predicted failure. Confirming evidence should not count except when it is the result of a genuine test of our hypothesis and it can be presented as a serious, but unsuccessful, attempt to falsify the hypothesis and whose results corroborate the evidence (Popper 1968). Essentially our theories and hypotheses need to have some, ‘skin in the game’. Again using the bridge design example if both the original theory and the new design hypothesis predict the survival of a bridge this does not represent a genuine test of the new theory. But if that new bridge could only be built using the new hypothesis and the bridge subsequently falls down then that new hypothesis will inevitably be scrutinised and either rejected or adjusted, as happened with the Tacoma Narrows disaster, our new hypothesis has in effect lots of epistemic ‘skin in the game’.

The scientific theory of Software Integrity Levels (SILs)

As the concept of a safety integrity levels is most entrenched within the software community I’ll stick with them for the moment, noting that the issues raised below are just as valid for safety integrity levels when applied to hardware. The theory of safety integrity levels for software can be expressed as follows:

  1. Software failures are ‘systematic’ that is they result from systematic faults in the software specification, design or production processes,
  2. As such software failures are not random in nature, given the correct set of inputs or environmental conditions the failure will always occur,
  3. The requirement for ultrahigh reliability (for example 10E-9 per hour) of safety functions makes traditional reliability testing of software to demonstrate such reliability impossible,
  4. The use of specific development processes will deliver the required reliability by reducing the number of latent faults that could cause a software failure but this comes at a cost,
  5. Therefore based on an assessment of risk an ‘integrity level’ is assigned to the safety function. The higher the risk the greater the integrity level assigned,
  6. This integrity level represents the required reliability of the safety function, and
  7. To achieve the integrity level a set of processes are applied to the specification, design & production processes, these are defined as an associated software integrity level.

Problems with SILs as a scientific theory

SIL’s are fundamentally untestable

Unfortunately even the lowest target failure rates for safety functions (e.g. 10E-5 per hr) are already beyond practical verification (Littlewood-Strigini 1993) therefore we have no practical independent and empirical way to demonstrate that application of a SIL (or any other posited technique) will achieve the required reliability (freedom from accident). So we end up with a circular argument where we can only demonstrate achieving a specific SIL by the evidence of carrying out the processes that define that SIL level (McDermid 2001).

SIL allocation is non-trivial

A number of different techniques can be used to allocate integrity level requirements ranging from the Consequence/Autonomy models of DO-178B and MIL-STD-882  to the Risk Matrices of IEC 61508 (1). Because of these differences SIL allocation cannot be said to be a consistent and therefore precisely defined activity this makes refutation of the theory difficult as a failure could be argued, ad hoc, as being due to the incorrect allocation of SILs rather than SILs themselves.

SIL Activities are inconsistent from standard to standard

The many SIL based standards vary widely in the methods invoked and the degree of tailoring that a project can apply. DO-178B defines a basic development process but focuses upon software product testing and inspection to assure safety. Other standards such as DEF STAN 00-55 focus on the definition of safety requirements that are acquitted through evidence. Some standards, such as DEF AUST 5679, emphasise the use of formal methods to achieve the highest integrity levels while others, such as IEC 61508, invoke a broad range of techniques to deliver a safety function at a required integrity level. There is as a result no single consistent and therefore wide ranging, ‘theory of SILs’, but each is specific to the project and company ‘instance’.

SIL activities are applied inconsistently

The majority of SIL standards allow a degree of tailoring of process to the specific project or company. While this is understandable given the range of projects and industry contexts it results in an inherently inconsistent application of processes across projects. As an example from aviation, within that industries software community there has been a vigorous debate over the application of various methods of achieving the Modified Condition/Decision Coverage criteria of DO-178B (Chilenski 2001). Because of this variability of application it is impossible to say with precision that a specific standard has been fully applied. The lack of precision then makes it difficult to argue that should an accident occur that the standard failed because it could always be argued after the fact that it was a fault of application rather than an inherent fault in the process standard that caused the failure. This is what Popper calls a conventionalist twist, because it can be used to explain away inconvenient results. This problem of application is further exacerbated by the standardisation bodies expressing their requirements in terms of recommendations (IEC 61508) or guidance (DO-178) rather than requirements and thereby allowing process variance without either justification or demonstration of equivalence.

SIL activities are ambiguous as to outcome

While the SIL standards are intended to deliver both intermediate and final products with low defect rates the logical argument as to how each process achieves or contributes to such a process is not so clear. The problem becomes worse as the process moves away from proximal activities that directly impact the final delivered product and towards the distal activities of managing the process. For example DO-178 Table A-1 requires the preparation of a plan for the software aspects of certification. While planning a process is certainly a ‘good thing’, the problem is that it is difficult to link the quality of overall planning to a specific and hazardous fault in a product. All that can be said about a plan is that it represents a planning activity and ensures that, if adhered to, subsequent efforts are carried out in a planned way and are auditable against the plan. Having developed a software product to the SIL requirements we then find that the end product behaves much like software that has not been developed to such a standard. In essence SIL’s make no risky predictions given as noted above that the purported reliability of the software is not empirically testable. Even should a latent software fault exist as long as the correct set of circumstances never arise in practice the software will operate safely.


Given the problems identified above we must conclude that, however much SIL’s have become the accepted wisdom, they do not satisfy the requirements of a scientific theory. They may have a seductive simplicity but they are, it seems, closer to astrology than to science or engineering. Unfortunately while the software community continues to cling to such concepts it stifles serious investigation into the real question of what constitutes safe software.


Chilenski, J. J. (2001), An Investigation of Three Forms of the Modified Condition Decision Coverage (MCDC) Criterion, FAA Tech Center Report DOT/FAA/AR-01/18.

Fowler, D., Application of IEC 61508 to Air Traffic Management and Similar Complex Critical Systems – Methods and Mythology, in Lessons in System Safety: Proceedings of the Eighth Safety-Critical Systems Symposium, Anderson, T., Redmill, F. (ed.s), pp 226-245, Southampton, UK, Springer Verlag.

Littlewood, B. & Strigini, L. (1993), Validation of Ultra-High Dependability for Software-based Systems. Comm. of the ACM, 36(11):69–80.

McDermid, J. A, Pumfrey, D.J Software Safety: Why is there no Consensus? Proceedings of the International System Safety Conference (ISSC) 2001, Huntsville,System Safety Society, 2001.

Popper, K.R. , Conjectures and Refutations, Third Ed. Routledge Pub., 1968.

Redmill, F., Safety Integrity Levels – Theory and Problems, Lessons in Systems Safety, in Lessons in System Safety: Proceedings of the Eighth Safety-Critical Systems Symposium, Anderson, T., Redmill, F. (ed.s), pp 1-20, Southampton, UK, Springer Verlag


1. There have been a multitude of qualitative and quantitative methods proposed for SIL assignment, so many that it sometimes seems that safety professionals take a perverse delight in propagating new techniques. Some of the more common include (with source):

  • Consequence (control loss) (MISRA),
  • Software authority (MIL-STD-882),
  • Consequence (loss severity) (DO 178),
  • Quantitative risk method (IEC 61508),
  • Risk graph and calibrated risk graph (IEC 61508/IEC 61511),
  • Hazardous event severity matrix (IEC 61508),
  • Hybrid consequence and risk matrix (DEF STAN 00-56),
  • Semi-quantitative method (IEC 61511),
  • Safety layer matrix method (IEC 61511), and
  • Layer of protection analysis (IEC 61511).