In June of 2011 the Australian Safety Critical Systems Association (ASCSA) published a short discussion paper on what they believed to be the philosophical principles necessary to successfully guide the development of a safety critical system. The paper identified eight management and eight technical principles, but do these principles do justice to the purported purpose of the paper?
Referring to the Oxford dictionary you find principle defined as, “a fundamental truth or proposition that serves as the foundation for a system of belief or behaviour or for a chain of reasoning”. Clearly the authors intended for the principles so enumerated to serve as the foundational concepts for safety management and engineering performed when developing safety critical systems.
A worthy objective, but I’m afraid I didn’t really come away with the feeling that they had nailed the philosophy part. A lot of good common sense statements, but philosophy? As in the study of the fundamental nature of knowledge, nature and reality? Not so much…
So, throwing my hat in the ring here’s my working list of philosophical principles.
The (Very Draft) Philosophical Principles of Safety Critical Systems
- Operating a safety critical system requires a decision to accept the residual risk of operation by society.
- There exists an ethical duty imposed on decision makers to eliminate if practical or where it is not practical to eliminate, to reduce the known and suspected risks to an acceptable level.
- The safety of a technological system cannot be divorced from it’s operational, organisational and societal context.
- Risk is a social construct and can never be evaluated in a totally objective fashion.
- When evaluating risk we must be clear as to whether we are dealing with an ergodic or non ergodic system.
- The uncertainty element of risk is made up of aleatory, epistemic and ontological components:
- Aleatory. The randomness of inherently random processes, for example equipment random failure rates,
- Epistemic. Uncertainty over parameters in the model of the system. For example in the estimates of severity and probability values, and
- Ontological. Uncertainty as to the correctness or completeness of the model of a system. For example the undetected presence of design errors, requirements incompleteness, invalid models or erroneous assumptions.
- As a result of the presence of epistemic and ontological risk operating risk will include both a known and unknown components.
- Some unknown risks may disclose themselves in the life of the systems, some may never be identified.
- Designing for safety must address both the aleatory risk posed by the system being constructed from imperfect components and fallible people as well as the risk introduced by epistemic and ontological uncertainty.
- Safety critical system risk for high consequence systems is dominated by epistemic and ontological risk e.g. the risks associated with the limits of our knowledge.
- The greater the potential severity component of safety risk the more likely the organisational focus will be on prevention of accidents rather than mitigation of consequences.
- The greater the severity of a potential accident the lower the required occurrence rate and the greater the epistemic and ontological uncertainty of estimation of probability.
- The greater the epistemic uncertainty of probability estimates the more the focus should be upon the reduction in the severity of consequences.
- The lower the required probability of an accident the less we can rely upon the assumption of independence of events to justify a low likelihood of occurrence.
- Complexity breeds ontological uncertainty and risk. The more complex a system the more likely it is that an accident will be due to the unintended and unidentified interaction of components, rather than singular component failures or human errors.
- Complex safety critical technological systems are highly optimised towards a tolerance of frequently occurring failure events but are vulnerable to rare combinations of such events.
- As a consequence of their highly optimised tolerance design the probability distribution of accident severity will exhibit a heavy tail distribution.
- One can never absolutely ‘prove’ the safety of a system as such arguments are inherently inductive. Therefore a rigorous (and ongoing) attempt to test for and identify flaws in the safety argument, rather than attempting to prove it in some absolute sense, should be adopted.
The above list is in no way complete, and I’d appreciate suggested additional principles.