Way, way back in 2011 NASA published the first volume of their planned two volume epic on system safety titled strangely enough “NASA System Safety Handbook Volume 1, System Safety Framework and Concepts for Implementation“, catchy eh?
I finally got around to reading it, which may give you an idea of the length of my reading list, and there are definitely aspects of the handbook that are new for safety standards. In particular the handbook requires that unknown and therefore unquantified hazards also be managed, which something you just don’t find in other current safety standards, like ISO 61508 which seem unfortunately to be mired in a frequentist view of uncertainty and risk.
The handbook identifies two key objectives in managing the unknown. Firstly to incorporate appropriate historically-informed defenses against unknown hazards into the design, for example through use of design heuristics, economy of mechanisms and so on. Secondly to minimise the introduction of potentially adverse conditions during system realization/operation. that is apply best practices to avoid (for example) installing space shuttle speed brake actuators reversed, as happened with the shuttle fleet.
Precursor analysis also gets a gurnesy as the standard, again something other standards don’t, addresses the management of safety risk during the operational phase of the systems life, requiring that that the results be flowed back into the standing risk analysis for the system.
The standard also defines the concept of a safety risk reserve that should be applied to safety assessments to ensure that the reported system risk reflects both the risk associated with identified hazards and that associated with as yet unidentified hazards. Interestingly the authors posit the the relative importance of such unknown hazards would decrease as operational experience accrues. For example:
- At initial transition to operations our uncertainty is high and the margin reflects this,
- As operational experience accrues further hazards are identified and the risk balance shifts towards identified hazards (the safety risk margin is consumed),
- Where design changes to reduce hazard likelihood are incorporated the risk transfers bad the other way i.e. known risk reduces but unknowns again increase (as does the allocated safety risk margin),
- As we add features to better manage system faults both known and unknown risk is decreased, and
- Finally as the system design stabilise we reach the systems safety goal as the unknown risk washes out to some (albeit unknown) asymptote.
What’s encouraging about the handbook is that it gets the fundamental completeness problem of hazard identification onto the table, requires that NASA programs formally manage it and provides a set of tools to do so. In addressing epistemic and ontological risk in a meaningful fashion NASA gets four out of five belated Black Swans.