At the height of the cold war with bombers carrying nuclear weapons on airborne alert and the strategic forces of both sides on a knife edge the possibility that a nuclear weapon could go off purely by accident and trigger nuclear war was a disquieting one.
Both sides realised that the risk of inadvertently starting World War III had to be minimised, and on the American side after several near misses in the 40s and 50s engineers at Los Alamos and Sandia labs started to work seriously on how to prevent nuclear weapons from going off by accident.
It is the stated position of the U.S. Air Force that their safeguards would prevent the occurrence of such events as are depicted in this film…
Film Title Card: Dr Strangelove
The very good news is that efforts have been successful, to date, in preventing an accidental detonation of a nuclear device. The even better news is that the concepts they developed to guide their efforts have much broader applicability than weapons safety, unfortunately they have also not really received the broader recognition that they deserve (1).
Despite this lack of recognition, where you have a design problem in which a component must both function on demand, e.g be reliable, and also not function inadvertently, e.g. be safe, you’ll probably find yourself adopting one or more of these principles.
The 3I principles
The fundamental principles to preventing accidental detonation of a weapon are what the nuclear safety community call the three I’s or Isolation, Incompatibility and Inoperability which represent the architectural design principles that underpin nuclear weapons safety design; and while we’ll we’ll deal with each separately in the following sections they should be understood as an integral set, where a failure to meet one principle undermines the whole.
Isolate detonation-critical components from unintended energy (electrical, thermal, mechanical)
To prevent a spurious input inadvertently triggering a weapon the critical weapon package is isolated behind an energy barrier with so called strong-link safety devices controlling access through the energy barrier so that only a valid trigger input will pass through to the weapon. To guard against the energy barrier or a strong-link failing in an adverse environment, such as a fire, designed in weak-link safety devices will have already assured that the weapon is rendered inoperable.
Non-safety functions are deliberately separated from weapons safety and arming functionality to keep the inventory of safety critical parts are as small as possible. This also gives us a separation of concerns design approach, reducing in turn the span and complexity of effort required to assure weapon safety. Basically if you’re working on the ‘if it can go wrong it will’ principle, the less components there are then the less potential risk there is.
Such architectural concepts of isolating, separating, simplifying and controlling access are all principles that can be applied in the general case to assure the integrity of safety critical functions.
Design enabling stimuli to be unique and not found in nature
Extreme external events, such as lightning strikes on a signal line or a carrier aircraft crashing, should not lead to initiating a weapon. To ensure only a deliberate actions will initiate a detonation, weapons includes ‘locks’ (or strong-links) which open only upon receipt of a unique signal that can’t be mistaken for, and are incompatible with, an environmental input (2)(3).
Mechanical devices are used to provide these strong-links (yes Virginia some folk really don’t trust software) and are designed to withstand assaults from the environment, and remain functional, until specified harm levels are exceeded. Where electrical signals are used, design techniques (e.g pseudo random signals, electrical isolation, the avoidance of easily spoofed signals such as AC frequencies close to 60 Hz AC or low voltage DC signals) are used to establish incompatibility.
To deliver the required high level of assurance strong links are normally placed in series and diverse technologies used to ensure no common failure modes exists amongst them. In practice strong-links usually comprise a combination of human intent (opened when a unique signal from a consent switch is received) in series with a weapon launch parameter (e.g opened by an internal accelerometer providing a unique signal) so that the combination of the two must be received before the weapon arming & firing signal is allowed to cross the energy barrier. Finally to negate ‘brute strength’ attacks the strong link will lock out further attempts if the initial input is incorrect, effectively making it more likely that a weapon will transition to a safe state than to a more hazardous one when exposed to multiple spurious inputs.
Clearly we can also take the principle of incompatibility and apply it directly to both hardware and software control channels. For example the use of multi-bit flags for critical signals, Cyclic Redundancy Checking, sending of separate command and intent signals, diverse and redundant signal paths (or messages) can all be seen as implementations intended to achieve signal incompatibility.
Make the weapon predictably and irreversibly inoperable before isolation is lost
So how is a a situation in which the weapon is exposed to an extreme environment, such as a fire or being dropped from an aircraft handled? To build an energy barrier that could withstand all potential accident environments would be impractical, so what do we do? To prevent such an environment triggering a weapon weak links are built into the system that ensure that the weapon will be rendered inoperable before the energy barrier (or strong-links) are breached. For example capacitors could be incorporated into a firing circuit that will fail when exposed to high temperature and thereby ‘safe’ the firing circuit well before the energy barrier fails and allow energy to pass down the circuit.
This principle of designing for predictable failure that leads to a safe state is one that can be applied to any system where there is a known threat environment. For example a cryogenic pressure vessel could be designed such that the system would have a fusible link to allow de-pressurisation before a fire causes catastrophic failure of the containment vessel.
Applying fundamental deterministic design principles
As you might have gathered from the inoperability principle nuclear weapons safety is under pinned by a physics based, approach to safety. The great advantage of using such fundamental physical properties is that they must occur, i.e. the probability of occurrence given the defined trigger is always unity. Basing safety features upon physics thus gives us strongly predictable, deterministic behaviour. An example of a physics based principle would be the permanent decomposition of the mylar in a capacitor in a high temperature event.
In the general case this principle would drive the use of inherent physical processes and principles to alert and initiate safety functions rather than complex electronic or software. For example spring powered fuel valves that close when power is removed use an inherent physical process to assure safety in the event of a power outage.
Safety themes and safety cases
Decades before the concept of safety cases emerged in the process and oil industries, see for example Cullen (1990) the nuclear weapons safety community had developed the concept of a safety ‘theme’. The elements of a safety theme include; how safety principles were addressed, how fundamental principles have been used and the how the number of safety critical components have been minimised.
A safety theme describes in a unified fashion the principles that will be used to assure safety under all expected environments. A safety theme assists in directing effort to meet the major safety requirements and provides a framework in which to communicate the implementations to key stakeholders
Antonio et al in Isbell 1997
In fact a safety theme closely resembles the current concept of developing standard safety case patterns in order to abstract away fundamental safety strategies out of the details of specific projects (McDermid, Kelly 1997). Because other constraints and system objectives exist there are naturally trade-offs and compromises that need to be made during a weapons development and in response a series of iterative and progressive safety evaluations are required by the nuclear safety community to surface and evaluate any potential deviations from the safety theme. Again it turns out that this iterative evaluation process is mirrored in the practice of developing safety cases in an iterative fashion, from a preliminary version through to a final operational version (Chinneck et al. 2004).
Ad hoc versus predictable safety
If we place a critical control circuit board in a fire how is it going to behave? Can you guarantee that it will not generate a potentially hazardous output? If so, how many assumptions did you have to make? The reality is in those sort of circumstances one can’t predict with any certainty how such an ad hoc and inherently non-deterministic circuit is going to behave (4). To address this problem the co-requisite requirement is to design for predictable ‘upon failure’ behaviour based upon fundamental principles. In the example of the fire above we could use a capacitor in the circuit to act as a thermal weak-link that would hard fail the circuit well before a fire might compromise its behaviour.
In designing for fault tolerance this idea of predictable failure equates to the concept of a fail stop processor or more generally the idea of developing a fault hypothesis that specifies the ways in which a systems components will fail and then ensuring that system behaviour is deterministic (predictable) in such circumstances.
While each of the three principles is by itself useful, the real power of the 3I principles emerges when we apply them as a set to the architecture of a system that controls a potentially hazardous function.
1. With the exception of Nancy Leveson’s Safeware.
2. That is signals that are logically separated from both each other and naturally occurring signals, normally via signal complexity so as to reduce the likelihood of spoofing. The assumption is that any transmission outside the safety barrier may be compromised.
3. As an example, in the 1961 Golsboro accident, gyration of the aircraft as it broke up was enough to distort the bomb bay sufficiently to pull the manual arming pin from one of the bombs thereby initiating the weapon arming sequence as the bomb was thrown clear, with only the pilot actuated safe/arm switch preventing a full weapon detonation.
4. the term ad-hoc is used in the sense that it has been designed to perform it’s required functions in an implicitly assumed nominal environment, so it’s behaviour in an abnormal environment will be therefore be ad-hoc in nature.
Chinneck, P., Pumfrey, D. and McDermid, J., The HEAT/ACT Preliminary Safety Case: A case study in the use of Goal Structuring Notation., Proc. 9th Australian Workshop on Safety-Related Programmable Systems (SCS 2004), Brisbane, Australia. CRPIT, 47. Cant, T., Ed. ACS. 33-41, 2004.
Cullen, W.D., The Public Enquiry Into the Piper Alpha Disaster, Dept of Energy, London, HMSO November 1990.
Isbell, D., (Ed.), High Consequence Operations, July 21-23 1997, Sandia Nationa Laboratories, Albuquerque, New Mexico, 1997.
McDermid, J., Kelly, T.P., Safety Case Construction and Reuse using Patterns, Conf. proc. SAFECOMP 97, pp 55-69, 1997.
Plummer, D.W, Greenwood, W.H., The History of US Nuclear Weapons Safety Devices, Sandia National Laboratories, AIAA Report 98-34-64, 1998.
Spray, S.D., Principle Based Passive Safety In Nuclear Weapon Systems, High Consequence Operations Safety Symposium, Sandia National Laboratories, Albuquerque, 13 July 1994.