Why writing a safety case might (actually) be a good idea
Frequent readers of my blog would probably realise that I’m a little sceptical of safety cases, as Scrooge remarked to Morely’s ghost, “There’s more of gravy than of grave about you, whatever you are!” So to for safety cases, oft more gravy than gravitas about them in my opinion, regardless of what their proponents might think.
All that being said, there may still be circumstances in which a formal safety case can potentially be very useful. One circumstance in particular is managing that tricky interface between system and lower level software design. One of the interesting problems that the current software assurance paradigm of “safety through correctness” throws up is that it implicitly relies on the system requirements being correct in regard to system safety properties, and that these have been faithfully translated into the high level software requirements. But, if the correctness assumption is invalid then incidents like the Airbus G-VATL fuel starvation incident can happen, while if the requirements fail in translation then incidents similar to the Mars Polar lander loss can occur. Nor is there much happiness to be had by then proceeding into software design sans safety requirements and then trying to generate a set of lower level ‘software’ safety requirements. At this level the requirements we work with are not concerned about system level properties, like safety, but rather the lower level functional workings of the software. ‘Safety through correctness’ type standards such as DO-178 or 278 are quite good at ensuring robustness against these software requirements which generally means that the system behaviour at that level of observation is unlikely to fail relative to the low level specifications, the hazardous behaviour is actually emergent (and observable) at the higher system level.
So it seems that there’s actually a practical use for a safety case to document what safety requirements we are concerned with at the higher system level and what are the critical properties that we might want to monitor to ensure the safety of the system (1). Here a safety case can provide us with a logical place to consolidate our thinking about what are the safety properties of the system, how ‘perfect’ any safety functions (like monitors) must be, the fidelity of the parameters we have decided to manage and measure relative to the unsafe states we may be concerned about, the potential costs of getting the response wrong (either way) and what claim limits we might express over the resultant system architecture. A further advantage of such an argument space is that it gives us a place where we can express, discuss and address the epistemic uncertainties that are usually an integral part of safety critical systems design, and to do so with some hope of parsimony. Once you proceed across the epistemic divide into the domain of software ‘correctness’ addressing such uncertainties becomes something of a moot point (2).
One of the other things that concerns me about safety cases, as well as other ‘assurance products’, when written simply to satisfy some external fiat is that they are inevitably viewed as just another costly nil value document, and inevitably drift away from the design as a result. If instead we embedded the concept of a safety case within the development process as the way that the system designers must establish and communicate a clear set of safety requirements to the software designers then lo our safety case actually has value and relevance to the project (3). In effect an assurance based communications tool (4). Such a live case would be relevant, contemporaneous and give an external assessor a direct insight into how mature the project is in it’s thinking about safety at one of the more critical program interfaces.
1. Bev Littlewood and John Rushby (2012) point out that such system safety properties are also ideal candidates for monitoring in a monitored architecture pattern as they are diverse from those that appear in the lower level software specifications. I recommend their paper.
2. As evinced by the often frustrating discussions between software and safety engineers.
4. As proposed by Knight and Graydon (2007).
Knight, J.C., and Graydon, P., 2007. Engineering, communication, and safety, in Proc. of the 12th Australian workshop on Safety critical systems and software and safety-related programmable systems – Vol. 86 (SCS ’07), Tony Cant (Ed.), pp31-39.
Littlewood, B. & Rushby, J., 2012, Reasoning about the Reliability of Diverse Two-Channel Systems in Which One Channel Is “Possibly Perfect”, IEEE Transactions on Software Engineering, 38(5), pp.1178–1194.