One of my somewhat perennial concerns when reviewing a functional hazard analysis (FHA) is what’s termed the completeness question. In this case whether all the potentially hazardous functional failure modes have been considered, and to what degree? The failure of a physical component are generally expressed in the physical domain and described in terms of failure modes related to the actual physical nature and design of the item. As an example the failure modes of a switch could include; fail to open, partially open, closed, partially closed and chatter. For physical failures we find that the description of consequence is bound up intimately with the physical ‘mode’ or mechanism of failure.
In contrast the model of failure used in functional failure concerns itself with the external effect or phenomena rather than underlying mechanisms. See for example Ezhilchevan and Shrivastasa’s (1989) failure type hierarchy, Bondavelli and Simoncini’s (1990) value and timing domain failure classes or Pumfrey’s (1990) service model. These are all taxonomies (1) of functional failure. Now taxonomies are generally either ‘genotype’, that is based on the underlying design, or ‘phenotype’, that is based on observable phenomena. In the case of the quoted functional failure taxonomies these are definitionally all about phenotype. The top blue boxes in the figure below illustrate this style of taxonomy.
So why is this important? As I noted in the introduction to this post there are two questions we need to answer regarding the quality of hazard analyses. First is it complete? And secondly, how do we know and can we prove it? Let’s look at ARP 4574 for a second here functional hazard analysis is supposed to examine aircraft (system) and aircraft system functions to identify potential functional failures and classify the associated hazard severities associated with specific failure conditions. But this all hinges on how you define a function, which bounds what you define as a functional failure.
At it’s most abstract a function accepts some input, transforms it and generates an output. More completely we can specify transition criteria (when a function is working) as well as input and output value and timing constraints (2). Even more completely we could specify resources that the function consumes to generate it’s output (power, CPU memory inter alia). Each level of descriptive completeness inherently constrains the completeness of any analysis of that description. So I could conduct a simplistic FHA that just looks at omission/commission failure, because I only have available a first order model. But in doing so I may be overlooking a critical failure mode related to timing or value errors. In essence your analysis is constrained by the level of abstraction of the systems functional specification.
The route out of this quagmire is to develop a taxonomy of functional failure modes, I call them domains, that you can use to identify the level of incompleteness of analysis. The top third of the figure illustrates this taxonomy, while in the middle I map the standard ‘functional specification’ constructs into these phenotype failure modes. At the bottom I refer back to some ‘black box’ causes for the failure. This gives us the standard hazard syntax of source/mechanism/outcome in the failure domain. if functions have been allocated to components you could substitute ‘Component X’ or ‘Software Unit Y’ faults of course as causation.
Function X hazard. An unintended interaction with function Y (mechanism/cause, genotype) led to the spurious commission (aberrant functional output, phenotype) of function X (functional source, genotype) and subsequent mishap event Z (loss outcome, phenotype).
The advantage of such a model is that if you’re doing an FHA and covering less than the exhaustive set of relationships then it highlights the degree to which your analysis is incomplete. The easiest way to do this is to simply take your FHA worksheet, count the number of individual functional failures identified in it and divide these by the number of functions assessed. If your answer is at the ‘one or two’ end of the spectrum then the analysis is probably only considering a simple commission/non-commision failure model (3). If on the other hand the answer is around ‘four or five’ then your probably dealing with a level of completeness of Pumfrey’s model. A quick cross check agains the degree of functional definition available in the model will tell you whether the analyst has been pond skating or not.
The question of what degree of completeness is required is one that you as the analyst need to consider and at the very least it should be acknowledged.
2. One should not simply consider the relationship between input and output, but also across the set of inputs and the set of outputs outputs. For example there may be a safety related constraint placed upon the environment in terms of input arrival rates.
3. Or the analyst has decided not to document the functional failures that did not result in a hazard, which is also a, ‘bad thing’.
Ezhilchelvan, P. D. and Shrivastava, S. K. A Classification of Faults in Systems. Newcastle : University of Newcastle upon Tyne, 1989. Technical Report.
Bondavalli, A. and Simoncini, L. Failure Classification with Respect to Detection, Task B: Specification and Design for Dependability. s.l. : ESPRIT BRA Project, 1990. ESPRIT BRA Project 3092 Predictably Dependable Computing Systems.
Pumfrey, D.J. The Principled Design of Computer System Safety Analyses. York : Department of Computer Science, University of York, 1990. PhD Thesis.