*As a preamble I thought about titling this post “Some Like It HOT (1)”, but then thought better of inflicting such a pun on my readers.
In this post I’m going to briefly discuss the almost ubiquitous and unchallenged use of median value voting algorithms as part of fault tolerant design (2) and why this had led to control systems that exhibit Highly Optimised Tolerance or HOT (2).
This post is part of the Airbus aircraft family and system safety thread.
This is especially true for avionics systems and their sensors where space and weight dictate the use of the minimum amount of physical redundancy, e.g. the use of triplex redundancy schemes for air data. So needing a redundant voting scheme, it seems a straight forward decision to use a median voting algorithm for air data systems, after all everyone ‘knows’ that it’s the best.
Well actually no, median voting works well for single failures but performs somewhat differently when we start considering multiple failures. A study by Bass, Latif-Shabgahi and Bennet (1997) found that while producing the largest number of correct results, median voting also produced the largest number of catastrophic (extreme value) errors.
So here we have a system design that, while robust to common perturbations, remains vulnerable (2) to rare events. Sounds suspiciously like a HOT system doesn’t it? What’s interesting is that current aviation regulations, the FAR/JAR, actually drive such design trade-offs by focusing on the requirement to address single point of failures. Naturally you end up with a system that is robust in the face of single failures but vulnerable to unanticipated, read un-specified, multiple failures. The vulnerability (3) of median voting algorithms being a case in point. Somewhat unsettling when you consider that your flying on a bunch of these algorithms.
1. Median value voting avoids sensor, ‘average drag’ as a sensor value moves towards the threshold and as a sample statistic, it handles outlier skewing of the data set better than the mean statistic.
From a software engineering perspective it fits a triple redundant architecture with processing a simple comparison and selection task, important for real time control applications.
2. Highly Optimised Tolerance (HOT) systems are ‘robust yet fragile’. Robust to anticipated (designed for) perturbations but fragile, or vulnerable, to rare events, design flaws and unanticipated perturbations (Carlson, Doyle 2002).
3. By vulnerable I mean that it can produce an extreme value error which has catastrophic consequences.
Bass, J.M., Latif-Shabgahi, G., Bennett, S., Experimental Comparison of Voting Algorithms in Cases of Disagreement, Euromicro, pp.516, 23rd EUROMICRO Conference ’97 New Frontiers of Information Technology, 1997.
Carlson, J.M., Doyle, J., Complexity and Robustness, Proc. of the National Academy of Sciences, 19 February, 2002, vol. 99 suppl. 1 pg 2545.