Why robust system design has unexpected and strong implications for system operators
Recent work in complexity and robustness theory for engineered systems has highlighted that the architecture with which these systems are designed inherently leads to ‘robust yet fragile’ behaviour (Carlson, Doyle 2002) and are characterised as exhibiting Highly Optimised Tolerance (HOT).
A HOT system is ‘robust’ in the sense that the system can provide reliable behaviour in the face of unreliable internal components and expected environmental perturbations. But a HOT system is also ‘fragile’ in the sense that it exhibits a hyper-sensitivity to unexpected environmental conditions or design flaws with this sensitivity being inherently linked to the robustness structures of the system itself (1).
This vulnerability has strong implications for the human operator when he or she is expected to intervene in response to the failure of system as in HOT systems operator intervention will be in response to epistemic failures (2) of the system rather than random failures of components. If this is the case then for HOT systems traditional skill and rule based training for operators may simply be missing the point as when a HOT system fails, its failure will be because of un-anticipated causes or unique environmental conditions demanding a knowledge rather than rule or skill based response.
A BEA study (BEA 2009) of aircrew responses to thirteen separate unreliable airspeed incidents offers a case in point. In the majority of the cases crews, while identifying an unreliable airspeed event, did not automatically respond with the ‘canned’ procedure. So what were they doing? I believe they recognised (perhaps only unconsciously) that the situation was unprecedented and were applying their experience and knowledge to think through the problem they faced.
The problem of course is that one of the (unstated) objectives of automation is to reduce the skill levels and experience required of operators.So we appear to have entered into a vicious spiral were HOT system failures require high levels of operator knowledge and associated problem solving skills, but at the same time allowing lesser skilled and experienced operators to use the system.
Perhaps it is time to re-imagine the role of the operator in highly automated robust systems.
1. This is closely aligned to the observation that in modern systems the cause of accidents is no longer ‘component failure’ but by the interaction of components (Leveson 1995).
2. Such failures may be due to design flaws and unanticipated changes in the assumed operating environment.
Carlson, J.M., Doyle, J., Complexity and Robustness, Proc. of the National Academy of Sciences, 19 February, 2002, vol. 99 suppl. 1 pg 2545.
BEA, Interim report no. 2, on the accident on 1st June 2009, to the Airbus A330-203, registered F-GZCP, operated by Air France flight AF 447 Rio de Janeiro – Paris, Report Number f-cp090601ae2, November 2009.
Leveson, N.G., Safeware: System Safety and Computers, Addison-Wesley Pub., 1995.