A reader of this blog might be aware of both the difference between ergodic and non-ergodic risks, and how the presence of non-ergodicity (i.e. the possibility of irreversible catastrophic outcomes) undermines a key assumption on which Pascalian risk assessment is based. But what to do about it? Well one thing we can practically do is to ensure that when we assess risk we take into account the non-ergodic nature of such catastrophes. Continue Reading…
One of the great mistakes is to judge policies and programs by their intentions rather than their results
Earlier this year the US Government declassified a WWII OSS field manual on sabotage. Now the Simple Sabotage Field Manual is not what you might think. No it’s not a 101 on blowing up bridges, nor is it a cookbook for how to conduct Operation Kutschera, but rather it’s aimed at a lower key sabotage of ordinary working practices inside the organisation. For example using conferences and meetings to strategically delay decision making. Nobody get kills but that new Panzer design with the Porsche turret? Well sorry Reichs Marshall it’ll be buried in design committee until about 1948. Charlie Stross went on to twitter asking for modern updates to the OSS manual, I’m not sure whether that exercise increased or decreased the net sum of human happiness, but hey, it was amusing.
Which got me to thinking, if you read the OSS manual and find that every working day seems like a text book play courtesy of the boys from Prince William Park, then shouldn’t you logically conclude that you are sitting in the middle of a war? If you see folk in your organisation regularly using moves out of the OSS play book they may not be just haplessly incompetent. If nothing else this should make you look at your daily fare of corporate hooey in a new light. So stay frosty people, and remember three times is enemy action.
About time I hear you say!🙂
Yes I’ve just rewritten a post on functional failure taxonomies to include how to use them to gauge the completeness of your analysis. This came out of a question I was asked in a workshop that went something like, ‘Ok mr big-shot consultant tell us, exactly how do we validate that our analysis is complete?’. That’s actually a fair question, standards like EUROCONTROL’s SAM Handbook and ARP 4761 tell you you ought to, but are not that helpful in the how to do it department. Hence this post.
Using a taxonomy to determine the coverage of the analysis is one approach to determining completeness. The other is to perform at least two analyses using different techniques and then compare the overlap of hazards using a capture/recapture technique. If there’s a high degree of overlap you can be confident there’s only a small hidden population of hazards as yet unidentified. If there’s a very low overlap, you may have a problem.
The 15 commandments of the god of the machine
Herewith, are the 15 commandments for thine safety critical software as spoken by the machine god unto his prophet Kopetz.
- Thou shalt regard the system safety case as thy tabernacle of safety and derive thine critical software failure modes and requirements from it.
- Thou shalt adopt a fundamentally safe architecture and define thy fault tolerance hypothesis as part of this. Even unto the definition of fault containment regions, their modes of failure and likelihood.
- Thine fault tolerance shall include start-up operating and shutdown states
- Thine system shall be partitioned to ‘divide and conquer’ the design. Yea such partitioning shall include the precise specification of component interfaces by time and value such that all manner of men shall comprehend them
- Thine project team shall develop a consistent model of time and state for even unto the concept of states and fault recovery by voting is the definition of time important.
- Yea even though thou hast selected a safety architecture pleasing to the lord, yet it is but a house built upon the sand, if no ‘programming in the small’ error detection and fault recovery is provided.
- Thou shall ensure that errors are contained and do not propagate through the system for a error idly propagated to a service interface is displeasing to the lord god of safety and invalidates your righteous claims of independence.
- Thou shall ensure independent channels and components do not have common mode failures for it is said that homogenous redundant channels protect only from random hardware failures neither from the common external cause such as EMI or power loss, nor from the common software design fault.
- Thine voting software shall follow the self-confidence principle for it is said that if the self-confidence principle is observed then a correct FCR will always make the correct decision under the assumption of a single faulty FCR, and only a faulty FCR will make false decisions.
- Thou shall hide and separate thy fault-tolerance mechanisms so that they do not introduce fear, doubt and further design errors unto the developers of the application code.
- Thou shall design your system for diagnosis for it is said that even a righteously designed fault tolerant system my hide such faults from view whereas thy systems maintainers must replace the affected LRU.
- Thine interfaces shall be helpful and forgive the operator his errors neither shall thine system dump the problem in the operators lap without prior warning of impending doom.
- Thine software shall record every single anomaly for your lord god requires that every anomaly observed during operation must be investigated until a root cause is defined
- Though shall mitigate further hazards introduced by your design decisions for better it is that you not program in C++ yet still is it righteous to prevent the dangling of thine pointers and memory leaks
- Though shall develop a consistent fault recovery strategy such that even in the face of violations of your fault hypothesis thine system shall restart and never give up.
Dispatches from the cyber-front
Interesting episode on the ABC’s Four Corners program this monday that discloses more about the ongoing attacks against government computer networks. Four Corners sources confirmed that, as I predicted at the time, the Bureau of Meteorology infiltration was a beach head operation to allow further attacks on higher value government targets (such as the Australian Geospatial-Intelligence Organisation and Intelligence/Surveillance assets such as the JORN system). OK, smug mode off. Continue Reading…