Archives For Safety

The practice of safety engineering in various high consequence industries.

System Safety Fundamentals Concept Cloud

System safety course, now with more case studies and software safety!

Have just added a couple of case studies and some course notes of software hazards and integrity partitioning, because hey I know you guys love that sort of stuff :)

System Safety Fundamentals Concept Cloud

I have finally got around to putting my safety course notes up, enjoy. You can also find them off the main menu.

Feel free to read and use under the terms of the associated creative commons license. I’d note that these are course notes so I use a large amount of example material from other sources (because hey, a good example is a good example right?) and where I have a source these are acknowledged in the notes. If you think I’ve missed a citation or made an error, then let me know.

To err is human, but to really screw it up takes a team of humans and computers…

How did a state of the art cruiser operated by one of the worlds superpowers end up shooting down an innocent passenger aircraft? To answer that question (at least in part) here’s a case study that’s part of the system safety course I teach that looks at some of the casual factors in the incident.

In the immediate aftermath of this disaster there was a lot of reflection, and work done, on how humans and complex systems interact. However one question that has so far gone unasked is simply this. What if the crew of the USS Vincennes had just used the combat system as it was intended? What would have happened if they’d implemented a doctrinal ruleset that reflected the rules of engagement that they were operating under and simply let the system do its job? After all it was not the software that confused altitude with range on the display, or misused the IFF system, or was confused by track IDs being recycled… no, that was the crew.

Consider the effect that the choice of a single word can have upon the success or failure of a standard.The standard is DO-278A, and the word is, ‘approve’. DO-278 is the ground worlds version of the aviation communities DO-178 software assurance standard, intended to bring the same level of rigour to the software used for navigation and air traffic management. There’s just one tiny difference, while DO-178 use the word ‘certify’, DO-278 uses the word ‘approve’, and in that one word lies a vast difference in the effectiveness of these two standards.

DO-178C has traditionally been applied in the context of an independent certifier (such as the FAA or JAA) who does just that, certifies that the standard has been applied appropriately and that the design produced meets the standard. Certification is independent of the supplier/customer relationship, which has a number of clear advantages. First the certifying body is indifferent as to whether the applicant meets or does not meet the requirements of DO-178C so has greater credibility when certifying as they are clearly much less likely to suffer from any conflict of interest. Second, because there is one certifying agency there is consistent interpretation of the standard and the fostering and dissemination of corporate knowledge across the industry through advice from the regulator.

Turning to DO-278A we find that the term ‘approver’ has mysteriously (1) replaced the term ‘certify’. So who, you may ask, can approve? In fact what does approve mean? Well the long answer short is anyone can, approve and the term means whatever you make of it. What it usually results in is the standard invoked between a supplier and customer, with the customer acting as the ‘approver’ of the standards application. This has obvious and significant implications for the degree of trust that we can place in the approval given by the customer organisation. Unlike an independent certifying agency the customer clearly has a corporate interest in acquiring the system which may well conflict with the object of fully complying with the requirements of the standard. Give that ‘approval’ is given on a contract basis between two organisations and often cloaked in non-disclosure agreements there is also little to no opportunity for the dissemination of useful learnings as to how to meet the standard. Finally when dealing with previously developed software the question becomes not just ‘did you apply the standard?’, but also ‘who was it that actually approved your application?’ and ‘How did they actually interpret the standard?’.

So what to do about it? To my mind the unstated success factor for the original DO-178 standard was in fact the regulatory environment in which it was used. If you want DO-278A to be more than just a paper tiger then you should also put in place mechanism for independent certification. In these days of smaller government this is unlikely to involve a government regulator, but there’s no reason why (for example) the independent safety assessor concept embodied in IEC 61508 could not be applied with appropriate checks and balances (1). Until that happens though, don’t set too much store by pronouncements of compliance to DO-278.

Final thought, I’m currently renovating our house and have had to employ an independent certifier to sign off on critical parts of the works. Now if I have to do that for a home renovation, I don’t see why some national ANSP shouldn’t have to do it for their bright and shiny toys.


1. Perhaps Screwtape consultants were advising the committee. :)

2. One of the problems of how 61508 implement the ISA is that they’re still paid by the customer, which leads in turn to the agency problem. A better scheme would be an industry fund into which all players contribute and from which the ISA agent is paid.


…for my boat is so small and the ocean so huge

For a small close knit community like the submarine service the loss of a boat and it’s crew can strike doubly hard. The USN’s response to this disaster, was both effective and long lasting. Doubly impressive given it was implemented at the height of the Cold War. As part of the course that I teach on system safety I use the Thresher as an important case study in organisational failure, and recovery.


The RAN’s Collins class Subsafe program derived it’s strategic principles in large measure from the USNs original program. The successful recovery of HMAS Dechaineux from a flooding incident at depth illustrates the success of both the RANs Subsafe program and also its antecedent.

Interesting documentary on SBS about the Germanwings tragedy, if you want a deeper insight see my post on the dirty little secret of civilian aviation. By the way, the two person rule only works if both those people are alive.

What burns in Vegas…

Ladies and gentlemen you need to leave, like leave your luggage!

This has been another moment of aircraft evacuation Zen.

Lady Justice (Image source: Jongleur CC-BY-SA-3.0)

Or how I learned to stop worrying about trifles and love the Act

One of the Achilles heels of the current Australian WH&S legislation is that it provides no clear point at which you should stop caring about potential harm. While there are reasons for this, it does mean that we can end up with some theatre of the absurd moments where someone seriously proposes paper cuts as a risk of concern.

The traditional response to such claims of risk is to point out that actually the law rarely concerns itself with such trifles. Or more pragmatically, as you are highly unlikely to be prosecuted over a paper cut it’s not worth worrying about. Continue Reading…

Piece of wing found on La Réunion Island, is that could be flap of #MH370 ? Credit: reunion 1ere

Piece of wing found on La Réunion Island (Image source: reunion 1ere)

Why this bit of wreckage is unlikely to affect the outcome of the MH370 search

If this really is a flaperon from MH370 then it’s good news in a way because we could use wind and current data for the Indian ocean to determine where it might have gone into the water. That in turn could be used to update a probability map of where we think that MH370 went down, by adjusting our priors in the Bayesian search strategy. Thereby ensuring that all the information we have is fruitfully integrated into our search strategy.

Well… perhaps it could, if the ATSB were actually applying a Bayesian search strategy, but apparently they’re not. So the ATSB is unlikely to get the most out of this piece of evidence and the only real upside that I see to this is that it should shutdown most of the conspiracy nut jobs who reckoned MH370 had been spirited away to North Korea or some such. :)

The offending PCA serial cable linking the comms module to the motherboard (Image source: Billy Rios)

Hannibal ante portas!

A recent article in Wired discloses how hospital drug pumps can be hacked and the firmware controlling them modified at will. Although in theory the comms module and motherboard should be separated by an air gap, in practice there’s a serial link cunningly installed to allow firmware to be updated via the interwebz.

As the Romans found, once you’ve built a road that a legion can march down it’s entirely possible for Hannibal and his elephants to march right up it. Thus proving once again, if proof be needed, that there’s nothing really new under the sun. In a similar vein we probably won’t see any real reform in this area until someone is actually killed or injured.

This has been another Internet of Things moment of zen.

rocket-landing-attempt (Image source- Space X)

How to make rocket landings a bit safer easier

No one should underestimate how difficult landing a booster rocket is, let alone onto a robot barge that’s sitting in the ocean. The booster has to decelerate to a landing speed on a hatful of fuel, then maintain a fixed orientation to the deck while it descends, all the while counteracting the dynamic effects of a tall thin flexible airframe, fuel slosh, c of g changes, wind and finally landing gear bounce when you do hit. It’s enough to make an autopilot cry. Continue Reading…

Here’s a companion tutorial to the one on integrity level partitioning. This addresses more general software hazards and how to deal with them. Again you can find a more permanent link on my publications page. Enjoy :)

In celebration of upgrading the site to WP Premium here’s some gratuitous eye candy :)

A little more seriously, PIO is one of those problems that, contrary to what the name might imply, requires one to treat the aircraft and pilot as a single control system.

The enigmatic face of HAL

The problem of people

The Hal effect, named after the eponymous anti-hero of Stanley Kubrick and Arthur C. Clarke’s film 2001, is the tendency for designers to implicitly embed their cultural biases into automation. While such biases are undoubtedly a very uncertain guide it might also be worthwhile to look at the 2001 Odyssey mission from Hal’s perspective for a moment. Continue Reading…

Comet (Image source: Public domain)

Amidst all the online soul searching, and pontificating about how to deal with the problem of suicide by airliner, which is let’s face it still a very, very small risk, what you are unlikely to find is any real consideration of how we have arrived at this pass. There is as it turns out a simple one word answer, and that word is efficiency, the dirty little secret of the aviation industry. Continue Reading…


Another A320 crash

25/03/2015 — 4 Comments

Germanwings crash (Image source: AFP)

The Germanwings A320 crash

At this stage there’s not more that can be said about the particulars of this tragedy that has claimed a 150 lives in a mountainous corner of France. Disturbingly again we have an A320 aircraft descending rapidly and apparently out of control, without the crew having any time to issue a distress call. Yet more disturbing is the though that the crash might be due to the crew failing to carry out the workaround for two blocked AoA probes promulgated in this Emergency Airworthiness Directive (EAD) that was issued in December of last year. And, as the final and rather unpleasant icing on this particular cake, there is the followup question as to whether the problem covered by the directive might also have been a causal factor in the AirAsia flight 8501 crash. That, if it be the case, would be very, very nasty indeed.

Unfortunately at this stage the answer to all of the above questions is that no one knows the answer, especially as the Indonesian investigators have declined to issue any further information on the causes of the Air Asia crash. However what we can be sure of is that given the highly dependable nature of aircraft systems the answer when it comes will comprise an apparently unlikely combinations of events, actions and circumstance, because that is the nature of accidents that occur in high dependability systems. One thing that’s also for sure, there’ll be little sleep in Toulouse until the FDRs are recovered, and maybe not much after that….


if having read the EAD your’e left wondering why it directed that two ADR’s be turned off it’s simply that by doing so you push the aircraft out of what’s called Normal law, where Alpha protection is trying to drive the nose down, into Alternate law, where the (erroneous) Alpha protection is removed. Of course in order to do so you need to be able to recognise, diagnose and apply the correct action, which also generally requires training.

MH370 underwater search area map (Image source- Australian Govt)

Bayes and the search for MH370

We are now approximately 60% of the way through searching the MH370 search area, and so far nothing. Which is unfortunate because as the search goes on the cost continues to go up for the taxpayer (and yes I am one of those). What’s more unfortunate, and not a little annoying, is that that through all this the ATSB continues to stonily ignore the use of a powerful search technique that’s been used to find everything from lost nuclear submarines to the wreckage of passenger aircraft.  Continue Reading…

Here’s an interesting graph that compares Class A mishap rates for USN manned aviation (pretty much from float plane to Super-Hornet) against the USAF’s drone programs. Interesting that both programs steadily track down decade by decade, even in the absence of formal system safety programs for most of the time (1).

USN Manned Aviation vs USAF Drones

The USAF drone program start out with around the 60 mishaps per 100,000 flight hour rate (equivalent to the USN transitioning to fast jets at the close of the 1940s) and maintains a steeper decrease rate that the USN aviation program. As a result while the USAF drones program is tail chasing the USN it still looks like it’ll hit parity with the USN sometime in the 2040s.

So why is the USAF drone program doing better in pulling down the accident rate, even when they don’t have a formal MIL-STD-882 safety program?

Well for one a higher degree of automation does have comparitive advantages. Although the USN’s carrier aircraft can do auto-land, they generally choose not to, as pilot’s need to keep their professional skills up, and human error during landing/takeoff inevitably drives the mishap rate up. Therefore a simple thing like implementing an auto-land function for drones (landing a drone is as it turns out not easy) has a comparatively greater bang for your safety buck. There’s also inherently higher risks of loss of control and mid air collision when air combat manoeuvring, or running into things when flying helicopters at low level which are operational hazards that drones generally don’t have to worry about.

For another, the development cycle for drones tends to be quicker than manned aviation, and drones have a ‘some what’ looser certification regime, so improvements from the next generation of drone design tend to roll into an expanding operational fleet more quickly. Having a higher cycle rate also helps retain and sustain the corporate memory of the design teams.

Finally there’s the lessons learned effect. With drones the hazards usually don’t need to be identified and then characterised. In contrast with the early days of jet age naval aviation the hazards drone face are usually well understood with well understood solutions, and whether these are addressed effectively has more to do with programmatic cost concerns than a lack of understanding. Conversely when it actually comes time to do something like put de-icing onto a drone, there’s a whole lot of experience that can be brought to bear with a very good chance of first time success.

A final question. Looking at the above do we think that the application of rigorous ‘FAA like’ processes or standards like ARP 4761, ARP 4754 and DO-178 would really improve matters?

Hmmm… maybe not a lot.


1. As a historical note while the F-14 program had the first USN aircraft system safety program (it was a small scale contractor in house effort) it was actually the F/A-18 which had the first customer mandated and funded system safety program per MIL-STD-882. USAF drone programs have not had formal system safety programs, as far as I’m aware.
Continue Reading…

SR-71 flight instruments (Image source: triddle)

How a invention that flew on the SR-71 could help commercial aviation today 

In a previous post on unusual attitude I talked about the use of pitch ladders as a means of providing greater attensity to aircraft attitude as well as a better indication of what the aircraft is dong, having entered into it. There are, of course, still disadvantages to this because such data in a commercial aircraft is usually presented ‘eyes down’, and in high stress, high workload situations it can be difficult to maintain an instrument scan pattern. There is however an alternative, and one that has a number of allied advantages. Continue Reading…

Unreliable airspeed events pose a significant challenge (and safety risk) because such situations throw onto aircrew the most difficult (and error prone) of human cognitive tasks, that of ‘understanding’ a novel situation. This results in a double whammy for unreliable airspeed incidents. That is the likelihood of an error in ‘understanding’ is far greater than any other error type, and having made that sort of error it’s highly likely that it’s going to be a fatal one. Continue Reading…

A while ago, while I was working on a project that would have been based (in part) in Queensland I was asked to look at the implications of the Registered Professional Engineers Queensland act for the project, and in particular for software development. For those not familiar, the Act provides for the registration of professional engineers to practise in Queensland. If you’re not registered you can’t practice unless you’re supervised by a registered engineer. Upon registering you then become liable to a statutory Board of Professional Engineers for your professional conduct. Oh yes and practicing without coverage is a crime.

While the act is oriented squarely at the provision of professional services, don’t presume that it is solely the concern of consultancies.  Continue Reading…

AirAsia QZ8501 CVR (Image source - AP Photo-Achmad Ibrahim)

Stall warning and Alternate law

This post is part of the Airbus aircraft family and system safety thread.

According to an investigator from Indonesia’s National Transportation Safety Committee (NTSC) several alarms, including the stall warning, could be heard going off on the Cockpit Voice Recorder’s tape.

Now why is that so significant?

Continue Reading…

Aviation is in itself not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity or neglect.


Captain A. G. Lamplugh, British Aviation Insurance Group, London, 1930s.

I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.

DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.

So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.

Enjoy :)

Indonesian AirNav radar screenshot (Image source: Aviation Herald)

So what did happen?

This post is part of the Airbus aircraft family and system safety thread

While the media ‘knows’ that the aircraft climbed steeply before rapidly descending, we should remember that this supposition relies on the self reported altitude and speed of the aircraft. So we should be cautious about presuming that what we see on a radar screen is actually what happened to the aircraft. There are of course also disturbing similarities to the circumstances in which Air France AF447 was lost, yet at this moment all they are are similarities. One things for sure though, there’ll be little sleep in Toulouse until the FDRs are recovered.

Boeing 787-8 N787BA cockpit (Image source: Alex Beltyukov CC BY-SA 3.0)

The Dreamliner and the Network

Big complicated technologies are rarely (perhaps never) developed by one organisation. Instead they’re a patchwork quilt of individual systems which are developed by domain experts, with the whole being stitched together by a single authority/agency. This practice is nothing new, it’s been around since the earliest days of the cybernetic era, it’s a classic tool that organisations and engineers use to deal with industrial scale design tasks (1). But what is different is that we no longer design systems, and systems of systems, as loose federations of entities. We now think of and design our systems as networks, and thus our system of systems have become a ‘network of networks’ that exhibit much greater degrees of interdependence.

Continue Reading…

787 Battery after fire (Image source: NTSB)

The NTSB have released their final report on the Boeing 787 Dreamliner Li-Ion battery fires. The report makes interesting reading, but for me the most telling point is summarised in conclusion seven, which I quote below.

Conclusion 7. Boeing’s electrical power system safety assessment did not consider the most severe effects of a cell internal short circuit and include requirements to mitigate related risks, and the review of the assessment by Boeing authorized representatives and Federal Aviation Administration certification engineers did not reveal this deficiency.

NTSB/AIR-14/01  (p78 )

In other words Boeing got themselves into a position with their safety assessment where their ‘assumed worst case’ was much less worse case than the reality. This failure to imagine the worst ensured that when they aggressively weight optimised the battery design instead of thermally optimising it, the risks they were actually running were unwittingly so much higher.

The first principal is that you must not fool yourself, and that you are the easiest person to fool

Richard P. Feynman

I’m also thinking that the behaviour of Boeing is consistent with what McDermid et al, calls probative blindness. That is, the safety activities that were conducted were intended to comply with regulatory requirements rather than actually determine what hazards existed and their risk.

… there is a high level of corporate confidence in the safety of the [Nimrod aircraft]. However, the lack of structured evidence to support this confidence clearly requires rectifying, in order to meet forthcoming legislation and to achieve compliance.

Nimrod Safety Management Plan 2002 (1)

As the quote from the Nimrod program deftly illustrates, often (2) safety analyses are conducted simply to confirm what we already ‘know’ that the system is safe, non-probative if you will. In these circumstances the objective is compliance with the regulations rather than to generate evidence that our system is unsafe. In such circumstances doing more or better safety analysis is unlikely to prevent an accident because the evidence will not cause beliefs to change, belief it seems is a powerful thing.

The Boeing battery saga also illustrates how much regulators like the FAA actually rely on the technical competence of those being regulated, and how fragile that regulatory relationship is when it comes to dealing with the safety of emerging technologies.


1. As quoted in Probative Blindness: How Safety Activity can fail to Update Beliefs about Safety, A J Rae*, J A McDermid, R D Alexander, M Nicholson (IET SSCS Conference 2014).

2. Actually in aerospace I’d assert that it’s normal practice to carry out hazard analyses simply to comply with a regulatory requirement. As far as the organisation commissioning them is concerned the results are going to tell them what they know already, that the system is safe.

Here’s a short tutorial I put together (in a bit of a rush) about the ‘mechanics’ of producing compliance finding as part of the ADF’s Airworthiness Regime. Hopefully this will be of assistance to anyone faced with the task of making compliance findings, managing the compliance finding process or dealing with the ADF airworthiness certification ‘beast’.

The tutorial is a mix of how to think about and judge evidence, drawing upon legal principles, and how to use practical argumentation models to structure the finding. No Dempster Shafer logic yet, perhaps in the next tutorial.

Anyway, hope you enjoy it. :)


A report issued by the US Chemical Safety Board on Monday entitled “Regulatory Report: Chevron Richmond Refinery Pipe Rupture and Fire,” calls on California to make changes to the way it manages process safety.

The report is worth a read as it looks at various regulatory regimes in a fairly balanced fashion. A strong independent competent regulator is seen as a key factor for success by the reports authors, regardless of the regulatory mechanisms. I don’t however think the evidence is as strong as the report makes out that safety case/goal based safety regimes perform ‘all that better’ than other regulatory regimes. Would have also been nice if they’d compared and contrasted against other industries, like aviation.

Cassini Descent Module (Image source: NASA)

When is an interlock not an interlock?

I was working on an interface problem the other day. The problem related to how to judge when a payload (attached to a carrier bus like) had left the parent (like the Huygens lander leaving the Cassini spacecraft above). Now I could use what’s called the ‘interlock interface’ which is a discrete ‘loop back’ that runs through the bus to payload connector then turns around and heads back into the bus again. The interlock interface is there to provides a means for the carriers avionics to determine if the payload is electrically mated to the bus. So should I use this as an indication that the payload has left the carrier bus as well? Well maybe not.

Continue Reading…

Midlands hotel

A quick report from sunny Manchester, where I’m attending the IET’s annual combined conference on system safety and cyber security. Day one of the conference proper and I got to be lead off with the first keynote. I was thinking about getting everyone to do some Tai Chii to limber up (maybe next year). Thanks once again to Dr Carl Sandom for inviting me over, it was a pleasure. I just hope the audience felt the same way. :)

Continue Reading…

Tenerife disaster moments after the impact

TCAS, emergent properties and risk trade-offs

There’s been some comment from various regulator’s regarding the use of Traffic Collision Avoidance System (TCAS) on the ground, experience shows that TCAS is sometimes turned on and off at the same time as the Mode S transponder. Eurocontrol doesn’t like it and is quite explicit about their dislike, ‘do not use it while taxiing’ they say, likewise the FAA also states that you should ‘minimise use on ground’. There are legitimate reasons for this dislike, having too many TCAS transponders operating within a specific area can degrade system performance as well as potentially interfering with airport ground radars. And as the FAA point out operating with the AD-B transponder on will also ensure that the aircraft is visible to ATC and other ADS-B (in) equipped aircraft (1). Which leaves us with the question, why are aircrew using TCAS on the ground? Is it because it’s just easy enough to turn on at the push back? Or is there another reason?

Continue Reading…

Interesting article on old school rail safety and lessons for the modern nuclear industry. As a somewhat ironic addendum the early nuclear industry safety studies also overlooked the risks posed by large inventories of fuel rods on site, the then assumption being that they’d be shipped off to a reprocessing facility as soon as possible, it’s hard to predict the future. :)

Right hand AoA probes (Image source: ATSB)

When good voting algorithms go bad

Thinking about the QF72 incident, it struck me that average value based voting methods are based on the calculation of a population statistic. Now population statistics work well when the population is normally distributed, or otherwise clustered around some value. But if the distribution has heavy tails, we can expect that extreme values will occur fairly regularly and therefor the ‘average’ value means much less. In fact for some distributions we may not be able to put a cap on the upper value that an ‘average’ could be, e.g. it could have an infinite value and the idea of an average is therefore meaningless.

Continue Reading…

An interesting post by Mike Thicke over at Cloud Chamber on the potential use of prediction markets to predict the location of MH370. Prediction markets integrate ‘diffused’ knowledge using a market mechanism to derive a predicted likelihood, essentially market prices are assigned to various outcomes and are treated as analogs of their likelihood. Market trading then established what the market ‘thinks’ is the value of each outcome. The technique has a long and colourful history, but it does seem to work. As an aside prediction markets are still predicting a No vote in the upcoming referendum on Scottish Independence despite recent polls to the contrary.

Returning to the MH370 saga, if the ATSB is not intending to use a Bayesian search plan then one could in principle crowd source the effort through such a prediction market. One could run the market in a dynamic fashion with the market prices updating as new information comes in from the ongoing search. Any investors out there?

MH370 underwater search area map (Image source- Australian Govt)

Just saw a sound bite of our Prime Minister reiterating that we’ll spare no expense to find MH370. Throwing money is one thing, but I’m kind of hoping that the ATSB will pull it’s finger out of it’s bureaucratic ass and actually apply the best search methods to the search. Unkind? Perhaps, but then maybe the families of the lost deserve the best that we can do…

Enshrined in Australia’s current workplace health and safety legislation is the principle of ‘So Far As Is Reasonably Practicable’. In essence SFAIRP requires you to eliminate or to reduce risk to a negligible level as is (surprise) reasonably practicable. While there’s been a lot of commentary on the increased requirements for diligence (read industry moaning and groaning) there’s been little or no consideration of what is the ‘theory of risk’ that backs this legislative principle and how it shapes the current legislation, let alone whether for good or ill. So I thought I’d take a stab at it. :) Continue Reading…

Finding MH370

26/08/2014 — 1 Comment

MH370 underwater search area map (Image source- Australian Govt)

Finding MH370 is going to be a bitch

The aircraft has gone down in an area which is the undersea equivalent of the eastern slopes of the Rockies, well before anyone mapped them. Add to that a search area of thousands of square kilometres in about an isolated a spot as you can imagine, a search zone interpolated from satellite pings and you can see that it’s going to be tough.

Continue Reading…


On Artificial Intelligence as Ethical Prosthesis

Out here in the grim meat-hook present of Reaper missions and Predator drone strikes we’re already well down track to a future in which decisions as to who lives and who dies are made less and less by human beings, and more and more by automation.

Continue Reading…

Waaay back in 2002 Chris Holloway wrote a paper that used a fictional civil court case involving the hazardous failure of software to show that much of the expertise and received wisdom of software engineering was, using the standards of the US federal judiciary, junky and at best opinion based.

Rereading the transcripts of Phillip Koopman, and Michael Barr in the 2013 Toyota spaghetti monster case I am struck both by how little things have changed and how far actual state of the industry can be from state of the practice, let alone state of the art. Life recapitulates art I guess, though not in a good way.

Tweedle Dum and Dee (Image source: Wikipedia Commons)

Revisiting the Knight, Leveson experiments

In the through the looking glass world of high integrity systems, the use of N-version programming is often touted as a means to achieve extremely lower failure rates without extensive V&V, due to the postulated independence of failure in independently developed software. Unfortunately this is hockum, as Knight and Leveson amply demonstrated with their N version experiments, but there may actually be advantages to N versioning, although not quite what the proponents of it originally expected.

Continue Reading…

Hazard checklists

06/07/2014 — 1 Comment

As I had to throw together an example checklist for a course I’m running, here it is. I’ve also given a little bit of a commentary on the use, advantages and disadvantages of checklists as well. Enjoy. :)


The DEF STAN 00-55 monster is back!!

That’s right, moves are afoot to reboot the cancelled UK MOD standard for software safety, DEF STAN 00-55. See the UK SCSC’s Event Diary for an opportunity to meet and greet the writers. They’ll have the standard up for an initial look on-line sometime in July as we well, so stay posted.

Continue Reading…

Cleveland street train overrun (Image source: ATSB)

The final ATSB report, sloppy and flawed

The ATSB has released it’s final report into the Cleveland street overrun and it’s disappointing, at least when it comes to how and why a buffer stop that actually materially contributed to an overrun came to be installed at Cleveland street station. I wasn’t greatly impressed by their preliminary report and asked some questions of the ATSB at the time (their response was polite but not terribly forthcoming) so I decided to see what the final report was like before sitting in judgement.

Continue Reading…

NASA safety handbook cover

Way, way back in 2011 NASA published the first volume of their planned two volume epic on system safety titled strangely enough “NASA System Safety Handbook Volume 1, System Safety Framework and Concepts for Implementation“, catchy eh?

Continue Reading…

Current practices in formal safety argument notation such as Goal Structuring Notation (GSN) or Cause Argument Evidence (CAE) rely on the practical argument model developed by the philosopher Toulmin (1958). Toulmin focused on the justification aspects of arguments rather than the inferential and developed a model of these ‘real world’ argument based on facts, conclusions, warrants, backing and qualifier entities.

Using Toulmin’s model from evidence one can draw a conclusion, as long as it is warranted. Said warrant being possibly supported by additional backing, and possibly contingent upon some qualifying statement. Importantly one of the qualifier elements in practical arguments is what Toulmin called a ‘rebuttal’, that is some form of legitimate constraint that may be placed on the conclusion drawn, we’ll get to why that’s important in a second.

Toulmin Argumentation Example

You see Toulmin developed his model so that one could actually analyse an argument, that is argument in the verb sense of, ‘we are having a safety argument’. Formal safety arguments in safety cases however are inherently advocacy positions, and the rebuttal part of Toulmin’s model finds no part in them. In the noun world of safety cases, argument is used in the sense of, ‘there is the 12 volume safety argument on the shelf’, and if the object is to produce something rather than discuss then there’s no need for a claim and rebuttal pairing is there?

In fact you won’t find an explicit rebuttal form in either GSN or CAE as far as I am aware, it seem that the very ‘idea’ of rebuttal has been pruned from the language of both. Of course it’s hard to express a concept if you don’t have the word for it, nice little example of how language form can control the conversation. Language is power so they say.


Well I can’t believe I’m saying this but those happy clappers of the software development world, the proponents of Agile, Scrum and the like might (grits teeth), actually, have a point. At least when it comes to the development of novel software systems in circumstances of uncertainty, and possibly even for high assurance systems.

Continue Reading…

For those interested, here’s a draft of the ‘Fundamentals of system safety‘ module from a course that I teach on system safety. Of course if you want the full effect, you’ll just have to come along. :)