Just attended the Australian System Safety Conference, the venue was the Customs House right on River. Lots of excellent speakers and interesting papers, I enjoyed Drew Rae’s on tribalism in system safety particularly.  The keynotes on resilience by John Bergstrom and cyber-security by Chris Johnson were also very good. I gave a presentation on the use of MIL-STD-882 as a tool for demonstrating compliance to the WHS Act, a subject that only a mother could love. Favourite moment? Watching the attendees faces when I told them that 61508 didn’t comply with the law. :)

Thanks again to Kate Thomson and John Davies for reviewing the legal aspects of my paper. Much appreciated guys.

Just added a short case study on the Patriot software timing error to the software safety micro course page. Peter Ladkin has also subjected the accident to a Why Because Analysis.

iVote_Logo

The best defence of a secure system is openness

Ensuring the security of high consequence systems rests fundamentally upon the organisation that sustains that system. Thus organisational dysfunction can and does manifest itself as an inability to deal with security in an effective fashion. To that end the ‘shoot the messenger’ approach of the NSW Electoral Commission to reports of security flaws in the iVote electronic voting system does not bode well for that organisation’s ability to deal with such challenges. Continue Reading…

The Electronic Frontier Foundation reports that a flaw in the iVote system developed by the NSW Electoral Commission meant that up to 66,000 online votes, were vulnerable to online attack. Michigan Computer Science Professor J. Alex Halderman and University of Melbourne Research Fellow Vanessa Teague, who had previously predicted problems, found a weakness that would have allowed an untraceable man in the middle attack. The untraceable nature of that attack is important and we’ll get back to it. Continue Reading…

rocket-landing-attempt (Image source- Space X)

How to make rocket landings a bit safer easier

No one should underestimate how difficult landing a booster rocket is, let alone onto a robot barge that’s sitting in the ocean. The booster has to decelerate to a landing speed on a hatful of fuel, then maintain a fixed orientation to the deck while it descends, all the while counteracting the dynamic effects of a tall thin flexible airframe, fuel slosh, c of g changes, wind and finally landing gear bounce when you do hit. It’s enough to make an autopilot cry. Continue Reading…

Once again my hometown of Newcastle is being battered by fierce winds and storms, in yet another ‘storm of the century’, the scene above is just around the corner from my place in the inner city suburb of Cooks Hill. We’re now into our our second day of category two cyclonic winds and rain with many parts of the city flooded, and without power. Dungog a small town to the North of us is cut off and several houses have been swept off their piers there, three deaths are reported. My 8 minute walk to work this morning was an adventure to say the least.

Here’s a companion tutorial to the one on integrity level partitioning. This addresses more general software hazards and how to deal with them. Again you can find a more permanent link on my publications page. Enjoy :)

A short tutorial on the architectural principles of integrity level partitioning,  I wrote this a while ago, but the fundamentals remain the same. Partitioning is a very powerful design technique but if you apply it you also need to be aware that it can interact with all sorts of other system design attributes, like scheduling and fault tolerance to name but two.

The material is drawn from may different sources, which unfortunately at the time I didn’t reference, so all I can do is offer a general acknowledgement here. You can also find a more permanent link to the tutorial on my publications page.

The GAO has released its audit report on the FAA’s NextGen Air Traffic Management system. The reports a good read and gives an excellent insight into how difficult Cybersecurity can be across a national infrastructure program, like really, really difficult. At least they’re not trying to integrate military and civilian airspaces at the same time :)

My analogy is that on the cyber security front we’re effectively asking the FAA to hold a boulder over its head for the next five years or so without dropping it. And if security isn’t built into the DNA of NextGen?  Well I leave it you dear reader to ponder the implications  of that, in this ever more connected world of ours.

In celebration of upgrading the site to WP Premium here’s some gratuitous eye candy :)

A little more seriously, PIO is one of those problems that, contrary to what the name might imply, requires one to treat the aircraft and pilot as a single control system.

The enigmatic face of HAL

The problem of people

The Hal effect, named after the eponymous anti-hero of Stanley Kubrick and Arthur C. Clarke’s film 2001, is the tendency for designers to implicitly embed their cultural biases into automation. While such biases are undoubtedly a very uncertain guide it might also be worthwhile to look at the 2001 Odyssey mission from Hal’s perspective for a moment. Continue Reading…

Comet (Image source: Public domain)

Amidst all the online soul searching, and pontificating about how to deal with the problem of suicide by airliner, which is let’s face it still a very, very small risk, what you are unlikely to find is any real consideration of how we have arrived at this pass. There is as it turns out a simple one word answer, and that word is efficiency, the dirty little secret of the aviation industry. Continue Reading…

A pilots view of the Germanwings disaster

Quality must be considered as embracing all factors which contribute to reliable and safe operation. What is needed is an atmosphere, a subtle attitude, an uncompromising insistence on excellence, as well as a healthy pessimism in technical matters, a pessimism which offsets the normal human tendency to expect that everything will come out right and that no accident can be foreseen — and forestalled — before it happens

Adm. Hyman Rickover (Father of the USN’s Atomic Fleet)

Data and Goliath

27/03/2015 — 1 Comment

Data and Goliath (Image Source: Bruce Schneier website)

Bruce Schneier has a new book out on the battle underway for the soul of the surveillance society, why privacy is important and a few modest proposals on how to prevent us inadvertently selling our metadata birthright. You can find a description, reviews and more on the book’s website here. Currently sitting number six on the NYT’s non-fiction book list. Recommend it.

Postscript

New Scientist has posted an online review of Bruce’s book here

Or how to avoid the secret police reading your mail

Yaay! Our glorious government of Oceania has just passed the Data Retention Act 2015 with the support of the oh so loyal opposition. The dynamics of this is that both parties believe that ‘security’ is what’s called here in Oceania a ‘wedge’ issue so they strive to outdo each other in pandering to the demands of our erstwhile secret secret police, lest the other side gain political capital from taking a tougher position. It’s the political example of an evolutionary arms race with each cycle of legislation becoming more and more extreme.

As a result telco’s here are required to keep your metadata for three years so that the secret police can paw through the electronic equivalent of your rubbish bin any time they choose. For those who go ‘metadata huh?’ metadata is all the add on information that goes with your communications via the interwebz, like where your email went, and where you were when you made a call at 1.33 am in the morning to your mother, so just like your rubbish bin it can tell the secret police an awful lot about you, especially when you knit it up with other information.  Continue Reading…

Another A320 crash

25/03/2015 — 4 Comments

Germanwings crash (Image source: AFP)

The Germanwings A320 crash

At this stage there’s not more that can be said about the particulars of this tragedy that has claimed a 150 lives in a mountainous corner of France. Disturbingly again we have an A320 aircraft descending rapidly and apparently out of control, without the crew having any time to issue a distress call. Yet more disturbing is the though that the crash might be due to the crew failing to carry out the workaround for two blocked AoA probes promulgated in this Emergency Airworthiness Directive (EAD) that was issued in December of last year. And, as the final and rather unpleasant icing on this particular cake, there is the followup question as to whether the problem covered by the directive might also have been a causal factor in the AirAsia flight 8501 crash. That, if it be the case, would be very, very nasty indeed.

Unfortunately at this stage the answer to all of the above questions is that no one knows the answer, especially as the Indonesian investigators have declined to issue any further information on the causes of the Air Asia crash. However what we can be sure of is that given the highly dependable nature of aircraft systems the answer when it comes will comprise an apparently unlikely combinations of events, actions and circumstance, because that is the nature of accidents that occur in high dependability systems. One thing that’s also for sure, there’ll be little sleep in Toulouse until the FDRs are recovered, and maybe not much after that….

Postscript

if having read the EAD your’e left wondering why it directed that two ADR’s be turned off it’s simply that by doing so you push the aircraft out of what’s called Normal law, where Alpha protection is trying to drive the nose down, into Alternate law, where the (erroneous) Alpha protection is removed. Of course in order to do so you need to be able to recognise, diagnose and apply the correct action, which also generally requires training.

James and Werner

Wernher von Braun and James Webb (Image source: NASA)

The more things change, the more they stay the same…

The Saturn second stage was built by North American Aviation at its plant at Seal Beach, California, shipped to NASA’s Marshall Space Flight Center, Huntsville, Alabama, and there tested to ensure that it met contract specifications. Problems developed on this piece of the Saturn effort and Wernher von Braun began intensive investigations. Essentially his engineers completely disassembled and examined every part of every stage delivered by North American to ensure no defects. This was an enormously expensive and time-consuming process, grinding the stage’s production schedule almost to a standstill and jeopardizing the Presidential timetable.

When this happened Webb told von Braun to desist, adding that “We’ve got to trust American industry.” The issue came to a showdown at a meeting where the Marshall rocket team was asked to explain its extreme measures. While doing so, one of the engineers produced a rag and told Webb that “this is what we find in this stuff.” The contractors, the Marshall engineers believed, required extensive oversight to ensure they produced the highest quality work.

And if Marshall hadn’t been so persnickety about quality? Well have a look at the post Apollo 1 fire accident investigation for the results of sacrificing quality (and safety) on the alter of schedule.

Reference

Apollo: A Retrospective Analysis, Roger D. Launius, July 1994, quoted in “This Is What We Find In This Stuff: A Designer Engineer’s View”,  Presentation, Rich Katz, Grunt Engineer NASA Office of Logic Design, FY2005 Software/Complex Electronic Hardware Standardization Conference, Norfolk, Virginia July 26-28, 2005.

MH370 underwater search area map (Image source- Australian Govt)

Bayes and the search for MH370

We are now approximately 60% of the way through searching the MH370 search area, and so far nothing. Which is unfortunate because as the search goes on the cost continues to go up for the taxpayer (and yes I am one of those). What’s more unfortunate, and not a little annoying, is that that through all this the ATSB continues to stonily ignore the use of a powerful search technique that’s been used to find everything from lost nuclear submarines to the wreckage of passenger aircraft.  Continue Reading…

Here’s an interesting graph that compares Class A mishap rates for USN manned aviation (pretty much from float plane to Super-Hornet) against the USAF’s drone programs. Interesting that both programs steadily track down decade by decade, even in the absence of formal system safety programs for most of the time (1).

USN Manned Aviation vs USAF Drones

The USAF drone program start out with around the 60 mishaps per 100,000 flight hour rate (equivalent to the USN transitioning to fast jets at the close of the 1940s) and maintains a steeper decrease rate that the USN aviation program. As a result while the USAF drones program is tail chasing the USN it still looks like it’ll hit parity with the USN sometime in the 2040s.

So why is the USAF drone program doing better in pulling down the accident rate, even when they don’t have a formal MIL-STD-882 safety program?

Well for one a higher degree of automation does have comparitive advantages. Although the USN’s carrier aircraft can do auto-land, they generally choose not to, as pilot’s need to keep their professional skills up, and human error during landing/takeoff inevitably drives the mishap rate up. Therefore a simple thing like implementing an auto-land function for drones (landing a drone is as it turns out not easy) has a comparatively greater bang for your safety buck. There’s also inherently higher risks of loss of control and mid air collision when air combat manoeuvring, or running into things when flying helicopters at low level which are operational hazards that drones generally don’t have to worry about.

For another, the development cycle for drones tends to be quicker than manned aviation, and drones have a ‘some what’ looser certification regime, so improvements from the next generation of drone design tend to roll into an expanding operational fleet more quickly. Having a higher cycle rate also helps retain and sustain the corporate memory of the design teams.

Finally there’s the lessons learned effect. With drones the hazards usually don’t need to be identified and then characterised. In contrast with the early days of jet age naval aviation the hazards drone face are usually well understood with well understood solutions, and whether these are addressed effectively has more to do with programmatic cost concerns than a lack of understanding. Conversely when it actually comes time to do something like put de-icing onto a drone, there’s a whole lot of experience that can be brought to bear with a very good chance of first time success.

A final question. Looking at the above do we think that the application of rigorous ‘FAA like’ processes or standards like ARP 4761, ARP 4754 and DO-178 would really improve matters?

Hmmm… maybe not a lot.

Notes

1. As a historical note while the F-14 program had the first USN aircraft system safety program (it was a small scale contractor in house effort) it was actually the F/A-18 which had the first customer mandated and funded system safety program per MIL-STD-882. USAF drone programs have not had formal system safety programs, as far as I’m aware.
Continue Reading…

For those of you interested in such things, there’s an interesting thread running over on the Safety Critical Mail List at Bielefeld on software failure. Sparked off by Peter Ladkin’s post over on Abnormal Distribution on the same subject. Whether software can be said to fail and whether you can use the term reliability to describe it is one of those strange attractors about which the list tends to orbit. An interesting discussion, although at times I did think we were playing a variant of Wittgenstein’s definition game.

And my opinion? Glad you asked.

Yes of course software fails. That it’s failure is not the same as the pseudo-random failure that we posit to hardware components is neither here nor there. Continue Reading…

The reasonable person is not any particular person or an average person… The reasonable person looks before he leaps, never pets a strange dog, waits for the airplane to come to a complete stop at the gate before unbuckling his seatbelt, and otherwise engages in the type of cautious conduct that annoys the rest of us… “This excellent but odious character stands like a monument in our courts of justice, vainly appealing to his fellow citizens to order their lives after his own example.”

J.M. Feinman, on the reasonable person (2010)

It is better to be vaguely right than exactly wrong

Carveth Read (1898) Logic, deductive and inductive

SR-71 flight instruments (Image source: triddle)

How a invention that flew on the SR-71 could help commercial aviation today 

In a previous post on unusual attitude I talked about the use of pitch ladders as a means of providing greater attensity to aircraft attitude as well as a better indication of what the aircraft is dong, having entered into it. There are of course still disadvantages with this because such data in a commercial aircraft is usually presented ‘eyes down’, and in high stress, high workload situations it can be difficult to maintain an instrument scan pattern. There is however an alternative, and one that has a number of allied advantages. Continue Reading…

R101 crash (Image source: public domain)

The R-101 is as safe as a house, except for the millionth chance. (Comment made shortly before boarding the doomed airship headed to India on its first real proving flight, 4 October 1930. The day before he had made his will.)

Lord Thomson, Secretary of State for Air

Unreliable airspeed events pose a significant challenge (and safety risk) because such situations throw onto aircrew the most difficult (and error prone) of human cognitive tasks, that of ‘understanding’ a novel situation. This results in a double whammy for unreliable airspeed incidents. That is the likelihood of an error in ‘understanding’ is far greater than any other error type, and having made that sort of error it’s highly likely that it’s going to be a fatal one. Continue Reading…

risky shift

What the?

14/02/2015 — Leave a comment

In case you’re wondering what’s going on dear reader, human factors can be a bit dry, and the occasional poster style blog posts you may have noted is my attempt to hydrate the subject a little. The continuing series can be found on the page imaginatively titled Human error in pictures, and who knows someone may find it useful…

Fatigue

An interesting little exposition of the current state of the practice in information risk management using the metaphor of the bald tire on the FAIR wiki. The authors observe that there’s much more shamanistic ritual (dressed up as ‘best practice’) than we’d like to think in risk assessment. A statement that I endorse, actually I think it’s mummery for the most part, but ehem, don’t tell the kids.

Their two fold point. First that while experience and intuition are vital, on their own they give little grip to critical examination. Second that if you want to manage you must measure, and to measure you need to define.

A disclaimer, I’m neither familiar with or a proponent of the FAIR tool, and I strongly doubt as to whether we can ever put risk management onto a truly scientific footing, much like engineering there’s more art than artifice, but it’s an interesting commentary nonetheless.

I give it 011 out 101 tooled up script kiddies.

The chess board is the world, the pieces are the phenomena of the universe, the rules of the game are what we call the laws of Nature. The player on the other side is hidden from us. We know that his play is always fair, just and patient. But we also know, to our cost, that he never overlooks a mistake, or makes the smallest allowance for ignorance.

Thomas Huxley

2015/02/img_4509.jpg
Why we should take the safety performance of small samples with a grain of salt

Safety when expressed quantitatively as the probability of a loss over some unit of exposure, is in effect a proportional rate. This is useful as we can compare the performance of different systems or operations when one has of operating hours, and potentially lots of accidents while another has only a few operating hours and therefore fewer accidents. Continue Reading…

Normalisation of deviance

15 Minutes

11/02/2015 — Leave a comment

Matthew Squair:

What the future of high assurance may look like, DARPA’s HACMS, open source and formal from the ground up.

Originally posted on A Critical Systems Blog:

Some of the work I lead at Galois was highlighted in the initial story on 60 Minutes last night, a spot interviewing Dan Kaufman at DARPA. I’m Galois’ principal investigator for the HACMS program, focused on building more reliable software for automobiles and aircraft and other embedded systems. The piece provides a nice overview for the general public on why software security matters and what DARPA is doing about it; HACMS is one piece of that story.

I was busy getting married when filming was scheduled, but two of my colleagues (Dylan McNamee and Pat Hickey) appear in brief cameos in the segment (don’t blink!). Good work, folks! I’m proud of my team and the work we’ve accomplished so far.

You can see more details about how we have been building better programming languages for embedded systems and using them to build unpiloted air vehicle software here.

View original

A while ago, while I was working on a project that would have been based (in part) in Queensland I was asked to look at the implications of the Registered Professional Engineers Queensland act for the project, and in particular for software development. For those not familiar, the Act provides for the registration of professional engineers to practise in Queensland. If you’re not registered you can’t practice unless you’re supervised by a registered engineer. Upon registering you then become liable to a statutory Board of Professional Engineers for your professional conduct. Oh yes and practicing without coverage is a crime.

While the act is oriented squarely at the provision of professional services, don’t presume that it is solely the concern of consultancies.  Continue Reading…

Exceptional violation

description error

Plan continuation bias

The important thing is to stop lying to yourself. A man who lies to himself, and believes his own lies, becomes unable to recognise the truth, either in himself or in anyone else.

Fyodor Dostoyevskiy

Work-Health-and-Safety

I’ll give you a hint it’s not pretty

Current Australian rail and workplace safety legislation requires that safety risks be either eliminated, or if that’s not possible be reduced, ‘so far as is reasonably practical’. The intent is to ensure that all reasonable practicable precautions are in place, not to achieve some target level of risk.

There are two elements to what is ‘reasonably practicable’. A duty-holder must first consider what can be done – that is, what is possible in the circumstances for ensuring health and safety. They must then consider whether it is reasonable, in the circumstances to do all that is possible. This means that what can be done should be done unless it is reasonable in the circumstances for the duty-holder to do something less.

Worksafe  Australia

This is a real and intractable problem for standards that determine the degree of effort applied to treat a hazard using an initial assessment of risk (1). Nor can the legislation be put aside through appeals to such formalisms as the ALARP principle, or the invocation of a standard such as AS 61508 (2). In essence if it is practical to do something, regardless of the degree of risk, then something should be done.  Continue Reading…

2015/02/img_4472.jpg

An interesting article from The Conversation on the semiotics of the Doomsday clock. Continue Reading…

Screwtape(Image source: end time info)

A short (and possibly evil) treatise on SILs from our guest blogger

May I introduce myself? The name’s Screwtape, some of you might have heard of me from that short and nasty book by C.S. Lewis. All lies of course, and I would know, about lies that is… baboom tish! Anyway the world has moved on and I’m sure that you’d be completely unsurprised to hear that I’ve branched out into software consulting now. I do find the software industry one that is oh so over-ripe for the plucking of immortal souls, ah but I digress. Your good host has asked me here today to render a few words on the question of risk based safety integrity levels and how to turn such pesky ideals, akin in many ways to those other notions of christian virtue, to your own ends. Continue Reading…

AirAsia QZ8501 CVR (Image source - AP Photo-Achmad Ibrahim)

Stall warning and Alternate law

According to an investigator from Indonesia’s National Transportation Safety Committee (NTSC) several alarms, including the stall warning, could be heard going off on the Cockpit Voice Recorder’s tape.

Now why is that so significant?

Continue Reading…

Aviation is in itself not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity or neglect.

 

Captain A. G. Lamplugh, British Aviation Insurance Group, London, 1930s.

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here's an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 32,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 12 sold-out performances for that many people to see it.

Click here to see the complete report.

Sharks (Image source: Darren Pateman)

Practical risk management, or why I love living in Australia

We’re into the ninth day of closed beaches here with two large great whites spotted ‘patrolling our shores’, whatever that means. Of course in Australia closed doesn’t actually mean the beaches are padlocked, not yet anyway. We just put a sign up and people can make their own minds up as to whether they wish to run the risk of being bitten. In my books a sensible approach to the issue, one that balances societal responsibility with personal freedom. I mean it’s not like they’re as dangerous as bicycles Continue Reading…

A short digression on who vs whom, neatly illustrating why writing requirements in natural english can be so damn difficult… I also love the idea of spider fastballs :)

I was cleaning out my (metaphorical) sock drawer and came across this rough guide to the workings of the Australian Defence standard on software safety DEF(AUST) 5679. The guide was written around 2006 for Issue 1 of the standard, although many of the issues it discussed persisted into Issue 2, which hit the streets in 2008.

DEF (AUST) 5679 is an interesting standard, one can see that the authors, Tony Cant amongst them, put a lot of thought into the methodology behind the standard, unfortunately it’s suffered from a failure to achieve large scale adoption and usage.

So here’s my thoughts at the time on how to actually use the standard to best advantage, I also threw in some concepts on how to deal with xOTS components within the DEF (AUST) 5679 framework.

Enjoy :)

Indonesian AirNav radar screenshot (Image source: Aviation Herald)

So what did happen? 

While the media ‘knows’ that the aircraft climbed steeply before rapidly descending, we should remember that this supposition relies on the self reported altitude and speed of the aircraft. So we should be cautious about presuming that what we see on a radar screen is actually what happened to the aircraft. There are of course also disturbing similarities to the circumstances in which Air France AF447 was lost, yet at this moment all they are are similarities. One things for sure though, there’ll be little sleep in Toulouse until the FDRs are recovered.