Challenger: A failure in communication

26/08/2015 — 9 Comments

Icicles on the launch tower (Image source: NASA)

An uneasy truth about the Challenger disaster

The story of Challenger in the public imagination could be summed up as ”’heroic’ engineers versus ’wicked’ managers”, which is a powerful myth but unfortunately just a myth. In reality? Well the reality is more complex and the causes of the decision to launch rest in part upon the failure of the participating engineers in the launch decision to clearly communicate the risks involved. Yes that’s right, the engineers screwed up in the first instance.

In the ideal, technical risk communication is the presentation of complex results and conclusions in a clear and unambiguous fashion. Unfortunately for the Challenger launch decision this was anything but the case. The meeting to discuss delaying the launch due to cold temperature was itself late to start and then chaotic in execution, among the problems:

  • the use of slide charts to present complex data,
  • said charts being made up in ‘real time’ prior to the meeting,
  • the participants having to digest the information ‘in-meeting’,
  • little attention paid to how the engineers presented data with many slides hand written,
  • not one chart presented O-ring erosion and temp together, and (perhaps most importantly),
  • not one chart referred to temperature data on all flights.

In contrast below is a single chart presenting the flight temperature that O-rings had been exposed to over the flight history of the shuttle fleet. Such a chart was not presented at the launch review, had it been then both NASA and Morton Thiokol management would have been confronted with how far outside the envelope they were taking the launch. Now I’m not going to say that the failure to communicate risk by the engineers to managers was the only reason that that a launch decision might have been made, but in an environment were the risk has not been clearly articulated it is all to easy to let other organisational objectives take precedence, see Vaughn (1996) for a fuller treatment of these.

Having observed the engineering culture at length I’ve concluded that engineers tend to fall into a view déformation professionnelle that the numbers and facts they work with are easily understood, universal in nature and thus that any conclusions they might draw are self evident truths. Having such a world view it’s a short step to a belief that no greater effort is required in communication than simply showing other folk the numbers, or worse, just your conclusions. This is probably also why engineers tend to love to use Powerpoint, and why poor technical communication (you guessed it, this time using Powerpoint) was also a causal factor in the Columbia accident. The reality, most especially in the case of risk, is that the numbers are never just the numbers and that effective technical communication is an essential element of risk management.

O ring distribution

Postscript on the 30th Anniversary of the loss of Challenger

Reflecting on Challenger, there’s one other factor that we should consider in this story and that’s the way in which risk was perceived. You see the engineers weren’t dealing with the traditional risks as expressed through the risk equals probability times consequence. No, what they were dealing with was deeper uncertainty about the design. Unfortunately if you are accustomed to view risk through that Pascalian lens of probability theory it’s hard to grasp that there are other deeper uncertainties with their attendant risks, and even harder to create a language that deals with it.

References

Rogers Commission, Report of the Presidential Commission on the Space Shuttle Challenger Accident, Chapter V: The Contributing Cause of The Accident, Washington, D.C., 1986.

Vaughn, D., the Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, University of Chicago Press, 1996.

9 responses to Challenger: A failure in communication

  1. 

    The Tang and clevis design was upside down. If the clevis was on the top, the failure of the O-Ring would still allow the gases to flow but. The actual configuration is the clevis is on the bottom, and the tang inserted from above. So when the O-ring failed the gases found an escape route

    • 
      Matthew Squair 26/08/2015 at 12:30 pm

      And the reason the booster was built in pieces was that only one company could build full length tubes, in order to allow competitive bidding per the acquisition regulations the design requirements needed to allow a fabricated solution. Thus was safety was subordinate to acquisition policy (James Oberg).

      • 

        Our NASA friends tell us if the Tang and Clevis were inverted, a blow out of the O-Ring would have not perpetrated the section. The gases would have flowed by. The O-Ring was a cushion to protect metal to metal vibration.
        http://www.gpo.gov/fdsys/pkg/GPO-CRPT-99hrpt1016/pdf/CHRG-101shrg1087-2.pdf page 184.
        Gases coming from top of page, flow out if O-Ring not effective. Invert the arrangement, gases would flow by at supersonic speed with momentum carrying most out nozzle. Only has to last for 12 minutes

      • 
        Matthew Squair 26/08/2015 at 7:29 pm

        Don’t forget the pressure testing of the primary packing seal as well.

  2. 

    I don’t think its just engineers that may be guilty of making assumptions about others’ level of technical understanding or ability to grasp ‘the bigger picture’ in an instant. All safety professionals would do well to have an understanding of commerce in order to be able to ensure that when they present important information it is done in ways which can be understood clearly enough by those charged with making decisions, that they can truly be said to ‘be informed’ when they make them. Without an understanding about how others may think and view a problem, then it is unlikely that they will ‘get’ what it is you’re trying to tell them.

    • 
      Matthew Squair 26/08/2015 at 7:28 pm

      No, you’re right all professionals suffer from this to some extent. But in my experience engineers are particularly bad at communicating with non specialists in either written or verbal form and the vast majority do not see it as part of the job description. These I describe as technicians with degrees.

      I was lucky and had basic staff skills beaten into me at an early age. :))

  3. 

    I am not sure management would just react like you predict based upon where this “O” ring temperature fit in the distribution of temperatures in previous launches. So it is away from the other temperatures, unless somebody can say specifically what that means for this flight and expresses that as a probability and severity, it would likely not be all that interesting to managers.

    • 
      Matthew Squair 01/09/2015 at 10:30 am

      Interesting, reflect on how you’ve framed the problem in terms of who has the burden of proof. I’d submit that “prove it’s safe” was exactly how management was thinking… because no one had articulated how far outside the flight envelope they were actually flying.

Trackbacks and Pingbacks:

  1. New PM Articles for the Week of August 24 – 30 - The Practicing IT Project Manager - August 31, 2015

    […] Matthew Squair maintains that the mid-flight explosion of shuttle Challenger resulted from a failure of engineers to communicate the nature of the risk. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s