Why the risk matrix?
For new systems we generally do not have statistical data on accidents, and high consequence events are, we hope, quite rare leaving us with a paucity of information. So we usually end up basing any risk assessment upon low base rate data, and having to fall back upon some form of subjective (and qualitative) method of risk assessment.
Risk matrices were developed to guide such qualitative risk assessments and decision making, and the form of these matrices is based on a mix of decision and classical risk theory. The matrix is widely described in safety and risk literature and has become one of the less questioned staples of risk management.
Despite this there are plenty of poorly constructed and ill thought out risk matrices out there, in both the literature and standards, and many users remain unaware of the degree of epistemic uncertainty that the use of a risk matrix introduces. So this post attempts to establish some basic principles of construction as an aid to improving the state of practice and understanding.
Creating the Matrix
In 1711 Abraham De Moivre came up with the mathematical definition of risk as:
The Risk of losing any sum is the reverse of Expectation; and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss.
Abraham de Moivre, De Mensura Sortis, 1711 in the Ph. Trans. of the Royal Society
Based on de Moivre’s definition we can define a series of curves that represent the same risk level (termed iso-risk contours) on a two dimensional surface. While this is mathematically correct it’s difficult for people to use in making qualitative evaluations. So decision theorists took the iso-risk contours (actually the log of these curves) and zoned them into judgementally tractable cells (or bins) to form the risk matrix.
In a risk matrix each cell notionally represent a point on the iso-risk curve and steps in the matrix define the edges of ‘risk zones’. We also usually plot the curve using log log axes which provide straight line contours.
This binning is intended to make qualitative decisions as to severity or likelihood more tractable as human beings find it easier to select qualitative values from amongst such bins.
But unfortunately binning also introduces ambiguity into the risk assessment, if you look at the example given above you’ll see that the iso-risk contour runs through the diagonal of the cell, so ‘off diagonal’ the cell risk is lesser or greater depending on which side of the contour your looking at.
So be aware that when you bin a continuous risk contour you pay for easier decision making with an increase in epistemic uncertainty over the resultant risk assessment.
Scaling likelihood and severity
The next problem that faces us in developing a risk matrix is assigning a scale to the severity and likelihood bins. If we have a linear (log log) plot then the value of each succeeding bin’s median point should go up by an order of magnitude. So in qualitative terms we would then define ‘probable’ in the example above to be 10 times as likely as ‘remote’, ‘catastrophic’ 10 times as severe as ‘critical’ and so on.
There are two good reasons to adopt such a scaling. The first is that it’s an established technique to avoid qualitative under-estimation of values. This is because people find it easier to discriminate between values separated by an order of magnitude than by a linear scale. The second is that if we have a linear iso-risk matrix then (by definition) the scales must also be logarithmic to comply with De Moivre’s equation. In fact they should always have a negative slope of one.
Unfortunately, you’ll also find plenty of example linear risk contour matrices with non-logarithmic axes that violate this rule, for example the Australian standard for risk management AS 4360 uses such an example. While such ill formed matrices may reflect a conscious decision making strategy, for example sensitivity to extreme severities, they don’t reflect De Moivre’s theorem and the classical definition of risk.
Another decision with designing a risk matrices is how many cells to have, too few and the evaluation is too granular, too many and the decision making becomes bogged down in detailed discrimination (which as noted above is hardly warranted). The usual happy compromise sits at around 5 to 7 bins on the vertical and horizontal, which for a log log scale will give a wide field of discrimination.
The flatland problem and the semiotics of colour
The problem with a matrix is that it can only represent two dimensions of information on one graph. Thus a simple matrix may allow us to communicate risk as a function of frequency (F) and severity (S) but we still need a way to graphically associate decision rules with the identified risk zones. The traditional method adopted to do this is to use colour to designate the specific risk zones and a colour key the associated action required. As the example above illustrates the various decision zones of risk are colour coded, with the highest risks being given the most alarming colours.
While colour has inherently a strong semiotic meaning, and one intuitively understood by both expert and lay person alike, there is also a potential trap in that by setting the priorities using such a strong code we are subtly forcing an imperative. In these circumstances a colourised matrix can become a tool of persuasion rather than one of communications (Roth 2012).
One must therefore carefully consider whether the matrix is intended as a tool to assist in decision making, where a degree of persuasion may be appropriate, or whether it is one intended to assist in more open ended communication, discussion and dialogue. If the answer is the latter then other forms of visualisation may be more appropriate.
Ensure (weak) consistency of ordering
A properly formed matrices risk zones should also exhibit what is called ‘weak consistency’, that is the qualitative ordering of risk as defined by the various (coloured) zones and their cells ranks various risks (form high to low) in roughly the same way that a quantitative analysis would do so (Cox 2008).
In practical terms what this means is that if you find there are areas of the matrix where the same effort will produce a greater or lesser improvement of risk when compared to another area (Clements 96) you have a problem of construction. You should also (in principal) never be able to jump across two risk decision zones in one movement.
For example if a safety device reduces the likelihood of occurrence of a hazard by a defined factor we wouldn’t expect the risk reduction to depend upon the severity. This exemplifies the problem of inconsistency that such breakdowns in consistency introduce.
Dealing with boundary conditions
In some standards, such as MIL-STD-882, an upper arbitrary bound may be placed on severity, in which case what happens when a severity exceeds the ‘allowable’ threshold? For example, should we discriminate between a risk to more than one person versus one to a single person? For specific domains this may not be a problem, but for others where mass casualt events are a possibility it may well be of concern.
In this case if we don’t wish to add columns to the matrix we may define a sub-cell within our matrix to reflect that this is a higher than we thought level of risk. Alternatively we could define the severity for the ‘I’ column as defining a range of severities whose values include at the median point the mandated severity. So for example the catastrophic hazard bin range would be from 1 fatality to 10 fatalities.
Looking at likelihood one should also include a likelihood of ‘impossible’ so that risks that have been removed can be recorded rather than deleted. Just because we’ve retired a hazard today, doesn’t mean a subsequent design change or design error won’t resurrect the hazard to haunt us.
Calibrating the risk matrix
Risk matrices are used to make decisions about risk, and their acceptability. But how do we calibrate the risk matrix to represent an understandably acceptable risk? One way is to pick a specific bin and establish a calibrating risk scenario for it, usually drawn from the real world, for which we can argue the risk is considered broadly acceptable by society (Clements 96).
So in the matrix above we could pick cell IE and equate that to an acceptable real world risk that could result in the death of a person (a catastrophic loss). For example, ‘the risk of death in an motor vehicle accident on the way from and to your work on main arterial roads under all weather conditions cumulatively over a 25 year working career‘. This establishes the edge of the acceptable risk zone by definition and allows use to define other risk zones.
In general it’s always a good idea to provide a description of what each qualitative bin means so that people understand the meaning. If you need to one can also include numerical ranges for likelihood and severity, such as the loss values in dollars, numbers of injuries sustained and so on.
Define your exposure
One should also consider and define the number of units, people or systems exposed, clearly there is a difference between the cumulative risk posed by say one aircraft and a fleet of one hundred aircraft in service or between one bus and a thousand cars.
What may be acceptable at an individual level (for example road accident) may not be acceptable at an aggregated or societal level and risk curves may need to be adjusted accordingly. MIL-STD-882C offers a simple example of this approach.
And then define your duration
Finally and perhaps most importantly you always need to define the duration of exposure for likelihood. Without it the statement is at best meaningless and at worst misleading as different readers will have different interpretations. A 1 in 100 probability of loss over 25 years of life is a very different risk to a 1 in 100 over a single 4 hour mission.
A risk matrix is about making decisions so it needs to support the user in that regard, but, it’s use as part of a risk assessment should not be seen as a means of acquitting a duty of care. The principle of ‘so far as is reasonable practicable’ cares very little about risk in the first instance, asking only whether it would be considered reasonable to acquit the hazard. Risk assessments belong at the back end of a program when, despite our efforts we have residual risks to consider as part of evaluating our efforts in achieving a reasonably practical level of safety. A fact that modern decision makers should keep in mind.
Clements, P. Sverdrup System Safety Course Notes, 1996.
Cox, L.A. Jr., ‘What’s Wrong with Risk Matrices?’, Risk Analysis, Vol. 28, No. 2, 2008.
Cox, S., Tait, R., Safety, Reliability & Risk Management, 2nd Ed., Butterworth, Heinemann, 1998.
MIL-STD-882, System Safety Program Requirements.
Leveson, N., System Safety and Computers – A Guide to Preventing Accidents and Losses Caused by Technology, Addison Wesley, 1995.
Roth, Florian., Focal Report 9: Risk Analysis Visualizing Risk: The Use of Graphical Elements in Risk Analysis and Communications, Risk and Resilience Research Group Center for Security Studies (CSS), Zürich 2012.