Saving the N-version baby out of the bath water


Tweedle Dum and Dee (Image source: Wikipedia Commons)

Revisiting the Knight, Leveson experiments

In the through the looking glass world of high integrity systems, the use of N-version programming is often touted as a means to achieve extremely lower failure rates without extensive V&V, due to the postulated independence of failure in independently developed software. Unfortunately this is hockum, as Knight and Leveson amply demonstrated with their N version experiments, but there may actually be advantages to N versioning, although not quite what the proponents of it originally expected.

Let’s turn the argument on it’s head for a moment, given that software developed by independent teams does not exhibit independent failures then a software fault in one version will be correlated to a potential fault in another. What the K-L experiments found was that these correlated faults were generally quite obscure and crucially the experiments also found that the end effects and failure rates from version to version differed significantly, as an example one version failed 231 times because of a fault but a second only 37 times due to the same fault.

This means that by running back to back comparison tests software faults in one version can point to common faults in other versions of greater obscurity and vice versa. The lack of independence somewhat ironically holds out the promise of getting to grips with subtle and hard to find faults that would otherwise require an inordinate degree of testing, analysis or use. Work by Hatton and Roberts on the accuracy of n-versioned seismic software has confirmed the utility of feeding back identified faults in one version to improve the accuracy of others.

So yes N-versioning can be useful because such versions do not fail independently, they fail in modes just sufficiently differently that their failure provides us with a tool that we can use to weed out the  obscure common faults from our software.