Is there any guidance on the cost/benefit trade-off of using different validation techniques?

(considering non-formal (testing) vs formal and also between formal methods)

Main -> FAQ -> EM-PQAM-3


 * Theme: Exploiting Models (EM)
 * Role: PQAM

Answer
This FAQ is centred on validation technologies, strictly speaking including testing, proofs, etc. Beyond the technology, one must also encompass other dimensions when elaborating a validation methodology, including:
 * The validation strategy: what is validated at which step of the development process
 * The risks in the development process: one must consider the errors that might be introduced at each step of the development process. Suppose a step is performed automatically, through a well-established tool, it might be more trustworthy than if it were performed by human developers.
 * The indirect cost of a failed validation: a failed validation generally triggers some corrective process, which is increasingly expensive as the process evolves, and as the defect gets older with respect to the development steps. This is discussed in dedicated FAQ's: QI-PQAM-1 and QI-PQAM-2.
 * The direct cost of the validation technology: some technology fit better at some point in the development process. For instance, validating the behaviour through exhaustive symbolic exploration is better performed on state machine models than on source code because state machine formalism abstracts away the implementation technology, which is not the target of validation in this case.

Considering formal validation techniques, each formal method might have specific costs and benefits. These not only encompass the benefits and costs about the quality of the validation, but also practical aspects including for instance the integration of the validation into the development flow. We therefore present in this FAQ the possible costs and benefits that might be brought by formal methods, to enable development teams weight the trade-off on a case by case basis. Formal methods all have short term costs and benefits due to the process change; we only focus here on the long term effect, not on the transient one.

Validation by Testing
Testing is the most widespread validation technique. In a traditional development process, it is known to be resource consuming with sometimes more than half the effort spend on the testing phase depending on the targeted level of assurance.

Before detailing the cost issues, it is important to notice that among the validation techniques, only testing can be performed on the final composite system (software/hardware + physical parts). Other validation techniques rely on the analysis of some software design artefact, which might be a model from which the executable code is derived or generated, or the source code itself. Only testing can be performed on the final platform. For instance, consider an ABS system. This system involves some software, hardware, and mechanical parts, and relies on physical laws such as the elasticity of tires. Only a practical test will ensure that the overall system works properly. This is an important benefit. Notice however that the difficulty of reaching a proper level of coverage grows with the complexity of the system under test, so that relying on final testing as the only validation might not deliver a very high level of assurance, depending on the system under test.

Testing requires the development of test cases. The level of assurance delivered by testing will be as good as the test cases. When one relies on testing to validate a system, the validation will be as exhaustive as the test cases are. This is generally specified using some coverage criteria (functional coverage, code coverage like statements, branches or more complex like Modified Condition/Decision Coverage). Tools can help in measure the coverage criteria and identify missing tests. Reaching full coverage might be difficult as some behaviour cannot be tested, for instance, due to non-determinism in the tested artefact. Moreover even with a 100% percent coverage, testing can only show the presence of error and not demonstrate their absence.

On the practical side, testing also requires some test infrastructure to run the test cases on the tested artefact, and to validate the outputs of the tested artefact, possibly with the help of a test oracle, that also needs to be developed. This adds to the costs.

Some cost reduction factors are the following:
 * Tests can evolve with software versions/variants (software product lines) with lower incremental costs while keeping the benefits of checking for non-regression.
 * Another emerging approach reconciling testing and formal methods is to generate the test cases from a model. This can be automated and efficiently tool supported, generally using some exploration technique. This approach is called Model-Based Testing (MBT) - see our DEPLOY success story Benefits of Model-Based Test Automation). An extra benefit is that the generation can guarantee some quality criteria on the test cases like the coverage or the length of the generated test cases. Notice however that although it decreases the cost of test case generation, there is an additional cost on developing the model from which test cases will be generated and the specific tool support (note that such tool are less subject to qualification/certification as they are not in the development chain). This could be decreased if a formal model is already developed although MBT might require some specific models.

Formal Validation Approaches
Formal validation approaches relies on reasoning techniques able to analyse behaviour without executing it (i.e. using static analysis). For instance, a verification of source code based on abstract interpretation will envision all possible execution paths, including execution paths that cannot be controlled by any testing-based approach for instance non-determinism, such as thread or interrupt synchronization, or algorithmic non-determinism.

Depending on the chosen formal method, model development can be more-less expensive but the benefits are generally in proportion too. This is discussed in a dedicated FAQ QI-PQAM-2. Two factors must be considered, namely: how far is the model from the document or artefact from which it is to be developed, and how complex is the modelling language itself. For instance, one can compare three different formal methods associates with different artefacts:
 * Source code: The Polyspace tool applies abstract interpretation techniques to verify source code . The cost is to adapt the source code to the few requirements of Polyspace. The benefits are the guarantee of the absence of specific kinds of bugs.
 * A communication protocol: the SPIN tool applies model-checking techniques for protocol verification . Developing the model in the Promela language of a communication protocol is not very expensive: it is the rather straightforward a transcription of the protocol in a practical modelling language. The benefit is the guarantee of specific properties of the protocol such as example the absence of deadlock, functional assurance of message order, delivery...
 * System models: the Rodin toolset relied on automated proving techniques . The development of Event-B models requires mastering some refinement tactics, and that must match some good practice to be easily amenable to proof. In this case the costs are higher but also the benefits as complex properties can be proved.

To cope with the limitation of translating the validation from the analysed artefact to the final product, there are two conditions:
 * The verified artefact is properly translated in the development process. This can include compilation chain for source code, development process transforming models into source code, or event generic libraries or hardware that are used in the system.
 * The approximations that are done in the context of the verification are still valid in the concrete environment. This is discussed extensively in a specific FAQ G-EA-1.

About the tooling: when one deploys such approaches, one must decide on the level of trust that should be granted to the tool. Consider a formal tool that automatically checks some artefact for errors, whatever errors means. Suppose that after a long verification run, the tool returns a laconic answer "No error". The level of assurance on gains about this result depends on the tool, and it is common sense that the more complex is a tool, the more likely there is a bug in it. Some other tools deliver an explanation; consider proof-based technology; they deliver a proof that can be checked by humans, at least partially, or checked redundantly by another prover. This is further discussed in the FAQ about certification ExFac-HM-1.

In the rest of the section, we detail more specific techniques.

A) Exhaustive Static Exploration
Models can be explored exhaustively provided they are finite, for instance through temporal model-checking technology. For instance, tools like can explore B and Event-B specification on finite domains and check for a range of errors, including deadlocks and temporal logic properties. NuSMV can exhaustively explore the behaviour of a finite state machine, and verify intricate timing conditions on it.

Some tool can check source code for language-related defects. They generally rely on abstract interpretation approach, relying on sound approximations to cover all possible behaviours, at the price of raising false positive. Consider for instance Polyspace, Asree or the Global C Surveyor. Static analysis will probably be more exhaustive than software testing. Formal validation approaches are conceived to cover all and every possible behaviour within its search bound, which must be carefully examined in a validation approach.

Unfortunately, such approaches seldom offer the possibility to verify the exploration. Abstract interpretation is an exception in this field, in that one can sometime browse through the source code annotated by the results of the analysis. For instance, is one considers the exhaustive exploration of the behaviours exhibited by a set of state machines as performed by NuSMV; one can only trust the tool for the proven claim.

B) Limited Static Exploration
Some tool can perform a bounded validation. For instance, Alloy searches for counterexample to a first order claim. No counterexample is not a proof that there is no counter-example out of the verification bound that has been set for the verification. Another example is the NuSMV model checker that might be used to search for counterexample traces of limited length.

There are two possible rationales behind the use of bounded verification; they are generally not present at the same time:
 * To make the verification that would require human guidance amenable to automated validation. Alloy is an example of this
 * To make the verification tractable as it wound otherwise be intractable or at least too time consuming. typically, one can also use limited validation as a cheap first step before initiating exhaustive search validation that is more time-consuming.

Of course, limited verification delivers a more limited assurance. Yet, one must compare the depth of the verification to the average length of test case, and consider that the verification is exhaustive within this bound.

C) Proofs
Proof-based approach will always at some point require human guidance, due to inherent limitation of such technologies. The amount of guidance necessary for such tool depends on the size and prove-ability of the model. On the other hand, they provide an even greater level of assurance than the space exploration-based technique because the constructed proof can be reviewed by a human, or cross-checked by a redundant prover.

Conclusions
This section recapitulated some example of validation technologies and target, together with their advantages and limitations.

For more information on how testing and formal methods can be combined, see.