How is the productivity of the various stages of the system development cycle affected when formal methods are used?

(as compared to when more traditional engineering methods are used)

Main -> FAQ -> QI-PQAM-2


 * Theme: Quality Improvement (QI)
 * Role: PQAM

Answer
It is generally admitted that the use of formal method in the whole development process impacts the productivity of the various development stages, and switches the workload from late testing phase to earlier modelling phases. The typical figure below illustrates this transfer of time spent in each phase of development. It compares the initial development program of a satellite control system and the (very successful) formal methods development of a rework of that software.



This transfer can be explained as follows:


 * Developing a formal model takes time. So more effort is required for the earlier phases than when following a traditional development process. The quality of the produced design artefact is however not comparable: formal methods force one to develop precise models which are quite valuable when they are passed to the later development stages, as they include a lot fewer ambiguities and incompleteness. This has been consistently reported by deployment associate of the DEPLOY project, and by a large survey on formal methods . This phenomenon is however hard to quantify.


 * Globally, the productivity is heavily related to the error detection rate. It is well known that the later an error is detected, the more expensive it is to correct. To illustrate this, the figure just above presents cost of correcting a requirements defect according to the stage at which it is discovered . So avoiding errors or detecting them at an early stage will have a large positive impact and explains the strong decrease of effort required on the later testing phase. Moreover some formal methods can even give the assurance of the absence of errors in components, totally removing the necessity to perform specific class of tests (e.g. unit tests). See also the FAQ QI-PQAM-1 for additional information on error detection rate.



On a finer grained scale, the productivity of using formal methods also depends on the formal method in use. The following aspects must be taken into account:
 * The degree of automation of the formal method: some formal method require proofs to be elaborated. They might require a lot of guidance to demonstrate properties of interests on the proof target. Other systems can be run non-interactively, such as model-checkers. There is also a set of more-less automated tools. For the Event-B method, proofs must be developed, but, depending on the care paid to the elaboration of the model with respect to proving process, the prover might be able to discover up to 90% and more of the proofs to be done.
 * The degree of integration of the formal method into the development process. A formal method that directly intake some artefact of the development process will deliver better productivity results than a formal method requiring a specific model to be developed.

Finally, formal methods can have an important change in process (depending on the way they are introduced, see FAQ "Control Impact of Formalism" for more details). Hence they can have an important "first project" effect, where the productivity of the first project is quite lower than the average productivity to be expected in the long run. Therefore, A few projects should be considered before drawing conclusions about global productivity changes.

Some representative examples
We discuss here some formal method used at various stages of development process.
 * Polyspace: Polyspace is a tool for finding bugs in source code. It is fully automated. It allows one to detect all errors related to the use of the programming language (null pointers, overflows, etc.) Polyspace is a well-known commercial product that is being widely used in industry, notably in space and aeronautical sector.
 * ProB success story: This reports on the use of a business-specific tool to detect some error in an artefact that is developed in the development process. The tool was fully automated.
 * Event-B: Event-B requires one to develop a specific model including some state machines, and specifying the behavioural requirements formally.
 * Model-checking technology: Model-checking is quite intuitive to users as one represents a state machine is some executable language, and then tests exhaustively this state machine against some behavioural requirements expressed in some logics.
 * SCR: the SCR method has been developed by the U.S. Navy in as an answer to the general statement that detecting errors earlier would cut down the total development cost . SCR is therefore specifically devoted to the proper identification of requirements through simulation and validation of state machine by end users. Requirements engineering is the earliest phase of any development process. SCR is furthermore able to perform simple checks on the forlmal models such as determinism and coverness (there is something sait for each possible combination of input).

Also, the success story "Productivity Improvement of Data Consistency in Transportation Models" presents a case where an ad-hoc tool could be developed to automate some domain-specific verification process, that was previously carried by hand. By hand, the process required one man-month; the tool brought this to one minute of processing. This example shows that developing an ad-hoc tool might dramatically cut on the implementation delays, although it requires very specific expert knowledge, which are different from the competences of formal method user.

A few comparative studies
An experimental project has been conducted at NASA to compare several V&V techniques on the same project. They compared classical testing, run time analysis, static analysis through abstract interpretation and model checking with the JavaPathFinder Tool. Their observations were as follows:
 * The advanced tools performed very well on concurrency related errors, where traditional testing often performs bad due to the non-reproducibility of errors and the difficulty to direct the test toward one synchronization or another.
 * Runtime analysis is a light-weight technique that produces very few spurious errors, and should always be one of the first techniques to be used on new code.
 * Model checking requires some initial start-up costs, for example in coming up with suitable abstractions, but once this has been done it can be reused (thus amortizing the cost over the rest of the verification phase).
 * Using abstractions during model checking forces one to achieve a very good understanding of the code to determine whether errors are spurious or not.
 * A major weakness of black-box testing is that it doesn't typically provide enough information to diagnose the cause of an error. Runtime monitoring/analysis also suffers from this problem, but to a lesser extent, since the monitoring of the events allows for a partial trace to the error to be observable. Model checking gives a precise trace, but it might be spurious if abstractions were used.

A comparison has been made between two projects of similar size aiming at developing automatic train protection (ATP) systems for metro applications. One was done with classical development process involving manual documentation and coding while the other involved the use of Simulink, code generation, and Polyspace. It is not a fully formal approach, but it already involves some use of formalism and modelling. The outcome was that the cost of formal modelling was 30% higher than manual coding. This workload increase is partly due to to the fact that graphic editing is inherently slower than textual editing. Nevertheless, this greater effort was compensated by the cost reduction of the verification process which was 70% cheaper than the verification of manual coding, and by the increased confidence on the product safety and quality.

In a survey on formal methods, there are four times more report of timing improving than timing worsening through the use of formal methods. It is also reproted that estimating the timing is difficult because the timing gain spans on the whole development process, as discussed earlier in this FAQ.