Reference: Berleant, D. and B. Liu, Is Sensitivity Analysis More Fault
Tolerant than Point Prediction?, 1997 Society for Computer Simulation
Western Multiconference - Medical Sciences Simulation Conference,
Jan. 12-15, Phoenix, AZ.
Is Sensitivity Analysis More Fault Tolerant than Point Prediction?
Daniel Berleant
Byron Liu
Electrical and
Computer Engineering
3215 Coover
Hall
Iowa State University
Ames, IA 50011
berleant@iastate.edu
Keywords:
software engineering, testing,
fault tolerance, mutation, verification.
A fault (``bug'') can cause software to crash catastrophically. Or it
can have no apparent effect at all. Here, we concentrate on an
intermediate possibility: a fault may degrade the performance of a
software system to a quantitatively measurable extent. In particular,
we address software for which the intended output is numerical, as is
the case with simulation models. For these systems, faults may degrade
output by making it different from its nominal value. Clearly, the
less severe this degradation is for a given output of a given program,
the more trustworthy the output, since almost any significant software
system has faults. This paper investigates the severity of output
degradation due to faults.
Our findings suggest that when fault-induced degradation is present,
point prediction degradation tends to be more severe than sensitivity
analysis degradation. Our findings also suggest that the sign of a
sensitivity analysis prediction is usually not reversed by faults.
Our findings have important practical implications for the
interpretation of simulation outputs since simulation programs, like
software systems in general, usually contain faults.
BRIEF SYNOPSIS OF THE RESEARCH AREA
Software mutation. To investigate the
effects of faults, we mutated the software system under
investigation, that is, we modified it by creating new faults
in it. Software mutation work has also been described in
the software testability literature, where it is used to find input
data with good fault detection coverage (cf. Friedman and Voas 1995).
Software fault tolerance. Most work in the area of software
fault tolerance has categorized program operation dichotomously as
acceptable or unacceptable. As an illustration, fault tolerance is
typically associated with the concept of reliability, which refers to
the probability that a working software system will continue working
throughout a given time period, where the software system is
considered to be either working or not working at any given time. As
another illustration, software performability work (e.g. Goseva et
al. 1995) deals with the fact of degrees of failure of
software, like the present work, but these degrees are then classified
into the 2 categories of ``acceptable'' and ``unacceptable'' In
contrast, here ``acceptable'' can mean something both more severe than
a trivial malfunction, and less severe than a catastrophic failure.
This work. The present work deals with software mutation, like
the mutation based testing literature, and with characterizing
degraded system performance, like the software performability
literature. However, this work distinguishes itself in that mutations
are used, not to determine software testability, but to determine
software fault tolerance, and this fault tolerance is measured by
characterizing system performance along a continuous scale rather than
dichotomously.
OBJECT OF THE STUDY
This study addresses two related questions concerning the fault
tolerance of results produced by simulation programs with faults.
- Are sensitivity analysis predictions more fault tolerant than
point predictions? And,
- Are sensitivity analysis prediction signs (plus or minus)
fault tolerant?
EXPERIMENTAL TECHNIQUE
We experimented on IMAP3 ( Interactive Model for AIDS
Prediction 3, Goforth and Berleant 1994), an epidemiological
simulation program which predicts US HIV infections, AIDS cases, and
cumulative deaths from AIDS on a yearly basis ending in the year
2016. The entire interactive program, with numerous screens of input
numbers describing various epidemiological parameters and populations
is fairly large, consisting of 782 kilobytes of executable code. The
source code for the simulation core contains 504 individual runnable C
statements. We mutated the system by deleting statements from the
simulation core. We deleted each statement in turn, running the
program for each deletion. This required running the program 505
times, once for each deletion condition plus once for the unmutated
system. A program was written to automate much of this process (Liu
1996).
For each deleted statement, we tabulated the effect of the deletion
on:
Note that the sensitivity analysis code was not itself mutated. Rather
the model core, which was mutated, is called by the sensitivity
analysis module.
RESULTS
First the main results are described. Then the findings are presented
and discussed, followed by the conclusions.
Description of Main Results
- Point prediction. For the mutated versions, the median
deviation from the nominal prediction for the chosen output (HIV cases
in the year 2016) was: 1.85% .
This was over the 147 mutated versions that both ran (the others
failed to run), and for which the deviation was nonzero (in other
words, for which the mutation affected the output prediction). The 87
mutations which ran and had no apparent effect were omitted from the
median calculation.
- Sensitivity analysis prediction. The median deviation
from the nominal prediction for the chosen sensitivity analysis output
(percent change in HIV cases in 2016 that would be induced by a 9%
increase in male condom usage) was: 1.34% .
This result was based on the same 147 mutated versions, and is less
than the aforementioned median deviation in point predictions of
1.85%.
- Sensitivity analysis prediction sign. The unmutated
program predicted that increased condom use would decrease the number
of future HIV cases. Of the same 147 mutated versions as above, only
two (1.36%) incorrectly predicted an increase. Another 140 of them
(95.24%) correctly predicted a decrease, and the remaining five
(3.40%) incorrectly predicted no change (that is, they predicted zero
sensitivity). The prediction of no change is less dangerous an error
than a reversed sign since zero sensitivity in a parameter that a
program is explicitly expected to handle is a strong clue (which could
be detected automatically) that a serious fault exists.
We compared the point prediction deviation to the corresponding
sensitivity prediction deviation for each mutated version in
turn. This analysis showed that the point prediction deviation was
greater than the sensitivity prediction deviation 91 times, less than
56 times, and both were zero and equal 87 times.
Findings, Discussion & Conclusions
Findings. For this experiment, the quality of the output
produced by mutated programs was noticeably better for sensitivity
analysis prediction than for point prediction, and sensitivity
analysis prediction sign was fairly robust.
Discussion.
The IMAP software system is intended for use in testing hypotheses
related to the US HIV epidemic. Uses of this type include
interactively helping the user to investigate sensitivity analysis
prediction signs, relative sensitivities of different model outputs to
different parameters and combinations of parameters (e.g. Zhang 1994),
etc. It was not intended to replace existing point predictions about
the epidemic (cf. Brookmeyer and Gail 1994). An important issue in the
performance of an interactive hypothesis testing system is the fault
tolerance of its outputs, because almost all software systems
of significant size have faults. This fault tolerance is
the issue investigated in this paper.
The suggestion that sensitivity analysis is more fault tolerant than
point prediction has intuitive appeal: one would expect many faults that
affect point prediction calculations to affect them similarly for both
conditions of a sensitivity analysis, the base condition and the
perturbed condition. However, intuitive appeal does not constitute a
proof nor does it eliminate the need for experimental results. We
would like for the experiment reported on here to be broadly
applicable to simulation systems. In order to generalize our findings
so that they are broadly applicable, however, we must presently make
certain assumptions:
- Assumption #1: the findings stated above are insensitive to
the type of fault. In particular, real faults will act similarly to
statement deletion mutations with respect to the
findings. (Another implication of this assumption is that
analyses based on other types of mutation will also provide similar
results. One particularly useful type of mutation would be random
modification of executable files, because this would be easy to do on
commercially sold software for which only the executable code is
available.)
- Assumption #2: the findings are qualitatively typical of those
that would be reached for other outputs and other simulation programs.
- Assumption #3: the findings hold not just for single faults,
which we tested here, but to multiple faults as well (since programs
are likely to have more than one fault).
- Assumption #4: Augmenting the experiment to test for faults in
the sensitivity analysis module itself, rather than just in the
simulation model core, would not change the findings.
Conclusions. Additional research is required
to test the validity of the assumptions and thus establish the degree
of generality of the findings. If the assumptions are valid we would
then conclude that, in the presence of faults:
- Sensitivity analysis predictions tend to be more fault
tolerant, and hence more dependable than point predictions.
- Sensitivity analysis prediction signs tend to be fault
tolerant.
ACKNOWLEDGEMENTS
The IMAP project was supported in part by funding from the Center for
Devices and Radiological Health (CDRH), Food and Drug Administration,
Department of Health and Human Services, Rockville, Maryland.
The authors wish to thank Harry F. Bushar (CDRH) and R. Ron Goforth
(Suranaree Institute of Technology, Thailand) for their comments on
the manuscript.
REFERENCES
- Berleant, D., H. Cheng, P. Hoang, M. Ibrahim, S.
Jamil, and P. Krovvidi, 1994, Robustness measurement: an approach to
assessing simulation program reliability, in M. J. Chinni, ed.,
Proceedings of the Military, Government and Aerospace Simulation
Conference, The Society for Computer Simulation (ISBN 1-56555-072-2),
pp.~165--170.
- Brookmeyer, R. and M. H. Gail, 1994, AIDS Epidemiology, Oxford
University Press.
- Friedman, M. and J. Voas, 1995, Software Assessment:
Reliability, Safety, Testability, John Wiley & Sons.
- Goforth, R. R. and D. Berleant, 1994, A simulation model to
assist in managing the HIV epidemic: IMAP2, Simulation 63
(2) (August) 128--136. (IMAP3, used for this paper, is a
revised version of IMAP2.)
- Goseva, K., P. Grnarov and A. Grnarov, 1995, Performability
modeling of N version programming technique, Proceedings of the
Sixth International Symposium on Software Reliability Engineering
(ISSRE '95). Abstract:
http://www.computer.org/conferen/proceed/issre95/abstract.htm#209.
- Liu, B., 1996, A mutation based approach to software fault
tolerance assessment, master's thesis, University of Arkansas,
Fayetteville, AR.
- Zhang, P., 1994, Extension and maintenance of the
Interactive Model for Aids Prediction (IMAP2) with emphasis on
automated sensitivity analysis, master's thesis, University of
Arkansas, Fayetteville, AR.
Biographical note: Daniel Berleant is an associate professor. Byron Liu received his master's
degree in 1996.