Debugging Software: From Verification to Repair

108
Dissertation Debugging Software: From Verification to Repair Andreas Griesmayer February 2007 durchgef¨ uhrt am IST - Institut f ¨ ur Softwaretechnologie Technische Universit¨ at Graz Betreuer: Univ.Ass. Roderick Bloem, PhD 1. Begutachter: Univ.Prof. Dipl.-Ing. Dr.Techn. Franz Wotawa 2. Begutachter: Univ.Prof. Dipl.-Inform. Dr.-Ing. Andreas Zeller

Transcript of Debugging Software: From Verification to Repair

Debugging Software: From Verification to RepairAndreas Griesmayer
February 2007
durchgefuhrt am
Betreuer: Univ.Ass. Roderick Bloem, PhD
1. Begutachter: Univ.Prof. Dipl.-Ing. Dr.Techn. Franz Wotawa 2. Begutachter: Univ.Prof. Dipl.-Inform. Dr.-Ing. Andreas Zeller
Preface
This thesis was written in partial fulfillment of the requirements for the degree of Doktor der technischen Wissenschaftenat Graz University of Technology, Austria.
The thesis was carried out at theIST-Institute for Software Technology, Inffeldgasse 16b/2, 8010 Graz, Austria. The work was performed during theperiod of February 2004 to January 2007 in which both research and development,took place.
The subject of this thesis is to research and develop novel approaches for local- ization and repair of faults in software by using model checking techniques. Parts of this thesis were published at scientific conferences and workshops.
This work was supervised by Roderick Bloem and was partiallyfunded by the European Union under the EU funded project PROSYD.
i
Preface
ii
Abstract
In development of software, debugging is a very time consuming and tedious task with only limited support of automated tools. Debugging consists of three steps: checkingif a program fulfills its specification, and, if not,localizing the fault and repairing the program to comply with the specification.
This work shows novel methods to assist a programmer in all ofthese steps. Model checking, a technique to exhaustively check the state spaceof a system, is success- fully used for hardware verification and increasingly used for checking abstract soft- ware designs or even code. We show existing tools and techniques necessary to cope with the vast state space generated by usual software systems, as well as a case study to show the applicability to real world problems. If a program does not fulfill its specification, a model checker gives a counterexample, aexemplary run of the system that shows a behavior that violates the specification. Even with this coun- terexample, it is hard to identify the statement responsible for the error. We present an automatic approach for fault localization that is based on model checking and reports only components that can be changed such that the difference between ac- tual and intended behavior of the example is removed. In principle, the approach is applicable for all languages for that model checkers exist. For our experiments, we used the approach to localize faults in C programs, and usethe bounded model checker CBMC (C bounded model checker) on an instrumented version of the pro- gram. Experimental data supports the applicability of the approach and its superiority compared to previous approaches.
To help the user with the third step of debugging,repair, we use CEGAR (coun- terexample guided abstraction refinement) to generate a Boolean program from a C program, which is then extended to a game. In the game, the protagonist can select an alternative implementation for an incorrect statement.If the protagonist can do so successfully using a memoryless strategy that does not depend on the stack contents, we have found a correction for the Boolean program, which is used as suggestion for a repair of the C program. This yields a correct but incomplete approach to repairing C programs. We present a symbolic algorithm to localize possibly faulty statements and provide corrections. The implementation was successfully applied to Boolean programs that were produced as abstractions from faulty Windows device drivers by the SLAM model checker from Microsoft.
iii
Abstract
iv
Zusammenfassung
Eine der aufwandigsten Tatigkeiten in der Softwareentwicklung is das“Debugging”, dasUberprufen ob das Programm der Spezifikation entspricht und gegebenenfalls das anschliessende Auffinden (“Lokalisierung” ) und Ausbessern (“Reparatur” ) des Fehlers. Die Unterstutzung durch automatisierte Tools f¨ur diese Tatigkeiten ist leider sehr beschrankt.
Die vorliegende Arbeit zeigt neue Ansatze um dem Programmierer in allen diesen Schritten zu helfen. Model checking, ein Verfahren um den gesamten Zustandsraum eines Systems zu uberprufen, wird bereits erfolgreich inder Verifikation von Hard- ware, zunehmend auch fur Softwaredesigns und teilweise sogar Quellcode benutzt. Wir zeigen existierende Tools und notwendige Techniken um mit dem enormen Zustandsraum umzugehen zu konnen, der durch Software aufgespannt wird. Eine Fallstudie zeigt die Anwendbarkeit auf reale Software zur Steuerung von Industrie- robotern. Wenn ein Programm seine Spezifikation nicht erfullt, dann liefert ein Model Checker ein“Counterexample”, ein Beispiel das einen Ablauf des Programms zeigt in dem die Spezifikation verletzt wird. Allerdings istes auch mit einem solchen Gegenbeispiel immer noch schwer, die exakte Position des fehlerhaften Statements herauszufinden, es koennte ein Beliebiges innerhalb des Laufes sein. Wir zeigen eine automatisierte Methode um die Anzahl der moglichen Fehlerstellen zu reduzieren. Sie basiert auf Model Checking und meldet nur Komponenten die so geandert wer- den konnen, dass das gefundene Beispiel keinen Fehler mehrerzeugt. Prinzipiell kann der Ansatz direkt fur alle Sprachen angewendet werden, die auch uber einen Model Checker verfugen. Um die Anwendbarkeit zu zeigen, haben wir den Ansatz verwendet um Fehler in C Programmen aufzuspuren. Dazu instrumentierten wir C Code und benutzten den Model Checker “CBMC” um direkt auf demQuellcode ar- beiten zu konnen. Die Ergebnisse unserer Experimente belegen die Anwendbarkeit des Ansatzes und die erhohte Genauigkeit im Vergleich zu vorherigen Ansatzen.
Um dem Benutzer mit der Reparatur, dem dritten Schritt des Debugging Prozesses, zu helfen, benutzen wirCEGAR(Counterexample Guided Abstraction Refinement), um Boolsche Programme aus den C Programmen zu erzeugen. Ein Boolsches Pro- gramm wird dann zu einem Spiel erweitert, in dem sich der Protagonist eine alterna- tive Implementierung fur ein Statement aussuchen kann. Wenn das moglich ist ohne zusatzlichen Speicher oder den Inhalt des Stacks zu benutzen, dann ist eine Reparatur fur das Boolsche Programm gefunden die verwendet wird um eine Reparatur fur das C Programm abzuleiten. Wir prasentieren einen symbolischen Algorithmus um moglicherwiese fehlerhafte Statements aufzufinden und zureparieren. Durch die Im-
v
Zusammenfassung
plementierung war es moglich Reparaturen fur Windows Device Treiber zu finden, die vom SLAM-Tool auf Boolsche Programme abstrahiert wurden.
vi
Acknowledgments
This work was partially funded by the European Union projectPROSYD under con- tract FP6-IST-507219. I am grateful to the EU-project for partially funding my re- search and attendance at theSpring School on Infinite Games and Their Applications, and the conferences IEA/AIE in Bari 2005, CAV in Edinburgh 2005 and Seattle 2006 and the V&D workshop in Seattle 2006.
I also want to thank Byron Cook for inviting me twice to come toCambridge and work on the repair approach for C programs. He made it possible to use the SLAM model checker with repair and to evaluate the results on realworld problems.
Many thanks to all members of the Institute of Software Technology. First of all to my supervisors Franz Wotawa and Roderick Bloem for all his ideas and suggestions, my colleagues and “office mates” Barbara Jobstmann, Ingo Pill and Stefan Staber who helped both scientifically and morally to achieve the goal of writing this thesis, and to Daniel Kob for sharing his experience on MBD and all the tips & tricks for LATEX. Without all the discussions and collaborations this workwould have been impossible.
Last but not least I want to express my deep gratitude to my family and my friends that accompanied me in the last years through the ups and downs of the process of writing a thesis.
Thanks,
Andreas
vii
Acknowledgments
viii
Contents
Preface i
Abstract iii
Zusammenfassung v
Acknowledgments vii
1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Introducing Software Model Checking 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 The Finite State Model . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Specification using Properties . . . . . . . . . . . . . . . . . . . . .10 2.4 Approaches in Model Checking . . . . . . . . . . . . . . . . . . . 11 2.5 Concurrent Programs . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Partial Order Reduction . . . . . . . . . . . . . . . . . . . 17 2.6 Existing Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.1 Explicit Model Checkers . . . . . . . . . . . . . . . . . . . 18 2.6.2 Symbolic . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6.3 Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Preliminaries 25 3.1 Stating Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Temporal Logic LTL . . . . . . . . . . . . . . . . . . . . . 25 3.1.2 Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Techniques for Software Model Checking . . . . . . . . . . . . . .28 3.2.1 Pushdown Systems . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Counterexample Guided Abstraction Refinement (CEGAR) 29
3.3 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 DACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1 Translating the Handling System . . . . . . . . . . . . . . . 36 4.2.2 Modeling Properties . . . . . . . . . . . . . . . . . . . . . 36 4.2.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Fault Localization 39 5.1 Fault Localization by Model Checking . . . . . . . . . . . . . . . .39
5.1.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Fault Localization for C Programs . . . . . . . . . . . . . . . . . . 42 5.2.1 CBMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.2 Fault Localization . . . . . . . . . . . . . . . . . . . . . . 44 5.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.4 Dual fault Diagnosis . . . . . . . . . . . . . . . . . . . . . 46 5.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.1 TCAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 Red-Black Tree . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Repair 55 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Boolean Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.1 Syntax and Semantics . . . . . . . . . . . . . . . . . . . . 56 6.3 Model Checking of Boolean Programs . . . . . . . . . . . . . . . . 59 6.4 Repair of Boolean Programs . . . . . . . . . . . . . . . . . . . . . 61
6.4.1 Building the Game . . . . . . . . . . . . . . . . . . . . . . 62 6.4.2 Computing the Strategy . . . . . . . . . . . . . . . . . . . 62 6.4.3 Extracting a Repair . . . . . . . . . . . . . . . . . . . . . . 65 6.4.4 Mapping Boolean Repairs to C . . . . . . . . . . . . . . . . 66 6.4.5 Localizing Faults . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.5.1 Adjustments for Checking Windows Device Drivers . . . .69 6.5.2 Case Study: Windows Parallel Port Device Driver . . . . .. 69
7 Conclusions 73 7.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.1.1 Improvements and Extensions . . . . . . . . . . . . . . . . 74 7.1.2 Quality of a Repair . . . . . . . . . . . . . . . . . . . . . . 75
Index 77
List of Tables
2.1 Transitions for the “vending machine” . . . . . . . . . . . . . . .. 12 2.2 SAT encoding for the transitions of the “vending machine” . . . . . 15 2.3 Comparison of current tools for Software Model Checking. . . . . 18
3.1 LTL examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 results of model checking the control software . . . . . . . .. . . . 37
5.1 Results of the TCAS task of the Siemens Test Suite. . . . . . .. . . 48 5.2 Results of the TCAS task for dual fault diagnosis. . . . . . .. . . . 50
6.1 BDD for symbolic representation ofτ for v1 := ∗sys . . . . . . . . 64 6.2 Results for repair of Windows device drivers . . . . . . . . . .. . . 67 6.3 Informal summary of properties listed in Table 6.2 . . . . .. . . . . 68
xiii
List of Figures
2.1 Model of theVending Machine. . . . . . . . . . . . . . . . . . . . 10 2.2 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Binary Decision Diagram . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Multi-threaded program and its Model . . . . . . . . . . . . . . . .16
3.1 Buchi automaton for LTL formulapU q . . . . . . . . . . . . . . . 28 3.2 CEGAR - loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 The handling system . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1 Localization Approach . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1 Boolean program and corresponding execution and game graph . . . 60 6.2 Faulty C program . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Predicates used for abstraction. . . . . . . . . . . . . . . . . . . .. 60 6.4 Abbreviations for the conditions on the implementationof ∗sys . . . 60 6.5 Tool for viewing the repairs . . . . . . . . . . . . . . . . . . . . . . 72
xv
List of Listings
2.1 Vending Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 simple C program . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 abstract program . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1 DACS example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1 Loop Unwind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 A simple C programP = Pk . . . . . . . . . . . . . . . . . . . . . 45 5.3 Modified unwound C programP ′
k . . . . . . . . . . . . . . . . . . 46 5.4 TCAS v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1 Temporal propertyDoubleCompletion. . . . . . . . . . . . . . . . . 70 6.2 Code from dispatchRedirect.c in Parport device driver .. . . . . . . 70 6.3 Source from fdoClose.c in Parport device driver . . . . . . .. . . . 70 6.4 Source from util.c in Parport device driver . . . . . . . . . . .. . . 70 6.5 Abstraction for P4CompleteRequest . . . . . . . . . . . . . . . . .71 6.6 Replacement for IoCompleteRequest . . . . . . . . . . . . . . . . .71 7.1 Simple C Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.2 Abstraction of Listing 7.1 . . . . . . . . . . . . . . . . . . . . . . . 75
xvii
1 Introduction
1.1 Motivation
Software errors can cause large amounts of damage, not only in safety-critical sys- tems, but also in industrial applications. An error in the control program for a robot, for example, may cause damage to products and to the robot itself. In such areas as automotive engineering, both the robots and the products are very expensive. More- over, the followup costs can be a multiple of the direct costs: the production line may have to be halted while a technician travels to the site to repair the problem. There- fore, reliable quality control for software is mandatory, but hard to realize: Software rapidly grows very complex and code reuse introduces code with side-effects un- known to the programmer. In concurrent software, the problems are further inten- sified because the environment strongly influences the orderin which parts of the program are executed, which introduces a source of variation that makes it hard to find a failure. And moreover: it is even hard to reproduce a failure once one is found. Therefore, it is necessary to find automated methods to verify if software fulfills its specification and, if not, to help the user by reporting how the specification was violated.
However hard it is to find out if a program behaves as expected,finding out that it does not is only the first step in debugging. Once a violation of the specifica- tion is discovered, the engineer has to find the source of the wrong behavior, the fault. If tools for model checkingor testingfind an error, they usually can provide a counterexample, a run of the program demonstrating a path trough the program from the invocation to the position where the error becomes apparent, e.g., an assert statement. Because of this, we use the termscounterexampleanderror tracesynony- mously. While it is obvious that the fault responsible for the undesired behavior can be found among the locations visited by the error trace, the exact location still has to be found. In that process, current industrial debugging tools only help the program- mer to observe a run and visualize the data, the main task of finding a position in the program that is responsible for the faulty behavior is left to the programmer. As error traces may consist of hundreds or thousands of steps, this is still a very tedious task that ought to be assisted by automated tools. Localization methods provide the programmer with a subset of locations within the counterexample, which helps to reduce the fraction of the program that needs to be searched for a fault. This thesis shows a novel approach that can be used for a wider variety of programs and shows better results than previous work.
1
1 Introduction
For the actualrepair of the program, the removal of erroneous behavior, the pro- grammer has to think about the effects a change of a statementcauses in the overall program. Thus, it is not enough to think only about the error trace that unveiled the error, but it is also necessary to keep in mind all other validruns that might be trig- gered by valid input values. To provide the programmer with amaximum of help, it is desirable to not only give a list of fault locations, but even report a list of possible changes to the program to remove the faulty behavior. To reach this goal, the whole state space of the program and the consequences of changes tothe program have to be examined.
This thesis shows new approaches for the entire debugging cycle in software de- velopment:verification, localizationandrepair of faults in software.
1.2 Structure of the Thesis
The following section will cover some important related work in the area of verifi- cation, localization and repair of software. Then, Chapter2 gives a short overview of software model checking (SMC). Simple examples will be used to introduce es- tablished techniques and approaches in SMC that will be usedthroughout the thesis. We will use that knowledge in Section 2.6 for a comparison of anumber of impor- tant model checking tools for software. Some of this tools will be used throughout this thesis. Chapters 4 to 6 are based on papers published during the work on the thesis and present formal techniques for the debugging process. Related work for these chapters is summarized in Section 1.3, necessary preliminaries are given in Chapter 3.
Chapter 4 gives a case study of model checking and is based on the paperFor- mal Verification of Control Software: A Case Studyappeared in18th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems(IEA/AIE ’05, joint work with R. Bloem and M. Hautzendorfer and F. Wotawa [GBHW05]). Subject of the case study is model checking of a control software written in the DACS programming language by FESTO.The model was created by syntactical rewriting of the original sources tothe Java programming lan- guage. We used the model checker “Java Path Finder” (JPF) to check for safety as well as liveness properties.
Chapter 5 presents a novel approach on localizing a fault within a program. The method was first published inFirst Workshop on Verification and Debugging(V&D ’06, joint work with S. Staber and R. Bloem [GSB06]) asAutomated Fault Local- ization for C Programs. The chapter presents a general method of localization that in principle is applicable to all languages for which model checking tools exist. To demonstrate the practicability, the approach was implemented using the bounded model checker CBMC and applied to a number of C programs (see also symbolic model checking in Section 2.4).
The idea of repair of finite state programs was introduced in [JGB05] and de- scribed in more detail in [JSGB07]. Chapter 6 shows a novel approach to auto-
2
1.3 Related Work
matically provide replacements for faulty expressions to satisfy the specification in software with infinite state space. (For more details on how to model software with infinite state space seepushdown systems, Section 3.2.1). The chapter is based on two publications: Because computation of a valid repair is computationally much harder than localization (while the localization approach considers one run of the program at a time, we have to include all possible runs for repair), the idea of repair was first shown on Boolean programs, where the state space generated by the variables is rel- atively small (Repair of Boolean Programs Using Gamesat2nd Workshop on Games in Design and Verification, GDV ’05, joint work with R. Bloem [GB05]). Usual pro- gramming languages with dynamic memory allocation and infinite loops or recursion may use infinite memory, checking such software is not decidable. Thus, we intro- duce abstraction to the approach to reduce the state space from programs written in C to an abstract Boolean program. This allows us to perform repair on real problems written in C (cf. CEGAR in Section 3.2.2). The application ofthe approach to C programs by using abstraction was presented inRepair of Boolean Programs with an Application to Cat the18th Conference on Computer Aided Verification(CAV’06, joint work with R. Bloem and B. Cook [GSB06]). Its experimental section contains a case study on a number of device drivers for the Windows operating system.
1.3 Related Work
The following section gives related work, grouped by the three main topics of this thesis. InVerification, a comparison to testing and references to techniques for mak- ing software model checking feasible are given.Localizationrelates the work pre- sented in Chapter 5 to model based diagnosis, techniques that compare successful and failing runs, and approaches to give a better understanding for error traces rather than trying to directly pinpoint the fault.Repairgives the related work on games on pushdown automata (cf. Section 3.2.1), which is the theoretical background to Chap- ter 6, as well as on abstraction, which is used in order to use repair on C programs.
Verification
Currently, mainly testing techniques are used to check if a program adheres to a spec- ification. Although there are methods of automatic test casegeneration that ensure a good coverage while minimizing the efforts for the programmer [HLSC01, LKS04], testing does not cover the full behavior of a system because user input and processed data seem nondeterministically within a program. This makes it hard to estimate the coverage of the test cases, or to reproduce a failure. This problem is intensified with concurrent programs, which contain task switches which are hard to control by the testing engineer, this makes it hard to find reliable test-cases and reproduce found bugs [LHS03]. To make sure that all faults are found, formal techniques, as presented in this thesis, are required. Model checking [CGP99] in particular is a technique to prove adherence of a system to a given property, regardless of the behavior of the en-
3
1 Introduction
vironment. The drawback of using model checking of programsis that it has to cope with the huge state space that is spanned by software. Therefore, techniques like abstraction [BMMR01] and partial order reduction [ABH+01] are used to reduce the state space that has to be searched. There are different waysto build a model for a system to check. Model checkers like JPF [BHPV00] directly work on the program- ming language. If no such model checker is available, a modelhas to be created from the sources manually. In chapter 4, we show an automatedtool to create a Java model from properties and programs written in DACS, a programming language for controller software for industrial robots. For the check weutilize the model checker JPF.
Localization
A novel approach for localization is presented in Chapter 5.The work is closely re- lated to the theory of model based diagnosis (MBD) [Rei87, KW87] which has its ori- gins in diagnosing physical systems. Console et al. [CFTD93] show the applicability of MBD to fault localization in logic programs. Much work hasbeen done in apply- ing MBD to hardware designs on the source code level [FSW96, Wot02, PW06]. In MBD, a model is derived from the source code. The model, together with a given failure trace, describes the actual, faulty behavior of theprogram. The correct output for the trace must be provided by an oracle (typically by the verification engineer). Model based diagnosis uses decision procedures to get a set of components that ex- plains the discrepancy between the model and the desired behavior. This does not yield in a replacement for those components. The theory has been successfully ap- plied to debugging programs written in a subset of Java [MS03, KW06]. They gener- atevalue based models, in which components are generated that capture the original behavior of the statement they represent. In particular, values of the variables or predicates are computed and propagated through the model and observations from an oracle (failure trace) are applied. The diagnostic engine locates conflicts between the model and the observations and selects components that are suspended in order to remove the conflict. In MBD, a suspended component stops topropagate its vari- ables. This leads to the problem that information about thisvariable is lost, which in turn may remove the conflict although no valid replacementto remove the actual fault exists for the component. In [KCW05], this problem is tackled by using model checking as second stage to find out if a fault location is feasible. Therefore an extra model for the model checking stage has to be generated.
In our approach, we use model checking to get feasible error locations in the first place. Therefore no second stage to check the locations is necessary. Nether do we require a certain failure trace and an oracle that delivers the correct output values for that trace. Instead we assume that a specification is given and use model checking techniques to obtain an arbitrary finite failure traceand to provide locations within the program that are possible locations of the fault. Furthermore, the approach only reports statements that can be changed such that the fault isremoved.
A related theory to model-based diagnosis was developed by Veneris, Smith, et al.
4
1.3 Related Work
See, for instance, [FAVS+04]. They use a SAT solver to diagnose sequential circuits but do not consider specifications: the user must provide theproper behavior of the circuit for a given counterexample.
One application of diagnosis is design rectification. Huangand Cheng [HC98] develop a theory to locate sites at which a circuit can be modified to satisfy a new specification. In their case, the specification is given as another circuit.
Other existing work uses the difference between traces for fault localization. Shary- gina and Peled use a testing tool to explore theneighborhoodof an error trace to understand the error and find its cause [SP01]. Renieris and Reiss [RR03] assume the existence of a faulty trace and a number of successful traces. The difference be- tween the traces points to suspicious parts of the program. Experimental data show that the quality of the fault localization strongly dependson the given set of traces. In the work of Groce [Gro04] a specification is used to generate a successful trace that is as close as possible to a counterexample. The difference between failing and successful traces serves as a basis to explain the fault. Experimental data shows quite good results of their approach in the cases where correct traces could be produced, although for some examples manual assumptions had to be added in order to obtain insightful explanations. Comparison to our work can be found in Section 5.3. In [Zel02], Zeller’s delta debugging approach is used to obtain a cause-effect chain of a fault. The differences between states in heuristically chosen points of interest in a failing and a successful program run deliver an explanation for the fault. Ball et al. [BNR03] use multiple calls to a model checker and comparethe obtained coun- terexamples to a successful trace. Transitions that do not appear in a correct trace are reported as a possible cause for the fault.
All approaches that use the difference between traces need asuccessful run that is either generated or given by the user. Such a successful run might be hard to produce, very different to the failing run, or non existent at all. In such cases the approaches give bad results or fail completely. The reported statements are fragments of the code present in the failing run but not in a successful runs. It is not guaranteed that the real fault location is reported or that the fragment of the code isminimal. It is possible that different failing runs lead to different fault candidates without a shared candidate. Contrary to these approaches, we do not compare runs on different inputs, but start with a counterexample and look for a minimal change in the program in order to generate a successful run with the same inputs. Thus, every reported location is a potential fault candidate and can be used to fix the fault for the used counterexample. It is guaranteed that the real fault is in the set of found candidates (assuming that the fault is in the considered fault model, cf. 3.3). Therefore,we rerun the process on multiple counterexamples and use the intersection of the results to narrow down the number of fault candidates.
Other existing work does not report possible fault locations, but rather gives a deeper understanding of the nature of a failure. In [JRS04] segments of the coun- terexample are identified that force the error to occur. Zeller and Hildebrandt [ZH02] point out the significant difference between a failure inducing program input and a successful input. Groce and Visser [GV03] generate and analyze similar versions of
5
1 Introduction
a program run (successful and faulty runs) in order to obtainmore information about the cause of the failure.
Repair
The work described in Chapter 6 extends work done in [JGB05, SJB05b, SJB05a] on locating and correcting faults in finite-state systems and shows a novel approach for repair of C programs utilizing abstraction to Boolean programs, which have an infinite state space due to recursion.
Alur, La Torre, and Madhusudan [ATM03] give a fixed-point computation algo- rithm for solving modular pushdown games (which are equivalent to Boolean pro- grams) that is similar to the algorithm we present in Chapter6. They do not apply it to repair and do not show an implementation. They further focus on complex- ity analysis for reachability in different settings of visibility: global memory, local memory and local but persistent memory. Work by Walukiewicz[Wal00] focuses on computing strategies for more generalµ-calculus properties on pushdown systems. This work is not in the setting of repair and the strategies that are found are not in general memoryless. Basing a repair on them would significantly alter the program by adding a second concurrent thread. Bouajjani, Esparza, and Maler [BEM97] give algorithms for reachability analysis in alternating pushdown systems. Their algo- rithm is polynomial, but the strategy it produces is an alternating automaton and may depend on the contents of the stack.
In [CGS04], abstraction is used in the context of fault localization. The authors argue that explanations of abstract counterexamples are more informative than ex- planations of concrete counterexamples, because the predicates used capture the im- portant information in the program, but not more: it is an automatically generated high-level description of the program. For instance, the information thatx should be greater thany may be much more informative than the information thatx should be 8239 and not 4.
The notion ofrepair in the context of source code is new. Previous work from Demsky and Rinard [DR03] and Khurshid, Garca, and Suen [KGS05] usesrepair in the context of corrupt data structures without terminatingthe program. Their work is on recovering from a failure, not on fixing faults in the system.
6
2.1 Introduction
This chapter will give a brief overview of software model checking. The first section introduces a small example to show the modeling process. After an introduction to specification using properties, the example will be used to introduce the main approaches of model checking in Section 2.4. The problems generated by concurrent programming are discussed in Section 2.5, before we comparea number of important model checking tools in Section 2.6.
In general, model checking is the technique of verifying that a system satisfies a specification by exhaustively checking all possible behavior of a system. It is ap- plicable for all kinds of applications for which useful models can be generated. In industry, model checking is still mainly employed in hardware [KG99], whereas model checking for software is a vivid field of research. [BR01a, CDH+00, God97, VHB+03, CKL04]. Hardware can always be represented in finite state models. That is, we have a finite set of states and transitions between them. Model checking soft- ware, on the other hand, in general is undecidable: Recursive functions calls, un- bounded loops and dynamic memory allocation induce an infinite state space that in general cannot be checked exhaustively.
So, in general, some amount of abstraction is necessary to check software. In this chapter we will only cover finite state systems. Such models do not allow in- finite recursion, and limit the used memory. This may reduce the applicability of finite state models for software model checking. In practice, however, we see that software often either requires only a finite number of states, or is easily abstracted to such a model. Models for software written in the DACS programming language covered in Chapter 4 are generally finite state because neither recursion nor dynamic memory allocation are possible. In Chapter 5, we use the model checker CBMC, which creates a finite state system by limiting the number of loop iterations and the recursion depth. If a finite state system is not sufficient to express the program,push- down systemsare used. A pushdown system consists of a finite set of controlstates and a stack whose topmost element is accessible. This allowsus to model functions of software in a natural way while checking pushdown automata is still decidable [ATM03]. Chapter 6 covers repair on Boolean programs, whichare equivalent to pushdown systems. More information on Boolean programs andpushdown systems
7
2 Introducing Software Model Checking
is given in Section 3.2.1.
2.2 The Finite State Model
The first step of automated verification is the creation of a formal model of the sys- tem to check. This section shows how to create and to express finite state models. We already mentioned that most programming languages can express programs with infinite state space. However, to reducing the number of states, in practice it often suffices toin-line the functions. That is, to copy the body of a called function in place of its call and make the necessary assignments to mal the formal arguments to the local variables of the called function, and the returnvalues to the variables of the callee. If there are recursive functions that call themselves directly or indirectly, a bound is needed after which in-lining is stopped to avoid aninfinite loop. Note that the model still is complete if the program never reachesa recursion depth ex- ceeding the selected bound. The model checker CBMC, which isused in Chapter 5, usesunwindingto remove loops. Here, the body of loops is replicated a finitenum- ber of times, whereby in some cases CBMC is able to find the bound automatically. Section 5.2.1 gives some examples.
The model deals withstates, transitionsandatomic propositions. A statesrepre- sents the status of the program at a given point in the execution. All facts that are interesting in a state, like values of variables or the line number of the next statement to execute, are encoded inatomic propositions. Transitionsconnect states and are given by the control structure of a program. They thus define which changes in the variables are allowed at a given point in the program. Formally, we represent such a model as Kripke structure:
Definition 2.1 (Kripke Structure). A Kripke structure is a tupleK = S, δ, S0, AP,L where
S is the (finite) set of states,
δ : S → 2S is the transition relation,
S0 ⊆ S is the set of initial states,
AP is the set of atomic propositions, and
L : S → 2AP is the labeling function.
The behavior of the model is given by the possibleruns through the model, which are (infinite) sequences of states following the transitionrelation: σ = s0s1s2 . . . with (si+1 ∈ δ(si)). Facts about the system are encoded as atomic propositions,the labelingof a state gives the propositions that are true for a given state.
Example: In Listing 2.1, we have a small program with two variableschoiceand money, and corresponding get-functions whose implementation isomitted here
8
Listing 2.1: Vending Machine
1 enum {No,Yes} money; 2 enum {Nothing, Chips, Nuts} choice ; 3
4 money = No; 5 choice = Nothing; 6
7 while(1){ 8 money = getMoney(); 9 if (money){
10 choice = getChoice (); 11 switch(choice){ 12 caseChips: dispense (”Chips”); 13 break; 14 caseNuts: dispense (”Nuts” ); 15 break; 16 } 17 choice = Nothing; 18 money = No; 19 } 20 }
(they return a nondeterministic value of the respective type to emulate an input from the user,Nothingrepresents the case where the user aborted the transaction). First, the routine checks if money was inserted. If so, it asks which kind of snack it should dispense and does so within the switch block. Afterthat, both variables are set to their initial values. For the model, we use propositions Moneyand Nomoneyfor variablemoney, andChips, NutsandNothing for variablechoice. The fact that a snack is dispensed is stated with propositionDispense. Note that if the user aborts a purchase after inserting some money, themachine “forgets” that money was inserted and neither gives money back nor letsthe user choose a second time without inserting new money. We see the corresponding model in Figure 2.1: States are given as boxes, we name them by a numberin the upper left corner. Labels are stated within the box, propositionsnot shown in a state are supposed to be false. The transitions are given as arrowsbetween states. The initial state is marked by an incoming arrow without starting state, in our case we haveS0 = {0}. For simplicity, we omitted the line number of the next statement to execute.
State 0 corresponds to the moment in the program where the variables are ini- tialized, and the statement in Line 8 is not yet executed. We have two successors: a self loop to State 0, which represents that the input function returnedNo, and State 1, which represents the fact that money was inserted and we entered the body of the if statement in Line 9. Execution of the second input function in Line 10 again gives us several successors for State 3. Because thereis no restriction on
9
NoMoney Nothing
Figure 2.1: Model of theVending Machine

2.3 Specification using Properties
A specification of a system can be stated using properties, which represent the set of sequences that adhere to them. They are divided into two classes, safety and liveness, which can be checked using different techniques.
Deadlock freedom, invariants and properties stated by assert-statements belong to the class ofsafety properties. Informally, a safety properties states that “some bad thing does not happen”. In other words, ifP is a safety property, andP does not hold for an execution, then there is some point in the execution atwhich the “bad thing” does happen. Thus, a safety property can be proven by checking if such a point in the execution is reachable within the model.
Starvation freedom and termination are examples forliveness properties, which state that “a good thing happens during execution”. In the first case, starvation free-
10
2.4 Approaches in Model Checking
dom, the “good thing” is that the program infinitely often makes progress. In the case of termination the “good thing” is that at some point thefinal statement of a program will be executed. More formally, ifP is a liveness property, any finite se- quence can be extended such thatP is fulfilled. Such a property cannot be proven by checking if a point within the model is reachable. Instead, we have to find an infinite sequence showing that the ’good thing’ never happens [Lam77, AS85]. Although a specification can have both safety and liveness parts, it is always possible to separate them.
There are different notations to formally state properties. The definitions of the commonly used “linear time temporal logic” (LTL) [Pnu77] and finite state automata are given in Section 3.1.1 and Section 3.1.2 respectively.
2.4 Approaches in Model Checking
Model Checking is the search of arun in the model, that is not in the set of sequences allowed by the specification. We call such a run anerror trace or counterexample because the run is an example that shows that the claim that the specification is correct is not satisfied.
For safety properties, we can mark states that violate the property in the model asbad states and search for a path that reaches one of them1. This process is the reachability problem, and the result can easily be transformed to acounterexample, a demonstration of the violation of the property. Checking ifa bad run exists for gen- eral properties is more complicated. For properties as usedin this thesis, a common approach is to build an automatonA for thenegated specification, which will accept all runs that violate the specification. The productM′ of modelM andA allows all runs that are possible in the model and are not allowed by the specification. If any run accepted byM′ exists, it is acounterexampleand demonstrates that the specification is violated.
The remainder of the section covers the main approaches in current model check- ing tools: explicit and symbolic model checking.
Explicit State
Explicit model checking tools build graphs whose states areexplored one by one. In general, the optimal search order depends on the application. Breadth First Search (BFS), e.g., guarantees to find the shortest counterexamplefor safety properties, but it is very costly to find loops to check for liveness. Depth First Search (DFS), on the other hand, in general does not guarantee to give the shortest counterexample, but often can be implemented more efficient than BFS algorithms. Especially loops, which have to be found for liveness properties, are harder tofind with BFS.
1For more complicated safety properties, like a sequence of events that may not happen, it might be necessary to use some memory. This is accomplished by extending the model accordingly.
11
Table 2.1: Transitions for the “vending machine”
0 → 0 000 000 0 → 1 000 001 1 → 2 001 010 1 → 3 001 011
1→ 4 001 100 2→ 7 010 111 3→ 5 011 101 4→ 6 100 110
5→ 7 101 111 6→ 7 110 111 7→ 0 111 000
There are a number of optimizations to improve explicit state model checking: Groce [GV04] uses heuristic to improve the search order and speed up the search for “bad states” in the model. Techniques likepartial order reduction(see also Section 2.5.1) are used to ignore parts of the model that do not carry information necessary for the checked property. The exact data structure depends on the purpose of the model checker. Visser et al. show a highly elaborate model checker for Java programs in [VHB+03]. They present optimizations to reduce the number of states and compress their representation. Further interesting explicit state model checker are discussed in Subsection 2.6.1.
Symbolic

x∈X δ(x). E.g., the set of all states reachable within two steps by the transition relation in Table 2.1 is computed asδ(δ(0)) = δ({0, 1}) = {0, 1, 2, 3}.
In the following, we have a look at the two main methods to encode the transition relation.
BDD Based Symbolic Model Checking A Binary Decision Diagram (BDD) is an effective method to store Boolean relations by building a decision structure. To explain this further, we first have a look at the binary form ofthe transition relation δ in table Table 2.1: Each transition is represented by a binary number. We also can view this relation as truth table by assuming that the lines in the transition relation yield true, while missing combinations yield zero.
Figure 2.2 shows thedecision treefor this relation: Each layer of nodes corre- sponds to a bit in the relation, starting with the most significant bitx2 in the upper- most layer. To check if a transition is in the relation, we start in the topmost node. If x2 is 1, we advance along the solid line (thethenedge) to its right child, otherwise we proceed along the dotted line (theelseedge) to the left child. We repeat this for
12
x2
x1
x0
y2
y1
y0
Figure 2.2: Decision tree for the function in Table 2.1.then-edged are represented as solid lines,else-edges by dashed lines. Black (white) leaves represent accepting (rejecting) paths.
each bit of the input word to check. If the trace leads to a accepting node (black), the word is accepted and therefore the transition is in the relation. Otherwise it is not a valid transition.
Decision trees need a lot of memory, to encode a function forn digits, 2n+1 − 1 nodes are required, which makes for 127 nodes in our example.A closer look at Figure 2.2 shows that a lot of the nodes are redundant, e.g., the last layer contains 64 nodes only to state whether the corresponding path is accepted or not. If we drop the requirement of having a tree structure, we get adecision diagramthat only needs two states to cover the same facts. Furthermore, we may drop decisions for single variables if the result of the acceptance of the word does notdepend on it. E.g., in our example, if the variablesx0, . . . , y1 are 0, there is no need to evaluatey2 because we know that both000000 and000001 are in the transition relation. A decision diagram is orderedif the decisions for its variables are taken in the same order, that is, if we consider the variables to be orderedx2 < x1 < · · · < y1 < y2, every path from the root node to one of the leaves encounters the variables in ascending order. A decision diagram is furthermorereducedif:
1. it does not contain any two nodesn1, n2 such that the subgraphs rooted byn1
andn2 are equivalent (represent the same functions)
2. it contains no node whose then- and else-edge lead to the same node
Such decision diagrams were introduced by Bryant [Bry86] under the name “re- duced function graphs”. In the following we will refer to them using the more fre- quently used term ROBDD (reduced ordered binary decision diagram) or BDD for short (we will always assume that it isorderedandreduced.)
13
x2
x1
x0
y2
y1
y0
0 1
Figure 2.3: Binary Decision Diagram for the function in Table 2.1. then-edged are represented as solid lines,else-edges by dashed lines.
Figure 2.3 gives the (reduced, ordered) BDD for the transition relation of the “vending machine”. It contains 23 nodes, a fifth of the original decision tree. Al- though the size of a BDD is exponential in the number of variables in the worst case, it often shows good behavior in practice allows a spacesaving representation of the transition relations of huge models [BCM+90]. The size of BDDs is highly dependent on the ordering of variables — while one ordering might show exponen- tial size in the number of variables, a different order couldbe linear. A BDD is, like a decision tree, a canonical form of representation: Two BDDs describing the same function will have the same form if they have the same ordering. This allows for an effective way of equivalence checking. Besides that, BDDs also allow a effective implementation of Boolean operators. For a Boolean operator , the expressionx y is computed in a total running time ofO(|x||y|).
BDDs are used in a wide range of model checking tools. The Repair approach in Chapter 6 uses BDDs to represent the transition relationsof the statements of Boolean programs.
Satisfiability Based Symbolic Model Checking A SAT solvertakes a for- mula in propositional logic and searches for an assignment to the variables to make it true, i.e., it checks if the formula issatisfiable(SAT). Although SAT is one of the most famous problems in NP — in fact, it was the first one proven to be in NP [Coo71] — a number of SAT solvers exist that usually are very effective on problems in practice .
SAT basedBounded Model Checking (BMC)[BCCZ99] encodes the transition re-
14
2.4 Approaches in Model Checking
Table 2.2: SAT encoding for the transitions of the “vending machine”. The exponent of the variables give the time step, while the index gives thesignificance of the bit.
0→ 0 x0 2x
0 1x
0 0x
1 2x
1 1x
1 0
lation as satisfiability problem by translating it to a Boolean formula that is satisfiable if and only if the property is violated. A SAT solver is used tofind an assignment, which in turn can be transformed back to a counterexample.
Example: Table 2.2 shows the transitions of the vending machine encoded as conjunction of Boolean variables that correspond to the binary representation of source and destination node. The index gives the significance of the bit (LSB = 0) while the exponent gives the time step. E.g., if a run starts in State 0 in the first time step, it can proceed to State 1 in the next time step (x0
2x 0 1x
1 0). The
disjunction of all these terms gives the encoding of the transition relation for one time step.
To generate a formula that covers more steps, we add the conjunction of the transition relation with all exponents incremented by 1 foreach time step. To solve the question if there is a path of length two from the initial state to State 2, the solver has to find an assignment for the free variables in the following formula (we omit the terms that do not comply with start and end node):
start in State 0 in time step 0
(x0 2x
0 1x
0 0) ∧
(x2 2x
2 1x
2 0) ∧
(x0 2x
0 1x
0 0x
1 2x
1 1x
∧ (x1 2x
1 1x
1 0x
2 2x
2 1x
2 0)
transitions ending at State 2 in time step 2
This formula is only be satisfied ifx1, the intermediate state, is set tox1 2, x
1 1, x
1 0.
This means that the intermediate state of the path between State 0 and State 2 is State 1.
The approach is calledboundedbecause the number of steps explored in the model is limited in order to generate the propositional formula. This implies that the approach is incomplete if the bound is not high enough. On the other hand, the size of the SAT problem does not suffer from the space explosion problemin the same way than BDDs do. The representation of the transition relationcan be very compact, but to compute several time steps requires to introduce extra variables, which makes is harder to find a satisfying assignment. So, the approach allows to check problems that would exceed memory limits on BDD based model checking and vice versa.
15
0/0
main: x=y=0; fork(assign1 ); fork(assign2 ); join (); assert (x==y);
fun assign1 : α1 : x=0; α2 : y=1;
fun assign2 : β1 : y=0; β2 : x=1;
Figure 2.4: Multi-threaded program and its Model
In general it is hard to tell whether a SAT- or a BDD approach performs better on a given problem [Str04]. Biere et al. [BCC+03] give an introduction to bounded model checking and show methods to ensure completeness. Bounded Model checking is used in Chapter 5 by the model checker CBMC to check properties of C programs. Generation of the model for a C program and the checked properties as done by this checker is more elaborate than shown here and is explained in more detail in [CKL04].
2.5 Concurrent Programs
The example above models a single process. What happens if wehave two or more processes run in parallel? Usually there is a scheduler thatdetermines which process is allowed to execute a statement next, or they are executed on different CPU-cores in parallel and the time needed for the execution of a statement determines when the next statement of a process is executed. In both cases, the programmer only has lim- ited possibilities to influence the order in which statements of different processes are executed. If processes operate on shared memory or resources, the order of execution influences the outcome of a computation.
To check if the execution of the process always gives us the intended behavior, we have to check all possible interleavings of the processes. Figure 2.4 shows a small multi-threaded program in pseudocode and its model. The program consists of a main routine that forks two small treads that assign to variablesx andy. The main routine waits until both threads finished their execution and asserts afterwards that
16
2.6 Existing Tools
x == y. The assignment statements are labeled withα and β. We see the same labels
in the model to show which statements is responsible for which change in state. Note that the labels are not part of the Kripke structure but only shown in the figure for better understanding. A state in the Kripke structure isdefined by the value of the variables and the position of both threads within their programs. The examples clearly demonstrates the problem in checking concurrent programs: Although each of the concurrent threads only consists of two statements, our model already shows six different possible runs in thirteen states. In general,the size of the state space shows an exponential blowup in the number of statements of the threads.
2.5.1 Partial Order Reduction
We have seen the increase in the number of states due to parallel programs in Sec- tion 2.5. Close examination of the respective model in Figure 2.4 shows that some of the paths partially traverse the same states. Because in that example we are only interested in the reachability of states (i.e. if there is a path whose last state is not accepting), the order of the execution ofα1 andβ1 does not matter as long as they are executed subsequently. This allows for the removal of one of the orderings from the graph.
Partial Order Reduction (POR) exploits such facts and avoids to build the full model, without dropping any information that is necessary to check the property. To this end, it is necessary to analyze if the statements of the program depend on each other. Statements that do not share variables are good examples for independent statements. But also statements that share variables can beindependent. E.g., if they access the shared variables only by reading.
A more detailed introduction to POR can be found in [CGP99], POR for symbolic model checking is discussed in [ABH+01].
2.6 Existing Tools
The first sections of this chapter have given the basic concepts in software model checking. In the remainder, we will have a short look at important existing tools for model checking software that emerged in the last ten years. Some of the tools will be used in the later chapters or could be used for the approaches presented there. Localization in Chapter 5, e.g., could directly applied to Java programs by using the JPF model checker described below instead of CBMC,which was used in the experiments given in Section 5.3. An overview of the tools presented below is given in Table 2.3, which compares some key features of the tools: the inputlan- guage, which is either C, Boolean programs as described in Section6.2, or one of the modeling languages Promela and Remopla. Themodeldetermines if the checker can cope with recursion (see finite state model Section 2.2),and pushdown systems Subsection 2.5.1).Approachshows if the tools performs explicit model checking
17
Table 2.3: Comparison of current tools for Software Model Checking
Tool Language Model Approach Abstr. Spec Threads
BEBOP Bool Pushdown Mixed2 No Safety No BLAST ANSI-C Finite Mixed2 Yes Safety Yes BOPPO Bool3 Finite SAT No Safety Yes CBMC ANSI-C Finite SAT No Safety No JPF Java Finite Explicit No Safety Yes Magic ANSI-C Finite SAT Yes Liveness Yes MOPED1 Bool Pushdown BDD No Liveness No MOPED2 Remopla Pushdown BDD Yes Safety Yes SPIN Promela Finite Explicit No Liveness Yes
2 control flow explicit, data symbolic3without recursion
(Section 2.4) or a symbolic approach (Section 2.4). Furthermore, the table contains if the tools supports abstraction and multiple threads, andthe supported specification.
2.6.1 Explicit Model Checkers
A mentioned before, explicit model checkers search the state space one by one using an explicit data structure. This allows for search heuristics and optimizations like partial order reduction, which are hard to implement for symbolic model checkers [ABH+01]. POR is a powerful tool for checking concurrent threads,which is an important reason that this is one of the strengths of tools using explicit models (see also Table 2.3).
BEBOP
BEBOP is a model checker for Boolean programs [BR00]. It creates an explicit rep- resentations of the control structure of the program, data is stored in BDDs. Boolean programs syntactically are very close to C programs. They support functions with call by value and return values as well as the control statements known from C. The domain of all variables is Boolean, parallel assignment andreturn values are sup- ported, but there is no form of dynamical allocation of memory. The expressibility of Boolean programs is equal to pushdown automata (see also Section 3.2.1) and therefore can handle (possibly infinite) recursive calls.
BEBOP can be used within the abstraction refinement loop of SLAM, which is dis- cussed below. It allows specification in form of the assertion- and assume statements that are defined in Boolean programs, see Section 6.2 for moredetails.
18
BLAST
BLAST (Berkeley Lazy Abstraction Software Verification Tool) is an abstracting model checker for C [HJMS03]4. In contrast to SLAM or SATABS, which are pre- sented below, BLAST creates the abstractionon-the-fly. That is, it does not create an abstraction of the full model, but only of the parts needed for verification. Thus, if a fault is found early, no abstraction of the full program is required.
BLAST is able to check safety properties, which are stated byassert statements in the simplest case. For more involved properties, the user may add variables to store certain events that can be checked afterwards. That way, theuser can state all safety properties. (All properties that can be expressed by a finiteautomaton.) To write properties that can be used for different programs, and to avoid editing the original source, BLAST comes with a simple specification language that allows to state such properties. An dedicated tool is used to do the actual instrumentation before starting the checking process.
The control flow of the program is stored explicitly, while the data, which is repre- sented by abstract predicates, is stored in BDDs. The current version uses an explicit stack for calls and does not handle recursive functions.An extension allows to check for data races in concurrent threads [HJMQ03].
Java Path Finder
The first version Java Path Finder (JPF), initiated from a group around Havelund at NASA [HP00], extracting a Promela model from Java source code and subsequently checked it using SPIN. Version two of JPF [VHB+03] is a new implementation from Visser at al., again at NASA. JPF2 does not create a Promela model but directly works on Java byte code using a backtracking Java virtual machine.This saves the tedious task of modeling, but creates the need for state-compressing mechanisms to cope with the large states. Such a state of a Java program containsall the variable values, the contents of the heap, position in the program and other information necessary to proceed at a given time step. JPF directly checks all Java programs that do not rely on native code, which allows to check Java programs “out of the box”, as long as they do not use libraries that access hardware. In order to check programs that do use native functions, like I/O, those libraries have to be replaced by abstractions.
JPF check for deadlocks, unhandled exceptions and safety properties, which can be expressed by assert statements. More recently, a plug-ininterface was introduced to allow extensions, e.g., for race conditions. JPF was released as open source in 2005 and is hosted by sourceforge where it is actively maintained5.
4http://mtc.epfl.ch/software-tools/blast/ 5http://javapathfinder.sourceforge.net (November 2006)
SPIN
Holzmann’s SPIN model checker [Hol97] is one of the most popular model check- ing tools for software. The acronym SPIN stands forSimple PromelaInterpreter. Promelais the input language of SPIN, itself an acronym forProcessMeta-Language. Besides processes, Promela provides data objects, messagechannels and constructs usual to high level languages. The aim of SPIN is to check designs of distributed soft- ware, communication protocols and the like. It allows a number of powerful methods for specifications. The simplest case are assertions that can be stated within a pro- cess. For more complex specifications, SPIN provides the so called never claim, a mechanism to stateomega regularproperties. They are equivalent to Buchi au- tomata, and more powerful then specifications written in thepopular temporal logic LTL. (Refer to Section 3.1.2 for more details on automata, and Section 3.1.1 for LTL respectively.)
To describe a program in Promela , most control statements ofusual programming languages can be expressed directly with equivalent statements, whereas communi- cation between processes can be represented in a more abstract way using events sent through channels. Promela comes with a natural representation of processes but lacks a direct equivalent for functions, which have to bein-lined or emulated using processes and locking mechanisms to pause the caller while the callee is executing. The dynamic SPIN(dSPIN) model checker [DIS99] eases modeling programs by adding additional features like functions and dynamic allocation of memory while maintaining backward compatibility and a finite state space.
SPIN searches its state spaceon the fly, it builds its internal data structures during the search process. It furthermore uses some elaborate approaches for state space compression and partial order reduction (see also Section 2.5.1) to make checking real problems feasible.
2.6.2 Symbolic
As mentioned above, symbolic model checking tools use a symbolic representation for the transition relation. This especially is useful to represent models with a high amount of nondeterminism and typically allows models with alarger state space than possible for explicit state model checkers.
BOPPO
BoPPo is a symbolic model checker for Boolean programs that features asynchronous threads. It combines advantages of explicit and symbolic model checking by intro- ducing a new approach of POR for symbolic models [CKS05].
Although BOPPO uses Boolean programs as input, it does not allow recursion. It features specifications by assert- and assume statementsand can be used as the model checker within the SLAM framework described below.
20
CBMC
CBMC, the C Bounded Model Checker by Daniel Kroning, takes ANSI-C programs as input and creates a model suitable for a SAT solver [CKL04]. It does so by creating a “GOTO-program”, a finite state version of the original, possible infinite, program. Loops are unwound and functions are in-lined; control flow statements are replace by constructs withgoto statements preserving the original behavior. To determinethe number of copies of the loop- and function-bodies to be inserted, CBMC performs symbolic execution and tries to figure out an adequate bound.If this is not possible, an user defined limit is used and an “unwind-assertion” is added to be sure that the limit is not exceeded without notification. This GOTO-program then is encoded and sent to a SAT solver.
CBMC allows specification by assert- and assume statements and generates a num- ber of checks for typical faults in C programs automatically. Thus, properties like “out of bounds” of arrays or usage of invalid pointers can be checked without manual interference of the user.
MAGIC (Modular Analysis of programs in C) is a SAT-based symbolic model checker with emphasis on verification of modular systems6 [CCG+04]. A program is split to several modules. The module to check is given in C and is abstracted using the CEGAR approach. It is possible to define access to different modules by state ma- chines. This is useful reduce the state space, and furthermore allows to check single modules before the whole system is implemented. The combination of the abstracted module and its environment is then checked for conformance with the specification (again given as state machine). MAGIC supports functions byin-lining, creating a finite system.
MOPED
MOPED is a symbolic model checker for pushdown systems (see also Section 3.2.1), which are equivalent to (possibly recursive) programs on data with finite domains7. It supports both reachability and full LTL model checking and takes pushdown sys- tems, which are expressed in its own modeling language, or Boolean programs as input.
A number of interesting extensions and tools for MOPED exist. jMoped [SSE05] translates Java byte code to a pushdown system suitable for MOPED. Although a number of limitations exists (e.g., no negative integers, multi-threading, or libraries that include native code), the tool can be interesting for small programs or designs written in Java. A plugin for the Java IDE Eclipse exists to ease the usage.
Since mid 2005, there is a version 2 of MOPED that introduces anew modeling language Remopla, which is in the style of Promela and features explicit procedures. A further important feature of the new version is its implementation of abstraction
6http://www.cs.cmu.edu/ chaki/magic/ 7http://www.fmi.uni-stuttgart.de/szs/tools/moped/ (November 2006)
(CEGAR) using BDDs [EKS06]. MOPED2 currently only supportsreachability specifications.
2.6.3 Toolkits
In the following, we introduce a number of toolkits for modelchecking. They in- corporate tools for modeling, specifying, abstracting andchecking of software and provide a common user interface while the underlying dedicated tools can be varied.
Bandera
Bandera is a framework for model checking Java programs8. It consists of mod- ules for slicing, abstraction, different back-end model checkers and a GUI. Bandera works by compiling the program to the intermediate languageJIMPLE on which optimizations are performed. From this representation, a model in BIR (Bandera In- termediate Representation), and subsequently a model for the used model checker is built. If a counterexample is found, it is translated back toJava and presented in the GUI.
Bandera allows powerful specifications by using assert statements, preconditions and postconditions. Labels can be stated directly in the source code and are used in properties written in the temporal logic BSL (Bandera Specification Logic). The annotations are done in the Java sources using JavaDoc [CDH02]. BSL supports the Spec Pattern Project9, which defines patterns for commonly occurring requirements to simplify the creation of specifications for the user.
Bogor
Bogor is an extensible explicit-state model checking framework that uses a revised version of Bandera’s BIR (slightly renamed to Bogor Input Representation) as mod- eling language10. The project aims to build a modular infrastructure that (a)makes it easier for domain experts (programmers, software designers, . . . ) to apply model checking, and (b) to test new techniques without requiring awhole model checker to be written from scratch. To this end, Bogor comes with an Eclipse plugin to vi- sualize the model and counterexamples, and features a extension mechanism to add new modules. The modeling language directly features constructs of modern pro- gramming languages like unbound runtime allocation of memory and new threads and garbage collection.
Bogor can be customized to some application (e.g., BanderaVM for checking Java programs11. That is, a user can write a converter from any language of interest to BIR, and furthermore provide listeners that are called while traversal of the model.
8http://bandera.projects.cis.ksu.edu/ (November 2006) 9http://patterns.projects.cis.ksu.edu/ (November 2006)
10http://bogor.projects.cis.ksu.edu/ (November 2006) 11http://bogor.projects.cis.ksu.edu/content/view/88/54/ (November 2006)
2.6 Existing Tools
Thus, both the type of the model (finite, or pushdown), and thetype of specification (safety, liveness) depend on the actual implementation.
SATABS
SATABS is a toolkit to check ANSI-C programs and supports similar methods for specification like CBMC, but in contrary to that model checker, it does not directly transform the program to a satisfiability problem and send itto a SAT solver. Instead, it creates an abstracted version of the program and sends it to a model checker, the result is used to refine the abstraction until a valid counterexample is found, or the program is shown to satisfy the specification (CEGAR loop, see Section 3.2.2). SA- TABS supports various model checkers as back end, includingthe tools SPIN and BOPPO mentioned above.
SDV / SLAM
SLAM is a framework that checks C programs by abstracting them to a Boolean pro- gram, which consists of the same control statements, but does only contain Boolean variables [BR01b]. Abstraction is done using a CEGAR loop (see also Section 3.2.2) and a back-end model checker (BOPPO or BEBOP). SLAM uses “C2bp” [BMMR01] to create the abstract model “Newton” to check feasibility of counterexamples and extract additional predicates.
The Static Driver Verifier (SDV) [BBC+06] is a tool to systematically check the source code of Windows device drivers. For efficiency reasons, most device drivers run within the kernel. With the drawback that a poorly written driver can cause the Windows kernel (and with that the entire operating system) to crash. Therefore, a set of rules was created to define how drivers properly interact with the Windows operating system kernel [BCLR04].
SDV rules are encoded as state machines, the behavior of the Windows kernel is provided by an abstract model, theharness. To check that a driver complies with a rule, the rule is instrumented into the driver code and passed to the SLAM model checker. SDV was used to find a number of hard to find and reproduce faults within device drivers. It evolved to a stong tool that will be included in future versions of the Windows Driver Kit (WDK)12.
12http://www.microsoft.com/whdc/devtools/tools/sdv-case.mspx (January 2006)
24
3 Preliminaries
This chapter gives a short introduction of some prerequisites used in later chapters. First we will have a look at common methods to state properties, linear time temporal logic (LTL) and automata, which are commonly used in model checking. We will continue in Section 3.2 with techniques that are important in model checking of software to represent the models and reduce the state space.
3.1 Stating Properties
Section 2.3 gave a short introduction to properties withoutgiving a definition of how to formally state them. This section introduces LTL and automata on infinite words, two widely used methods to state properties for model checking.
3.1.1 Temporal Logic LTL
LTL is a temporal logic on infinite paths. It allows to specifytemporal connections between events that are stated as atomic propositions. LTL formulas are evaluated on an infinite sequence of discrete time steps and features the following basic temporal operators:
• Xφ, the next time operator:φ holds in the next time step
• φUψ, the until operator: There is a step in the path in thatψ holds. Before that,φ holds in every time step.
• φRψ, the release operator:ψ holds untillφ is true at the same time of forever.
We use abbreviations for two often used properties:
• Fφ = true Uφ, there is a time whenφ will be true (eventually)
• Gφ = false Rφ, φ has to be true now and forever (always)
Boolean operators are used to connect atomic propositions and LTL formulas in their usual meaning. To define the semantics of LTL formulas more formally, we look at an infinite pathπ = π0, π1, . . . over the alphabetΣ = 2AP , with AP the set of propositions. We sayp is true inπi if p ∈ πi, p is false otherwise. A suffix ofπ
25
3 Preliminaries
Table 3.1: LTL examples. The last state of the path in the top is repeated forever. We writeX at the intersection of a formula and a state of the path if the property holds at the corresponding state. For better readability, positions where the formula does not hold are left empty
q p q
p q q
p . . .
p X X X X X . . . X p X X X X X X . . . qU p X X X X X X X . . . pR q X X X X . . . G p X X . . . F q X X X X X X X X . . .
that starts atπi is denoted byπi, andπ |= φ states thatπ modelsφ i.e. π is a path that satisfies the LTL formulaφ. We can now give the semantics of the operators:
π |= true
π |= ¬φ iff π 6|= φ
π |= φ ∧ ψ iff π |= φ and π |= ψ
π |= φ ∨ ψ iff π |= φ or π |= ψ
π |= Xφ iff π1 |= φ
π |= φUψ iff ∃i ≥ 0 : πi |= φ and ∀j : 0 ≤ j < i : πi |= φ
π |= φRψ iff ∀i ≥ 0 : πi |= ψ, or ∃j ≥ 0 : πj |= φ ∧ ∀i : 0 ≤ i ≤ j, πi |= φ
This gives us the semantics for the abbreviations:
π |= Fφ iff ∃i : πi |= φ
π |= Gφ iff ∀i : πi |= φ
Example: Table 3.1 gives a number of examples. In the first line, we are given a sequence over the alphabet{p, q}. Because LTL works on infinite sequences, we assume that the last state,{p}, is repeated forever. We have a number of LTL formulas on the left hand side whose truth value on each pointof the sequence is stated on the point of intersection. The first two lines arequite obvious:p is always true when the current state contains the predicatep and X p is true whenever thenextstate contains the predicate. The formulaqU p results intrue
in states where sequences start that have the predicateq in the labelinguntil we have a state that is followed by a state withp, or p is in the label of the current state. That is the difference to the next example,pR q. Here we need a sequence
26
3.1 Stating Properties
with q in the label until there is ap in the same state. The last but one example, G p, extends tofalse R p, which means that there is nothing thatreleasesp, it has to be present forever. This is only true from the last state on. F q extends to trueU q and states that the formula istrue as long as thereeventuallywill be a state that containsq. Because there is aq in the last but one state, this is true in all states before that, but false afterwards because the path repeatsp forever without q beingtrue again.
3.1.2 Automata
Like Kripke structures, an automaton consists of states andnodes, but rather than modeling a system, it is used to process an input string. We define an automaton asA = S, δ, S0, AP, Sacc with S being the (finite) set of states.S0 ∈ S is the set of initial states, andSacc the set of acceptance states. The transition relation, with labels from a subset ofAP , is given byδ = S × 2AP × S. A run is defined as traversal through the automaton that starts at an initialstate inS0 and continues to next states as long as there is a transition that meets the next input symbol of the input string. More formally, we define a run for input string π and s0 ∈ S0
as: s0 π0−→ s1
π1−→ s2 . . . where∀i ≥ 0 : (si, πi, si+1) ∈ δ. If a run fulfills the acceptance condition, the input string is accepted. If a state does not have a successor state for a given input symbol, the respective run is non accepting (regardless of the acceptance condition).
The acceptance condition determines the kind of input string an automaton can process. Finite automata use a subset of the statesS asaccepting statesSacc and can process finite words. A run is accepting if it ends at an accepting state. To process infinite input sequences, we have to adjust the acceptance condition: There are a number of different notions [Tho90]. A widely used acceptance condition for infinite automata is the Buchi condition: a run is accepted if it visits a set of accepting states infinitely often.
We furthermore classify automata depending on the structure of the transition re- lation δ.
Deterministic: for each input string, there is exactly one run. In terms of the tran- sition relation:∀(s, π, s′) ∈ δ : (s, π, s′′) ∈ δ → s′ = s′′
Nondeterministic: for each input string, there may be several runs. I.e., thereare states on which a run can proceed to one out of several successor states for the same input symbol. An input string is accepted from the automaton, if there existsa run that accepts it.
In the following, we will use a three letter abbreviation forthe classification of automata where the first letter gives the structure of the transition relation, and the second letter the acceptance condition. We will useD for deterministic andN non- deterministic automata.F is used for finite automata, andB for Buchi acceptance condition. E.g., we write DFA for deterministic finite automata, and NBA for non- deterministic Buchi automata.
27
p
q
Figure 3.1: Buchi automaton for LTL formulapU q in a compact representation: the edge with labelp is an abbreviation for the edges labeled with{p} and {p, q}, likewise for labelq. Empty edges allows transitions for any input.
DFA and NFA are equivalent in their expressibility. That is,they can be used to describe the same languages, but differ on the size of the automaton. This is not true for infinite automata: DBA automata are strictly less expressible than NBA automata [Rog02]. The latter are capable of expressing all omega regular languages, while it is not possible to express ,e.g.,F G p as an NBA automaton. NBAs are strictly more expressive than LTL so it is possible to express any formula given in LTL as Buchi automaton (but not vice versa).
Example: The LTL formula pU q is expressed by the NBA in Figure 3.1. A transition is enabled if the label corresponds with input letter. So, if the automaton reads{p, q} in State 1 the next state is either State 1 or State 2. The shown automaton accepts all words fulfillingpU q by looping in State 1 until aq is read and continuing in the accepting State 2 (marked by double outline). If there is a input letter for which no output transition exists, the automaton rejects the input word (e.g.,pq in State 1).
3.2 Techniques for Software Model Checking
Model checking suffers from thestate explosion problem, the state space necessary for the model may grow exponentially with the size of the problem. For software model checking, this problem arises in particular because of the enormous ranges of used variables, flexible data structures and concurrent programs. This section gives some details on the problems typical for software model checking, and approaches to ease them.
3.2.1 Pushdown Systems
A function call in software systems is handled by storing thecurrent position and local variables on a stack, passing control to the called function and restoring the local variables after the callee has finished. The call stackis part of the state of the model to allow an exhaustive check of the system. A function can call itself recursively, either directly or indirectly through another function. If the recursion depth is not bound, this leads to an infinite state space that cannot be represented in
28
3.2 Techniques for Software Model Checking
the models we have seen so far. Therefore, we introduce pushdown systems, which allow to model such systems.
Instead of enumerating all possible states, a pushdown system distinguishes con- trol locations and a stack. The transition relation is givenby rules that operate on the control locations and the topmost stack element. Both the number of control loca- tions and the stack alphabet are finite. Therefore, althoughthe size of the stack and thus the number of states is unbound, we have a finite representation [BEM97].
Definition 3.1 (Pushdown System). A Pushdown SystemP is a tupleP,Γ,, c0, withP the finite set of control locations andΓ the stack alphabet. A configuration is a tuplep, γ with p ∈ P andγ ∈ Γ∗, c0 is the initial configuration. The set of tran- sition rules is given as ⊆ (P ×Γ)× (P ×Γ∗). The infinite set of transitions is de- fined by transition rules((q, γ), (q′, ω1 . . . ωn)) ∈ as((q, γω′), (q′, ω1 . . . ωnω
′)) for anyω′ ∈ Γ∗.
Thus, we see that a transition rule inΓ defines those transitions from one control location to another that can be performed by replacing the topmost (leftmost) stack element by an arbitrary sequence of stack elements. The remainder of the stack is not accessible and stays untouched.
Pushdown automata are equivalent to Boolean programs that are described in more detail in Section 6.2. In a rule of the form((q, γω′), (q′, ω1 . . . ωnω
′)), we identifyq andq′ with statements,γ andω1 . . . ωn correspond to valuations of variables andω′
to the call stack. We intuitively see that this allows us to model transitions of state- ments as well as function calls including recursion. This allows a finite representation even for infinite recursion because we only access the topmost entry of the stack and can represent the rest in form of a finite automaton. Lately, approaches have been published that do not only allow model checking of LTL formulas over propositions that depend on the visible variables, but even allow a limited set of properties that depend on the stack [EKS03].
3.2.2 Counterexample Guided Abstraction Refinement (CEGAR )
Software usually contains statements that are of no interest for the validation of a property, e.g., print-statements that do not alter any variables, or statements on vari- ables that do not interfere with the property at hand. Slicing [Wei84] is a simple method to remove code that is not involved in the computations of variables used for the specification, but a program slice still may contain moreinformation than neces- sary. If, for instance, a property simply states that two variables should be equal, the actual value does not matter.
CEGAR is an automated method to iteratively build an abstraction of a program that is detailed enough to either find a counterexample for a property, or prove that the program is correct (Figure 3.2). The abstraction phase uses the original program and a number of predicates to create an abstraction, a program in terms of the predi- cates that reflects the original program. Here, a valuation of the predicates represents anabstract statethat corresponds to a set of concrete states in the original program.
29
Figure 3.2: CEGAR - loop
As the predicates may depend on each other, not all combinations of predicates cor- respond to concrete state. E.g., if we havep1 : x == 0 andp2 : x > 2, there is no concrete state that corresponds to the abstract statep1 = true, p2 = true. A dependency predicateallows us to consider only non-spurious abstract states. (Cf. enforcefor Boolean programs in Section 6.2.)
To compute the abstraction, the effects of the statement on the predicates have to be considered. To this end, the statements of the original program are considered as “predicate transformers”, and the new values of the predicates (with respect to the current abstract state) are computed using a theorem prover[GS97]. The abstraction is conservative, which means that all behavior from the original program is also pos- sible in the abstract program. The converse does not hold. Therefore, the checking phase may return a counterexample in the abstract program that is not possible in the original program (if no counterexample is found in the abstract model, then the concrete model complies with the specification too). The feasibility check ensures that only counterexamples are reported that are possible inthe concrete program. To this end, the trace is simulated in the concrete program. If acontradiction is found, the feasibility checker returns new predicates that demonstrate why the path is not possible in the concrete program. A new abstraction is performed that considers the new predicates. If the feasibility checker finds a way to follow the abstract coun- terexample in the concrete program, a counterexample for the concrete program is found and reported.
In the abstract programA, the values of the predicates are set according to the values of the variables at the corresponding position inP. Because there is less in- formation in the abstract system, it is not always possible to determine the truth value of a predicate. In such cases, nondeterminism is introduced. We use the expression choose(a, b) to express this nondeterminism. Ifa is true, it evaluates to true. Otherwise it evaluates to false ifb is true, or nondeterministically otherwise (see also
30
Listing 3.1: simple C program
1 x = y+1; 2 x++; 3 if (x==y) 4 skip ;
Listing 3.2: abstract program
1 p = false; 2 p = choose(false , p ); 3 if (p) 4 skip ;
semantics of Boolean programs in Section 6.2). For instance, if we have a predicatep = {x == y} andP is of the form of
Listing 3.1, the corresponding abstract programA is as given in Listing 3.2. We use choose(false, p) because if p is false, we havex 6= y and it might be that x = y is true afterx was incremented, so p is set nondeterministically. Otherwise, if p is true, thenx = y and an increment of x will make them different. Thus, th