Reverse Compilation Techniques

by

Cristina Cifuentes


Submitted to the School of Computing Science
in partial fulfilment of the requirements for the degree of

Doctor of Philosophy

at the

QUEENSLAND UNIVERSITY OF TECHNOLOGY

July 1994

© Cristina Cifuentes, 1994

The author hereby grants to QUT permission to reproduce and
to distribute copies of this thesis document in whole or in part.
Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Signed

Date
QUEENSLAND UNIVERSITY OF TECHNOLOGY
DOCTOR OF PHILOSOPHY THESIS EXAMINATION

CANDIDATE NAME Cristina Nicole Cifuentes
CENTRE/RESEARCH CONCENTRATION Programming Languages and Systems
PRINCIPAL SUPERVISOR Professor K J Gough
ASSOCIATE SUPERVISOR Professor W J Caelli
THESIS TITLE Reverse Compilation Techniques

Under the requirements of PhD regulation 9.2, the above candidate was examined orally by the Faculty. The members of the panel set up for this examination recommend that the thesis be accepted by the University and forwarded to the appointed Committee for examination.

Name........................................ Signature........................................
Panel Chairperson (Principal Supervisor)

Name........................................ Signature........................................
Panel Member

Name........................................ Signature........................................
Panel Member

**********

Under the requirements of PhD regulation 9.15, it is hereby certified that the thesis of the above-named candidate has been examined. I recommend on behalf of the Examination Committee that the thesis be accepted in fulfilment of the conditions for the award of the degree of Doctor of Philosophy.

Name........................................ Signature........................................
Examination Committee Chairperson
Date........................................
Reverse Compilation Techniques
by
Cristina Cifuentes

Abstract
Techniques for writing reverse compilers or decompilers are presented in this thesis. These techniques are based on compiler and optimization theory, and are applied to decompilation in a unique way; these techniques have never before been published.

A decompiler is composed of several phases which are grouped into modules dependent on language or machine features. The front-end is a machine dependent module that parses the binary program, analyzes the semantics of the instructions in the program, and generates an intermediate low-level representation of the program, as well as a control flow graph of each subroutine. The universal decompiling machine is a language and machine independent module that analyzes the low-level intermediate code and transforms it into a high-level representation available in any high-level language, and analyzes the structure of the control flow graph(s) and transform them into graphs that make use of high-level control structures. Finally, the back-end is a target language dependent module that generates code for the target language.

Decompilation is a process that involves the use of tools to load the binary program into memory, parse or disassemble such a program, and decompile or analyze the program to generate a high-level language program. This process benefits from compiler and library signatures to recognize particular compilers and library subroutines. Whenever a compiler signature is recognized in the binary program, all compiler start-up and library subroutines are not decompiled; in the former case, the routines are eliminated from the final target program and the entry point to the main program is used for the decompiler analysis, in the latter case the subroutines are replaced by their library name.

The presented techniques were implemented in a prototype decompiler for the Intel i80286 architecture running under the DOS operating system, dcc, which produces target C programs for source .exe or .com files. Sample decompiled programs, comparisons against the initial high-level language program, and an analysis of results is presented in Chapter 9.

Chapter 1 gives an introduction to decompilation from a compiler point of view, Chapter 2 gives an overview of the history of decompilation since its appearance in the early 1960s, Chapter 3 presents the relations between the static binary code of the source binary program and the actions performed at run-time to implement the program, Chapter 4 describes the phases of the front-end module, Chapter 5 defines data optimization techniques to analyze the intermediate code and transform it into a higher-representation, Chapter 6 defines control structure transformation techniques to analyze the structure of the control flow graph and transform it into a graph of high-level control structures, Chapter 7 describes the back-end module, Chapter 8 presents the decompilation tool programs, Chapter 9 gives an overview of the implementation of dcc and the results obtained, and Chapter 10 gives the conclusions and future work of this research.
Parts of this thesis have been published or have been submitted to international journals. Two papers were presented at the XIX Conferencia Latinoamericana de Informática in 1993: “A Methodology for Decompilation”[CG93], and “A Structuring Algorithm for Decompilation”[Cif93]. The former paper presented the phases of the decompiler as described in Chapter 1, Section 1.3, the front-end (Chapter 4), initial work on the control flow analysis phase (Chapter 6), and comments on the work done with dcc. The latter paper presented the structuring algorithms used in the control flow analysis phase (Chapter 6). One journal paper, “Decompilation of Binary Programs”[CG94], has been accepted for publication by Software – Practice & Experience; this paper gives an overview of the techniques used to build a decompiler (summaries of Chapters 4, 5, 6, and 7), how a signature generator tool can help in the decompilation process (Chapter 8, Section 8.2), and a sample decompiled program by dcc (Chapter 9). Two papers are currently under consideration for publication in international journals. “Interprocedural Data Flow Decompilation”[Cif94a] was submitted to the Journal of Programming Languages and describes in full the optimizations performed by the data flow analyzer to transform the low-level intermediate code into a high-level representation. “Structuring Decompiled Graphs”[Cif94b] was submitted to The Computer Journal and gives the final, improved method of structuring control flow graphs (Chapter 6), and a sample decompiled program by dcc (Chapter 9).

The techniques presented in this thesis expand on earlier work described in the literature. Previous work in decompilation did not document on the interprocedural register analysis required to determine register arguments and register return values, the analysis required to eliminate stack-related instructions (i.e. push and pop), or the structuring of a generic set of control structures. Innovative work done for this research is described in Chapters 5, 6, and 8. Chapter 5, Sections 5.2 and 5.4 illustrate and describe nine different types of optimizations that transform the low-level intermediate code into a high-level representation. These optimizations take into account condition codes, subroutine calls (i.e. interprocedural analysis) and register spilling, eliminating all low-level features of the intermediate instructions (such as condition codes and registers) and introducing the high-level concept of expressions into the intermediate representation. Chapter 6, Sections 6.2 and 6.6 illustrate and describe algorithms to structure different types of loops and conditional, including multi-way branch conditionals (e.g. case statements). Previous work in this area has concentrated in the structuring of loops, few papers attempt to structure 2-way conditional branches, no work on multi-way conditional branches is described in the literature. This thesis presents a complete method for structuring all types of structures based on a predetermined, generic set of high-level control structures. A criterion for determining the generic set of control structures is given in Chapter 6, Section 6.4. Chapter 8 describes all tools used to decompile programs, the most important tool is the signature generator (Section 8.2) which is used to determine compiler and library signatures in architectures that have an operating system that do not share libraries, such as the DOS operating system.
Acknowledgments

The feasibility of writing a decompiler for a contemporary machine architecture was raised by Professors John Gough and Bill Caelli in the early 1990s. Since this problem appeared to provide a challenge in the areas of graph and data flow theory, I decided on pursuing a PhD with the aim at determining techniques for the reverse compilation of binary programs. This thesis is the answer to the many questions asked about how to do it; and yes, it is feasible to write a decompiler.

I would like to acknowledge the time and resources provided by a number of people in the computing community. Professor John Gough provided many discussions on data flow analysis, and commented on each draft chapter of this thesis. Sylvia Willie lent me a PC and an office in her lab in the initial stages of this degree. Pete French provided me with an account on a Vax BSD 4.2 machine in England to test a Vax decompiler available on the network. Jeff Ledermann rewrote the disassembler. Michael Van Emmerik wrote the library signature generator program, generated compiler and library signatures for several PC compilers, ported dcc to the DOS environment, and wrote the interactive user interface for dcc. Jinli Cao translated a Chinese article on decompilation to English while studying at QUT. Geoff Olney proof-read each chapter, pointed out inconsistencies, and suggested the layout of the thesis. I was supported by an Australian Postgraduate Research Award (APRA) scholarship during the duration of this degree.

Jeff Ledermann and Michael Van Emmerik were employed under Australian Research Council ARC grant No. A49130261.

This thesis was written with the LATEX document preparation system. All figures were produced with the xfig facility for interactive generation of figures under X11.

Cristina Cifuentes
June 1994

The author acknowledges that any Trade Marks, Registered Names, or Proprietary Terms used in this thesis are the legal property of their respective owners.
# Contents

1 Introduction to Decompiling

1.1 Decompilers ................................................. 1
1.2 Problems ................................................... 1
   1.2.1 Recursive Undecidability .............................. 2
   1.2.2 The von Neumann Architecture ....................... 3
   1.2.3 Self-modifying code .................................. 3
   1.2.4 Idioms ................................................ 3
   1.2.5 Virus and Trojan “tricks” ............................ 4
   1.2.6 Architecture-dependent Restrictions .................. 6
   1.2.7 Subroutines included by the compiler and linker ...... 6
1.3 The Phases of a Decompiler ................................ 7
   1.3.1 Syntax Analysis ....................................... 8
   1.3.2 Semantic Analysis ................................. 9
   1.3.3 Intermediate Code Generation ......................... 10
   1.3.4 Control Flow Graph Generation .................... 10
   1.3.5 Data Flow Analysis ................................. 10
   1.3.6 Control Flow Analysis ............................... 11
   1.3.7 Code Generation ..................................... 11
1.4 The Grouping of Phases ..................................... 12
1.5 The Context of a Decompiler ................................ 13
1.6 Uses of Decompilation ..................................... 15
   1.6.1 Legal Aspects ........................................ 15

2 Decompilation – What has been done? ........................ 17
2.1 Previous Work ............................................. 17

3 Run-time Environment ......................................... 31
3.1 Storage Organization ....................................... 31
   3.1.1 The Stack Frame ................................... 33
3.2 Data Types ............................................... 33
   3.2.1 Data Handling in High-level Languages ............... 34
3.3 High-Level Language Interface ............................ 36
   3.3.1 The Stack Frame ................................... 36
   3.3.2 Parameter Passing ................................... 38
3.4 Symbol Table ................................................ 39
   3.4.1 Data Structures ..................................... 40
4 The Front-end

4.1 Syntax Analysis
4.1.1 Finite State Automaton
4.1.2 Finite State Automatons and Parsers
4.1.3 Separation of Code and Data

4.2 Semantic Analysis
4.2.1 Idioms
4.2.2 Simple Type Propagation

4.3 Intermediate Code Generation
4.3.1 Low-level Intermediate Code
4.3.2 High-level Intermediate Code

4.4 Control Flow Graph Generation
4.4.1 Basic Concepts
4.4.2 Basic Blocks
4.4.3 Control Flow Graphs

5 Data Flow Analysis

5.1 Previous Work
5.1.1 Elimination of Condition Codes
5.1.2 Elimination of Redundant Loads and Stores

5.2 Types of Optimizations
5.2.1 Dead-Register Elimination
5.2.2 Dead-Condition Code Elimination
5.2.3 Condition Code Propagation
5.2.4 Register Arguments
5.2.5 Function Return Register(s)
5.2.6 Register Copy Propagation
5.2.7 Actual Parameters
5.2.8 Data Type Propagation Across Procedure Calls
5.2.9 Register Variable Elimination

5.3 Global Data Flow Analysis
5.3.1 Data Flow Analysis Definitions
5.3.2 Taxonomy of Data Flow Problems
5.3.3 Solving Data Flow Equations

5.4 Code-improving Optimizations
5.4.1 Dead-Register Elimination
5.4.2 Dead-Condition Code Elimination
5.4.3 Condition Code Propagation
5.4.4 Register Arguments
5.4.5 Function Return Register(s)
5.4.6 Register Copy Propagation
5.4.7 Actual Parameters
5.4.8 Data Type Propagation Across Procedure Calls
5.4.9 Register Variable Elimination
5.4.10 An Extended Register Copy Propagation Algorithm

5.5 Further Data Type Propagation
6 Control Flow Analysis

6.1 Previous Work .... 123
6.1.1 Introduction of Boolean Variables .... 123
6.1.2 Code Replication .... 124
6.1.3 Multilevel Exit Loops and Other Structures .... 125
6.1.4 Graph Transformation System .... 126

6.2 Graph Structuring .... 126
6.2.1 Structuring Loops .... 127
6.2.2 Structuring Conditionals .... 128

6.3 Control Flow Analysis .... 130
6.3.1 Control Flow Analysis Definitions .... 130
6.3.2 Relations .... 131
6.3.3 Interval Theory .... 131
6.3.4 Irreducible Flow Graphs .... 135

6.4 High-Level Language Control Structures .... 135
6.4.1 Control Structures - Classification .... 135
6.4.2 Control Structures in 3rd Generation Languages .... 138
6.4.3 Generic Set of Control Structures .... 140

6.5 Structured and Unstructured Graphs .... 141
6.5.1 Loops .... 141
6.5.2 Conditionals .... 142
6.5.3 Structured Graphs and Reducibility .... 144

6.6 Structuring Algorithms .... 144
6.6.1 Structuring Loops .... 145
6.6.2 Structuring 2-way Conditionals .... 151
6.6.3 Structuring n-way Conditionals .... 156
6.6.4 Application Order .... 157

7 The Back-end .... 163
7.1 Code Generation .... 163
7.1.1 Generating Code for a Basic Block .... 163
7.1.2 Generating Code from Control Flow Graphs .... 167
7.1.3 The Case of Irreducible Graphs .... 178

8 Decompilation Tools .... 181
8.1 The Loader .... 182
8.2 Signature Generator .... 183
8.2.1 Library Subroutine Signatures .... 184
8.2.2 Compiler Signature .... 186
8.2.3 Manual Generation of Signatures .... 187
8.3 Library Prototype Generator .... 188
8.4 Disassembler .... 189
8.5 Language Independent Bindings .... 190
8.6 Postprocessor .... 191
9 dcc
  9.1 The Loader .................................................. 197
  9.2 Compiler and Library Signatures .............................. 198
    9.2.1 Library Prototypes ..................................... 199
  9.3 The Front-end .................................................. 199
    9.3.1 The Parser ............................................... 199
    9.3.2 The Intermediate Code .................................... 202
    9.3.3 The Control Flow Graph Generator ......................... 204
    9.3.4 The Semantic Analyzer .................................... 204
  9.4 The Disassembler .................................................. 210
  9.5 The Universal Decompiling Machine ............................ 210
    9.5.1 Data Flow Analysis ....................................... 210
    9.5.2 Control Flow Analysis ..................................... 212
  9.6 The Back-end .................................................... 212
    9.6.1 Code Generation .......................................... 213
  9.7 Results .......................................................... 214
    9.7.1 Intops.exe ............................................... 214
    9.7.2 Byteops.exe .............................................. 219
    9.7.3 Longops.exe ............................................... 224
    9.7.4 Benchsho.exe .............................................. 235
    9.7.5 Benchlng.exe .............................................. 242
    9.7.6 Benchmul.exe .............................................. 251
    9.7.7 Benchfn.exe .............................................. 256
    9.7.8 Fibo.exe .................................................. 263
    9.7.9 Crc.exe .................................................. 268
    9.7.10 Matrixmu .................................................. 279
    9.7.11 Overall Results ........................................... 284

10 Conclusions ....................................................... 285

A i8086 – i80286 Architecture ........................................ 289
  A.1 Instruction Format ............................................. 290
  A.2 Instruction Set ................................................ 292

B Program Segment Prefix ............................................. 303

C Executable File Format .............................................. 305
  C.1 .exe Files ..................................................... 305
  C.2 .com Files ..................................................... 306

D Low-level to High-level Icode Mapping ............................ 307

E Comments and Error Messages displayed by dcc ..................... 311

F DOS Interrupts ..................................................... 313

Bibliography .......................................................... 317
List of Figures

1-1 A Decompiler ......................................................... 1
1-2 Turing Machine Representation ...................................... 2
1-3 Sample self-modifying Code ........................................... 3
1-4 Sample Idioms ......................................................... 4
1-5 Modify the return address ............................................. 4
1-6 Self-modifying Code Virus ............................................ 5
1-7 Self-encrypting Virus .................................................. 5
1-8 Self-generating Virus .................................................. 6
1-9 Architecture-dependent Problem ...................................... 7
1-10 Phases of a Decompiler ................................................ 8
1-11 Parse tree for cx := cx - 50 ........................................ 8
1-12 Generic Constructs .................................................... 11
1-13 Decompiler Modules .................................................. 12
1-14 A Decompilation System .............................................. 13

3-1 General Format of a Binary Program ................................. 31
3-2 Skeleton Code for a “hello world” Program ......................... 32
3-3 The Stack Frame ........................................................ 33
3-4 The Stack Frame ........................................................ 34
3-5 Size of Different Data Types in the i80286 .......................... 34
3-6 Register Conventions for Return Values .............................. 37
3-7 Return Value Convention ................................................. 38
3-8 Register Parameter Passing Convention ............................... 39
3-9 Unordered List Representation ........................................ 40
3-10 Ordered List Representation .......................................... 41
3-11 Hash Table Representation .......................................... 41
3-12 Symbol Table Representation ........................................ 42

4-1 Phases of the Front-end ................................................. 43
4-2 Interaction between the Parser and Semantic Analyzer ............ 44
4-3 Components of a FSA Transition Diagram ............................. 45
4-4 FSA example ............................................................. 46
4-5 Sample Code for a “hello world” Program ......................... 47
4-6 Counter-example ........................................................ 48
4-7 Initial Parser Algorithm ............................................... 51
4-8 Final Parser Algorithm ................................................. 57
4-9 Interaction of the Semantic Analyzer .................................. 58
4-10 High-level Subroutine Prologue ...................................... 58
4-11 Register Variables ...................................................... 58
4-12 Subroutine Trailer Code ............................................... 59
4-13 C Calling Convention - Uses pop .................................... 59
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4-14</td>
<td>C Calling Convention - Uses add</td>
<td>60</td>
</tr>
<tr>
<td>4-15</td>
<td>Pascal Calling Convention</td>
<td>60</td>
</tr>
<tr>
<td>4-16</td>
<td>Long Addition</td>
<td>60</td>
</tr>
<tr>
<td>4-17</td>
<td>Long Subtraction</td>
<td>61</td>
</tr>
<tr>
<td>4-18</td>
<td>Long Negation</td>
<td>61</td>
</tr>
<tr>
<td>4-19</td>
<td>Shift Long Variable Left by 1</td>
<td>62</td>
</tr>
<tr>
<td>4-20</td>
<td>Shift Signed Long Variable Right by 1</td>
<td>62</td>
</tr>
<tr>
<td>4-21</td>
<td>Shift Unsigned Long Variable Right by 1</td>
<td>62</td>
</tr>
<tr>
<td>4-22</td>
<td>Assign Zero</td>
<td>63</td>
</tr>
<tr>
<td>4-23</td>
<td>Shift Left by n</td>
<td>63</td>
</tr>
<tr>
<td>4-24</td>
<td>Bitwise Negation</td>
<td>63</td>
</tr>
<tr>
<td>4-25</td>
<td>Sign Determination According to Conditional Jump</td>
<td>65</td>
</tr>
<tr>
<td>4-26</td>
<td>Long Conditional Graphs</td>
<td>67</td>
</tr>
<tr>
<td>4-27</td>
<td>Long Equality Boolean Conditional Code</td>
<td>67</td>
</tr>
<tr>
<td>4-28</td>
<td>Long Non-Equality Boolean Conditional Code</td>
<td>68</td>
</tr>
<tr>
<td>4-29</td>
<td>Interaction of the Intermediate Code Generator</td>
<td>68</td>
</tr>
<tr>
<td>4-30</td>
<td>Low-level Intermediate Instructions - Example</td>
<td>69</td>
</tr>
<tr>
<td>4-31</td>
<td>General Representation of a Quadruple</td>
<td>69</td>
</tr>
<tr>
<td>4-32</td>
<td>General Representation of a Triplet</td>
<td>71</td>
</tr>
<tr>
<td>4-33</td>
<td>Interaction of the Control Flow Graph Generator</td>
<td>73</td>
</tr>
<tr>
<td>4-34</td>
<td>Sample Directed, Connected Graph</td>
<td>74</td>
</tr>
<tr>
<td>4-35</td>
<td>Node Representation of Different Types of Basic Blocks</td>
<td>77</td>
</tr>
<tr>
<td>4-36</td>
<td>Control Flow Graph for Example 10</td>
<td>79</td>
</tr>
<tr>
<td>4-37</td>
<td>Basic Block Definition in C</td>
<td>79</td>
</tr>
<tr>
<td>5-1</td>
<td>Context of the Data Flow Analysis Phase</td>
<td>83</td>
</tr>
<tr>
<td>5-2</td>
<td>Sample Flow Graph</td>
<td>86</td>
</tr>
<tr>
<td>5-3</td>
<td>Flow graph After Code Optimization</td>
<td>93</td>
</tr>
<tr>
<td>5-4</td>
<td>Data Flow Analysis Equations</td>
<td>95</td>
</tr>
<tr>
<td>5-5</td>
<td>Data Flow Problems - Summary</td>
<td>97</td>
</tr>
<tr>
<td>5-6</td>
<td>Live Register Example Graph</td>
<td>99</td>
</tr>
<tr>
<td>5-7</td>
<td>Flow Graph Before Optimization</td>
<td>103</td>
</tr>
<tr>
<td>5-8</td>
<td>Dead Register Elimination Algorithm</td>
<td>104</td>
</tr>
<tr>
<td>5-9</td>
<td>Update of du-chains</td>
<td>105</td>
</tr>
<tr>
<td>5-10</td>
<td>Dead Condition Code Elimination Algorithm</td>
<td>106</td>
</tr>
<tr>
<td>5-11</td>
<td>Condition Code Propagation Algorithm</td>
<td>108</td>
</tr>
<tr>
<td>5-12</td>
<td>BNF for Conditional Expressions</td>
<td>108</td>
</tr>
<tr>
<td>5-13</td>
<td>Register Argument Algorithm</td>
<td>110</td>
</tr>
<tr>
<td>5-14</td>
<td>Function Return Register(s)</td>
<td>112</td>
</tr>
<tr>
<td>5-15</td>
<td>Register Copy Propagation Algorithm</td>
<td>115</td>
</tr>
<tr>
<td>5-16</td>
<td>Expression Stack</td>
<td>116</td>
</tr>
<tr>
<td>5-17</td>
<td>Potential High-Level Instructions that Define and Use Registers</td>
<td>118</td>
</tr>
<tr>
<td>5-18</td>
<td>Extended Register Copy Propagation Algorithm</td>
<td>121</td>
</tr>
<tr>
<td>5-19</td>
<td>Matrix Addition Subroutine</td>
<td>122</td>
</tr>
<tr>
<td>6-1</td>
<td>Context of the Control Flow Analysis Phase</td>
<td>123</td>
</tr>
<tr>
<td>6-2</td>
<td>Sample Control Flow Graph</td>
<td>127</td>
</tr>
<tr>
<td>Figure</td>
<td>Description</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>--------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>6-3</td>
<td>Post-tested Loop</td>
<td>128</td>
</tr>
<tr>
<td>6-4</td>
<td>Pre-tested Loop</td>
<td>128</td>
</tr>
<tr>
<td>6-5</td>
<td>2-way Conditional Branching</td>
<td>129</td>
</tr>
<tr>
<td>6-6</td>
<td>Single Branch Conditional</td>
<td>129</td>
</tr>
<tr>
<td>6-7</td>
<td>Compound Conditional Branch</td>
<td>130</td>
</tr>
<tr>
<td>6-8</td>
<td>Interval Algorithm</td>
<td>132</td>
</tr>
<tr>
<td>6-9</td>
<td>Intervals of a Graph</td>
<td>133</td>
</tr>
<tr>
<td>6-10</td>
<td>Derived Sequence Algorithm</td>
<td>134</td>
</tr>
<tr>
<td>6-11</td>
<td>Derived Sequence of a Graph</td>
<td>134</td>
</tr>
<tr>
<td>6-12</td>
<td>Canonical Irreducible Graph</td>
<td>135</td>
</tr>
<tr>
<td>6-13</td>
<td>High-level Control Structures</td>
<td>136</td>
</tr>
<tr>
<td>6-14</td>
<td>Control Structures Classes Hierarchy</td>
<td>138</td>
</tr>
<tr>
<td>6-15</td>
<td>Classes of Control Structures in High-Level Languages</td>
<td>140</td>
</tr>
<tr>
<td>6-16</td>
<td>Structured Loops</td>
<td>141</td>
</tr>
<tr>
<td>6-17</td>
<td>Sample Unstructured Loops</td>
<td>142</td>
</tr>
<tr>
<td>6-18</td>
<td>Structured 2-way Conditionals</td>
<td>143</td>
</tr>
<tr>
<td>6-19</td>
<td>Structured 4-way Conditional</td>
<td>143</td>
</tr>
<tr>
<td>6-20</td>
<td>Abnormal Selection Path</td>
<td>143</td>
</tr>
<tr>
<td>6-21</td>
<td>Unstructured 3-way Conditionals</td>
<td>144</td>
</tr>
<tr>
<td>6-22</td>
<td>Graph Grammar for the Class of Structures DRECn</td>
<td>145</td>
</tr>
<tr>
<td>6-23</td>
<td>Intervals of the Control Flow Graph of Figure 6-2</td>
<td>146</td>
</tr>
<tr>
<td>6-24</td>
<td>Derived Sequence of Graphs $G_2 \ldots G_4$</td>
<td>147</td>
</tr>
<tr>
<td>6-25</td>
<td>Loop Structuring Algorithm</td>
<td>148</td>
</tr>
<tr>
<td>6-26</td>
<td>Multiexit Loops - 4 Cases</td>
<td>149</td>
</tr>
<tr>
<td>6-27</td>
<td>Algorithm to Mark all Nodes that belong to a Loop induced by $(y, x)$</td>
<td>149</td>
</tr>
<tr>
<td>6-28</td>
<td>Algorithm to Determine the Type of Loop</td>
<td>151</td>
</tr>
<tr>
<td>6-29</td>
<td>Algorithm to Determine the Follow of a Loop</td>
<td>152</td>
</tr>
<tr>
<td>6-30</td>
<td>Control Flow Graph with Immediate Dominator Information</td>
<td>153</td>
</tr>
<tr>
<td>6-31</td>
<td>2-way Conditional Structuring Algorithm</td>
<td>154</td>
</tr>
<tr>
<td>6-32</td>
<td>Compound Conditional Graphs</td>
<td>155</td>
</tr>
<tr>
<td>6-33</td>
<td>Subgraph of Figure 6-2 with Intermediate Instruction Information</td>
<td>156</td>
</tr>
<tr>
<td>6-34</td>
<td>Compound Condition Structuring Algorithm</td>
<td>157</td>
</tr>
<tr>
<td>6-35</td>
<td>Unstructured n-way Subgraph with Abnormal Exit</td>
<td>158</td>
</tr>
<tr>
<td>6-36</td>
<td>Unstructured n-way Subgraph with Abnormal Entry</td>
<td>158</td>
</tr>
<tr>
<td>6-37</td>
<td>n-way Conditional Structuring Algorithm</td>
<td>159</td>
</tr>
<tr>
<td>6-38</td>
<td>Unstructured Graph</td>
<td>159</td>
</tr>
<tr>
<td>6-39</td>
<td>Multientry Loops - 4 Cases</td>
<td>160</td>
</tr>
<tr>
<td>6-40</td>
<td>Canonical Irreducible Graph with Immediate Dominator Information</td>
<td>161</td>
</tr>
<tr>
<td>7-1</td>
<td>Relation of the Code Generator with the UDM</td>
<td>163</td>
</tr>
<tr>
<td>7-2</td>
<td>Sample Control Flow Graph After Data and Control Flow Analyses</td>
<td>164</td>
</tr>
<tr>
<td>7-3</td>
<td>Abstract Syntax Tree for First Instruction of B9</td>
<td>165</td>
</tr>
<tr>
<td>7-4</td>
<td>Algorithm to Generate Code from an Expression Tree</td>
<td>166</td>
</tr>
<tr>
<td>7-5</td>
<td>Algorithm to Generate Code from a Basic Block</td>
<td>167</td>
</tr>
<tr>
<td>7-6</td>
<td>Control Flow Graph with Structuring Information</td>
<td>168</td>
</tr>
<tr>
<td>7-7</td>
<td>Algorithm to Generate Code for a Loop Header Rooted Graph</td>
<td>171</td>
</tr>
<tr>
<td>7-8</td>
<td>Algorithm to Generate Code for a 2-way Rooted Graph</td>
<td>174</td>
</tr>
<tr>
<td>Figure</td>
<td>Title</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>----------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>7-9</td>
<td>Algorithm to Generate Code for an n-way Rooted Graph</td>
<td>175</td>
</tr>
<tr>
<td>7-10</td>
<td>Algorithm to Generate Code for 1-way, Call, and Fall Rooted Graph</td>
<td>176</td>
</tr>
<tr>
<td>7-11</td>
<td>Algorithm to Generate Code from a Control Flow Graph</td>
<td>176</td>
</tr>
<tr>
<td>7-12</td>
<td>Algorithm to Generate Code from a Call Graph</td>
<td>177</td>
</tr>
<tr>
<td>7-13</td>
<td>Final Code for the Graph of Figure 7.2</td>
<td>178</td>
</tr>
<tr>
<td>7-14</td>
<td>Canonical Irreducible Graph with Structuring Information</td>
<td>179</td>
</tr>
<tr>
<td>8-1</td>
<td>Decompilation System</td>
<td>181</td>
</tr>
<tr>
<td>8-2</td>
<td>General Format of a Binary Program</td>
<td>182</td>
</tr>
<tr>
<td>8-3</td>
<td>Loader Algorithm</td>
<td>183</td>
</tr>
<tr>
<td>8-4</td>
<td>Partial Disassembly of Library Function fseek()</td>
<td>185</td>
</tr>
<tr>
<td>8-5</td>
<td>Signature for Library Function fseek()</td>
<td>185</td>
</tr>
<tr>
<td>8-6</td>
<td>Signature Algorithm</td>
<td>186</td>
</tr>
<tr>
<td>8-7</td>
<td>Disassembler as part of the Decompiler</td>
<td>191</td>
</tr>
<tr>
<td>9-1</td>
<td>Structure of the dcc Decompiler</td>
<td>195</td>
</tr>
<tr>
<td>9-2</td>
<td>Main Decompiler Program</td>
<td>196</td>
</tr>
<tr>
<td>9-3</td>
<td>Program Information Record</td>
<td>198</td>
</tr>
<tr>
<td>9-4</td>
<td>Front-end Procedure</td>
<td>200</td>
</tr>
<tr>
<td>9-5</td>
<td>Procedure Record</td>
<td>201</td>
</tr>
<tr>
<td>9-6</td>
<td>Machine Instructions that Represent more than One Icode Instruction</td>
<td>202</td>
</tr>
<tr>
<td>9-7</td>
<td>Low-level Intermediate Code for the i80286</td>
<td>205</td>
</tr>
<tr>
<td>9-7</td>
<td>Low-level Intermediate Code for the i80286 - Continued</td>
<td>206</td>
</tr>
<tr>
<td>9-7</td>
<td>Low-level Intermediate Code for the i80286 - Continued</td>
<td>207</td>
</tr>
<tr>
<td>9-8</td>
<td>Basic Block Record</td>
<td>208</td>
</tr>
<tr>
<td>9-9</td>
<td>Post-increment or Post-decrement in a Conditional Jump</td>
<td>209</td>
</tr>
<tr>
<td>9-10</td>
<td>Pre Increment/Decrement in Conditional Jump</td>
<td>209</td>
</tr>
<tr>
<td>9-11</td>
<td>Procedure for the Universal Decompiling Machine</td>
<td>211</td>
</tr>
<tr>
<td>9-12</td>
<td>Back-end Procedure</td>
<td>213</td>
</tr>
<tr>
<td>9-13</td>
<td>Bundle Data Structure Definition</td>
<td>214</td>
</tr>
<tr>
<td>9-14</td>
<td>Intops.a2</td>
<td>216</td>
</tr>
<tr>
<td>9-15</td>
<td>Intops.b</td>
<td>217</td>
</tr>
<tr>
<td>9-16</td>
<td>Intops.c</td>
<td>218</td>
</tr>
<tr>
<td>9-17</td>
<td>Intops Statistics</td>
<td>218</td>
</tr>
<tr>
<td>9-18</td>
<td>Byteops.a2</td>
<td>220</td>
</tr>
<tr>
<td>9-18</td>
<td>Byteops.a2 – Continued</td>
<td>221</td>
</tr>
<tr>
<td>9-19</td>
<td>Byteops.b</td>
<td>222</td>
</tr>
<tr>
<td>9-20</td>
<td>Byteops.c</td>
<td>223</td>
</tr>
<tr>
<td>9-21</td>
<td>Byteops Statistics</td>
<td>223</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2</td>
<td>225</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2 – Continued</td>
<td>226</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2 – Continued</td>
<td>227</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2 – Continued</td>
<td>228</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2 – Continued</td>
<td>229</td>
</tr>
<tr>
<td>9-22</td>
<td>Longops.a2 – Continued</td>
<td>230</td>
</tr>
<tr>
<td>9-23</td>
<td>Longops.b</td>
<td>231</td>
</tr>
<tr>
<td>9-23</td>
<td>Longops.b – Continued</td>
<td>232</td>
</tr>
</tbody>
</table>
9-23 Longops.b – Continued .......................... 233
9-24 Longops.c ........................................... 234
9-25 Longops Statistics ................................. 234
9-26 Control Flow Graph for Boolean Assignment ........................................ 235
9-27 Benchsho.a2 ........................................ 237
9-27 Benchsho.a2 – Continued .......................... 238
9-27 Benchsho.a2 – Continued .......................... 239
9-28 Benchsho.b .......................................... 240
9-29 Benchsho.c .......................................... 241
9-30 Benchsho Statistics ............................... 241
9-31 Benchlng.a2 ........................................ 243
9-31 Benchlng.a2 – Continued .......................... 244
9-31 Benchlng.a2 – Continued .......................... 245
9-31 Benchlng.a2 – Continued .......................... 246
9-31 Benchlng.a2 – Continued .......................... 247
9-32 Benchlng.b .......................................... 248
9-32 Benchlng.b – Continued ............................ 249
9-33 Benchlng.c .......................................... 250
9-34 Benchlng Statistics ............................... 250
9-35 Benchmul.a2 ....................................... 252
9-35 Benchmul.a2 – Continued .......................... 253
9-36 Benchmul.b .......................................... 254
9-37 Benchmul.c .......................................... 255
9-38 Benchmul Statistics ............................... 255
9-39 Benchfn.a2 ......................................... 257
9-39 Benchfn.a2 – Continued ............................ 258
9-39 Benchfn.a2 – Continued ............................ 259
9-40 Benchfn.b .......................................... 260
9-40 Benchfn.b – Continued ............................ 261
9-41 Benchfn.c .......................................... 262
9-42 Benchfn Statistics ............................... 262
9-43 Fibo.a2 ............................................. 264
9-43 Fibo.a2 – Continued ............................... 265
9-44 Fibo.b ................................................ 266
9-45 Fibo.c ................................................ 267
9-46 Fibo Statistics ...................................... 267
9-47 Crc.a2 .............................................. 269
9-47 Crc.a2 – Continued ............................... 270
9-47 Crc.a2 – Continued ............................... 271
9-47 Crc.a2 – Continued ............................... 272
9-48 Crc.b ................................................ 273
9-48 Crc.b – Continued ............................... 274
9-48 Crc.b – Continued ............................... 275
9-49 Crc.c .............................................. 276
9-49 Crc.c – Continued ............................... 277
9-49 Crc.c – Continued ............................... 278
9-50 Crc Statistics ..................................... 278
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>9-51</td>
<td>Matrixmu.a2</td>
<td>280</td>
</tr>
<tr>
<td>9-51</td>
<td>Matrixmu.a2 – Continued</td>
<td>281</td>
</tr>
<tr>
<td>9-52</td>
<td>Matrixmu.b</td>
<td>282</td>
</tr>
<tr>
<td>9-53</td>
<td>Matrixmu.c</td>
<td>283</td>
</tr>
<tr>
<td>9-54</td>
<td>Matrixmu Statistics</td>
<td>283</td>
</tr>
<tr>
<td>9-55</td>
<td>Results for Tested Programs</td>
<td>284</td>
</tr>
<tr>
<td>A-1</td>
<td>Register Classification</td>
<td>289</td>
</tr>
<tr>
<td>A-2</td>
<td>Structure of the Flags Register</td>
<td>290</td>
</tr>
<tr>
<td>A-3</td>
<td>Compound Opcodes' Second Byte</td>
<td>290</td>
</tr>
<tr>
<td>A-4</td>
<td>The Fields Byte</td>
<td>291</td>
</tr>
<tr>
<td>A-5</td>
<td>Algorithm to Interpret the Fields Byte</td>
<td>291</td>
</tr>
<tr>
<td>A-6</td>
<td>Mapping of r/m field</td>
<td>291</td>
</tr>
<tr>
<td>A-7</td>
<td>Default Segments</td>
<td>292</td>
</tr>
<tr>
<td>A-8</td>
<td>Segment Override Prefix</td>
<td>292</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes</td>
<td>294</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte opcodes – Continued</td>
<td>295</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes – Continued</td>
<td>296</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes – Continued</td>
<td>297</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes – Continued</td>
<td>298</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes – Continued</td>
<td>299</td>
</tr>
<tr>
<td>A-9</td>
<td>1-byte Opcodes – Continued</td>
<td>300</td>
</tr>
<tr>
<td>A-10</td>
<td>Table1 Opcodes</td>
<td>300</td>
</tr>
<tr>
<td>A-11</td>
<td>Table2 Opcodes</td>
<td>301</td>
</tr>
<tr>
<td>A-12</td>
<td>Table3 Opcodes</td>
<td>301</td>
</tr>
<tr>
<td>A-13</td>
<td>Table4 Opcodes</td>
<td>301</td>
</tr>
<tr>
<td>B-1</td>
<td>PSP Fields</td>
<td>303</td>
</tr>
<tr>
<td>C-1</td>
<td>Structure of an .exe File</td>
<td>305</td>
</tr>
<tr>
<td>C-2</td>
<td>Fixed Formatted Area</td>
<td>306</td>
</tr>
<tr>
<td>D-1</td>
<td>Icode Opcodes</td>
<td>308</td>
</tr>
<tr>
<td>D-1</td>
<td>Icode Opcodes – Continued</td>
<td>309</td>
</tr>
<tr>
<td>D-1</td>
<td>Icode Opcodes – Continued</td>
<td>310</td>
</tr>
<tr>
<td>F-1</td>
<td>DOS Interrupts</td>
<td>314</td>
</tr>
<tr>
<td>F-1</td>
<td>DOS Interrupts – Continued</td>
<td>315</td>
</tr>
<tr>
<td>F-1</td>
<td>DOS Interrupts – Continued</td>
<td>316</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction to Decompiling

Compiler-writing techniques are well known in the computer community; decompiler-writing techniques are not as well yet known. Interestingly enough, decompiler-writing techniques are based on compiler-writing techniques, as explained in this thesis. This chapter introduces the subject of decompiling by describing the components of a decompiler and the environment in which a decompilation of a binary program is done.

1.1 Decompilers

A decompiler is a program that reads a program written in a machine language – the source language – and translates it into an equivalent program in a high-level language – the target language (see Figure 1-1). A decompiler, or reverse compiler, attempts to reverse the process of a compiler which translates a high-level language program into a binary or executable program.

![Figure 1-1: A Decompiler](image)

Basic decompiler techniques are used to decompile binary programs from a wide variety of machine languages to a diversity of high-level languages. The structure of decompilers is based on the structure of compilers; similar principles and techniques are used to perform the analysis of programs. The first decompilers appeared in the early 1960s, a decade after their compiler counterparts. As with the first compilers, much of the early work on decompilation dealt with the translation of scientific programs. Chapter 2 describes the history of decompilation.

1.2 Problems

A decompiler writer has to face several theoretical and practical problems when writing a decompiler. Some of these problems can be solved by use of heuristic methods, others cannot be determined completely. Due to these limitations, a decompiler performs automatic program translation of some source programs, and semi-automatic program translation of
other source programs. This differs from a compiler, which performs an automatic program translation of all source programs. This section looks at some of the problems involved.

### 1.2.1 Recursive Undecidability

The general theory of computability tries to solve decision problems, that is, problems which inquire on the existence of an algorithm for deciding the truth or falsity of a whole class of statements. If there is a positive solution, an algorithm must be given; otherwise, a proof of non-existence of such an algorithm is needed, in this latter case we say that the problem is unsolvable, undecidable, or non-computable. Unsolvable problems can be partially computable if an algorithm can be given that answers yes whenever the program halts, but otherwise loops forever.

In the mathematical world, an abstract concept has to be described and modelled in terms of mathematical definitions. The abstraction of the algorithm has to be described in terms of what is called a Turing machine. A Turing machine is a computing machine that prints symbols on a linear tape of infinite length in both directions, possesses a finite number of states, and performs actions specified by means of quadruples based upon its current internal configuration and current symbol on the tape. Figure 1-2 shows a representation of a Turing machine.

![Turing Machine Representation](image)

**Figure 1-2: Turing Machine Representation**

The **halting problem** for a Turing machine $Z$ consists of determining, of a given instantaneous description $\alpha$, whether or not there exists a computation of $Z$ that begins with $\alpha$. In other words, we are trying to determine whether or not $Z$ will halt if placed in an initial state. It has been proved that this problem is recursively unsolvable and partially computable[Dec58, GL82].

Given a binary program, the separation of data from code, even in programs that do not allow such practices as self-modifying code, is equivalent to the halting problem, since it is unknown in general whether a particular instruction will be executed or not (e.g. consider the code following a loop). This implies that the problem is partially computable, and therefore an algorithm can be written to separate data from code in some cases, but not all.
1.2 Problems

1.2.2 The von Neumann Architecture

In von Neumann machines, both data and instructions are represented in the same way in memory. This means that a given byte located in memory is not known to be data or instruction (or both) until that byte is fetched from memory, placed on a register, and used as data or instruction. Even on segmented architectures where data segments hold only data information and code segments hold only instructions, data can still be stored in a code segment in the form of a table (e.g. case tables in the Intel architecture), and instructions can still be stored in the form of data and later executed by interpreting such instructions. This latter method was used as part of a Modula-2 compiler for the PC that interprets an intermediate code for an abstract stack machine. The intermediate code was stored as data and the offset for a particular procedure was pointed to by es:di[GCC+92].

1.2.3 Self-modifying code

Self-modifying code refers to instructions or preset data that are modified during execution of the program. A memory byte location for an instruction can be modified during program execution to represent another instruction or data. This method has been used throughout the years for different purposes. In the 60s and 70s, computers did not have much memory, and thus it was difficult to run large programs. Computers with a maximum of 32Kb and 64Kb were available at the time. Since space was a constraint, it had to be utilized in the best way. One way to achieve this was by saving bytes in the executable program, by reusing data locations as instructions or vice versa. In this way, a memory cell held an instruction at one time, and data or another instruction at another time. Also, instructions modified other instructions once they were not needed, and therefore executed different code next time the program executed that section of code.

Nowadays there are few memory limitations on computers, and therefore self-modifying code is not used as often. It is still used though when writing encrypting programs or virus code (see Section 1.2.5). A sample self-modifying code for the Intel architecture is given in Figure 1-3. The inst definition is modified by the mov instruction to the data bytes E920. After the move, inst is treated as yet another instruction, which is now 0E9h 20h; that is, an unconditional jump with offset 20h. Before the mov, the inst memory location held a 9090, which would have been executed as two nop instructions.

```assembly
... ; other code
mov [inst], E920 ; E9 == jmp, 20 == offset
inst db 9090 ; 90 == nop
```

Figure 1-3: Sample self-modifying Code

1.2.4 Idioms

An idiom or idiomatic expression is a sequence of instructions which form a logical entity, and which taken together have a meaning that cannot be derived by considering the primary meanings of the instructions[Gai65].

For example, the multiplication or division by powers of 2 is a commonly known idiom: multiplication is performed by shifting to the left, while division is performed by shifting to the right. Another idiom is the way long variables are added. If the machine has a word size of 2 bytes, a long variable has 4 bytes. To add two long variables, the low two bytes are added first, followed by the high two bytes, taking into account the carry from the first addition. These idioms and their meaning are illustrated in Figure 1-4. Most idioms are known in the computer community, but unfortunately, not all of them are widely known.

\[
\begin{align*}
&\text{shl ax, 2} & \text{add ax, [bp-4]} \\
&\text{adc dx, [bp-2]} \\
\downarrow \\
&\text{mul ax, 4} & \text{add dx:ax, [bp-2][bp-4]}
\end{align*}
\]

Figure 1-4: Sample Idioms

1.2.5 Virus and Trojan “tricks”

Not only have virus programs been written to trigger malicious code, but also hide this code by means of tricks. Different methods are used in viruses to hide their malicious code, including self-modifying and encrypting techniques.

Figure 1-5 illustrates code for the Azusa virus, which stores in the stack a new return address for a procedure. As can be seen, the segment and offset addresses of the virus code are pushed onto the stack, followed by a return far instruction, which transfers control to the virus code. When disassembling code, most disassemblers would stop at the far return instruction believing an end of procedure has been met; which is not the case.

\[
\begin{align*}
&\ldots \quad ; \text{other code, ax holds segment SEG value} \\
&\text{SEG:00C4 push ax} \quad ; \text{set up segment} \\
&\text{SEG:00C5 mov ax, 0CAh} \quad ; \text{ax holds an offset} \\
&\text{SEG:00C8 push ax} \quad ; \text{set up offset} \\
&\text{SEG:00C9 retf} \quad ; \text{jump to virus code at SEG:00CA} \\
&\text{SEG:00CA \ldots} \quad ; \text{virus code is here}
\end{align*}
\]

Figure 1-5: Modify the return address

One frequently used trick is the use of self-modifying code to modify the target address offset of an unconditional jump which has been defined as data. Figure 1-6 illustrates the relevant code of the Cia virus before execution. As can be seen, \texttt{cont} and \texttt{conta} define data
items 0E9h and 0h respectively. During execution of this program, procX modifies the contents of conta with the offset of the virus code, and after procedure return, the instruction jmp virusOffset (0E9h virusOffset) is executed, treating data as instructions.

```assembly
start:
    call procX ; invoke procedure
cont db 0E9h ; opcode for jmp
conta dw 0
procX:
    mov cs:[conta],virusOffset
    ret
virus:
    ... ; virus code
end.
```

Figure 1-6: Self-modifying Code Virus

Virus code can be present in an encrypted form, and decryption of this code is only performed when needed. A simple encryption/decryption mechanism is performed by the xor function, since two xors of a byte against the same constant are equivalent to the original byte. In this way, encryption is performed with the application of one xor through the code, and decryption is performed by xorring the code against the same constant value. This virus is illustrated in Figure 1-7, and was part of the LeprosyB virus.

```assembly
encrypt_decrypt:
    mov bx, offset virus_code ; get address of start encrypt/decrypt
xor_loop:
    mov ah, [bx] ; get the current byte
    xor ah, encrypt_val ; encrypt/decrypt with xor
    mov [bx], ah ; put it back where we got it from
    inc bx ; bx points to the next byte
    cmp bx, offset virus_code+virus_size ; are we at the end?
    jle xor_loop ; if not, do another cycle
    ret
```

Figure 1-7: Self-encrypting Virus

Recently, polymorphic mutation is used to encrypt viruses. The idea of this virus is to self-generate sections of code based on the regularity of the instruction set. Figure 1-8 illustrates the encryption engine of the Nuke virus. Here, a different key is used each time around the encryption loop (ax), and the encryption is done by means of an xor instruction.
Encryption_Engine:
07AB       mov     cx,770h
07AE       mov     ax,7E2Ch
07B1       encryption_loop:
07B1       xor     cs:[si],ax
07B4       inc     si
07B5       dec     ah
07B7       inc     ax
07B8       loop    encryption_loop
07BA       retn

Figure 1-8: Self-generating Virus

In general, virus programs make use of any flaw in the machine language set, self-modifying code, self-encrypting code, and undocumented operating system functions. This type of code is hard to disassemble automatically, given that most of the modifications to instructions/data are done during program execution. In these cases, human intervention is required.

1.2.6 Architecture-dependent Restrictions

Most of the contemporary machine architectures make use of a prefetch buffer to fetch instructions while the processor is executing instructions. This means that instructions that are prefetched are stored in a different location from the instructions that are already in main memory. When a program uses self-modifying code to attempt to modify an instruction in memory, if the instruction has already been prefetched, it is modified in memory but not in the pipeline buffer; therefore, the initial, unmodified instruction is executed. This example can be seen in Figure 1-9. In this case, the jmpDef data definition is really an instruction, jmp codeExecuted. This definition appears to be modified by the previous instruction, mov [jmpDef],ax, which places two nop instructions in the definition of jmpDef. This would mean that the code at codeNotExecuted is executed, displaying “Hello world!” and exiting. When running this program on an i80386 machine, “Share and Enjoy!” is displayed. The i80386 has a prefetch buffer of 4 bytes, so the jmpDef definition is not modified because it has been prefetched, and therefore the jump to codeExecuted is done, and “Share and Enjoy!” is displayed. This type of code cannot be determined by normal straight line step debuggers, unless a complete emulation of the machine is done.

1.2.7 Subroutines included by the compiler and linker

Another problem with decompilation is the great number of subroutines introduced by the compiler and the number of routines linked in by the linker. The compiler will always include start-up subroutines that set up its environment, and runtime support routines whenever required. These routines are normally written in assembler and in most cases are untranslatable into a higher-level representation. Also, most operating systems do not provide a mechanism for sharing libraries, consequently, binary programs are self-contained.
1.3 The Phases of a Decompiler

Conceptually, a decompiler is structured in a similar way to a compiler, by a series of phases that transform the source machine program from one representation to another. The typical phases of a decompiler are shown in Figure 1-10. These phases represent the logical organization of a decompiler. In practice, some of the phases will be grouped together, as seen in Section 1.4.

A point to note is that there is no lexical analysis or scanning phase in the decompiler. This is due to the simplicity of machine languages; all tokens are represented by bytes or bits of a byte. Given a byte, it is not possible to determine whether that byte forms the start of a new token or not; for example, the byte 50 could represent the opcode for a push ax instruction, an immediate constant, or an offset to a data location.

and library routines are bound into each binary image. Library routines are either written in the language the compiler was written in or in assembler. This means that a binary program contains not only the routines written by the programmer, but a great number of other routines linked in by the linker. For example, a program written in C to display “hello world” and compiled on a PC has over 25 different subroutines in the binary program. A similar program written in Pascal and compiled on the PC generates more than 40 subroutines in the executable program. Out of all these routines, the reverse engineer is normally interested in just the one initial subroutine; the main program.

Figure 1-9: Architecture-dependent Problem
1.3.1 Syntax Analysis

The parser or syntax analyzer groups bytes of the source program into grammatical phrases (or sentences) of the source machine language. These phrases can be represented in a parse tree. The expression `sub cx, 50` is semantically equivalent to `cx := cx - 50`. This latter expression can be represented in a parse tree as shown in Figure 1-11. There are two phrases in this expression: `cx - 50` and `cx := <exp>`. These phrases form a hierarchy, but due to the nature of machine language, the hierarchy will always have a maximum of two levels.

The main problem encountered by the syntax analyzer is determining what is data and what is an instruction. For example, a `case` table can be located in the code segment and it is unknown to the decompiler that this table is data rather than instructions, due to the architecture of the von Neumann machine. In this case, instructions cannot be parsed.
1.3 The Phases of a Decompiler

Sequentially assuming that the next byte will always hold an instruction. Machine dependent heuristics are required in order to determine the correct set of instructions. Syntax analysis is covered in Chapter 4.

1.3.2 Semantic Analysis

The semantic analysis phase checks the source program for the semantic meaning of groups of instructions, gathers type information, and propagates this type across the subroutine. Given that binary programs were produced by a compiler, the semantics of the machine language is correct in order for the program to execute. It is rarely the case in which a binary program does not run due to errors in the code generated by a compiler. Thus, semantic errors are not present in the source program unless the syntax analyzer has parsed an instruction incorrectly or data has been parsed instead of instructions.

In order to check for the semantic meaning of a group of instructions, idioms are looked for. The idioms from Figure 1-4 can be transformed into semantically equivalent instructions: the multiplication of ax by 4 in the first case, and the addition of long variables in the second case. [bp-2]:[bp-4] represent a long variable for that particular subroutine, and dx:ax holds the value of a long variable temporarily in this subroutine. These latter registers do not have to be used as a long register throughout the subroutine, only when needed.

Type propagation of newly found types by idiomatic expressions is done throughout the graph. For example, in Figure 1-4, two stack locations of a subroutine were known to be used as a long variable. Therefore, anywhere these two locations are used or defined independently must be converted to a use or definition of a long variable. If the following two statements are part of the code for that subroutine

```plaintext
asgn [bp-2], 0
asgn [bp-4], 14h
```

the propagation of the long type on [bp-2] and [bp-4] would merge these two statements into one that represents the identifiers as longs, thus

```plaintext
asgn [bp-2]:[bp-4], 14h
```

Finally, semantic errors are normally not produced by the compiler when generating code, but can be found in executable programs that run on a more advanced architecture than the one that is under consideration. For example, say we are to decompile binaries of the i80286 architecture. The new i80386 and i80486 architectures are based on this i80286 architecture, and their binary programs are stored in the same way. What is different in these new architectures, with respect to the machine language, is the use of more registers and instructions. If we are presented with an instruction

```plaintext
add ebx, 20
```

the register identifier ebx is a 32-bit register not present in the old architecture. Therefore, although the instruction is syntactically correct, it is not semantically correct for the machine language we are decompiling, and thus an error needs to be reported. Chapter 4 covers some of the analysis done in this phase.
1.3.3 Intermediate Code Generation

An explicit intermediate representation of the source program is necessary for the decompiler to analyse the program. This representation must be easy to generate from the source program, and must also be a suitable representation for the target language. The semantically equivalent representation illustrated in Section 1.3.1 is ideal for this purpose: it is a three-address code representation in which an instruction can have at most three operands. These operands are all identifiers in machine language, but can easily be extended to expressions to represent high-level language expressions (i.e. an identifier is an expression). In this way, a three-address representation is used, in which an instruction can have at most three expressions. Chapter 4 describes the intermediate code used by the decompiler.

1.3.4 Control Flow Graph Generation

A control flow graph of each subroutine in the source program is also necessary for the decompiler to analyse the program. This representation is suited for determining the high-level control structures used in the program. It is also used to eliminate intermediate jumps that the compiler generated due to the offset limitations of a conditional jump in machine language. In the following code

```plaintext
... ; other code
jne x ; x <= maximum offset allowed for jne
... ; other code
x: jmp y ; intermediate jump
... ; other code
y: ... ; final target address
```

label x is the target address of the conditional jump jne x. This instruction is limited by the maximum offset allowed in the machine architecture, and therefore cannot execute a conditional jump to y on the one instruction; it has to use an intermediate jump instruction. In the control flow graph, the conditional jump to x is replaced with the final target jump to y.

1.3.5 Data Flow Analysis

The data flow analysis phase attempts to improve the intermediate code, so that high-level language expressions can be found. The use of temporary registers and condition flags is eliminated during this analysis, as these concepts are not available in high-level languages. For a series of intermediate language instructions

```plaintext
asgn ax, [bp-0Eh]
asgn bx, [bp-0Ch]
asgn bx, bx * 2
asgn ax, ax + bx
asgn [bp-0Eh], ax
```

the final output should be in terms of a high-level expression

```plaintext
asgn [bp-0Eh], [bp-0Eh] + [bp-0Ch] * 2
```
The first set of instructions makes use of registers, stack variables and constants; expressions are in terms of identifiers, with a maximum tree level of 2. After the analysis, the final instruction makes use of stack variable identifiers, \([\text{bp-0Eh}], [\text{bp-0Ch}]\), and an expression tree of 3 levels, \([\text{bp-0Eh}] := [\text{bp-0Eh}] + [\text{bp-0Ch}] * 2\). The temporary registers used by the machine language to calculate the high-level expression, \(ax\) and \(bx\), along with the loading and storing of these registers, has been eliminated. Chapter 5 presents an algorithm to perform this analysis, and to eliminate other intermediate language instructions such as \text{push} and \text{pop}.

### 1.3.6 Control Flow Analysis

The control flow analyzer phase attempts to structure the control flow graph of each subroutine of the program into a generic set of high-level language constructs. This generic set must contain control instructions available in most languages; such as looping and conditional transfers of control. Language-specific constructs should not be allowed. Figure 1-12 shows two sample control flow graphs: an \text{if..then..else} and a \text{while()}. Chapter 6 presents an algorithm for structuring arbitrary control flow graphs.

![Generic Constructs](image)

**Figure 1-12: Generic Constructs**

### 1.3.7 Code Generation

The final phase of the decompiler is the generation of target high-level language code, based on the control flow graph and intermediate code of each subroutine. Variable names are selected for all local stack, argument, and register-variable identifiers. Subroutine names are also selected for the different routines found in the program. Control structures and intermediate instructions are translated into a high-level language statement.

For the example in Section 1.3.5, the local stack identifiers \([\text{bp-0Eh}]\) and \([\text{bp-0Ch}]\) are given the arbitrary names \text{loc2} and \text{loc1} respectively, and the instruction is translated to say the C language as

\[
\text{loc2} = \text{loc2} + (\text{loc1} \times 2);
\]

Code generation is covered in Chapter 7.
1.4 The Grouping of Phases

The decompiler phases presented in Section 1.3 are normally grouped in the implementation of the decompiler. As shown in Figure 1-13, three different modules are distinguished: front-end, udm, and back-end.

![Diagram of decompiler modules](binary_program)

**Figure 1-13: Decompiler Modules**

The **front-end** consists of those phases that are machine and machine-language dependent. These phases include lexical, syntax and semantic analyses, and intermediate code and control flow graph generation. As a whole, these phases produce an intermediate, machine-independent representation of the program.

The **udm** is the universal decompiling machine; an intermediate module that is completely machine and language independent, and that performs the core of the decompiling analysis. Two phases are included in this module, the data flow and the control flow analyzers.

Finally, the **back-end** consists of those phases that are high-level or target language dependent. This module is the code generator.

In compiler theory, the grouping of phases is a mechanism used by compiler writers to generate compilers for different machines and different languages. If the back-end of the compiler is rewritten for a different machine, a new compiler for that machine is constructed by using the original front-end. In a similar way, a new front-end for another high-level language definition can be written and used with the original back-end. In practice there are some limitations to this method, inherent to the choice of intermediate code representation.
In theory, the grouping of phases in a decompiler makes it easy to write different decompilers for different machines and languages; by writing different front-ends for different machines, and different back-ends for different target languages. In practical applications, this result is always limited by the generality of the intermediate language used.

## 1.5 The Context of a Decompiler

In practice, several programs can be used with the decompiler to create the target high-level language program. In general, source binary programs have a relocation table of addresses that are to be relocated when the program is loaded into memory. This task is accomplished by the loader. The relocated or absolute machine code is then disassembled to produce an assembly representation of the program. The disassembler can use help from compiler and library signatures to eliminate the disassembling of compiler start-up code and library routines. The assembler program is then input to the decompiler, and a high-level target program is generated. Any further processing required on the target program, such as converting while() loops into for loops can be done by a postprocessor. Figure 1-14 shows the steps involved in a typical “decompilation”. The user could also be a source of information, particularly when determining library routines and separation of data from instructions. Whenever possible, it is more reliable to use automatic tools. Decompiler helper tools are covered in Chapter 8. This section briefly explains their task.

![Figure 1-14: A Decompilation System](image)

**Loader**

The loader is a program that loads a binary program into memory, and relocates the machine code if it is relocatable. During relocation, instructions are altered and placed back in memory.
**Signature Generator**

A signature generator is a program that automatically determines compiler and library signatures; a binary pattern that uniquely identifies each compiler and library subroutine. The use of these signatures attempts to reverse the task performed by the linker, which links in library and compiler start-up code into the program. In this way, the analyzed program consist only of user subroutines; the ones that the user compiled in the initial high-level language program.

For example, in the compiled C program that displays “hello world” and has over 25 different subroutines in the binary program, 16 subroutines were added by the compiler to set-up its environment, 9 routines that form part of `printf()` were added by the linker, and 1 subroutine formed part of the initial C program.

The use of a signature generator not only reduces the number of subroutines to analyze, but also increases the documentation of the target programs by using library names rather than arbitrary subroutine names.

**Prototype Generator**

The prototype generator is a program that automatically determines the types of the arguments of library subroutines, and the type of the return value in the case of functions. These prototypes are derived from the library header files, and are used by the decompiler to determine the type of the arguments to library subroutines and the number of such arguments.

**Disassembler**

A disassembler is a program that transforms a machine language into assembler language. Some decompilers transform assembler programs to a higher representation (see Chapter 2). In these cases, the assembler program has been produced by a disassembler, was written in assembler, or the compiler compiled to assembler.

**Library Bindings**

Whenever the target language of the decompiler is different to the original language used to compile the binary source program, if the generated target code makes use of library names (i.e. library signatures were detected), although this program is correct, it cannot be recompiled in the target language since it does not use library routines for that language but for another one. The introduction of library bindings solves this problem, by binding the subroutines of one language to the other.

**Postprocessor**

A postprocessor is a program that transforms a high-level language program into a semantically equivalent high-level program written in the same language. For example, if the target language is C, the following code
loc1 = 1;
while (loc1 < 50) {
    /* some code in C */
    loc1 = loc1 + 1;
}

would be converted by a postprocessor into

for (loc1 = 1; loc1 < 50; loc1++) {
    /* some code in C */
}

which is a semantically equivalent program that makes use of control structures available in the C language, but not present in the generic set of structures decompiled by the decompiler.

1.6 Uses of Decompilation

Decompilation is a tool for a computer professional. There are two major areas where decompilation is used: software maintenance and security. In the former area, decompilation is used to recover lost or inaccessible source code, translate code written in an obsolete language into a newer language, structure old code written in an unstructured way (i.e. spaghetti code) into a structured program, migrate applications to a new hardware platform, and debug binary programs that are known to have bugs but for which the source code is unavailable. In the latter area, decompilation is used as a tool to verify the object code produced by a compiler in software-critical systems, since the compiler cannot be trusted in these systems, and to check for the existence of malicious code such as viruses.

1.6.1 Legal Aspects

Several questions have been raised in the last years regarding the legality of decompilation. A debate between supporters of decompilation who claim fair competition is possible with the use of decompilation tools, and the opponents of decompilation who claim copyright is infringed by decompilation, is currently being held. The law in different countries is being modified to determine in which cases decompilation is lawful. At present, commercial software is being sold with software agreements that ban the user from disassembling or decompiling the product. For example, part of the Lotus software agreement reads like this:

You may not alter, merge, modify or adapt this Software in any way including disassembling or decompiling.

It is not the purpose of this thesis to debate the legal implications of decompilation. This topic is not further covered in this thesis.
Chapter 2

Decompilation – What has been done?

Different attempts at writing decompilers have been made in the last 20 years. Due to the amount of information lost in the compilation process, to be able to regenerate high-level language code all of these experimental decompilers have limitations in one way or another, including decompilation of assembly files[Hou73, Fri74, Wor78, Hop78, Bri81] or object files with or without symbolic debugging information[Reu88, PW93], simplified high-level language[Hou73], and the requirement of the compiler’s specification[BB91, BB93]. Assembly programs have helpful data information in the form of symbolic text, such as data segments, data and type declarations, subroutine names, subroutine entry point, and subroutine exit statement. All this information can be collected in a symbol table and then the decompiler would not need to address the problem of separating data from instructions, or the naming of variables and subroutines. Object files with debugging information contain the program’s symbol table as constructed by the compiler. Given the symbol table, it is easy to determine which memory locations are instructions, as there is a certainty on which memory locations represent data. In general, object files contain more information than binary files. Finally, knowledge of the compiler’s specifications is impractical, as these specifications are not normally disclosed by compiler manufacturers.

2.1 Previous Work

Decompilers have been considered a useful software tool since they were first used in the 1960s. At that time, decompilers were used to aid in the program conversion process from second to third generation computers; in this way, manpower would not be spent in the time-consuming task of rewriting programs for the third generation machines. During the 70s and 80s, decompilers were used for the portability of programs, documentation, debugging, re-creation of lost source code, and the modification of existing binaries. In the 90s, decompilers have become a reverse engineering tool capable of helping the user with such tasks as checking software for the existence of illegal code, checking that a compiler generates the right code, and translation of binary programs from one machine to another. It is noted that decompilation is not being used for software piracy or breach of copyright, as the process is incomplete in general, and can be used only as a tool to help develop a task.

The following descriptions illustrate the best-known decompilers and/or research performed into decompiler topics by individual researchers or companies:

D-Neliac decompiler, 1960. As reported by Halstead in [Hal62], the Donnelly-Neliac (D-Neliac) decompiler was produced by J.K.Donnelly and H.England at the Navy
Electronics Laboratory (NEL) in 1960. Neliac is an Algol-type language developed at the NEL in 1955. The D-Neliac decompiler produced Neliac code from machine code programs; different versions were written for the Remington Rand Univac M-460 Countess computer and the Control Data Corporation 1604 computer.

*D-Neliac proved useful for converting non-Neliac compiled programs into Neliac, and for detecting logic errors in the original high-level program. This decompiler proved the feasibility of writing decompilers.*

W. Sassaman, 1966. Sassaman developed a decompiler at TRW Inc., to aid in the conversion process of programs from 2nd to 3rd generation computers. This decompiler took as input symbolic assembler programs for the IBM 7000 series and produced Fortran programs. Binary code was not chosen as input language because the information in the symbolic assembler was more useful. Fortran was a standard language in the 1960s and ran on both 2nd and 3rd generation computers. Engineering applications which involved algebraic algorithms were the type of programs decompiled. The user was required to define rules for the recognition of subroutines. The decompiler was 90% accurate, and some manual intervention was required [Sas66].

*This is the first decompiler that makes use of assembler input programs rather than pure binary code. Assembler programs contain useful information in the form of names, macros, data and instructions, which are not available in binary or executable programs, and therefore eliminate the problem of separating data from instructions in the parsing phase of a decompiler.*

M. Halstead, 1967. The Lockheed Missiles and Space Company (LMSC) added some enhancements to the Neliac compiler developed at the Navy Electronics Laboratory, to cater for decompilation [Hal67]. The LMSC Neliac decompiler took as input machine code for the IBM 7094 and produced Neliac code for the Univac 1108. It proved successful by decompiling over 90% of instructions and leaving the programmer to decompile the other 10%. This decompiler was used at LMSC and under contract for customers in the U.S.A. and Canada [Hal70].

*Halstead analyzed the implementation effort required to raise the percentage of correctly decompiled instructions half way to 100%, and found that it was approximately equal to the effort already spent [Hal70]. This was because decompilers from that time handled straightforward cases, but the harder cases were left for the programmer to consider. In order to handle more cases, more time was required to code these special cases into the decompiler, and this time was proportionately greater than the time required to code simple cases.*

Autocoder to Cobol Conversion Aid Program, 1967. Housel reported on a set of commercial decompilers developed by IBM to translate Autocoder programs, which were business data processing oriented, to Cobol. The translation was a one-to-one mapping and therefore manual optimization was required. The size of the final programs occupied 2.1% times the core storage of the original program [Hou73].

*This decompiler is really a translation tool of one language to another. No attempt is made to analyze the program and reduce the number of instructions generated. Inefficient code was produced in general.*
C.R. Hollander, 1973. Hollander’s PhD dissertation [Hol73] describes a decompiler designed around a formal syntax-oriented metalanguage, and consisting of 5 cooperating sequential processes; initializer, scanner, parser, constructor, and generator; each implemented as an interpreter of sets of metarules. The decompiler was a metasystem that defined its operations by implementing interpreters.

The initializer loads the program and converts it into an internal representation. The scanner interacts with the initializer when finding the fields of an instruction, and interacts with the parser when matching source code templates against instructions. The parser establishes the correspondence between syntactic phrases in the source language and their semantic equivalents in the target language. Finally, the constructor and generator generate code for the final program.

An experimental decompiler was implemented to translate a subset of IBM’s System/360 assembler into an Algol-like target language. This decompiler was written in Algol-W, a compiler developed at Stanford University, and worked correctly on the 10 programs it was tested against.

This work presents a novel approach to decompilation, by means of a formal syntax-oriented metalanguage, but its main drawback is precisely this methodology, which is equivalent to a pattern-matching operation of assembler instructions into high-level instructions. This limits the amount of assembler instructions that can be decompiled, as instructions that belong to a pattern need to be in a particular order to be recognized; intermediate instructions, different control flow patterns, or optimized code is not allowed. In order for syntax-oriented decompilers to work, the set of all possible patterns would need to be enumerated for each high-level instruction of each different compiler. Another approach would be to write a decompiler for a specific compiler, and make use of the specifications of that compiler; this approach is only possible if the compiler writer is willing to reveal the specifications of his compiler. It appears that Hollander’s decompiler worked because the compiler specifications for the Algol-W compiler that he was using were known, as this compiler was written at the University where he was doing this research. The set of assembler instructions generated for a particular Algol-W instruction were known in this case.


The partial assembly phase separates data from instructions, builds a control flow graph, and generates an intermediate representation of the program. The analyzer analyzes the program in order to detect program loops and eliminate unnecessary intermediate instructions. Finally, the code generator optimizes the translation of arithmetic expressions, and generates code for the target language.

An experimental decompiler was written for Knuth’s MIX assembler (MIXAL), producing PL/1 code for the IBM 370 machines. 6 programs were tested, 88% of the instructions were correct, and the remaining 12% of the instructions required manual intervention [HH73].

This decompiler proved that by using known compiler and graph methods, a decompiler could be written that produced good high-level code. The use of an intermediate
representation made the analysis completely machine independent. The main objection to this methodology is the choice of source language, MIX assembler, not only for the greater amount of information available in these programs, but for being a simplified non-real-life assembler language.

The Piler System, 1974. Barbe’s Piler system attempts to be a general decompiler that translates a large class of source–target language pairs to help in the automatic translation of computer programs. The Piler system was composed of three phases: interpretation, analysis, and conversion. In this way, different interpreters could be written for different source machine languages, and different converters could be written for different target high-level languages, making it simple to write decompilers for different source–target language pairs. Other uses for this decompiler included documentation, debugging aid, and evaluation of the code generated by a compiler.

During interpretation, the source machine program was loaded into memory, parsed and converted into a 3-address microform representation. This meant that each machine instruction required one or more microform instructions. The analyzer determined the logical structure of the program by means of data flow analysis, and modified the microform representation to an intermediate representation. A flowchart of the program after this analysis was made available to users, and they could even modify the flowchart, if there were any errors, on behalf of the decompiler. Finally, the converter generated code for the target high-level language[Bar74].

Although the Piler system attempted to be a general decompiler, only an interpreter for machine language of the GE/Honeywell 600 computer was written, and skeletal converters for Univac 1108’s Fortran and Cobol were developed. The main effort of this project concentrated on the analyzer.

The Piler system was a first attempt at a general decompiler for a large class of source and target languages. Its main problem was to attempt to be general enough with the use of a microform representation, which was even lower-level than an assembler-type representation.

F.L. Friedman, 1974. Friedman’s PhD dissertation describes a decompiler used for the transfer of mini-computer operating systems within the same architectural class[Fri74]. Four main phases are described: pre-processor, decompiler, code generator, and compiler.

The pre-processor converts assembler code into a standard form (descriptive assembler language). The decompiler takes the standard assembler form, analyses it, and decompiles it into an internal representation, from which FRECL code is then generated by the code generator. Finally, a FRECL compiler compiles this program into machine code for another machine. FRECL is a high-level language for program transport and development; it was developed by Friedman, who also wrote a compiler for it. The decompiler used in this project was an adaptation of Housel’s decompiler[Hou73].

Two experiments were performed; the first one involved the transport of a small but self-contained portion of the IBM 1130 Disk Monitor System to Microdata 1600/21; up to 33% manual intervention was required on the input assembler programs. Overall, the amount of effort required to prepare the code for input to the transport system was too great to be completed in a reasonable amount of time; therefore, a second experiment
2.1 Previous Work

was conducted. The second experiment decompiled Microdata 1621 operating system programs into FRECL and compiled them back again into Microdata 1621 machine code. Some of the resultant programs were re-inserted into the operating system and tested. On average, only 2% of the input assembler instructions required manual intervention, but the final machine program had a 194% increase in the number of machine instructions.

*This dissertation is a first attempt at decompiling operating system code, and it illustrates the difficulties faced by the decompiler when decompiling machine-dependent code. Input programs to this transport system require a large amount of effort to be presented in the format required by the system, and the final produced programs appear to be inefficient; both in the size of the program and the time to execute many more machine instructions.*

**Ultrasystems, 1974.** Hopwood reported on a decompilation project at Ultrasystems, Inc., in which he was a consultant for the design of the system [Hop78]. This decompiler was to be used as a documentation tool for the Trident submarine fire control software system. It took as input Trident assembler programs, and produced programs in the Trident High-Level Language (THLL) that was being developed at this company. Four main stages were distinguished: normalization, analysis, expression condensation, and code generation.

The input assembler programs were normalized so that data areas were distinguished with pseudo-instructions. An intermediate representation was generated, and the data analyzed. Arithmetic and logical expressions were built during a process of expression condensation, and finally, the output high-level language program was generated by matching control structures to those available in THLL.

*This project attempts to document assembler programs by converting them into high-level language. The fact is, given the time constraints of the project, the expression condensation phase was not coded, and therefore the output programs were hard to read, as several instructions were required for a single expression.*

**V. Schneider and G. Winiger, 1974.** Schneider and Winiger presented a notation for specifying the compilation and decompilation of high-level languages. By defining a context-free grammar for the compilation process (i.e. describe all possible 2-address object code produced from expressions and assignments), the paper shows how this grammar can be inverted to decompile the object code into the original source program [SW74]. Even more, an ambiguous compilation grammar will produce optimal object code, and will generate an unambiguous decompilation grammar. A case study showed that the object code produced by the Algol 60 constructs could not be decompiled deterministically. This work was part of a future decompiler, but further references in the literature about this work were not found.

*This work presents, in a different way, a syntax-oriented decompiler [Hol73]; that is, a decompiler that uses pattern matching of a series of object instructions to reconstruct the original source program. In this case, the compilation grammar needs to be known in order to invert the grammar and generate a decompilation grammar. Note that no optimization is possible if it is not defined as part of the compilation grammar.*
Decomposition of Polish code, 1977, 1981, 1988. Two papers in the area of decompilation of Polish code into Basic code are found in the literature. The problem arises in connection with highly interactive systems, where a fast response is required to every input from the user. The user’s program is kept in an intermediate form, and then “decompiled” each time a command is issued. An algorithm for the translation of reverse Polish notation to expressions is given [BP79].

The second paper presents the process of decompilation as a two step problem: the need to convert machine code to Polish representation, and the conversion of Polish code to source form. The paper concentrates on the second step of the decompilation problem, but yet claims to be decompiling Polish code to Basic code by means of a context-free grammar for Polish notation and a left-to-right or right-to-left parsing scheme [BP81].

This technique was recently used in a decompiler that converted reverse Polish code into spreadsheet expressions [May88]. In this case, the programmers of a product that included a spreadsheet-like component wanted to speed up the product by storing user’s expressions in a compiled form, reverse Polish notation in this case, and decompile these expressions whenever the user wanted to see or modify them. Parentheses were left as part of the reverse Polish notation to reconstruct the exact same expression the user had input to the system.

The use of the word decompilation in this sense is a misuse of the term. All that is being presented in these papers is a method for re-construcing or deparsing the original expression (written in Basic or Spreadsheet expressions) given an intermediate Polish representation of a program. In the case of the Polish to Basic translators, no explanation is given as to how to arrive at such an intermediate representation given a machine program.

G.L. Hopwood, 1978. Hopwood’s PhD dissertation [Hop78] describes a 7-step decompiler designed for the purposes of transferability and documentation. It is stated that the decompilation process can be aided by manual intervention or other external information.

The input program to the decompiler is formatted by a preprocessor, then loaded into memory, and a control flow graph of the program is built. The nodes of this graph represent one instruction. After constructing the graph, control patterns are recognized, and instructions that generate a goto statement are eliminated by the use of either node splitting or the introduction of synthetic variables. The source program is then translated into an intermediate machine independent code, and analysis of variable usage is performed on this representation in order to find expressions and eliminate unnecessary variables by a method of forward substitution. Finally, code is generated for each intermediate instruction, functions are implemented to represent operations not supported by the target language, and comments are provided. Manual intervention was required to prepare the input data, provide additional information that the decompiler needed during the translation process, and to make modifications to the target program.

An experimental decompiler was written for the Varian Data machines 620/i. It decompiled assembler into MOL620, a machine-oriented language developed at University of California at Irvine by M.D. Hopwood and the author. The decompiler was tested
with a large debugger program, Isadora, which was written in assembler. The generated decompiled program was manually modified to recompile it into machine code, as there were calls to interrupt service routines, self-modifying code, and extra registers used for subroutine calls. The final program was better documented than the original assembler program.

The main drawbacks of this research are the granularity of the control flow graph and the use of registers in the final target program. In the former case, Hopwood chose to build control flow graphs that had one node per instruction; this means that the size of the control flow graph is quite large for large programs, and there is no benefit gained as opposed to using nodes that are basic blocks (i.e., the size of the nodes is dependent on the number of changes of flow of control). In the latter case, the MOL620 language allows for the use of machine registers, and sample code illustrated in Hopwood’s dissertation shows that registers were used as part of expressions and arguments to subroutine calls. The concept of registers is not a high-level concept available in high-level languages, and it should not be used if wanting to generate high-level code.

D.A. Workman, 1978. This work describes the use of decompilation in the design of a high-level language suitable for real-time training device systems, in particular the F4 trainer aircraft [Wor78]. The operating system of the F4 was written in assembler, and it was therefore the input language to this decompiler. The output language was not determined as this project was to design one, thus code generation was not implemented.

Two phases of the decompiler were implemented: the first phase, which mapped the assembler to an intermediate language and gathered statistics about the source program, and the second phase, which generated a control flow graph of basic blocks, classified the instructions according to their probable type, and analyzed the flow of control in order to determine high-level control structures. The results indicated the need of a high-level language that handled bit strings, supported looping and conditional control structures, and did not require dynamic data structures or recursion.

This work presents a novel use of decompilation techniques, although the input language was not machine code but assembler. A simple data analysis was done by classifying instructions, but did not attempt to analyze them completely as there was no need to generate high-level code. The analysis of the control flow is complete and considers 8 different categories of loops and 2-way conditional statements.

Zebra, 1981. The Zebra prototype was developed at the Naval Underwater Systems Centre in an attempt to achieve portability of assembler programs. Zebra took as input a subset of the ULTRA/32 assembler, called AN/UYK-7, and produced assembler for the PDP11/70. The project was described by D.L. Brinkley in [Bri81].

The Zebra decompiler was composed of 3 passes: a lexical and flow analysis pass, which parsed the program and performed control flow analysis in the graph of basic blocks. The second pass was concerned with the translation of the program to an intermediate form, and the third pass simplified the intermediate representation by eliminating extraneous loads and stores, in much the same way described by Housel [Hou73, HH73].
It was concluded that it was hard to capture the semantics of the program and that decompilation was economically impractical, but it could aid in the transportation process.

This project made use of known technology to develop a decompiler of assembler programs. No new concepts were introduced by this research, but it raised the point that decompilation is to be used as a tool to aid in the solution of a problem, but not as tool that will give all solutions to the problem, given that a 100% correct decompiler cannot be built.

Decompilation of DML programs, 1982. A decompiler of database code was designed to convert a subset of Codasyl DML programs, written with procedural operations, into a relational system with a nonprocedural query specification. An Access Path Model is introduced to interpret the semantic accesses performed by the program. In order to determine how FIND operations implement semantic accesses, a global data flow reaching analysis is performed on the control flow graph, and operations are matched to templates. The final graph structures are remapped into a relational structure. This method depends on the logical order of the objects and a standard ordering of the DML statements[KW82].

Another decompiler of database code was proposed to decompile well-coded application programs into a proposed semantic representation is described in [DS82]. This work was induced by changes in the use requirements of a Database Management System (DBMS), where application programs were written in Cobol-DML. A decompiler of Cobol-DML programs was written to analyse and convert application programs into a model and schema-independent representation. This representation was later modified or restructured to account for database changes. Language templates were used to match against key instructions of a Cobol-DML programs.

In the context of databases, decompilation is viewed as the process of grouping a sequence of statements which represent a query into another (nonprocedural) specification. Data flow analysis is required, but all other stages of a decompiler are not implemented for this type of application.

Forth Decompiler, 1982, 1984. A recursive Forth decompiler is a tool that scans through a compiled dictionary entry and decompiles words into primitives and addresses[Dud82]. Such a decompiler is considered one of the most useful tools in the Forth toolbox[HM84]. The decompiler implements a recursive descent parser so that decompiled words can be decompiled in a recursive fashion.

These works present a deparsing tool rather than a decompiler. The tool recursively scans through a dictionary table and returns the primitives or addresses associated with a given word.

Software Transport System, 1985. C.W.Yoo describes an automatic Software Transport System (STS) that moves assembler code from one machine to another. The process involves the decompilation of an assembler program for machine $m_1$ to a high-level language, and the compilation of this program in a machine $m_2$ to assembler. An experimental decompiler was developed on the Intel 8080 architecture; it took as input assembler programs and produced PL/M programs. The recompiled PL/M programs were up to 23% more efficient than their assembler counterpart. An experimental
STS was developed to develop a C cross-compiler for the Z-80 processor. The project encountered problems in the lack of data type in the STS[Yoo85].

The STS took as input an assembler program for machine \( m_1 \) and an assembler grammar for machine \( m_2 \), and produced an assembler program for machine \( m_2 \). The input grammar was parsed and produced tables used by the abstract syntax tree parser to parse the input assembler program and generate an abstract syntax tree (AST) of the program. This AST was the input to the decompiler, which then performed control and data flow analyses, in much the same way described by Hollander[Hol73], Friedman[Fri74], and Barbe[Bar74], and finally generated high-level code. The high-level language was then compiled for machine \( m_2 \).

This work does not present any new research into the decompilation area, but it does present a novel approach to the transportation of assembler programs by means of a grammar describing the assembler instructions of the target architecture.

**Decomp, 1988.** J.Reuter wrote decomp, a decompiler for the Vax BSD 4.2 which took as input object files with symbolic information and produced C-like programs. The nature of this decompiler was to port the Empire game to the VMS environment, given that source code was not available. The decompiler is freely available on the Internet[Reu88].

Decomp made use of the symbol table to find the entry points to functions, determine data used in the program, and the names of that data. Subroutines were decompiled one at a time, in the following way: a control flow graph of basic blocks was built and optimised by the removal of arcs leading to intermediate unconditional branches. Control flow analysis was performed in the graph to find high-level control constructs, converting the control flow graph into a tree of generic constructs. The algorithm used by this analysis was taken from the struct program, a program that structures graphs produced by Fortran programs, which was based on the structuring algorithm described by B.Baker in [Bak77]. Finally, the generic constructs in the tree were converted to C-specific constructs, and code was generated. The final output programs required manual modifications to place the arguments on the procedure’s argument list, and determine that a subroutine returned a value (i.e. was a function). This decompiler was written in about 5 man-months[Reu91].

Sample programs were written and compiled in C in a Vax BSD 4.2 machine, thanks to the collaboration of Pete French[Fre91], who provided me with an account in a Vax BSD 4.2 machine. The resulting C programs are not compilable, but require some hand editing. The programs have the correct control structures, due to the structuring algorithm implemented, and the right data type of variables, due to the embedded symbol table in the object code. The names of library routines and procedures, and the user’s program entry point are also known from the symbol table; therefore, no extraneous procedures (e.g. compiler start up code, library routines) are decompiled. The need for a data flow analysis stage is vital, though, as neither expressions, actual arguments, nor function return value are determined. An interprocedural data flow analysis would eliminate much of the hand-editing required to recompile the output programs.

**exe2c, 1990.** The Austin Code Works sponsored the development of the exe2c decompiler, targetted at the PC compatible family of computers running the DOS operating
Decompilation – What has been done?

The project was announced in April 1990 [Gut90], tested by about 20 people, and it was decided that it needed some more work to decompile in C. A year later, the project reached a β operational level [Gut91a], but was never finished [Gut91b]. I was a beta tester of this release.

exe2c is a multipass decompiler that consists of 3 programs: e2a, a2aparse, and e2c. e2a is the disassembler. It converts executable files to assembler, and produces a commented assembler listing as well. e2aparse is the assembler to C front-end processor, which analyzes the assembler file produced by e2a and generates .cod and .glb files. Finally, the e2c program translates the files prepared by a2aparse and generates pseudo-C. An integrated environment, envmnu, is also provided.

Programs decompiled by exe2c make use of a header file that defines registers, types and macros. The output C programs are hard to understand because they rely on registers and condition codes (represented by Boolean variables). Normally, one machine instruction is decompiled into one or more C instructions that perform the required operation on registers, and set up condition codes if required by the instruction. Expressions and arguments to subroutines are not determined, and a local stack is used for the final C programs. It is obvious from this output code that a data flow analysis was not implemented in exe2c. This decompiler has implemented a control flow analysis stage; looping and conditional constructs are available. The choice of control constructs is generally adequate. Case tables are not detected correctly, though. The number and type of procedures decompiled shows that all library routines, and compiler start-up code and runtime support routines found in the program are decompiled. The nature of these routines is normally low-level, as they are normally written in assembler. These routines are hard to decompile as, in most cases, there is no high-level counterpart (unless it is low-level type C code).

This decompiler is a first effort in many years to decompile executable files. The results show that a data flow analysis and heuristics are required to produce better C code. Also, a mechanism to skip all extraneous code introduced by the compiler and to detect library subroutines would be beneficial.

PLM-80 Decompiler, 1991. The Information Technology Division of the Australian Department of Defence researched into decompilation for defence applications, such as maintenance of obsolete code, production of scientific and technical intelligence, and assessment of systems for hazards to safety or security. This work was described by S.T. Hood in [Hoo91].

Techniques for the construction of decompilers using definite-clause grammars, an extension of context-free grammars, in a Prolog environment are described. A Prolog database is used to store the initial assembler code and the recognised syntactic structures of the grammar. A prototype decompiler for Intel 8085 assembler programs compiled by a PLM-80 compiler was written in Prolog. The decompiler produced target programs in Small-C, a subset of the C language. The definite-clause grammar given in this report was capable of recognizing if . . . then type structures, and while() loops, as well as static (global) and automatic (local) variables of simple types (i.e. character, integers, and longs). A graphical user interface was written to display the assembler and pseudo-C programs, and to enable the user to assign variable names,
and comments. This interface also asked the user for the entry point to the main program, and allowed him to select the control construct to be recognized.

The analysis performed by this decompiler is limited to the recognition of control structures and simple data types. No analysis on the use of registers is done or mentioned. Automatic variables are represented by an indexed variable that represents the stack. The graphical interface helps the user document the decompiled program by means of comments and meaningful variable names. This analysis does not support optimized code.

Decompiler compiler, 1991–1994. A decompiler compiler is a tool that takes as input a compiler specification and the corresponding portions of object code, and returns the code for a decompiler; i.e. it is an automatic way of generating decompilers, much in the same way that yacc is used to generate compilers[BBL91, BB91, BB94].

Two approaches are described to generate such a decompiler compiler: a logic and a functional programming approach. The former approach makes use of the bidirectionality of logic programming languages such as Prolog, and runs the specification of the compiler backwards to obtain a decompiler[BBL91, BB91, BBL93]. In theory this is correct, but in practice this approach is limited to the implementation of the Prolog interpreter, and therefore problems of strictness and reversibility are encountered[BB92, BB93]. The latter approach is based on the logic approach but makes use of lazy functional programming languages like Haskell, to generate a more efficient decompiler[BBL91, BB91, BBL93]. Even if a non-lazy functional language is to be used, laziness can be simulated in the form of objects rather than lists.

The decompiler produced by a decompiler compiler will take as input object code and return a list of source codes that can be compiled to the given object code. In order to achieve this, an enumeration of all possible source codes would be required, given a description of an arbitrary inherited attribute grammar. It is proved that such an enumeration is equivalent to the Halting Problem[BB92, BB93], and is therefore non-computable. Even further, there is no computable method which takes an attribute grammar description and decides whether or not the compiled code will give a terminating enumeration for a given value of the attribute[BB92, BB93], so it is not straightforward which grammars can be used. Therefore, the class of grammars acceptable to this method needs to be restricted to those that produce a complete enumeration, such as non left-recursive grammars.

An implementation of this method was firstly done for a subset of an Occam-like language using a functional programming language. The decompiler grammar was an inherited attribute grammar which took the intended object code as an argument[BB92, BB93]. A Prolog decompiler was also described based on the compiler specification. This decompiler applied the clauses of the compiler in a selective and ordered way, so that the problem of non-termination would not be met, and only a subset of the source code programs would be returned (rather than an infinite list)[Bow91, Bow93]. Recently, this method made use of an imperative programming language, C++, due to the inefficiencies of the functional and logic approach. In this prototype, C++ object’s were used as lazy lists, and a set of library functions was written to implement the operators of the intermediate representation used[BB94]. Problems with optimized code have been detected.
As illustrated by this research, decompiler compilers can be constructed automatically if the set of compiler specifications and object code produced for each clause of the specification is known. In general, this is not the case as compiler writers do not disclose their compiler specifications. Only customized compilers and decompilers can be built by this method. It is also noted that optimizations produced by the optimization stage of a compiler are not handled by this method, and that real executable programs cannot be decompiled by the decompilers generated by the method described. The problem of separating instructions from data is not addressed, nor is the problem of determining the data types of variables used in the executable program. In conclusion, decompiler compilers can be generated automatically if the object code produced by a compiler is known, but the generated decompilers cannot decompile arbitrary executable programs.

8086 C Decompiling System, 1991–1993. This decompiler takes as input executable files from a DOS environment and produces C programs. The input files need to be compiled with Microsoft C version 5.0 in the small memory model[FZL93]. Five phases were described: recognition of library functions, symbolic execution, recognition of data types, program transformation, and C code generation. The recognition of library functions and intermediate language was further described in [FZ91, HZY91].

The recognition of library functions for Microsoft C was done to eliminate subroutines that were part of a library, and therefore produce C code for only the user routines. A table of C library functions is built-into the decompiling system. For each library function, its name, characteristic code (sequence of instructions that distinguish this function from any other function), number of instructions in the characteristic code, and method to recognize the function were stored. This was done manually by the decompiler writer. The symbolic execution translated machine instructions to intermediate instructions, and represented each instruction in terms of its symbolic contents. The recognition of data types is done by a set of rules for the collection of information on different data types and analysis rules to determine the data type in use. The program transformation transforms storage calculation into address expressions, e.g. array addressing. Finally, the C code generator transforms the program structure by finding control structures, and generates C code.

This decompiling system makes use of library function recognition to generate more readable C programs. The method of library recognition is hand-crafted, and therefore inefficient if other versions of the compiler, other memory models, or other compilers were used to compile the original programs. The recognition of data types is a first attempt to recognize types of arrays, pointers and structures, but not much detail is given in the paper. No description is given as to how an address expression is reached in the intermediate code, and no examples are given to show the quality of the final C programs.

Alpha AXP Migration Tools, 1993. When Digital Equipment Corporation designed the Alpha AXP architecture, the AXP team got involved in a project to run existing VAX and MIPS code on the new Alpha AXP computers. They opted for a binary translator which would convert a sequence of instructions of the old architecture into a sequence of instructions of the new architecture. The process needed to be fully automatic and to cater for code created or modified during execution. Two
parts to the migration process were defined: a binary translation, and a runtime environment [SCK+93].

The binary translation phase took binary programs and translated them into AXP opcodes. It made use of decompilation techniques to understand the underlying meaning of the machine instructions. Condition code usage analysis was performed as these conditions do not exist on the Alpha architecture. The code was also analyzed to determine function return values and find bugs (e.g. uninitialized variables). MIPS has standard library routines which are embedded in the binary program. In this case, a pattern matching algorithm was used to detect routines that were library routines, such routines were not analysed but replaced by their name. Idioms were also found and replaced by an optimal instruction sequence. Finally, code was generated in the form of AXP opcodes. The new binary file had both, the new code and the old code.

The runtime environment executes the translated code and acts as a bridge between the new and old operating systems (e.g. different calling standards, exception handling). It had a built-in interpreter of old code to run old code not discovered or nonexistent at translation time. This was possible because the old code was also saved as part of the new binary file.

Two binary translators were written: VEST, to translate from the OpenVMS VAX system to the OpenVMS AXP system, and mx, to translate ULTRIX MIPS images to DEC OSF/1 AXP images. The runtime environments for these translators were TIE and mxr respectively.

This project illustrates the use of decompilation techniques in a modern translation system. It proved successful for a large class of binary programs. Some of the programs that could not be translated were programs that were technically infeasible to translate, such as programs that use privileged opcodes, or run with superuser privileges.

Source/PROM Comparator, 1993. A tool to demonstrate the equivalence of source code and PROM contents was developed at the Nuclear Electric plc, UK, to verify the correct translation of PL/M-86 programs into PROM programs executed by safety critical computer controlled systems [PW93].

Three stages are identified: the reconstitution of object code files from the PROM files, the disassembly of object code to an assembler-like form with help from a name-table built up from the source code, and decompilation of assembler programs and comparison with the original source code. In the decompiling stage, it was noted that it was necessary to eliminate intermediate jumps, registers and stack operations, identify procedure arguments, resolve indexes of structures, arrays and pointers, and convert the expressions to a normal form. In order to compare the original program and the decompiled program, an intermediate language was used. The source program was translated to this language with the use of a commercial product, and the output of the decompilation stage was written in the same language. The project proved to be a practical way of verifying the correctness of translated code, and to demonstrate that the tools used to create the programs (compiler, linker, optimizer) behave reliably for the particular safety system analyzed.

This project describes a use of decompilation techniques, to help demonstrate the equivalence of high-level and low-level code in a safety-critical system. The decompilation
stage performs much of the analysis, with help from a symbol table constructed from the original source program. The task is simplified by the knowledge of the compiler used to compile the high-level programs.

In the last years, commercial vendor-specific decompilers have been manufactured. These decompilers are targeted at the decompilation of binary files produced by database languages, such as Clipper and FoxPro. No information on the techniques used to decompile these programs is given by their manufacturers. The following list mentions some of these commercial decompilers:

**Valkyrie, 1993.** Visual decompiler for Clipper Summer ’87, manufactured by CodeWorks [Val93].

**OutFox, 1993.** Decompiler for encrypted FoxBASE+ programs [Out93].

**ReFox, 1993.** Decompiles encrypted FoxPro files, manufactured by Xitech Inc [HHB+93].

**DOC, 1993.** COBOL decompiler for AS/400 and System/38. Converts object programs into COBOL source programs which can be modified by the programmer. Manufactured by Harman Resources [Cob93].

**Uniclip, 1993.** Decompiler for Clipper Summer ’87 EXE files, manufactured by Stro Ware [Unc93].

**Clipback, 1993.** Decompiler for Summer ’87 executables, manufactured by Intelligent Information Systems [Unc93].

**Brillig, 1993.** Decompiler for Clipper 5.X .exe and .obj files, manufactured by APTware [Bri93].
Chapter 3

Run-time Environment

Before considering decompilation, the relations between the static binary code of the program and the actions performed at run-time to implement the program are presented. The representation of objects in a binary program differs between compilers; elementary data types such as integers, characters, and reals are often represented by an equivalent data object in the machine (i.e. a fixed size number of bytes), whereas aggregate objects such as arrays, strings, and structures are represented in various different ways.

Throughout this thesis, the word subroutine is used as a generic word to denote a procedure or a function; the latter two words are used only when there is certainty as to what the subroutine really is, that is, a subroutine that returns a value is a function, and a subroutine that does not return a value is a procedure.

3.1 Storage Organization

A high-level language program is composed of one or more subroutines, called the user subroutines. The corresponding binary program is composed of the user subroutines, library routines that were invoked by the user program, and other subroutines linked in by the linker to provide support for the compiler at run-time. The general format of the binary code of a program is shown in Figure 3-1. The program starts by invoking compiler start-up subroutines that set up the environment for the compiler; this is followed by the user’s main program subroutine, which invokes library routines linked in by the linker; and is finalized by a series of compiler subroutines that restore the state of the machine before program termination.

![Figure 3-1: General Format of a Binary Program](image-url)
For example, a “hello world” C program compiled with Borland Turbo C v2.01 has over 25 different subroutines. The start-up code invokes up to 16 different subroutines to set up the compiler’s environment. The user’s main program is composed of one procedure. This procedure invokes the printf() procedure which then invokes up to 8 different subroutines to display the formatted string. Finally, the exit code invokes 3 subroutines to restore the environment and exit back to DOS. Sample skeleton code for this program is shown in Figure 3-2.

```assembly
helloc proc far
    mov dx, DGROUP ; dx == GROUP segment addr
    mov cs: DGROUP@@, dx
    ; save several vectors and install default divide by zero handler
    call SaveVectors
    ; calculate environment size, determine amount of memory needed,
    ; check size of the stack, return to DOS memory allocated in excess,
    ; set far heap and program stack, reset uninitialized data area,
    ; install floating point emulator
    push cs
    call ds:[__emu1st]
    ; prepare main arguments
    call _setargv@
    call _setenvp@
    ; initialize window sizes
    call ds:[__crt1st]
    ; invoke main(argc, argv, envp)
    push word ptr environ@
    push word ptr _argv@
    push word ptr _argc@
    call main@ ; user’s main() program
    ; flush and close streams and files
    push ax
    call exit@
helloc endp
```

Figure 3-2: Skeleton Code for a “hello world” Program

In a binary program, subroutines are identified by their entry address; there are no names associated with subroutines, and it is unknown whether the subroutine is a procedure or a function before performing a data flow analysis on the registers defined and used by these subroutines. It is said that a subroutine that invokes another subroutine is the caller, and the invoked subroutine is the callee.
3.2 Data Types

3.1.1 The Stack Frame

Each subroutine is associated with a stack frame during run-time. The stack frame is the set of parameters, local variables, and return address of the caller subroutine, as shown in Figure 3-3. The parameters in the stack frame represent the actual parameters of a particular invocation of the subroutine: information on the formal parameters of the subroutine are not stored elsewhere in the binary file. The stack mark represents the return address of the caller (so that control can be transferred to the caller once the callee is finished), and the caller’s frame pointer (register \texttt{bp} in the Intel architecture), which is a reference point for offsets into the stack frame. The local variables represent the space allocated by the subroutine once control has been transferred to it; this space is available to the subroutine only while it is active (i.e. not terminated).

![Figure 3-3: The Stack Frame](image)

Once the frame pointer has been set (i.e. register \texttt{bp}), positive offsets from the frame pointer access parameters and the stack mark, and negative offsets access local variables. The convention used in diagrams relating to the stack frame is as follows: the stack grows downwards from high to low memory, as in the Intel architecture.

The stack frame may also contain other fields, as shown in Figure 3-4. These fields are not used by all languages nor all compilers\cite{ASU86a}. The return value field is used in some languages by the callee to return the value of a function back to the caller; these values are more often returned in registers for efficiency. The control link points to the stack frame of the caller, and the access link points to the stack frame of an enclosing subroutine that holds non-local data that is accessible from this subroutine.

3.2 Data Types

Data objects are normally stored in contiguous memory locations. Elementary data types such as characters, integers, and longs, can be held in registers while an operation is performed on them. Aggregate data types such as arrays, strings, and records, cannot be held in registers in their entirety because their size is normally beyond the size of a register, therefore it is easier to access them through a pointer to their starting address.
The sizes of different data types for the i80286 architecture are shown in Figure 3-5. This machine has a word size of 16 bits. Sizes are given in 8-bit bytes.

<table>
<thead>
<tr>
<th>Data Type</th>
<th>Size (bytes)</th>
</tr>
</thead>
<tbody>
<tr>
<td>character</td>
<td>1</td>
</tr>
<tr>
<td>integer</td>
<td>2</td>
</tr>
<tr>
<td>long</td>
<td>4</td>
</tr>
<tr>
<td>real</td>
<td>4</td>
</tr>
<tr>
<td>long real</td>
<td>8</td>
</tr>
<tr>
<td>near pointer</td>
<td>2</td>
</tr>
<tr>
<td>far pointer</td>
<td>4</td>
</tr>
<tr>
<td>other types</td>
<td>≥ 1</td>
</tr>
</tbody>
</table>

Figure 3-5: Size of Different Data Types in the i80286

### 3.2.1 Data Handling in High-level Languages

Aggregate data types are handled in several different ways by different compilers. This section describes different formats used by C, Pascal, Fortran, and Basic compilers, according to [Mic87].

**Array**

An array is a contiguous piece of memory that holds one or more items of a certain type. Arrays are implemented in memory as a series of rows or columns, depending on the order used by the language:

- Row-major order: the elements of a multidimensional array are stored by row order; that is, one row after the other. This order is used by C and Pascal compilers.
• Column-major order: the elements of a multidimensional array are stored in column order rather than row order. This order is used by Fortran and Basic compilers. Some Basic compilers have a compile option to use row-major order.

In most languages, the size of the array is known at compile time; this is the case of C, Pascal and Fortran. Basic allows for run-time declared array sizes, therefore an array needs to have an array-descriptor to hold the size of the array and a pointer to the physical location in memory where the array is stored.

**String**

A string is a sequence of characters. Different languages use different representations for a string, such as the following:

• C format: a string is an array of bytes terminated by a null character (i.e. 0).

• Fortran format: a string is a series of bytes at a fixed memory location, hence no delimiter is used or needed at the end of the string.

• Pascal format: common Pascal compilers have 2 types of strings: STRING and LSTRING. The former is a fixed-length string and is implemented in the Fortran format. The latter is a variable-length string and is implemented as an array of characters that holds the length of the string in the first byte of the array. Standard Pascal does not have a STRING or LSTRING type.

• Basic format: a string is implemented as a 4-byte string-descriptor; the first 2 bytes hold the length of the string, and the next 2 bytes are an offset into the default data area which holds the string. This area is assigned by Basic’s string-space management routines, and therefore is not a fixed location in memory.

**Record**

A record is a contiguous piece of memory that holds related items of one or more data types. Different names are used for records in different languages; `struct` in C, `record` in Pascal, and user-defined type in Basic. By default, C and Pascal store structures in unpacked storage, word-aligned, except for byte-sized objects and arrays of byte-sized objects. Basic and some C and Pascal compilers store structures in packed storage.

**Complex Numbers**

The Fortran COMPLEX data type stores floating point numbers in the following way:

• COMPLEX*8: 4 bytes represent the real part, and the other 4 bytes represent the floating point number of the imaginary part.

• COMPLEX*16: 8 bytes represent the real part, and the other 8 bytes the imaginary part.
Boolean

The Fortran LOGICAL data type stores Boolean information in the following way:

- LOGICAL*2: 1 byte holds the Boolean value (0 or 1), and the other byte is left unused.
- LOGICAL*4: 1 byte holds the Boolean value, and the other 3 bytes are left unused.

3.3 High-Level Language Interface

Compilers of high-level languages use a series of conventions to allow mixed-language programming, so that a program can have some subroutines written in one language, and other subroutines written in a different language, and all these subroutines are linked in together in the same program. The series of conventions relate to the way the stack frame is set up, and the calling conventions used to invoke subroutines.

3.3.1 The Stack Frame

The stack mark contains the caller’s return address and frame pointer. The return address varies in size depending on whether the callee is invoked using a near or far call. Near calls are within the same segment and therefore can be referenced by an offset from the current segment base address. Far calls are in a different segment, so both segment and offset of the callee are stored. For a 2-byte machine word architecture, the near call stores 2 bytes for the offset of the caller, whereas the far call stores 4 bytes for the segment and offset of the caller. Register $bp$ is used as the frame pointer, the contents of the caller’s frame pointer is pushed onto the stack at subroutine entry so that it can be restored at subroutine termination.

Entering a Subroutine

Register $bp$ is established as the frame pointer by pushing its address onto the stack (i.e. storing the frame pointer of the caller on the stack), and copying the current stack pointer register ($sp$) to $bp$. The following code is used in the i80286 architecture:

```assembly
push bp ; save old copy of bp
mov bp, sp ; bp == frame pointer
```

Allocating Local Data

A subroutine may reserve space on the stack for local variables. This is done by decrementing the contents of the stack register $sp$ by an even amount of bytes. For example, to allocate space for 2 integer variables, 4 bytes are reserved on the stack:

```assembly
sub sp, 4
```
Preserving Register Values

The most widely used calling convention for DOS compilers demands that a subroutine should always preserve the values of registers si, di, ss, ds, and bp. If any of these registers is used in the callee subroutine, their values are pushed onto the stack, and restored before subroutine return. For example, if si and di are used by a subroutine, the following code is found after local data allocation:

```
push si
push di
```

Accessing Parameters

Parameters are located at positive offsets from the frame pointer register, bp. In order to access a parameter n, the offset from bp is calculated by adding the size of the stack mark, plus the size of the parameters between bp and parameter n, plus the size of parameter n.

Returning a Value

Functions returning a value in registers use different registers according to the size of the returned value. Data values of 1 byte are returned in the al register, 2 bytes are returned in the ax register, and 4 bytes are returned in the dx:ax registers, as shown in Figure 3-6.

<table>
<thead>
<tr>
<th>Data Size (bytes)</th>
<th>Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>AL</td>
</tr>
<tr>
<td>2</td>
<td>AX</td>
</tr>
<tr>
<td>4</td>
<td>DX = high byte</td>
</tr>
<tr>
<td></td>
<td>AX = low byte</td>
</tr>
</tbody>
</table>

Figure 3-6: Register Conventions for Return Values

Larger data values are returned using the following conventions:

- Function called by C: the callee must allocate space from the heap for the return value and place its address in dx:ax.
- Function called by Pascal, Fortran or Basic: the caller reserves space in the stack segment for the return value, and pushes the offset address of the allocated space on the stack as the last parameter. Therefore, the offset address of the return value is at bp + 6 for far calls, and bp + 4 for near calls, as shown in Figure 3-7.

Exiting the Subroutine

The stack frame is restored by popping any registers that were saved at subroutine entry, deallocating any space reserved for local variables, restoring the old frame pointer (bp), and returning according to the convention in use.

- C convention: the caller adjusts the stack for any parameters pushed on the stack. A ret instruction is all that is needed to end the subroutine.
3.3.2 Parameter Passing

Three different methods are used to pass parameters on the Intel architecture under the DOS operating system: C, Pascal, and register calling conventions. Mixtures of these calling conventions are available in other operating systems and architectures. For example, in OS/2, the standard call uses C ordering to pass parameters, but the callee cuts back the parameters from the stack in system calls.

C Calling Convention

The caller is responsible for pushing the parameters on the stack, and restoring them after the callee returns. The parameters are pushed in right to left order, so that a variable number of parameters can be passed to the callee. For example, for a C function prototype void procX (int, char, long), and a caller procedure that invokes the procX() procedure:

```c
procN()
{ int i; /* bp - 8 */
  char c; /* bp - 6 */
  long l; /* bp - 4 */
  procX (i, c, l);
}
```
the following assembler code is produced:

```assembly
push word ptr [bp-2] ; high word of l
push word ptr [bp-4] ; low word of l
push [bp-6] ; c
push word ptr [bp-8] ; i
call procX ; call function
add sp, 8 ; restore the stack
```

Note that due to word alignment, the character c is stored as 2 bytes on the stack even though its size is one byte only.

### Pascal Calling Convention

The caller is responsible for pushing the arguments on the stack, and the callee adjusts the stack before returning. Arguments are pushed on the stack in left to right order, hence a fixed number of arguments are used in this convention. For the previous example, the calling of `procX (i, c, l)` produces the following assembler code in Pascal convention:

```assembly
push word ptr [bp-8] ; i
push [bp-6] ; c
push word ptr [bp-2] ; high word of l
push word ptr [bp-4] ; low word of l
call procX ; call procX (procX restores stack)
```

### Register Calling Convention

This convention does not push arguments on the stack but passes them in registers, therefore the generated code is faster. Predetermined registers are used to pass arguments between subroutines, and different registers are used for different argument sizes. Figure 3-8 shows the set of registers used by Borland Turbo C++ [Bor92]; a maximum of 3 parameters can be passed in registers. Far pointers, unions, structures, and real numbers are pushed on the stack.

<table>
<thead>
<tr>
<th>Parameter Type</th>
<th>Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>Character</td>
<td>al, dl, bl</td>
</tr>
<tr>
<td>Integer</td>
<td>ax, dx, bx</td>
</tr>
<tr>
<td>Long</td>
<td>dx:ax</td>
</tr>
<tr>
<td>Near pointer</td>
<td>ax, dx, bx</td>
</tr>
</tbody>
</table>

![Figure 3-8: Register Parameter Passing Convention](image)

### 3.4 Symbol Table

A decompiler uses a symbol table to store information on variables used throughout the program. In a binary program, variables are identified by their address; there are no names
associated with variables. Variables that have a physical memory address are global variables; their segment and offset are used to access them. Variables that are located at a negative offset from the frame pointer are local variables to the corresponding stack frame’s subroutine, and variables at positive offsets are actual arguments to the subroutine. Since register variables are used by compilers for efficiency purposes, all registers are also considered variables initially; further analysis on registers determines whether they represent register variables or not (see Chapter 5, Section 5.2.9). Variables are assigned unique names during code generation, as explained in Chapter 7.

The symbol table must be able to provide information on an entry efficiently, and handle a varying number of variables; hence, a symbol table that grows dynamically if necessary is desirable. The performance of the symbol table is measured in terms of the time taken to access an entry and insert a new item to the table.

### 3.4.1 Data Structures

Symbol tables are represented by a variety of data structures. Some are more efficient than others, at the expense of more coding. To illustrate the differences between the various data structures, let us assume the following data items are to be placed in the symbol table:

- `cs:01F8` ; global variable
- `bp + 4` ; parameter
- `bp - 6` ; local variable
- `ax` ; register ax
- `bp - 2` ; local variable

**Unordered List**

An unordered list is a linked-list or an array of data items. Items are stored in the list in a first-in basis (i.e. on the next available position). An array implementation presents the limitation of size; these limitations are avoided by the use of a linked-list implementation. An access to this symbol table, for a list of `n` items is `O(n)`. Figure 3-9 shows the list built for our example.

![Figure 3-9: Unordered List Representation](image)

**Ordered List**

An ordered list is easier to access, since not all items of the list need to be checked to determine whether an item is already in the list or not. Ordered lists can be searched using a binary search, which provides an access time of `O(log n)`. Insertion of an item is costly, since the list needs to remain ordered.
Since there are different types of variables in a binary program, and these items are identified in a different way based on their type, an ordering within a type is possible, but the four different types must be access independently. Figure 3-10 shows the ordered list representation of our example: a record that determines the type of the data item is used first, and each of the data types has an ordered list associated with it.

![Figure 3-10: Ordered List Representation](image)

**Hash Table**

A hash table is a mapping between a fixed number of positions in a table, and a possibly large number of variables. The mapping is done via a hashing function which is defined for all possible variables, can be computed quickly, provides an uniform probability for all variables, and randomizes similar variables to different table locations.

In open hashing, a hash table is represented by an array of a fixed size, and a linked-list attached to each array position (bucket). The linked-list holds different variables that hash to the same bucket. Figure 3-11 shows the hash table built for our example; as for ordered lists, a record which determines the type of the variable is used first, and a hash table is associated with each different variable type.

![Figure 3-11: Hash Table Representation](image)
Symbol Table Representation for Decompilation

A combination of the abovementioned methods is used for the purposes of decompilation. The symbol table is defined in terms of the different types of variables; global, local, parameter, and register. Each of these types is implemented in a different way. For global variables, since their address range is large, a hash table implementation is most suited. For local variables and parameters, since these variables are offsets from the frame pointer, and are always allocated in an ordered way (i.e. there are no “gaps” in the stack frame), they are implemented by an ordered list on the offset; the register \( \text{bp} \) does not need to be stored since it is always the same. Finally, for registers, since there is a fixed number of registers, an array indexed by register number can be implemented; array positions that have an associated item represent registers that are defined in the symbol table. This representation is shown in Figure 3-12.

![Figure 3-12: Symbol Table Representation](image)

Symbol tables are widely discussed in the literature, refer to [ASU86a, FJ88a, Gou88] for more information on symbol tables from a compiler point of view.
Chapter 4

The Front-end

The front-end is a machine dependent module which takes as input the binary source program, parses the program, and produces as output the control flow graph and intermediate language representation of the program. The phases of the front-end are shown in Figure 4-1.

Unlike compilers, there is no lexical analysis phase in the front-end of the decompiler. The lexical analyzer or scanner is the phase that groups characters from the input stream into tokens. Specialized tools such as lex and scanGen have been designed to help automate the construction of scanners for compilers[FJ88a]. Given the simplicity of machine language, there is no need to scan the input bytes to recognize different words that belong to the language; all information is stored in terms of bytes or bits from a byte, and it is not possible to determine what a particular byte represents (i.e. opcode, register, offset) out of context. The syntax analyzer determines what a series of bytes represent based on the language description of the machine language.

4.1 Syntax Analysis

The syntax analyzer is the first phase of the decompiler. Its role is to group a sequence of bytes into a phrase or sentence of the language. This sequence of bytes is checked for syntactic structure, that is, that the string belongs to the language. Valid strings are represented by a parse tree, which is input into the next phase, the semantic analyzer. The
The relation between the syntax analyzer and the semantic analyzer is shown in Figure 4-2. The syntax analyzer is also known as the parser.

![Figure 4-2: Interaction between the Parser and Semantic Analyzer](image)

The syntax of a machine language can be precisely specified by a grammar. In machine languages, only instructions or statements are specified; there are no control structures as in high-level languages. In general, a grammar provides a precise notation for specifying any language.

The main difficulty of a decompiler parser is the separation of code from data, that is, the determination of which bytes in memory represent code and which ones represent data. This problem is inherent to the von Neumann architecture, and thus, needs to be addressed by means of heuristic methods.

**Syntax Errors**

Syntax errors are seldom found in binary programs, as compilers generate correct code for a compiled program to run on a machine. But, given that upgrades of a machine architecture result in new machines that support all predecessor machines, the machine instruction set of the new architecture is an extension of the instruction set of the old architecture. This is the case of the i80486, which supports all predecessor i8086, i80186, i80286, and i80386 instruction sets. Therefore, if a parser is written for the i8086, all new machine instructions are not recognized by the parser and must result in an error. On the other hand, if the parser is written for the newest machine, the i80486, all instructions should be recognized and no syntactic errors are likely to be encountered.

**4.1.1 Finite State Automaton**

A Finite State Automaton (FSA) is a recognizer for a language. It takes as input a string, and answers *yes* if the string belongs to the language and *no* otherwise. A string is a sequence of symbols of a given alphabet. Given an arbitrary string, an FSA can determine whether the string belongs to the language or not.

**Definition 1** *A Finite State Automaton is a mathematical model that consists of:*

- *A finite set of states S*
- *An initial state s₀*
4.1 Syntax Analysis

- A set of final or accept states $F$
- An alphabet of input symbols $\Sigma$
- A transition function $T$: state $\times$ symbol $\rightarrow$ state

An FSA can be graphically represented by transition diagrams. The components of these diagrams are shown in Figure 4-3. The symbols from the alphabet label the transitions. Error transitions are not explicitly represented in the diagram, and it is assumed that any non-valid symbol out of a state labels a transition to an error state.

![Figure 4-3: Components of a FSA Transition Diagram](image)

A **wildcard language** is a meta-language used to specify wildcard conditions in a language[Gou88]. Two meta-symbols are used ‘*’ and ‘%’. The meta-symbol ‘*’ represents any sequence of zero or more symbols from the alphabet $\Sigma$, and ‘%’ represents any single symbol from $\Sigma$.

**Example 1** Let $\Sigma = \{a, b, c\}$. The language that accepts all strings starting with an $a$ is described by the wildcard expression $aa^*$, and is represented in an FSA in the following way:

![Diagram](image)

**Non-deterministic Finite State Automaton**

An FSA is said to be non-deterministic (NFSA) whenever there are two or more transitions out of a state labelled with the same symbol, or when the empty string ($\varepsilon$) labels a transition. In these cases, the next state is not uniquely identified by a (state, symbol) tuple.

**Deterministic Finite State Automaton**

A deterministic finite state automaton (DFSA) is a FSA that has no transitions labelled by the $\varepsilon$ string, and that uniquely identifies or determines the next state of a (state, symbol) tuple.

Any NFSA can be converted into an equivalent DFSA by a method of *subset construction*. This method has been explained in the literature, for example, refer to [ASU86b, Gou88, FJ88a] for details. A method to construct a minimum-state DFSA is also described in [ASU86b].
4.1.2 Finite State Automatons and Parsers

Any machine language can be represented by an FSA that accepts or rejects arbitrary strings. The alphabet $\Sigma$ is the finite set of hexadecimal numbers 00..FF (i.e., numbers represented by a byte), and a string is a sequence of bytes. The strings that belong to this language are those instructions recognized by the particular machine language.

**Example 2** An FSA to recognize the i80286 machine instruction

\[ \text{83E950 ; sub cx, 50} \]

needs to first determine that 83 is an opcode (\text{sub}), and that it takes two or more bytes as operands. The second byte encodes the destination register operand (lower 3 bits), and how many bytes of other information there are after this byte: whenever the upper two bits are 0 or 2, 2 more bytes follow this second byte, if these bits represent 1, 1 byte follows the second byte, otherwise there are no more bytes as part of the destination operand. In our example, the lower three bits are equivalent to 1, which is register \text{cx}, and the upper two bits are 3, which means there are no more bytes as part of the destination operand. Finally, the last byte is the immediate constant operand, 50 in this example. The FSA for this example is shown in Figure 4-4.

![Figure 4-4: FSA example](image)

A machine language can also be described in terms of a context-free grammar (CFG); as regular expressions are a subset of context-free grammars. An algorithm to mechanically convert an NFSA into a CFG is presented in [ASU86a]. CFGs are used to specify high-level constructs that have an inherent recursive structure, and since machine languages do not make use of recursive constructs, it is not necessary to define them in terms of CFGs.

4.1.3 Separation of Code and Data

Given the entry point to a program, it is the function of the syntax analyzer to parse machine instructions following all possible paths of the program. The main problem faced by the parser is that data and code are represented in the same way in a von Neumann machine, thus it is not easy to determine if the byte(s) that follows an instruction belongs to another instruction or represents data. Heuristic methods need to be used in order to determine data from code, as explained in this section.

Once the source binary program has been loaded into memory, the loader returns the initial start address of the program in memory. This address is the starting address for the complete binary program, and thus, must be the address of an instruction in order for the program to run. Furthermore, if the binary program has been checked against compiler signatures, the initial starting address is the entry point to the \text{main} of the program; i.e.
the start address of the user-written program, skipping all compiler start up code. Figure 4-5 illustrates sample code for a “hello world” program. The entry point returned by the loader is CS:0000, which is the entry point to the complete program (including compiler start up code). The entry point given by the compiler signature analyzer is CS:01FA, which is the start address of the main program. Throughout this thesis we will assume the entry point is the one given by the compiler signature analyzer without loss of generality. The explained methods are applicable to both cases, but more interesting examples are found in the latter case. The technique for generating compiler signatures and detecting them is given in Chapter 8.

```
hello proc far
CS:0000 start: mov dx,*
CS:0003 mov cs:*,dx
... ... ; start-up code
CS:011A call _main
CS:011D ... ; exit code
    hello endp
... ...
_main proc near
CS:01FA push bp
CS:01FB mov bp,sp
CS:01FD mov ax,194h
CS:0200 push ax
CS:0201 call _printf
CS:0204 pop cx
CS:0205 pop bp
CS:0206 ret
    _main endp
```

Figure 4-5: Sample Code for a “hello world” Program

A paper by R.N. Horspool and N. Marovac focused on the problem of separation of code from data. This paper mentioned that this problem is equivalent to the halting problem, as it is impossible to separate data from instructions in a von Neumann architecture that computes both data addresses and branch destination addresses at execution time[HM79]. An algorithm to find the maximum set of locations holding instructions was given. This modification of the original problem is equivalent to a combinatorial problem of searching for a maximal set of trees out of all the candidate trees, for which a branch-and-bound method is applied. The algorithm is proved to be NP-Complete.

As it turns out, in dense machine instruction sets (such as in the Intel architecture), the given algorithm does not work, as almost any byte combination is a valid machine instruction, and therefore it is hard to determine the bounds of the code since it is hard to know when data has been reached. A simple counter-example to this algorithm is given by a case
table stored in the code segment (see Figure 4-6). After the indexed jump instruction at CS:0DDDB, which indexes into the case table, the table itself is defined starting at CS:0DE0, but yet it is treated as code by the algorithm as it includes valid bytes for instructions. In this i80286 code example, 0E is equivalent to `push CS`, 2B is equivalent to `sub` which takes one other byte as argument, 0E in this case, to result in `sub ax,[bp]`, and so on. The produced code is therefore wrong.

```
CS:0DDDB jmp CS:0DE0[bx]
CS:0DE0 0E2B ; push CS
CS:0DE2 0E13 ; sub ax,[bp]
...  
```

Figure 4-6: Counter-example

This section presents a different method to determine code from instructions in a binary program that has been loaded into memory. It provides heuristic methods to determine special cases of data found in between sections of code.

**The Process**

As previously mentioned, the process of determining data from code is based on the knowledge that the initial entry point to the program is an instruction. From this instruction onwards, instructions are parsed sequentially along this path, until a change in the flow of control or an end of path is reached. In the former case, the target address(es) acts as new entry points into memory, as the address must hold a valid instruction in order for the program to continue execution. In the latter case, the end of the current path is reached and no more instructions are scanned along this path as we cannot determine whether these next bytes are code or data.

Changes in the flow of control are due to jumps and procedure calls. A *conditional jump* branches the flow of control in two: the target branch address is followed whenever the condition is true, otherwise the address following the conditional branch is followed. Both paths are followed by the parser in order to get all possibly executable code. An *unconditional jump* transfers the flow of control to the target address; this unique path is followed by the parser. A *procedure call* transfers control to the invoked procedure, and once it returns, the instructions following the procedure call are parsed. In the case that the procedure does not return, the bytes following the procedure call are not parsed as it is not certain what these bytes are (code or data).

An end of path is reached whenever a procedure return instruction or an end of program is met. The end of program is normally specified by a series of instructions that make the operating system terminate the current process (i.e. the program). This sequence of instructions varies between operating systems, so they need to be coded for the specific source machine. Determining whether the end of program is met (i.e. the program finishes
or halts) is not equivalent to solving the halting problem though, as the path that is being followed is not necessarily a path that the executable program will follow, i.e. the condition that branches onto this path might never become true during program execution; for example, programs in an infinite loop.

**Example 3** On the Intel architecture, the end of a program is specified via interrupt instructions. There are different methods to terminate a program, some of these methods make use of the program segment prefix, commonly referred to as the PSP; refer to Appendix B for more information on the PSP. There are 7 different ways of terminating a program under DOS:

1. **Terminate process with return code:** `int 21h, function 4Ch`. This is the most commonly used method in `.exe` files.

2. **Terminate process:** `int 20h`. The code segment, `cs`, needs to be pointing to the PSP. This method is normally used in `.com` files as `cs` already points to the PSP segment.

3. **Warm boot/Terminate vector:** offset 00h in the PSP contains an `int 20h` instruction. Register `cs` must be pointing to the PSP segment.

4. **Return instruction:** the return address is placed on the stack before the program starts. When the program is to be finished, it returns to this address on the stack. This method was used in the CP/M operating system as the address of the warm boot vector was on the stack. Initial DOS `.com` programs made use of this technique.

5. **Terminate process function:** `int 21h, function 00h`. Register `cs` must point to the PSP.

6. **Terminate and stay resident:** `int 27h`. Register `cs` must point to the PSP.

7. **Terminate and stay resident function:** `int 21h, function 31h`.

Determining whether a procedure returns (i.e. finishes or halts) or not is difficult, as the procedure could make use of self-modifying code or execute data as code and terminate in an instruction within this data. In general, we are interested in a solution for normal cases, as aberrant cases require a step debugger tool and user input to solve the problem. A procedure does not return if it reaches the end of program or invokes a procedure that reaches the end of program (e.g. a procedure that invokes `exit(.)` in C). Determining whether a procedure has reached the end of program is possible by emulation of the contents of the registers that are involved in the sequence of instructions that terminate the program. In the case of Example 3, keeping track of registers `ah`, and `cs` in most cases.

This initial algorithm for separation of code from data is shown in Figure 4-7. In order to keep track of registers, a `machState` record of register values is used. A `state` variable of this type holds the current values of the registers (i.e. the current state of the machine). A bitmap of 2 bits per memory byte is used to store information regarding each byte that was loaded into memory:

- **0:** represents an unknown value (i.e. the memory location has not been analyzed).
- **1:** represents a code byte.
2: represents a data byte.

3: represents a byte that is used as both, data and code.

The algorithm is implemented recursively. Each time a non fall-through path needs to be followed, a copy of the current state is made, and the path is followed by a recursive call to the parse procedure with the copy of the state.

**Indirect Addressing Mode**

The indirect addressing mode makes use of the contents of a register or memory location to determine the target address of an instruction that uses this addressing mode. Indirect addressing mode can be used with the unconditional jump (e.g. to implement indexed case tables) and the procedure call instructions. The main problem with this addressing mode is that the contents of memory can be changed during program execution, and thus, a static analysis of the program will not provide the right value, and is not able to determine if the memory location has been modified. The same applies to register contents, unless the contents of registers is being emulated, but again, if the register is used within a loop, the contents of the register will most likely be wrong (unless loops are emulated also).

In the i80286, an indirect instruction can be intra-segment or inter-segment. In the former case, the contents of the register or memory location holds a 16-bit offset address, in the latter case, a 32-bit address (i.e. segment and offset) is given.

Indirect procedure calls are used in high-level languages like C to implement pointers to function invocation. Consider the following C program:

```c
typedef char (*tfunc)();
tfunc func[2] = {func1, func2};

char func1() { /* some code here */}
char func2() { /* some code here */}

main()
{
    func[0]();
    func[1]();
}
```

In the main program, functions func1() and func2() are invoked by means of a function pointer and an index into the array of such functions. The disassembled code of this program looks like this:

```
CS:0094 B604 ; address of proc1 (04B6)
CS:0098 C704 ; address of proc2 (04C7)
...
proc_1 PROC FAR
CS:04B6 55 push bp
...
CS:04C6 CB retf
```
procedure parse (machState *state)
    done = FALSE;
    while (! done)
        getNextInst (state, &inst);
        if (alreadyParsed (inst)) /* check if instruction already parsed */
            done = TRUE;
            break;
        end if
        setBitmap (CODE, inst);
        case (inst.opcode) of
            conditional jump:
                *stateCopy = *state;
                parse (stateCopy); /* fall-through */
                state->ip = targetAdr (inst); /* target branch address */
                if (hasBeenParsed (state->ip)) /* check if code already parsed */
                    done = TRUE;
                end if
            unconditional jump:
                state->ip = targetAdr (inst); /* target branch address */
                if (hasBeenParsed (state->ip)) /* check if code already parsed */
                    done = TRUE;
                end if
            procedure call:
                /* Process non-library procedures only */
                if (! isLibrary (targetAdr (inst)))
                    *stateCopy = *state;
                    stateCopy->ip = targetAdr (inst);
                    parse (stateCopy); /* process target procedure */
                end if
            procedure return:
                done = TRUE; /* end of procedure */
            move:
                if (destination operand is a register)
                    updateState (state, inst.sourceOp, inst.destOp);
                end if
            interrupt:
                if (end of program via interrupt)
                    done = TRUE; /* end of program */
                end if
        end case
    end while
end procedure

Figure 4-7: Initial Parser Algorithm
The function pointers have been replaced by the memory offset of the address that holds the address of each procedure (i.e. 04B6 and 04C7 respectively). If these addresses have not been modified during program execution, checking the contents of these memory locations provides us with the target address of the function(s). This is the implementation that we use. The target address of the function is replaced in the procedure call instruction, and an invocation to a normal procedure is done in our decompiled C program, as follows:

```c
void proc_1() {/* some code */}
void proc_2() {/* some code */}

void main()
{
    proc_1();
    proc_2();
}
```

### Case Statements

High-level languages implement multiway (or n-way) branches via a high-level construct known as a case statement. In this construct, there are n different possible paths to be executed (i.e. n different branches). There is no low-level machine instruction to represent this construct, therefore different methods are used by compiler writers to define a case table.

If the number of cases is not too great (i.e. less than 10), a case is implemented by a sequence of conditional jumps, each of which tests for an individual value and transfers control to the code for the corresponding statement. Consider the following fragment of code in assembler

```assembly
cmp al, 8       ; start of case
je  lab1
cmp al, 7Fh
```
In this code fragment, register al is compared against 5 different byte values, if the result is equal, an unconditional jump is performed to the label that handles the case. If the register is not equal to any of the 5 options, the program unconditionally jumps to the end of the case.

A more compact way to implement a case statement is to use an indexed table that holds n target label addresses; one for each of the corresponding n statements. The table is indexed into by an indexed jump instruction. Before indexing into the table, the lower and upper bounds of the table are checked for, so that no erroneous indexing is done. Once it has been determined that the index is within the bound, the indexed jump instruction is performed. Consider the following fragment of code:

```
cs:0DCF cmp ax, 17h ; 17h == 24
cs:0DD2 jbe startCase
```

```
cs:0DD4 jmp endCase
```

```
cs:0DD7 startCase:
    mov bx, ax
```

```
cs:0DD9 shl bx, 1
```

```
cs:0DDB jmp word ptr cs:0DE0[bx] ; indexed jump
```

```
cs:0DE0 OE13 ; dw lab1 ; start of indexed table
```

```
cs:0DE2 OE1F ; dw lab2
```

```
cs:0E0E 11F4 ; dw lab24 ; end of indexed table
cs:0E10 lab1:
```

```
cs:11C7 lab24:
```

```
cs:11F4 endCase: ; end of case
```

The case table is defined in the code segment as data, and is located immediately after the indexed jump and before any target branch labels. Register ax holds the index into the table. This register is compared against the upper bound, 24. If the register is greater than 24, the rest of the sequence is not executed and the control is transferred to labZ, the first instruction after the end of the case. On the other hand, if the register is less or equal to
24, labA is reached and register bx is set up as the offset into the table. Since the size of the word is 2, the case table has offset labels of size 2, so the initial index into the table is multiplied by two to get the correct offset into the 2-byte table. Once this is done, the indexed jump instruction determines that the case table is in segment cs and offset 0DE0 (i.e. the next byte in memory in this case). Therefore, the target jump address is any of the 24 different options available in this table.

A very similar implementation of case statements is given by a case table that is located after the end of the procedure, and the index register into the table is the same as the register that holds the offset into the table (register bx in the following fragment of code):

```
cs:0BE7    cmp   bx, 17h      ; 17h == 24
cs:0BEE    jbe  startCase
cs:0BEF    jmp   jumpEnd

startCase:
    shl   bx, 1
    jmp   word ptr cs:0FB8[bx]     ; indexed jump

jumpEnd:
    jmp   endCase

lab1:
...

lab24:
...

endCase:                       ; end of case
...

ret                       ; end of procedure

lab1:                       ; start of indexed table
lab2:
...

lab31:
...

lab2:
...

endCase:                     ; Start of indexed table
lab2:
...

lab31:
...

endCase:                     ; End of indexed table
```

A third way to implement a case statement is to have the case table following all indexed branches. In this way, the code jumps over all target jump addresses, checks for upper bounds of the indexed table (31 in the following fragment of code), adjusts the register that indexes into the table, and branches to this location:

```
cs:0C65    jmp   startCase
...

lab5:
...

lab31:
...

lab2:
...

lab8:      1403    ; dw endCase    ; Start of indexed table
lab8:      13B8    ; dw lab2
...

lab31:     13F4    ; dw lab31    ; End of indexed table
```

```
cmp ax, 1Fh ; 1Fh == 31

cs:13F9 jae endCase

cs:13FB xchg ax, bx

cs:13FC shl bx, 1

cs:13FE jmp word ptr cs:13B8[bx] ; indexed jump

cs:1403 endCase:

... 

cs:1444 ret

A different implementation of case statements is by means of a string of character options, as opposed to numbers. Consider the following code fragment:

cs:246A 4C6C68464E6F785875646973 ; db 'LlhFNoxFudispncpeEfgG%'

cs:2476 63706E6566756C6525 ; dw lab1 ; start of table

cs:2481 2573 ; dw lab2

... ...

cs:24A7 24DF ; dw lab21

... ...

cs:24C4 procStart:

push bp

... ...

cs:2555 mov di, cs

cs:2557 mov es, di ; es = cs

cs:2559 mov di, 246Ah ; di = start of string

cs:255C mov cx, 15h ; cx = upper bound

cs:255F repne scasb

cs:2561 sub di, 246Bh

cs:2565 shl di, 1

cs:2567 jmp word ptr cs:247F[di] ; indexed jump

cs:256C lab1:

... ...

cs:26FF lab12:

... ...

cs:2714 ret

The string of character options is located at cs:246A. Register al holds the current character option to be checked, es:di points to the string in memory to be compared against, and the repne scasb instruction finds the first match of register al in the string pointed to by es:di. Register di is left pointing to the character after the match. This register is then subtracted from the string’s initial address plus one, and it now indexes into an indexed jump table located before the procedure on the code segment. This method is compact and elegant.

Unfortunately, there is no fixed representation of a case statement, and thus, the binary code needs to be manually examined in the first instance to determine how the case statement was implemented. Different compilers use different implementations, but normally a specific vendor’s compiler uses only one or two different representations of case tables.
The determination of a case table is a heuristic method that handles a predefined set of generalized implementations. The more implementation methods that are handled by the decompiler, the better output it can generate. As heuristic methods are used, the right preconditions need to be satisfied before applying the method; i.e. if and indexed table is met and the bound of the indexed table cannot be determined, the proposed heuristic method cannot be applied.

**Final Algorithm**

The final algorithm used for data/code separation is shown in Figure 4-8. The algorithm is based on the algorithm of Figure 4-7, but expands on the cases of indexed jumps and indirect jumps and calls.

### 4.2 Semantic Analysis

The semantic analysis phase determines the meaning of a group of machine instructions, collects information on the individual instructions of a subroutine, and propagates this information across the instructions of the subroutine. In this way, base data types such as integers and long integers are propagated across the subroutine. The relation of this phase with the syntax analyzer and intermediate code generator is shown in Figure 4-9.

**Definition 2** An identifier (<ident>) is either a register, local variable (negative offset from the stack), parameter (positive offset from the stack), or a global variable (location in memory).

#### 4.2.1 Idioms

The semantic meaning of a series of instructions is sometimes given by an idiom. These are sequences of instructions that represent a high-level instruction.

**Definition 3** An idiom is a sequence of instructions that has a logical meaning which cannot be derived from the individual instructions.

Most idioms are widely known to the compiler community, as they are a series of instructions that perform an operation in a unique or more efficient way than doing it with different instructions. The following sections illustrate some of the best known idioms.

### Subroutine Idioms

When entering a subroutine, the base register, \( bp \), is established to be the frame pointer by copying the value of the stack pointer (\( sp \)) into \( bp \). The frame pointer is used to access parameters and local data from the stack within that subroutine. This sequence of instructions is shown in Figure 4-10. The high-level language subroutine prologue sets up register \( bp \) to point to the current stack pointer, and optionally allocates space on the stack for local static variables, by decreasing the contents of the stack pointer \( sp \) by the required number of bytes. This idiom is represented by an enter instruction that takes the number of bytes reserved for local storage.
procedure parse (machState *state)
  done = FALSE;
  while (! done)
    getNextInst (state, &inst);
    if (alreadyParsed (inst)) /* check if instruction already parsed */
      done = TRUE; break;
    end if
    setBitmap (CODE, inst);
  case (inst.opcode) of
  conditional jump:
    *stateCopy = *state;
    parse (stateCopy); /* fall-through */
    state->ip = targetAdr (inst); /* target branch address */
    if (hasBeenParsed(state->ip)) /* check if code already parsed */
      done = TRUE;
    end if
  unconditional jump:
    if (direct jump)
      state->ip = targetAdr(inst); /* target branch address */
      if (hasBeenParsed(state->ip)) /* check if code already parsed */
        done = TRUE;
      end if
    else /* indirect jump */
      check for case table, if found, determine bounds of the table.
      if (bounds determined)
        for (all entries i in the table)
          *stateCopy = *state;
          stateCopy->ip = targetAdr(targetAdr(table[i]));
          parse (stateCopy);
        end for
      else /* cannot continue along this path */
        done = TRUE;
      end if
    end if
  procedure call: /* Process non-library procedures only */
    if (! isLibrary (targetAdr (inst)))
      *stateCopy = *state;
      if (direct call)
        stateCopy->ip = targetAdr(inst);
      else /* indirect call */
        stateCopy->ip = targetAdr(targetAdr(inst));
      end if
      parse (stateCopy); /* process target procedure */
    end if
  end case
  end while
end procedure

Figure 4-8: Final Parser Algorithm
Once the subroutine prologue is encountered, any *pushes* on the stack represent registers whose values are to be preserved during this subroutine. These registers could act as register variables (i.e. local variables) in the current subroutine, and thus are flagged as possibly being register variables. Figure 4-11 shows registers `si` and `di` begin pushed on the stack.

Finally, to exit a subroutine, any registers saved on the stack need to be popped, any data space that was allocated needs to be freed, `bp` needs to be restored to point to the old frame pointer, and the subroutine then returns with a near or far return instruction. Figure 4-12 shows sample trailer code.

**Calling Conventions**

The C calling convention is also known as the C parameter-passing sequence. In this convention, the caller pushes the parameters on the stack, in the reverse order in which they appear in the source code (i.e. right to left order), and then invokes the procedure. After procedure return, the caller restores the stack by either *popping* the parameters from the
4.2 Semantic Analysis

Figure 4-12: Subroutine Trailer Code

stack, or adding the number of parameter bytes to the stack pointer. In either case, the total number of bytes used in arguments is known, and is stored for later use. The instruction(s) involved in the restoring of the stack are eliminated from the code. The C calling convention is used when passing a variable number of arguments, as the callee does not need to restore the stack. Figure 4-13 shows the case in which pop instructions are used to restore the stack. The total number of bytes is computed by multiplying the number of pops by 2.

Figure 4-13: C Calling Convention - Uses pop

Figure 4-14 shows the case in which the stack is restored by adding the number of argument bytes to the stack. This value is stored for later use, and the instruction that restores the stack is eliminated for further analysis. It has been found that when 2 or 4 bytes were used for arguments, the stack is restored by popping these bytes from the stack. This is due to the number of cycles involved in the two different operations: each pop reg instruction takes 1 byte, and an add sp,immed instruction takes 3 bytes. Most likely, the binary code had been optimized for space rather than speed, because a pop reg instruction on the i8086 takes 8 cycles, whereas an add sp,immed instruction takes 4 cycles.

The Pascal calling convention is also known as the Pascal parameter-passing sequence. In this convention, the caller pushes the parameters on the stack in the same order as they appear in the source code (i.e. left to right order), the callee procedure is invoked, and the callee is responsible for adjusting the stack before returning. It is therefore necessary for the callee to know how many parameters are passed, and thus, it cannot be used for variable argument parameters. Figure 4-15 shows this convention.
Long Variable Operations

Long variables are stored in memory as two consecutive memory or stack locations. These variables are normally identified when simple addition or subtraction operations are performed on them. The idioms used for these operations are generally used due to their simplicity in number of instructions.

Figure 4-16 shows the instructions involved in long addition. The low parts of the long variable(s) are added with an `add` instruction, which sets up the `carry` flag if there is an overflow. The high parts are then added taken into account the `carry` flag, as if there were an overflow of 1 in the low part, this 1 needs to be added to the high part. Thus, a `adc` (add with carry) instruction is used to add the high parts.
In a similar way, long subtraction is performed. The low parts are first subtracted with a `sub` instruction. If there is a borrow, the carry flag is set. Such underflow is taken into consideration when subtracting the high parts, as if there were an overflow in the low part, a borrow needs to be subtracted from the source high part operand. Thus, an `sbb` (subtract with borrow) instruction is used. Figure 4-17 shows this case.

![Figure 4-17: Long Subtraction](image)

The negation of a long variable is done by a sequence of 3 instructions: the high part is negated, then the low part is negated, and finally, zero is subtracted with borrow from the high part in case there was an underflow in the negation of the low part. This idiom is shown in Figure 4-18.

![Figure 4-18: Long Negation](image)

Long shifts by 1 are normally performed using the carry flag and rotating that flag onto the high or low part of the answer. A left shift is independent of the sign of the long operand, and generally involves the low part to be shifted left (shl), the high bit of the low part will be in the carry flag. The high part is then shifted left, but making use of the carry flag, which contains the bit to be placed on the lowest bit of the high part answer, thus, a rcl (rotate carry left) instruction is used. This idiom is shown in Figure 4-19.

![Figure 4-19: Long Left Shift](image)

A long right shift by 1 needs to retain the sign of the long operand, so two different idioms are used for signed and unsigned long operands. Figure 4-20 shows the idiom for signed long operands. The high part of the long operand is shifted right by 1, and an arithmetic shift right (sar) instruction is used, so that the number is treated as a signed number. The lower bit of the high part is placed on the carry flag. The low part of the operand is then shifted right, taking into account the bit in the carry flag, so a rotate carry right (rcr)
The Front-end

\[
\begin{align*}
\text{shl regL, 1} \\
\text{rcl regH, 1} \\
&\Downarrow\\
\text{regH:regL} &= \text{regH:regL} \ll 1
\end{align*}
\]

Figure 4-19: Shift Long Variable Left by 1

In a similar way, a long shift right by 1 of an unsigned long operand is done. In this case, the high part is shifted right, moving the lower bit into the carry flag. This bit is then shifted into the low part by a rotate carry right instruction. See Figure 4-21.

\[
\begin{align*}
\text{shr regH, 1} \\
\text{rcr regL, 1} \\
&\Downarrow\\
\text{regH:regL} &= \text{regH:regL} \gg 1 \quad (\text{regH:regL} \text{ is unsigned long})
\end{align*}
\]

Figure 4-21: Shift Unsigned Long Variable Right by 1

Miscellaneous Idioms

A widely known machine idiom is the assignment of zero to a variable. Rather than using a mov instruction, an xor is used: whenever a variable is xorred to itself, the result is zero. This machine idiom uses fewer machine cycles and bytes than its counterpart, and is shown in Figure 4-22.
Different machine architectures restrict the number of bits that are shifted in the one shift instruction. In the case of the i8086, the shift instruction allows only one bit to be shifted in the one instruction, thus, several shift instructions have to be coded when shifting two or more bits. Figure 4-23 shows this idiom. In general, a shift by constant $n$ can be done by $n$ different shift 1 instructions.

```
shl reg, 1 --\n[...]         | n times
shl reg, 1 --/
  ↓
  reg = reg << n
```

Figure 4-23: Shift Left by $n$

Bitwise negation of an integer/word variable is done as shown in Figure 4-24. This idiom negates (2’s complement) the register, then subtracts it from itself with borrow in case there was an underflow in the initial negation of the register, and finally increments the register by one to get a 0 or 1 answer.

```
  neg reg
  sbb reg, reg
  inc reg
  ↓
  reg = !reg
```

Figure 4-24: Bitwise Negation
4.2.2 Simple Type Propagation

The sign of elementary data types such as byte and integer is easily determined by the type of conditional jump used to compare an operand. Such a technique is also used to determine the sign of more complex elementary data types such as long and real. The following sections illustrate the techniques used to determine whether a word-sized operand is a signed or unsigned integer, and whether a two word-sized operand is a signed or unsigned long. These techniques are easily extended to other elementary data types.

Propagation of Integers

A word-sized operand can be a signed integer or an unsigned integer. Most instructions that deal with word-sized operands do not make any distinction between signed or unsigned operands; conditional jump instructions are an exception. There are different types of conditional jumps for most relational operations, for example, the following code:

```
    cmp [bp-0Ah], 28h
    jg X
```

checks whether the word operand at `bp-0Ah` is greater than `28h`. The following code:

```
    cmp [bp-0Ah], 28h
    ja X
```

checks whether the word operand at `bp-0Ah` is above `28h`. This latter conditional jump tests for unsigned word operands, while the former conditional jump tests for signed word operands; hence, the local variable at `bp-0Ah` is a signed integer in the former case, and an unsigned integer in the latter case. This information is stored as an attribute of the local variable `bp-0Ah` in the symbol table.

In the same way, whenever the operands of a conditional jump deal with registers, the register is determined to be a signed or unsigned integer register, and this information is propagated backwards on the basic block to which the register belongs, up to the definition of the register. Consider the following code:

```
1  mov ax, [bp-0Ch]
2  cmp ax, 28h
3  ja X
```

By instruction 3 the operands of the conditional jump are determined to be unsigned integers; hence, register `ax` and constant `28h` are unsigned integer operands. Since register `ax` is not a local variable, this information is propagated backwards until the definition of `ax` is found. In this example, instruction 1 defines `ax` in terms of local variable `bp-0Ch`, therefore, this local variable represents an unsigned integer and this attribute is stored in the symbol table entry for `bp-0Ch`.

The set of conditional jumps used to distinguish a signed from an unsigned integer are shown in Figure 4-25. These conditional jumps are for the Intel architecture.
Propagation of Long Variables

The initial recognition of long variables is determined by idiom analysis, as described in Section 4.2.1. Once a pair of identifiers is known to be a long variable, all references to these identifiers must be changed to reflect them being part of a long variable (i.e. the high or low part of the long variable). Also, couples of instructions that deal with the high and low parts of the long variable can be merged into the one instruction. Consider the following code:

```
108 mov dx, [bp-12h]
109 mov ax, [bp-14h]
111 add dx:ax, [bp-0Ah]:[bp-0Ch]
112 mov [bp-0Eh], dx
113 mov [bp-10h], ax
```

Instructions 110 and 111 were merged into the one `add` instruction by idiom analysis, leading to the identifiers `bp-0Ah` and `bp-0Ch` to become a long variable, as well as the registers `dx:ax`. Identifiers other than registers are propagated throughout the whole subroutine intermediate code, in this example, no other references to `bp-0Ah` are done. Registers are propagated within the basic block they were used in, by backward propagation until the register definition is found, and forward propagation until a redefinition of the register is done. In this example, by backward propagation of `dx:ax`, we arrive at the following code:

```
109 mov dx:ax, [bp-12h]:[bp-14h]
111 add dx:ax, [bp-0Ah]:[bp-0Ch]
112 mov [bp-0Eh], dx
113 mov [bp-10h], ax
```

which merges instructions 108 and 109 into the one `mov` instruction. Also, this merge has determined that the local identifiers `bp-12h` and `bp-14h` are a long variable, and hence, this information is stored in the symbol table. By forward propagation of `dx:ax` within the basic block we arrive at the following code:

```
109 mov dx:ax, [bp-12h]:[bp-14h]
111 add dx:ax, [bp-0Ah]:[bp-0Ch]
113 mov [bp-0Eh]:[bp-10h], dx:ax
```

which merges instructions 112 and 113 into the one `mov` instruction. In this case, the local identifiers `bp-0Eh` and `bp-10h` are determined to be a long variable, and this information is also stored in the symbol table and propagated.
Propagation of long variables across conditional jumps is done in two or more steps. The high and low part of the long identifier are compared against another identifier in different basic blocks. The notion of basic block is simple: a sequence of instructions that have one entry and one exit point; this notion is explained in more detail in Section 4.4.3. Consider the following code:

```
115    mov  dx:ax, [bp-0Eh]:[bp-10h]
116    cmp  dx, [bp-0Ah]
117    jl   L21
118    jg   L22
119    cmp  ax, [bp-0Ch]
120    jbe  L21
```

At instruction 115, registers \texttt{dx:ax} are determined to be a long register, hence, the \texttt{cmp} opcode at instruction 116 is only checking for the high part of this long register, a further instruction (119) checks for the low part of the long register. By analysing the instructions it is seen that whenever \texttt{dx:ax} are less or equal to the identifier \texttt{[bp-0Ah]:[bp-0Ch]}, the label \texttt{L21} is reached; otherwise the label \texttt{L22} is reached. These three basic blocks can be transformed into a unique basic block that contains this condition, as follows:

```
115    mov  dx:ax, [bp-0Eh]:[bp-10h]
116    cmp  dx:ax, [bp-0Ah]:[bp-0Ch]
117    jle  L21
```

This basic block branches to label \texttt{L21} whenever the condition is true, and branches to label \texttt{L22} whenever the condition is false. The presence of label \texttt{L22} is not made explicit in the instructions, but is implicit in the out-edges of this basic block.

In general, long conditional branches are identified by their graph structure. Figure 4-26 shows five graphs. Four of these represent six different conditions. Graphs (a) and (b) represent the same condition. These graphs represent different long conditions depending on the instructions in the nodes associated with these graphs, the conditions are: $\leq$, $<$, $>$, and $\geq$. Graphs (c) and (d) present equality and inequality of long variables. These four graphs are translated into graph (e) when the following conditions are satisfied:

- Graphs (a) and (b):
  - Basic block \texttt{x} is a conditional node that compares the high parts of the long identifiers.
  - Basic block \texttt{y} is a conditional node that has one instruction; a conditional jump, and has one in-edge; the one from basic block \texttt{x}.
  - Basic block \texttt{z} is a conditional node that has two instructions; a compare of the low parts of the long identifiers, and a conditional jump.

- Graphs (c) and (d):
  - Basic block \texttt{x} is a conditional node that compares the high parts of the long identifiers.
4.3 Intermediate Code Generation

In a decompiler, the front-end translates the machine language source code into an intermediate representation which is suitable for analysis by the universal decompiling machine. Figure 4-29 shows the relation of this phase with the semantic analyzer and the last phase of the front-end; the control flow graph generator. A target language independent representation is used, so that retargeting to a different language is feasible, by writing a back-end for that language and attaching it to the decompiler.

---

Basic block \( y \) is a conditional node that has two instructions; a compare of the low parts of the long identifiers, and a conditional jump, and has one in-edge; the one from basic block \( x \).

Figure 4-27 shows sample code for the graphs (c) and (d) of Figure 4-26; equality and inequality of long identifiers. This code is for the Intel i80286 architecture.

<table>
<thead>
<tr>
<th>Node ( x )</th>
<th>Node ( y )</th>
<th>Boolean Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>cmp dx, offHi</code></td>
<td><code>cmp ax, offLow</code></td>
<td><code>! =</code></td>
</tr>
<tr>
<td><code>jne t</code></td>
<td><code>jne t</code></td>
<td></td>
</tr>
<tr>
<td><code>cmp dx, offHi</code></td>
<td><code>cmp ax, offLow</code></td>
<td><code>==</code></td>
</tr>
<tr>
<td><code>jne e</code></td>
<td><code>je t</code></td>
<td></td>
</tr>
</tbody>
</table>

---

Sample code for the nodes of graph (a), Figure 4-26 is given in Figure 4-28. The code associated with each node represents different non-equality Boolean conditions, namely, less or equal, less than, greater than, and greater and equal. Similar code is used for the nodes of graph (b) which represent the same exact Boolean conditions. This code is for the Intel i80286 architecture.

---

4.3 Intermediate Code Generation
<table>
<thead>
<tr>
<th>Node x</th>
<th>Node y</th>
<th>Node z</th>
<th>Boolean Condition</th>
</tr>
</thead>
<tbody>
<tr>
<td>cmp dx, offHi</td>
<td>jg e</td>
<td>cmp ax, offLow</td>
<td>&lt;=</td>
</tr>
<tr>
<td>jl t</td>
<td></td>
<td>jbe t</td>
<td></td>
</tr>
<tr>
<td>cmp dx, offHi</td>
<td>jne e</td>
<td>cmp ax, offLow</td>
<td>&lt;</td>
</tr>
<tr>
<td>jl t</td>
<td></td>
<td>jb t</td>
<td></td>
</tr>
<tr>
<td>cmp dx, offHi</td>
<td>jne e</td>
<td>cmp ax, offLow</td>
<td>&gt;</td>
</tr>
<tr>
<td>ig t</td>
<td></td>
<td>ja t</td>
<td></td>
</tr>
<tr>
<td>cmp dx, offHi</td>
<td>jl e</td>
<td>cmp ax, offLow</td>
<td>&gt;=</td>
</tr>
<tr>
<td>ig t</td>
<td></td>
<td>jae t</td>
<td></td>
</tr>
</tbody>
</table>

Figure 4-28: Long Non-Equality Boolean Conditional Code

A two-step approach is taken for decompilation: a low-level intermediate representation is first used to represent the machine language program. Idiom analysis and type propagation can be done in this representation, as well as generating assembler code from it (i.e. it is an intermediate code suitable for a disassembler, which does not perform high-level analysis on the code). This representation is then converted into a high-level intermediate representation that is suitable for high-level language generation. The representation needs to be general enough to generate code for any high-level language.

4.3.1 Low-level Intermediate Code

A low-level intermediate representation that resembles the assembler language for the machine that is being decompiled is a good choice of low-level intermediate code, as it is possible to perform semantic analysis on the code, as well as generating assembler programs from it. The intermediate code must have a one instruction for each complete instruction of the machine language. Compound machine instructions must also be represented by one intermediate instruction. For example, in Figure 4-30, the machine instruction B720 is a mov bh,20 intermediate instruction. The machine instruction 2E followed by FFEFC006 (a jmp with a cs segment override) is replaced by a jmp instruction that makes explicit the use of register cs. And finally, the compound machine instructions F3A4 are equivalent to the assembler instructions rep and movs di,si. These two instructions are represented by the unique intermediate instruction rep_movi, which makes explicit the destination and source registers of the move.
Implementation of Low-Level Intermediate Code

The low-level intermediate representation is implemented in quadruples which make explicit the operands used in the instruction, as shown in Figure 4-31. The opcode field holds the low-level intermediate opcode, the dest field holds the destination operand (i.e., an identifier), and the src1 and src2 fields hold the source operands of the instruction. Some instructions do not use two source operands, so only the src1 field is used.

Example 4 An add bx,3 machine instruction is represented in a quadruple in the following way:

\[
\text{add} \quad \text{bx} \quad \text{bx} \quad 3
\]

where register bx is source and destination operand, and constant 3 is the second source operand.

Example 5 A push cx machine instruction is represented in the following way:

\[
\text{push} \quad \text{sp} \quad \text{cx}
\]

where register cx is the source operand and register sp is the destination operand.

4.3.2 High-level Intermediate Code

Three-address code is a generalized form of assembler code for a three-address machine. This intermediate code is most suited for a decompiler, given that the three-address code is a linearized representation of an abstract syntax tree (AST) of the program. In this way, the complete AST of the program can be reconstructed during the data flow analysis. A three-address instruction has the general form:

\[ x := y \ op \ z \]

where \( x, y, \) and \( z \) are identifiers, and \( \op \) is an arithmetic or logic operation. The result address is \( x \), and the two operand addresses are \( y \) and \( z \).
Types of Three-Address Statements

Three-address statements are similar to high-level language statements. Given that the data flow analysis will reconstruct the AST of the program, a three-address instruction is going to represent not only individual identifiers, but expressions. An identifier can be viewed as the minimal form of an expression. The different types of instructions are:

1. asgn <exp>, <arithExp>
   The asgn instruction assigns an arithmetic expression to an identifier or an expression (i.e. an identifier that is represented by an expression, such as indexing into an array). This statement represents three different types of high-level assignment instructions:
   • \( x := y \ op \ z \). Where \( x, y, \) and \( z \) are identifiers, and \( \op \) is a binary arithmetic operator.
   • \( x := \op \ y \). Where \( x \) and \( y \) are identifiers, and \( \op \) is a unary arithmetic operator.
   • \( x := y \). Where \( x \) and \( y \) are identifiers.

   After data flow analysis, the arithmetic expression represents not only a binary operation, but holds a complete parse tree of arithmetic operators and identifiers. This transformation is described in Chapter 5.

   In this context, a subroutine that returns a value (i.e. a function), is also considered an identifier, as its invocation returns a result that is assigned to another identifier (e.g. \( a := \text{sum}(b, c) \)).

2. jmp
   The unconditional jump instruction has no associated expression attached to it, other than the target destination address of the jump. This instruction transfers control to the target address. Since the address is coded in the out-edge of the basic block that includes this instruction, it is not explicitly described as part of the instruction. This instruction is equivalent to the high-level instruction:
   
   \[
   \text{goto } L
   \]

   where \( L \) is the target address of the jump.

3. jcond <boolExp>
   The conditional jump instruction has a Boolean expression associated with it, which determines whether the branch is taken or not. The Boolean expression is of the form \( x \ \relop \ y \), where \( x \) and \( y \) are identifiers, and \( \relop \) is a relational operator, such as \( <, \ge, = \). This statement is equivalent to the high-level statement:

   \[
   \text{if } x \ \relop \ y \ \text{goto } L
   \]

   In this intermediate instruction, the target branch address (\( L \)) and the fall-through address (i.e. address of the next instruction) are not part of the instruction as these are coded in the out-edges from the basic block that has this instruction in the control flow graph.

4. call <procId> <actual parameters>
   The call instruction represents a subroutine call. The procedure identifier (\(<\procId>\)) is a pointer to the flow graph of the invoked procedure. The actual parameter list is
constructed during data flow analysis. If the subroutine called is a function, it also defines the registers that hold the returned value. In this case, the instruction is equivalent to `asgn <regs>, <procId> <actual parameters>.

5. `ret [<arithExp>]`
   The return instruction determines the end of a procedure along a path. If there is nothing to return, the subroutine is a procedure, otherwise it is a function.

There are also two pseudo high-level intermediate instructions that are used as intermediate instructions in the data flow analysis, but are eliminated by the end of the analysis. These instructions are:

1. `push <arithExp>`
   The `push` instruction places the associated arithmetic expression on a temporary stack.

2. `pop <ident>`
   The `pop` instruction takes the expression or identifier at the top of the temporary stack and assigns it to the identifier ident.

**Implementation of High-Level Intermediate Code**

The high-level intermediate representation is implemented by triplets. In a triplet, the two expressions are made explicit, as well as the instruction opcode, such as shown in Figure 4-32. The `result` and `arg` fields are pointers to an expression, which in its minimal form is an identifier which points to the symbol table.

![Figure 4-32: General Representation of a Triplet](image)

An assignment statement `x := y op z` is represented in a triplet in the following way: the `op` field is the `asgn` opcode, the `result` field has a pointer to the identifier `x` (which in turn has a pointer to the identifier in the symbol table), and the `arg` field has a pointer to a binary expression; this expression is represented by an abstract syntax tree with pointers to the symbol table entries of `y` and `z`, as follows:

![Symbol Table](image)

In a similar way, a conditional jump statement `if a relop b` is represented in a triplet in the following way: the `op` field is the `jcond` opcode, the `result` field has a pointer to the abstract syntax tree of the relational test, and the `arg` field is left empty, as follows:

![Symbol Table](image)
An unconditional jump statement \texttt{goto L} does not use the \texttt{result} or \texttt{arg} field. The \texttt{op} field is set to \texttt{jmp}, and the other fields are left empty, as follows:

\[
\begin{array}{c}
\text{JMP} \\
\end{array}
\]

A procedure call statement \texttt{procX (a,b)} uses the \texttt{op} field for the \texttt{call} opcode, the \texttt{result} field for the procedure's name, which is pointed to in the symbol table, and the \texttt{arg} field for the procedure arguments, which is a list of arguments that point to the symbol table, as follows:

\[
\begin{array}{c}
\text{CALL} \\
\end{array}
\]

A procedure return statement \texttt{ret a} uses the \texttt{op} field for the \texttt{ret} opcode, the \texttt{result} field for the identifier/expression that is being returned, and the \texttt{arg} field is left empty, as follows:

\[
\begin{array}{c}
\text{RET} \\
\end{array}
\]

The pseudo high-level instruction \texttt{push a} is stored in a triplet by using the \texttt{op} field as the \texttt{push} opcode, the \texttt{arg} field as the identifier that is being pushed, and the \texttt{result} field is left empty, as follows:

\[
\begin{array}{c}
\text{PUSH} \\
\end{array}
\]

In a similar way, the \texttt{pop a} instruction is stored in a triplet, using the \texttt{op} field for the \texttt{pop} opcode, the \texttt{result} field for the identifier, and eventually (during data flow analysis), the \texttt{arg} field is filled with the expression that is being popped. Initially, this field is left empty. The triplet representation is as follows:
4.4 Control Flow Graph Generation

The control flow graph generation phase constructs a call graph of the source program, and a control flow graph of basic blocks for each subroutine of the program. These graphs are used to analyze the program in the universal decompiling machine (UDM) module. The interaction of this phase with the intermediate code generator and the udm is shown in Figure 4-33.

4.4.1 Basic Concepts

This section describes definitions of mathematical and graph theory. These terms are defined here to eliminate any ambiguity of terminology.

Definition 4 A graph $G$ is a tuple $(V, E, h)$ where $V$ is a set of nodes, $E$ is a set of edges, and $h$ is the root of the graph. An edge is a pair of nodes $(v, w)$, with $v, w \in V$.

Definition 5 A directed graph $G = (N, E, h)$ is a graph that has directed edges; i.e. each $(n_i, n_j) \in E$ has a direction, and is represented by $n_i \rightarrow n_j$.

Definition 6 A path from $n_1$ to $n_m$ in graph $G = (N, E, h)$, represented $n_1 \rightarrow^* n_m$, is a sequence of edges $(n_1, n_2), (n_2, n_3), \ldots, (n_{m-1}, n_m) \in N, m \geq 1$.

Definition 7 If $G = (V, E, h)$ is a graph, $\exists! \ h \in V$, and $E = \emptyset$, then $G$ is called a trivial graph.

Definition 8 If $G = (N, E, h)$ is a graph, and $\forall n \in N, h \rightarrow^* n$, then $G$ is a connected graph.

A connected graph is a graph in which all nodes can be reached from the header node. A sample directed, connected graph is shown in Figure 4-34.
The Front-end

Graph Representation

A graph \( G = (V, E, h) \) is represented in several different ways, including, incidence matrices, adjacency matrices, and predecessor-successor tables.

**Definition 9** The incidence matrix for a graph \( G = (V, E, h) \) is the \( v \times e \) matrix \( M(G) = [m_{ij}] \), where \( m_{ij} \) is the number of times (0, 1 or 2) that vertex \( v_i \) and edge \( e_j \) are incident.

**Definition 10** The adjacency matrix for a graph \( G = (V, E, h) \) is the \( v \times v \) matrix \( A(G) = [a_{ij}] \), where \( a_{ij} \) is the number of edges joining vertices \( v_i \) and \( v_j \).

**Definition 11** The predecessor-successor table for a graph \( G = (V, E, h) \) is the \( v \times 2 \) table \( T(G) = [t_{i1}, t_{i2}] \), where \( t_{i1} \) is the list of predecessor vertices of vertex \( v_i \), and \( t_{i2} \) is the list of successor vertices of vertex \( v_i \).

**Example 6** The graph in Figure 4-34 is represented by the following matrices:

- **Incidence matrix:**

  \[
  \begin{array}{cccccccc}
  & e_1 & e_2 & e_3 & e_4 & e_5 & e_6 & e_7 & e_8 \\
  v_1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\
  v_2 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & 0 \\
  v_3 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 2 \\
  v_4 & 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0 \\
  \end{array}
  \]

- **Adjacency matrix:**

  \[
  \begin{array}{cccc}
  & v_1 & v_2 & v_3 & v_4 \\
  v_1 & 0 & 1 & 1 & 1 \\
  v_2 & 1 & 0 & 1 & 2 \\
  v_3 & 1 & 1 & 1 & 1 \\
  v_4 & 1 & 2 & 1 & 0 \\
  \end{array}
  \]

- **Predecessor successor table:**

  \[
  \begin{array}{ccc}
  & \text{Predecessor} & \text{Successor} \\
  v_1 & \emptyset & \{v_2, v_3, v_4\} \\
  v_2 & \{v_1, v_4\} & \{v_3, v_4\} \\
  v_3 & \{v_1, v_2, v_3\} & \{v_3, v_4\} \\
  v_4 & \{v_1, v_2, v_3\} & \{v_2\} \\
  \end{array}
  \]

### 4.4.2 Basic Blocks

In this section we formalize the definition of basic blocks. In order to characterize it, we need some more definitions of program structure. We start by defining the components of a program, that is, data and instructions. Note that this definition does not associate instructions or data to memory locations.
Definition 12  Let

- \( P \) be a program
- \( I = \{i_1, \ldots, i_n\} \) be the instructions of \( P \)
- \( D = \{d_1, \ldots, d_m\} \) be the data of \( P \)

Then \( P = I \cup D \)

For the purpose of this research, programs are restricted to containing no self-modifying code, and make no use of data as instructions or vice versa (\( I(P) \cap D(P) = \emptyset \)). An instruction sequence is a set of instructions physically located one after the other in memory.

Definition 13  Let

- \( P \) be a program
- \( I = \{i_1, \ldots, i_n\} \) be the instructions of \( P \)

Then \( S \) is an instruction sequence if and only if

\[ S = [i_j, i_{j+1}, \ldots, i_{j+k}] \ni 1 \leq j < j + k \leq n \wedge i_{j+1} \text{ is in a consecutive memory location to } i_j, \forall 1 \leq j \leq k - 1. \]

Intermediate instructions are classified in two sets for the purposes of control flow graph generation:

- Transfer Instructions (TI): the set of instructions that transfer flow of control to an address in memory different from the address of the next instruction. These instructions are:
  - Unconditional jumps: the flow of control is transferred to the target jump address.
  - Conditional jumps: the flow of control is transferred to the target jump address if the condition is true, otherwise the control is transferred to the next instruction in the sequence.
  - Indexed jumps: the flow of control is transferred to one of many target addresses.
  - Subroutine call: the flow of control is transferred to the invoked subroutine.
  - Subroutine return: the flow of control is transferred to the subroutine that invoked the subroutine with the return instruction.
  - End of program: the program ends.

- Non transfer instructions (NTI): the set of instructions that transfer control to the next instruction in the sequence, i.e. all instructions that do not belong to the TI set.

Having classified the intermediate instructions, a basic block is defined in terms of its instructions:

Definition 14  A basic block \( b = [i_1, \ldots, i_{n-1}, i_n], n \geq 1 \) is an instruction sequence that satisfies the following conditions:

1. \([i_1, \ldots, i_{n-1}] \in NTI\)
2. $i_n \in TI$

or

1. $[i_1, \ldots, i_{n-1}, i_n] \in NTI$

2. $i_{n+1}$ is the first instruction of another basic block.

A basic block is a sequence of instructions that has one entry point and one exit point. If one instruction of the basic block is executed, all other instructions are executed as well.

The set of instructions of a program can be uniquely partitioned into a set of non-overlapping basic blocks, starting from the program’s entry point.

**Definition 15** Let

- $I$ be the instructions of program $P$
- $h$ be $P$’s entry point

Then $\exists B = \{b_1, \ldots, b_n\} \land b_1 \cap b_2 \cap \cdots \cap b_n = \emptyset \land I = b_1 \cup b_2 \cup \cdots \cup b_n \land b_1$ ’s entry point $= h$.

### 4.4.3 Control Flow Graphs

A control flow graph is a directed graph that represents the flow of control of a program, thus, it only represents the instructions of such a program. The nodes of this graph represent basic blocks of the program, and the edges represent the flow of control between nodes. More formally,

**Definition 16** A control flow graph $G = (N, E, h)$ for a program $P$ is a connected, directed graph, that satisfies the following conditions:

- $\forall n \in N, n$ represents a basic blocks of $P$.
- $\forall e = (n_i, n_j) \in E, e$ represents flow of control from one basic block to another and $n_i, n_j \in N$.
- $\exists f : B \rightarrow N \land \forall b_i \in B, f(b_i) = n_k$ for some $n_k \in N \land \forall b_j \in B \land f(b_j) = n_k$

For the purpose of control flow graph (cfg) generation, basic blocks are classified into different types, according to the last instruction of the basic block. The available types of basic blocks are:

- 1-way basic block: the last instruction in the basic block is an unconditional jump. The block has one out-edge.

- 2-way basic block: the last instruction is a conditional jump, thus, the block has two out-edges.

- $n$-way basic block: the last instruction is an indexed jump. The $n$ branches located in the case table become the $n$ out-edges of this node.
• call basic block: the last instruction is a call to a subroutine. There are two out-edges from this block: one to the instruction following the subroutine call (if the subroutine returns), and the other to the subroutine that is called.

• return basic block: the last instruction is a procedure return or an end of program. There are no out-edges from this basic block.

• fall basic block: the next instruction is the target address of a branching instruction (i.e. the next instruction has a label). This node is seen as a node that falls through the next one, thus, there is only one out-edge.

The different types of basic block are represented in a control flow graph by named nodes, as shown in Figure 4-35. Whenever a node is not named in a graph, it means that the type of the basic block is irrelevant, or obvious from the context (i.e. the exact number of out-edges are specified in the graph).

Example 7 Consider the following fragment of code:

```
0  PUSH   bp
1  MOV    bp, sp
2  SUB    sp, 4
3  MOV    ax, 0Ah
4  MOV    [bp-2], ax
5  MOV    [bp-4], ax
6  LEA    ax, [bp-4]
7  PUSH   ax
8  CALL   near ptr proc_1
9  POP    cx
10 L1:    MOV    ax, [bp-4]
11       CMP    ax, [bp-2]
12       JNE    L2
13  PUSH   word ptr [bp-4]
14  MOV    ax, 0AAh
15  PUSH   ax
16  CALL   near ptr printf
```
This code has the following basic blocks:

<table>
<thead>
<tr>
<th>Basic Block Type</th>
<th>Instruction Extent</th>
</tr>
</thead>
<tbody>
<tr>
<td>call</td>
<td>0 to 8</td>
</tr>
<tr>
<td>fall</td>
<td>9</td>
</tr>
<tr>
<td>2w</td>
<td>10 to 12</td>
</tr>
<tr>
<td>call</td>
<td>13 to 16</td>
</tr>
<tr>
<td>ret</td>
<td>17..21</td>
</tr>
<tr>
<td>2w</td>
<td>22..24</td>
</tr>
<tr>
<td>call</td>
<td>25..27</td>
</tr>
<tr>
<td>1w</td>
<td>28, 29</td>
</tr>
</tbody>
</table>

The control flow graph that represents these instructions is shown in Figure 4-36.

From here onwards, the word graph is used to represent a control flow graph, and the word node is used to represent a basic block, unless otherwise stated.

Implementation

Control flow graphs have on average close 2 out edges per node, thus, a matrix representation (e.g. incident and adjacent matrices) is very sparse and memory inefficient (i.e. most of the matrix is zero). It is therefore better to implement control flow graphs in predecessor-successor tables, so that only the existing edges in the graph are represented in this relationship. Note that the successor is all that is needed to represent the complete graph; the predecessor is also stored to make access to the graph easily during different traversals of the graph.

If the size of the graph is unknown (i.e. the number of nodes is not fixed), it is possible to construct the graph dynamically as a pointer to a basic block, which has a list of predecessor and a list of successors attached to it. The predecessors and successors are pointers to basic block nodes as well; in this way, a basic block is only represented once. This representation is plausible in any high-level language that allows dynamic allocation of memory. Consider the C definition of a basic block in Figure 4-37. The BB structure defines a basic block node, the numInEdges and numOutEdges hold the number of predecessor and successor nodes,
respectively, the **inEdges** is a dynamically allocated array of pointers to predecessor basic blocks, and the **outEdges** is a dynamically allocated array of pointers to successor basic blocks. In this representation, a graph is a pointer to the header basic block (i.e. a PBB).

```c
typedef struct _BB{
    byte    nodeType;    /* Type of node */
    int     numInEdges;  /* Number of in edges */
    struct _BB **inEdges; /* Array of pointers to predecessors */
    int     numOutEdges; /* Number of out-edges */
    struct _BB **outEdges; /* Array of pointers to successors */
    /* other fields go here */
} BB;

typedef BB *PBB;   /* Pointer to a basic block */
```

Figure 4-37: Basic Block Definition in C
Graph Optimization

One pass compilers generate machine code that makes use of redundant or unnecessary jumps in the form of jumps to jumps, conditional jumps to jumps, and jumps to conditional jumps. These unnecessary jumps can be eliminated by a peephole optimization on flow-of-control. This optimization is not always used, though.

Peephole optimization is a method for improving the performance of the target program by examining a short sequence of target instructions (called the peephole) and replacing these instructions by a shorter or faster sequence of instructions. The peephole is a small, moving window of target code; the code in the peephole does not need to be contiguous. Each improvement made through a peephole optimization may spawn opportunities for additional improvements, thus, repeated passes over the code is necessary.

Flow-of-control optimization is the method by which redundant jumps are eliminated. For decompilation, we are interested in eliminating all jumps to jumps, and conditional jumps to jumps, as the target jump holds the address of the target branch, and makes use of an intermediate basic blocks that can be removed from the graph. The removal of jumps to conditional jumps is not desired, as it involves the rearrangement of several instructions, not just the modification of one target branch address.

The following jump sequence jumps to label Lx to jump to label Ly afterwards, without any other instructions executed between the jumps:

```
jmp Lx
...
Lx: jmp Ly
```

This sequence is replaced by the sequence

```
jmp Ly
...
Lx: jmp Ly
```

where the first jump branches to the target Ly label, rather than the intermediate Lx label. The number of predecessors to basic block starting at Lx is decremented by one, and the number of predecessors to block starting at Ly is incremented by one, to reflect the change of edges in the graph. If at any one time the number of predecessors to the basic block starting at label Lx becomes zero, the node is removed from the graph because it is unreachable, and thus was unnecessary in the first place.

In a similar way, an unconditional jump to a jump sequence like the following

```
jZ Lx
...
Lx: jmp Ly
```

is replaced by the code sequence

```
jZ Ly
...
Lx: jmp Ly
```
The Call Graph

The call graph is a mathematical representation of the subroutines of a program. Each node represents a subroutine, and each edge represents a call to another subroutine. More formally,

**Definition 17** Let $\mathcal{P} = \{p_1, p_2, \ldots\}$ be the finite set of procedures of a program. A call graph $C$ is a tuple $(N, E, h)$, where $N$ is the set of procedures and $n_i \in N$ represents one and only one $p_i \in \mathcal{P}$, $E$ is the set of edges and $(n_i, n_j) \in E$ represents one or more references of $p_i$ to $p_j$, and $h$ is the main procedure.

The construction of the call graph is simple if the invoked subroutines are statically bound to subroutine constants, that is, the program does not make use of procedure parameters or procedure variables. The presence of recursion introduces cycles in the call graph. An efficient algorithm to construct the call graph in the presence of procedure parameters, for languages that do not have recursion is given in [Ryd79]. This method was later extended to support recursion, and is explained in [CCHK90]. Finally, a method that handles procedure parameters and a limited type of procedure variables is described in [HK92].

It should be noted that this method is not able to reconstruct all graphs to their original form, as a compiler optimisation could have changed an implicit call instruction into an unconditional jump. In these cases, the call graph (and hence the decompiler) will treat the code of both subroutines as being one, unless the invoked subroutine is also called elsewhere via an implicit call instruction.
Chapter 5

Data Flow Analysis

The low-level intermediate code generated by the front-end is an assembler-type representation that makes use of registers and condition codes. This representation can be transformed into a higher level representation that does not make use of such low-level concepts, and that regenerates the high-level concept of expression. The transformation of low-level to high-level intermediate code is done by means of program transformations, traditionally referred to as optimizations. These transformations are applied to the low-level intermediate code, to transform it into the high-level intermediate code described in Chapter 4, Section 4.3.2. The relation of this phase with the front-end and the control flow analysis phase is shown in Figure 5-1.

![Diagram of the Data Flow Analysis Phase]

Figure 5-1: Context of the Data Flow Analysis Phase

The types of transformations that are required by the data flow analysis phase include, the elimination of useless instructions, the elimination of condition codes, the determination of register arguments and function return register(s), the elimination of registers and intermediate instructions by the regeneration of expressions, the determination of actual parameters, and the propagation of data type across subroutine calls. Most of these transformations are required to improve the quality of the low-level intermediate code, and to reconstruct some of the information lost during the compilation process. In the case of the elimination of useless instructions, this step is required even for optimising compilers when there exist machine instructions that perform more than one function at a time (for an example, refer to Section 5.2.1).

Conventional data flow analysis is the process of collecting information about the way variables are used in a program, and summarizing it in the form of sets. This information is used by the decompiler to transform and improve the quality of the intermediate code. Several properties are required by code-improving transformations, including[ASU86b]:

1. A transformation must preserve the meaning of programs.
2. A transformation must be worth the effort.

Techniques for decompilation optimization of the intermediate code are presented in this chapter. The transformations are firstly illustrated by means of examples, and algorithms are later provided for each optimization.

5.1 Previous Work

Not much work has been done in the area of data flow analysis of a decompiler, mainly due to the limitations placed on many of the decompilers available in the literature; decompilation of assembler source files\cite{Hou73,Fri74,Wor78,Bri81}, decompilation of object files with symbolic debugging information\cite{Reu88}, and the compiler specification requirements to build a decompiler\cite{BB91,Bow93,BB93}. Data flow analysis is essential when decompiling pure binary files, as there is no extra information on the way data is used, and the type of it. The following sections summarize all the work that has been done in this area.

5.1.1 Elimination of Condition Codes

A program which translates microprocessor object code (i8085) into a behaviorally equivalent PL/1 program was described by Marshall and Zobrist\cite{MZBR85}, and was used for electronic system simulation. The final PL/1 programs contained a large number of statements that defined flags, even if these flags were not used or referenced later on the program. This prompted DeJean and Zobrist to formulate an optimization of flag definitions by means of a reach algorithm\cite{DZ89}. This method eliminated over 50% of the flag definitions in the translation process, generating PL/1 programs that defined only the necessary flags for a later condition.

*The method presented in this thesis goes beyond this optimization of flag definitions, in that it not only determines which flag definitions are extraneous and therefore unnecessary, but also determines which Boolean conditional expression is represented by the combined set of instructions that define and use the flag. In this way, the target HLL program does not rely on the use and concept of flags, as any real HLL program does not.*

5.1.2 Elimination of Redundant Loads and Stores

A method of *text compression* was presented by Housel\cite{Hou73} for the elimination of intermediate loads and stores. This method works on a 3-address intermediate representation of the program, and consists of two stages: forward-substitution and backward-substitution. The former stage substitutes the source operand of an assignment instruction into a subsequent instruction that uses the same result operand, if the result is found to be not busy within the same basic block. The latter stage substitutes the result operand of an assignment instruction into a previous instruction (other than an assignment instruction) that defines as result operand the source operand of the assignment instruction under consideration. This method provided a reduction of instruction of up to 40% in assembly code compiled by Knuth’s MIXAL compiler.
A method of expression condensation was described by Hopwood [Hop78] to combine 2 or more intermediate instructions into an equivalent expression by means of forward substitution. This method specifies 5 necessary conditions and 6 sufficient conditions under which forward substitution of a variable or register can be performed. This method was based on variable usage analysis. The great number of conditions is inherent to the choice of control flow graph: one node per intermediate instruction, rather than basic blocks. This meant that variables were forward substituted across node boundaries, making the whole process much more complex than required.

The interprocedural data flow analyses presented in this thesis define two sufficient conditions under which a register can be substituted or replaced into another instruction, including such intermediate instructions as push and pop. This method not only finds expressions by eliminating intermediate registers and instruction definitions, but also determines actual parameters of subroutines, values returned from functions, and eliminates pseudo high-level instructions. The method is based on the initial high-level intermediate representation of the binary program, which is semantically equivalent to the low-level intermediate representation, and transforms it into a HLL representation.

5.2 Types of Optimizations

This section presents the code-improving transformations used by a decompiler. The techniques used to implement these transformations are explained in Sections 5.3 and 5.4. The optimizations presented in this section make use of the example flow graph in Figure 5-2, where basic blocks B1 ... B4 belong to the main program, and blocks B5 ... B7 belong to the subroutine _anlslh (a runtime support routine). In the main program, registers si and di are used as register variables, and have been flagged by the parser as possibly being so (see Chapter 4, Section 4.2.1).

The aim of these optimizations is to eliminate the low-level language concepts of condition codes, registers, and intermediate instructions, and introduce the high-level concept of expressions of more than two operands. For this purpose, it is noted that push instructions are used in a variety of ways by today’s compilers. Parameter passing is the most common use of this instruction, by pushing them before the subroutine call, in the order specified by the calling convention in use. Register spilling is used whenever the compiler runs out of registers to compute an expression. push and pop are also used to preserve the contents of registers across procedure calls, and to copy values into registers.

5.2.1 Dead-Register Elimination

An identifier is dead at a point in a program if its value is not used following the definition of the variable. It is said that the instruction that defines a dead identifier is useless, and thus can be eliminated or removed from the code. Consider the following code from basic block B1, Figure 5-2:

6  ax = tmp / di
7  dx = tmp % di
8  dx = 3
9  dx:ax = ax * dx
Instruction 6 defines register \texttt{ax}, instruction 7 defines register \texttt{dx}, and instruction 8 redefines register \texttt{dx}. There is no use of register \texttt{dx} between the definition at instruction 7 and instruction 8, thus, the definition of register \texttt{dx} at instruction 7 is dead, and this instruction becomes useless since it defines only register \texttt{dx}. The previous sequence of instructions is replaced by the following code:

\begin{verbatim}
6 ax = tmp / di
8 dx = 3
9 dx:ax = ax * dx
10 si = ax
\end{verbatim}

The definition of register \texttt{dx} at instruction 8 is used in the multiplication of instruction 9, where the register is redefined, as well as register \texttt{ax}. Instruction 10 uses register \texttt{ax}, and
there are no further uses of register dx before redefinition of this register at instruction 13, thus, this last definition of dx is dead and must be eliminated. Since instruction 9 defines not only dx but also ax, and ax is not dead, the instruction is not useless as it still defines a live register; therefore, the instruction is modified to reflect the fact that only register ax is defined, as follows:

6 ax = tmp / di 
8 dx = 3 
9 ax = ax * dx 
10 si = ax 

5.2.2 Dead-Condition Code Elimination

In a similar way to dead-register elimination, a condition code is dead at a point in a program if its value is not used before redefinition. In this case, the definition of the condition code is useless, and is not required, but the instruction that defines this condition code is still useful if the identifiers that the instruction defines are not dead, hence, the instruction itself is not necessarily eliminated. Consider the following code from basic block B1, Figure 5-2:

14 cmp [bp-6]:[bp-8], dx:ax ; cc-def = ZF, CF, SF 
15 jg B2 ; cc-use = SF

Instruction 14 defines the condition codes zero (ZF), carry (CF) and sign (SF). Instruction 15 uses the sign condition code. Neither of the following two basic blocks make use of the condition codes carry or zero before redefinition, thus, the definition of these condition codes in instruction 14 is useless and can be eliminated. We replace the information of instruction 14 to hold the following information:

14 cmp [bp-6]:[bp-8], dx:ax ; cc-def = SF

5.2.3 Condition Code Propagation

Condition codes are flags used by the machine to signal the occurrence of a condition. In general, several machine instructions set these flags, ranging from 1 to 3 different flags being set by the one instruction, and fewer instructions make use of those flags, only using 1 or 2 flags. After dead-condition code elimination, the excess definitions of condition codes are eliminated, thus, all remaining flags are used by subsequent instructions. Consider the following code from basic block B1, Figure 5-2 after dead-condition code elimination:

14 cmp [bp-6]:[bp-8], dx:ax ; cc-def = SF 
15 jg B2 ; cc-use = SF

Instruction 14 defines the sign flag by comparing two operands, and instruction 15 uses this flag to determine whether the first operand of the previous instruction was greater than the second operand. These two instructions are functionally equivalent to a high-level conditional jump instruction that checks for an operand being greater than a second operand. The instructions can be replaced by:

15 jcond ([bp-6]:[bp-8] > dx:ax) B2

eliminating instruction 14 and all references to the condition codes.
5.2.4 Register Arguments

Subroutines use register arguments to speed the access to those arguments and remove the overhead placed by the pushing of arguments on the stack before subroutine invocation. Register arguments are used by many runtime support routines, and by user routines compiled with the register calling convention (available in some compilers). Consider the following code of basic block B2, Figure 5-2:

```
19  cx = 4 ; def = {cx}
20  dx:ax = [bp-6]:[bp-8] ; def = {dx, ax}
21  call _aNlshl
```

Instruction 19 defines register cx, instruction 20 defines registers dx:ax, and instruction 21 invokes the subroutine _aNlshl. The first basic block of the subroutine _aNlshl, B5 in Figure 5-2, uses register cx after defining the high part of this register (i.e. register ch), thus, the low part of this register (i.e. register cl) contains whatever value the register had before the subroutine was invoked. In a similar way, basic block B6 uses registers dx:ax before they are defined within the subroutine, thus, the values of these registers before subroutine invocation are used. These three registers are used before being defined in the subroutine, and are defined by the caller, thus, they are register arguments to the _aNlshl subroutine. The formal argument list of this subroutine is modified to reflect this fact:

```
formal_arguments(_aNlshl) = {arg1 = dx:ax, arg2 = cl}
```

Within the subroutine, these registers are replaced by their formal argument name.

5.2.5 Function Return Register(s)

Subroutines that return a value are called functions. Functions usually return values in registers, and these registers are then used by the caller subroutine. Consider the following code from basic blocks B2 and B3, Figure 5-2:

```
20  dx:ax = [bp-6]:[bp-8] ; def = {dx, ax} use = {}
21  call _aNlshl ; def = {} use = {dx, ax, cl}
22  [bp-6]:[bp-8] = dx:ax ; def = {} use = {dx, ax}
```

Instruction 21 invokes the subroutine _aNlshl. After subroutine return, instruction 22 uses registers dx:ax. These registers have been defined in the previous basic block at instruction 20, but since there is a subroutine invocation in between these two instructions, the subroutine needs to be checked for any modification(s) to registers dx:ax. Consider the code of basic block B6, Figure 5-2 after dead-register elimination:

```
35  dx:ax = dx:ax << 1
36  cx = cx - 1
37  jcond (cx <> 0) B6
```

Recall from Section 5.2.4 that dx:ax are register arguments. These registers are modified in instruction 35 by a shift left. Actually, they form part of a loop as instruction 37 jumps back to the initial instruction 35 if register cx is not equal to zero. After the loop is finished, the flow of control is transferred to basic block B7, which returns from this subroutine. The reference to registers dx:ax in instruction 22 are the modified versions of these registers. We can think of subroutine _aNlshl as a function that returns both these registers, so the call to function _aNlshl in instruction 21 is replaced by:
5.2 Types of Optimizations

Instruction 22 uses the two registers defined in instruction 21, so, by register copy propagation, we arrive to the following code:

\[ \text{dx:ax} = \text{call } _\text{aNlshl} ; \text{def} = \{\text{dx, ax}\} \text{ use} = \{\text{dx, ax, cl}\} \]

The return instruction of the function \(_\text{aNlshl}\) (instruction 38) is modified to return the registers \(\text{dx:ax}\), leading to the following code:

\[ \text{ret dx:ax} \]

5.2.6 Register Copy Propagation

An instruction is intermediate if it defines a register value that is used by a unique subsequent instruction. In machine language, intermediate instructions are used to move the contents of operands into registers, move the operands of an instruction into the registers that are used by a particular instruction, and to store the computed result in registers to a local variable. Consider the following code from basic block B2, Figure 5-2:

\[ \begin{align*}
16 & \quad \text{dx:ax} = [\text{bp-6}]:[\text{bp-8}] ; \text{def} = \{\text{dx, ax}\} \text{ use} = \{\} \\
17 & \quad \text{dx:ax} = \text{dx:ax} - [\text{bp-2}]:[\text{bp-4}] ; \text{def} = \{\text{dx, ax}\} \text{ use} = \{\text{dx, ax}\} \\
18 & \quad [\text{bp-6}]:[\text{bp-8}] = \text{dx:ax} ; \text{def} = \{\} \text{ use} = \{\text{dx, ax}\}
\end{align*} \]

Instruction 16 defines the long register \(\text{dx:ax}\) by copying the contents of the long local variable at \(\text{bp-6}\). This long register is then used in instruction 17 as an operand of a subtraction. The result is placed in the same long register, which is then copied to the long local variable at \(\text{bp-6}\) in instruction 18. As seen, instruction 16 defines the temporary long register \(\text{dx:ax}\) to be used in instruction 17, and this instruction redefines the register, and is then copied to the final local variable in instruction 18. These intermediate registers can be eliminated by replacing them with the local variable that was used to define them, thus, in instruction 17, registers \(\text{dx:ax}\) are replaced by the long local variable at \(\text{bp-6}\) which defined these registers in the previous instruction:

\[ \begin{align*}
17 & \quad \text{dx:ax} = [\text{bp-6}]:[\text{bp-8}] - [\text{bp-2}]:[\text{bp-4}] \\
\end{align*} \]

and instruction 16 is removed. In a similar way, the resultant long register \(\text{dx:ax}\) from instruction 17 is replaced in instruction 18, leading to the following code:

\[ \begin{align*}
18 & \quad [\text{bp-6}]:[\text{bp-8}] = [\text{bp-6}]:[\text{bp-8}] - [\text{bp-2}]:[\text{bp-4}] \\
\end{align*} \]

and instruction 17 is eliminated. The final instruction 18 is a reconstruction of the original high-level expression.

High-level language expressions are represented by parse trees of one or more operands, whereas machine language expressions allow only for at most two operands. In most cases, one of these operands needs to be in a register(s), and the result is also placed in a register(s). The final result is then copied to the appropriate identifier (i.e. local variable, argument, global variable). Consider the following code from basic block B1, Figure 5-2 after dead-register elimination:
Instruction 3 defines register ax by copying the contents of the integer register variable si. Register variables are treated as local variables rather than registers in this context. Instruction 4 uses register ax to define register dx by sign extension of register ax. Instruction 5 then uses these sign-extended registers to copy them to register tmp, which is used in instruction 6 as the dividend of a divide instruction. The local integer register variable di is used as the divisor, and the result is placed on register ax. This result is used in the multiplication in instruction 9, which also uses register dx and redefines register ax. Finally, the result is placed on the local register variable si. As seen, most of these instructions can be folded into a subsequent instruction, eliminating most of them as follows: instruction 3 is replaced into instruction 4, leading to:

4 dx:ax = ax

and instruction 3 is eliminated. Instruction 4 is replaced into instruction 5, leading to:

5 tmp = ax

and instruction 4 is eliminated. Instruction 5 is replaced into instruction 6, leading to:

6 ax = tmp / di

and instruction 5 is eliminated. Instruction 6 is replaced into instruction 9, leading to:

9 ax = (si / di) * dx

and instruction 6 is eliminated. Instruction 7 is replaced into instruction 9, leading to:

9 ax = (si / di) * 3

and instruction 7 is eliminated. Finally, instruction 9 is replaced into instruction 10, leading to the following final code:

10 si = (si / di) * 3

This final instruction 10 replaces all previous instructions 3 ... 10.

### 5.2.7 Actual Parameters

Actual parameters to a subroutine call are either pushed on the stack or placed on registers (for register arguments) before the subroutine is invoked. These arguments can be mapped against the formal argument list of the subroutine, and placed in the actual parameter list of the call instruction. Consider the following code from basic block B4, Figure 5-2 after register copy propagation:

24 push [bp-6]:[bp-8]
28 push (si * 5)
30 push 66
31 call printf
After parsing, the formal argument list of `printf` has one fixed argument of size 2 bytes, and a variable number of other arguments. The calling convention used for this procedure has been set to C. Instruction 31 has also saved the information regarding the number of bytes popped from the stack after subroutine call: 8 bytes in this case, thus, there are 8 bytes of actual arguments for this subroutine; the first 2 bytes are fixed. Instruction 24 pushes 4 bytes on the stack, instruction 28 pushes 2 bytes on the stack, and instruction 30 pushes another 2 bytes on the stack, for a total of 8 bytes required by `printf` in this instance. These identifiers can be replaced on the actual argument list of `printf` at instruction 31, in reverse order due to the C calling convention (i.e. last instruction pushed is the first one in the argument list). The modifications lead to the following code:

```
31 call printf (66, si * 5, [bp-6]:[bp-8])
```

and instructions 24, 28 and 30 are eliminated.

In a similar way, register arguments are placed on the actual argument list of the invoked subroutine. Consider the following code of basic blocks B2 and B3, Figure 5-2 after register argument and function return register detection and dead-register elimination:

```
19 cl = 4 ; def = {cl}
20 dx:ax = [bp-6]:[bp-8] ; def = {dx, ax}
22 [bp-6]:[bp-8] = call _aNlshl ; use = {dx, ax, cl}
```

Instruction 19 and 20 define the register arguments used by function `_aNlshl`, the associated register definitions are placed in the function’s actual argument list in the following way:

```
22 [bp-6]:[bp-8] = call _aNlshl ([bp-6]:[bp-8], 4)
```

eliminating instructions 19 and 20, and intermediate registers `dx`, `ax`, and `cl`.

### 5.2.8 Data Type Propagation Across Procedure Calls

The type of the actual arguments of a subroutine needs to be the same as the type of the formal arguments. In the case of library subroutines, the formal argument types are known with certainty, and thus, these types need to be matched against the actual types. If there are any differences, the formal type is to be propagated to the actual argument. Consider the following code from basic block B4, Figure 5-2 after register copy propagation and the detection of actual parameters:

```
31 call printf (66, si * 5, [bp-6]:[bp-8])
```

The first formal argument type of `printf` is a string (i.e. a `char *` in C). Strings are stored in machine language as data constants in the data or code segment. These strings are referenced by accessing the desired segment and an offset within that segment. In our example, 66 is a constant, and since it is the first argument to `printf` it is really an offset to a string located in the data segment. The string type is propagated to this first argument, the string is found in memory, and replaced in the actual argument list, leading to the following code:

```
31 call printf ("c * 5 = %d, a = %ld\n", si * 5, [bp-6]:[bp-8])
```

All other arguments to `printf` have undetermined type from the point of view of the formal argument list, so the types that the actual arguments have are trusted (i.e. the types used in the caller) and are not modified.
5.2.9 Register Variable Elimination

The register copy propagation optimization finds high-level expressions and eliminates intermediate instructions by eliminating most of the intermediate registers used in the computation of the expression, as seen in Section 5.2.6. After this optimization has been applied, there are only a few registers left (if any) in the intermediate code. These remaining registers represent register variables or common subexpressions, used by the compiler or the optimizer to speed up access time. These registers are equivalent to local variables in a high-level program, and are therefore replaced by new local variables in the corresponding subroutine that uses them. Consider the following code from basic block B1, Figure 5-2 after register copy propagation:

```plaintext
1  si = 20
2  di = 80
10 si = si / di * 3
```

Registers si and di are used as register variables in this procedure. These registers are initialized in instructions 1 and 2, and are later used in the expression of instruction 10. Let us rename register si by loc1 and register di by loc2, then the previous code would look like:

```plaintext
1  loc1 = 20
2  loc2 = 80
10 loc1 = loc1 / loc2 * 3
```

and all references to registers have been eliminated.

After applying all of the previously explained transformations, the final intermediate code for Figure 5-2 is shown in Figure 5-3.

5.3 Global Data Flow Analysis

In order to perform code-improving transformations on the intermediate code, the decompiler needs to collect information on registers and condition codes about the whole program, and propagate this information across the different basic blocks. The information is collected by a data flow analysis process, which solves systems of equations that relate information at various points in the program. This section defines data flow problems and equations available in the literature; refer to [All72, AC76, ASU86b, FJ88b] for more information.

5.3.1 Data Flow Analysis Definitions

**Definition 18** A register is defined if the content of the register is modified (i.e. it is assigned a new value). In a similar way, a flag is defined if it is modified by an instruction.

**Definition 19** A register is used if the register is referenced (i.e. the value of the register is used). In a similar way, a flag is used if it is referenced by an instruction.

**Definition 20** A locally available definition \( d \) in a basic block \( B_i \) is the last definition of \( d \) in \( B_i \).
Definition 21 A locally upwards exposed use $u$ in a basic block $B_i$ is a use which has not been previously defined in $B_i$.

Definition 22 A definition $d$ in basic block $B_i$ reaches basic block $B_j$ if

1. $d$ is a locally available definition from $B_i$.
2. $\exists B_i \rightarrow B_j$.
3. $\exists B_i \rightarrow B_j \land \forall B_k \in (B_i \rightarrow B_j), k \neq i \land k \neq j, B_k$ does not redefine $d$.

Definition 23 Any definition of a register/flag in a basic block $B_i$ is said to kill all definitions of the same register/flag that reach $B_i$.

Definition 24 A definition $d$ in a basic block $B_i$ is preserved if $d$ is not redefined in $B_i$.

Definition 25 The definitions available at the exit of a basic block $B_i$ are either:

1. The locally available definitions of the register/flag.
2. The definitions of the register/flag reaching $B_i$.

Definition 26 A use $u$ of a register/flag is upwards exposed in a basic block $B_i$ if either:

1. $u$ is locally upwards exposed from $B_i$.
2. $\exists B_i \rightarrow B_k \land u$ is locally upwards exposed from $B_k \land \exists B_j, i \leq j < k$, which contains a definition of $u$.

---

1The symbol $\rightarrow$ is used in this Chapter to represent a path. This symbol is defined in Chapter 6, Section 6.3.1.
Definition 27 A definition $d$ is live or active at basic block $B_i$ if:

1. $d$ reaches $B_i$
2. There is an upwards exposed use of $d$ at $B_i$.

Definition 28 A definition $d$ in basic block $B_i$ is busy (sometimes called very busy) if $d$ is used before being redefined along all paths from $B_i$.

Definition 29 A definition $d$ in basic block $B_i$ is dead if $d$ is not used before being redefined along all paths from $B_i$ (i.e. $d$ is not busy or live).

Definition 30 A definition-use chain (du-chain) for a definition $d$ at instruction $i$ is the set of instructions $j$ where $d$ could be used before being redefined (i.e. the instructions which can be affected by $d$).

Definition 31 A use-definition chain (ud-chain) for a use $u$ at instruction $j$ is the set of instructions $i$ where $u$ was defined (i.e. the statements which can affect $u$).

Definition 32 A path is $d$-clear if there is no definition of $d$ along that path.

5.3.2 Taxonomy of Data Flow Problems

Data flow problems are solved by a series of equations that uses information collected in each basic block, and propagates it across the complete control flow graph. Information propagated within the flow graph of a procedure is called intraprocedural data flow analysis, and information propagated across procedure calls is called interprocedural data flow analysis.

Information on registers defined or killed is collected from within the basic block in the form of sets (e.g. gen() and kill()), and is then summarized at basic block entrance and exit in the form of sets (e.g. in() and out() sets). A typical data flow equation for basic block $B_i$ has the following form:

$$ out(B_i) = gen(B_i) \cup (in(B_i) - kill(B_i)) $$

and stands for “the information at the end of basic block $B_i$ is either the information generated on $B_i$, or the information that entered the basic block and was not killed within the basic block”. The summary $in()$ information is collected from the predecessor nodes of the graph, by an equation of the form:

$$ in(B_i) = \bigcup_{p \in \text{Pred}(B_i)} out(p) $$

which collects information that is available at the exit of any predecessor node. This data flow problem is classified as an any-path problem, since the information collected from predecessors is derived from any path (i.e. not all paths need to have the same information). Any-path problems are represented in equations by a union of predecessors or successors, depending on the problem.

In a similar way, an all-paths problem is a data flow problem that is specified by an equation that collects information available in all paths from the current basic block to the successors or predecessors, depending on the type of problem.
Definition 33 A data flow problem is said to be **forward-flow** if

1. The out() set is computed in terms of the in() set within the same basic block.
2. The in() set is computed from the out() set of predecessor basic blocks.

Definition 34 A data flow problem is said to be **backward-flow** if

1. The in() set is computed in terms of the out() set within the same basic block.
2. The out() set is computed from the in() set of successor basic blocks.

This classification of data flow problems derives the taxonomy shown in Figure 5-4. For each forward- and backward-flow problem, all-path and any-path equations are defined in terms of successors and predecessors. This table is taken from [FJ88b].

<table>
<thead>
<tr>
<th></th>
<th>Forward-Flow</th>
<th>Backward-Flow</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Any path</strong></td>
<td>Out($B_i$) = Gen($B_i$) $\cup$ (In($B_i$) $-$ Kill($B_i$))</td>
<td>In($B_i$) = Gen($B_i$) $\cup$ (Out($B_i$) $-$ Kill($B_i$))</td>
</tr>
<tr>
<td></td>
<td>In($B_i$) = $\cup_{p \in \text{Pred}(B_i)}$ Out($p$)</td>
<td>Out($B_i$) = $\cup_{s \in \text{Succ}(B_i)}$ In($s$)</td>
</tr>
<tr>
<td><strong>All paths</strong></td>
<td>Out($B_i$) = Gen($B_i$) $\cup$ (In($B_i$) $-$ Kill($B_i$))</td>
<td>In($B_i$) = Gen($B_i$) $\cup$ (Out($B_i$) $-$ Kill($B_i$))</td>
</tr>
<tr>
<td></td>
<td>In($B_i$) = $\cap_{p \in \text{Pred}(B_i)}$ Out($p$)</td>
<td>Out($B_i$) = $\cap_{s \in \text{Succ}(B_i)}$ In($s$)</td>
</tr>
</tbody>
</table>

Figure 5-4: Data Flow Analysis Equations

Data Flow Equations

Data flow equations do not, in general, have unique solutions; but in data flow problems either the minimum or maximum fixed-point solution that satisfies the equations is the one of interest. Finding this solution is done by setting a boundary condition on the initial value of the in($B$) set of the header basic block for forward-flow problems, and the value of the out($B$) set of the exit basic block for backward-flow problems. Depending on the interpretation of the problem, these boundary condition sets are initialized to the empty or the universal set (i.e. all possible values).

Intraprocedural data flow problems solve equations for a subroutine without taking into account the values used or defined by other subroutines. As these problems are flow insensitive, the boundary conditions are set for all initial (for forward-flow problems) or all exit (for backward-flow problems) nodes. Interprocedural data flow problems solve equations for the subroutines of a program taking into account values used or defined by invoked subroutines. Information flows between subroutines of the call graph. These flow sensitive problems set the boundary condition only for the main subroutine of the program’s call graph, all other subroutines summarize information from all predecessor (in the case of forward-flow problems) or all successor (for backward-flow problems) nodes in the call graph (i.e. the caller node). This section presents data flow equations used to solve reaching, live, available, and busy registers.

The reaching register definition analysis determines which registers reach a particular basic block along some path, thus, the following forward-flow, any-path equations are used:
Definition 35  Let

• $B_i$ be a basic block

• $\text{ReachIn}(B_i)$ be the set of registers that reach the entrance to $B_i$

• $\text{ReachOut}(B_i)$ be the set of registers that reach the exit from $B_i$

• $\text{Kill}(B_i)$ be the set of registers killed in $B_i$

• $\text{Def}(B_i)$ be the set of registers defined in $B_i$

Then

$$\text{ReachIn}(B_i) = \begin{cases} \bigcup_{p \in \text{Pred}(B_i)} \text{ReachOut}(p) & \text{if } B_i \text{ is not the header node} \\ \emptyset & \text{otherwise} \end{cases}$$

$$\text{ReachOut}(B_i) = \text{Def}(B_i) \cup (\text{ReachIn}(B_i) - \text{Kill}(B_i))$$

Live register analysis determines whether a register is to be used along some path, thus, the following backward-flow, any-path equations are used:

Definition 36  Let

• $B_i$ be a basic block

• $\text{LiveIn}(B_i)$ be the set of registers that are live on entrance to $B_i$

• $\text{LiveOut}(B_i)$ be the set of registers that are live on exit from $B_i$

• $\text{Use}(B_i)$ be the set of registers used in $B_i$

• $\text{Def}(B_i)$ be the set of registers defined in $B_i$

Then

$$\text{LiveOut}(B_i) = \begin{cases} \bigcup_{s \in \text{Succ}(B_i)} \text{LiveIn}(s) & \text{if } B_i \text{ is not a return node} \\ \emptyset & \text{otherwise} \end{cases}$$

$$\text{LiveIn}(B_i) = \text{Use}(B_i) \cup (\text{LiveOut}(B_i) - \text{Def}(B_i))$$

Available register analysis determines which registers are available along all paths of the graph, thus, the following forward-flow, all-paths equations are used:

Definition 37  Let

• $B_i$ be a basic block

• $\text{AvailIn}(B_i)$ be the set of the registers that are available on entrance to $B_i$

• $\text{AvailOut}(B_i)$ be the set of the registers that are available on exit from $B_i$

• $\text{Compute}(B_i)$ be the set of the registers in $B_i$ computed and not killed

• $\text{Kill}(B_i)$ be the set of the registers in $B_i$ that are killed due to an assignment
Then

\[
\begin{align*}
AvailIn(B_i) &= \begin{cases} \bigcap_{p \in \text{Pred}(B_i)} AvailOut(p) & \text{if } B_i \text{ is not the header node} \\
\emptyset & \text{otherwise} \end{cases} \\
AvailOut(B_i) &= \text{Compute}(B_i) \cup (AvailIn(B_i) - \text{Kill}(B_i))
\end{align*}
\]

Busy register analysis determines which registers are busy along all paths of the graph, thus, the following backward-flow, all-paths equations are used:

**Definition 38** Let

- \(B_i\) be a basic block
- \(\text{BusyIn}(B_i)\) be the set of the registers that are busy on entrance to \(B_i\)
- \(\text{BusyOut}(B_i)\) be the set of the registers that are busy on exit from \(B_i\)
- \(\text{Use}(B_i)\) be the set of the registers that are used before killed in \(B_i\)
- \(\text{Kill}(B_i)\) be the set of the registers that are killed before used in \(B_i\)

Then

\[
\begin{align*}
\text{BusyOut}(B_i) &= \begin{cases} \bigcap_{s \in \text{Succ}(B_i)} \text{BusyIn}(s) & \text{if } B_i \text{ is not a return node} \\
\emptyset & \text{otherwise} \end{cases} \\
\text{BusyIn}(B_i) &= \text{Use}(B_i) \cup (\text{BusyOut}(B_i) - \text{Kill}(B_i))
\end{align*}
\]

The problem of finding the uses of a register definition, i.e. a du-chain problem, is solved by a backward-flow, any-path problem. Similarly, the problem of finding all definitions for a use of a register, i.e. a ud-chain problem, is solved by a forward-flow, any-path problem. The previous data flow problems are summarized in the table in Figure 5-5.

<table>
<thead>
<tr>
<th></th>
<th>Forward-Flow</th>
<th>Backward-Flow</th>
</tr>
</thead>
<tbody>
<tr>
<td>Any-path</td>
<td>Reach</td>
<td>Live</td>
</tr>
<tr>
<td></td>
<td>ud-chains</td>
<td>du-chains</td>
</tr>
<tr>
<td>All-path</td>
<td>Available</td>
<td>Busy</td>
</tr>
<tr>
<td></td>
<td>Copy propagation</td>
<td>Dead</td>
</tr>
</tbody>
</table>

Figure 5-5: Data Flow Problems - Summary

Recently, precise interprocedural live variable equations were presented as part of a code optimization at link-time system [SW93]. A two-phase approach is used in order to remove information propagation across unrelated subroutines that call the same other subroutine. The call graph has two nodes for each call node; the call node as such, which has an out-edge to the header node of the callee subroutine, and the ret\_call node, which has an in-edge from the return node of the callee subroutine. In the first phase, information flows across normal nodes and call edges only; return edges are removed from the call graph. In the second phase, information flows across normal nodes and return edges only; call edges are removed.
from the call graph. This phase makes use of the summary information calculated in the first phase. Because the information flows from the caller to the callee, and vice versa, this method provides a more precise information than other methods presented in the literature.

Definition 39 presents the equations used for precise interprocedural register analysis. Live and dead register equations are solved for the first phase, and summarized for each subroutine of the call graph in the PUse() and PDef() sets. Since live register equations are also solved in the second phase, these equations have been associated with the phase number to differentiate them (e.g. LiveIn1() for the first phase, and LiveIn2() for the second phase). Separate equations are given for call, and ret_call basic blocks. The initial boundary conditions for both live and dead equations is the empty set.

**Definition 39** Let

- \( B_i \) be a basic block other than call and ret_call
- \( \text{LiveIn1}(B_j) \) be the set of registers that are live on entrance to \( B_j \) during phase one
- \( \text{LiveOut1}(B_j) \) be the set of registers that are live on exit from \( B_j \) during phase one
- \( \text{DeadIn}(B_j) \) be the set of registers that have been killed on entrance to \( B_j \)
- \( \text{DeadOut}(B_j) \) be the set of registers that have been killed on exit from \( B_j \)
- \( \text{Use}(B_j) \) be the set of registers used in \( B_j \)
- \( \text{Def}(B_j) \) be the set of registers defined in \( B_j \)
- \( \text{LiveIn2}(B_j) \) be the set of registers that are live on entrance to \( B_j \) during phase two
- \( \text{LiveOut2}(B_j) \) be the set of registers that are live on exit from \( B_j \) during phase two

Then precise interprocedural live register analysis is calculated as follows:

- **Phase 1:**
  
  \[
  \text{LiveOut1}(B_i) = \begin{cases} 
  \bigcup_{s \in \text{Succ}(B_i)} \text{LiveIn1}(s) & \text{if } B_i \text{ is not a return node} \\
  \emptyset & \text{otherwise}
  \end{cases}
  \]
  
  \[
  \text{LiveIn1}(B_i) = \text{Use}(B_i) \cup (\text{LiveOut1}(B_i) - \text{Def}(B_i))
  \]
  
  \[
  \text{DeadOut}(B_i) = \begin{cases} 
  \bigcap_{s \in \text{Succ}(B_i)} \text{DeadIn}(s) & \text{if } B_i \text{ is not a return node} \\
  \emptyset & \text{otherwise}
  \end{cases}
  \]
  
  \[
  \text{DeadIn}(B_i) = \text{Def}(B_i) \cup (\text{DeadOut}(B_i) - \text{Use}(B_i))
  \]

  \[
  \text{LiveOut1}(\text{ret\_call}) = \bigcup_{s \in \text{Succ}(\text{ret\_call})} \text{LiveIn1}(s)
  \]
  
  \[
  \text{LiveIn1}(\text{ret\_call}) = \text{LiveOut1}(\text{ret\_call})
  \]
  
  \[
  \text{LiveOut1}(\text{call}) = \text{LiveIn1}(\text{entry}) \cup (\text{LiveOut1}(\text{ret\_call}) - \text{DeadIn}(\text{entry}))
  \]
  
  \[
  \text{LiveIn1}(\text{call}) = \text{LiveOut1}(\text{call})
  \]

  \[
  \text{DeadOut}(\text{ret\_call}) = \bigcap_{s \in \text{Succ}(\text{ret\_call})} \text{DeadIn}(s)
  \]
  
  \[
  \text{DeadIn}(\text{ret\_call}) = \text{DeadOut}(\text{ret\_call})
  \]
  
  \[
  \text{DeadOut}(\text{call}) = \text{DeadIn}(\text{entry}) \cup (\text{DeadOut}(\text{ret\_call}) - \text{LiveIn1}(\text{entry}))
  \]
  
  \[
  \text{DeadIn}(\text{call}) = \text{DeadOut}(\text{call})
  \]
5.3 Global Data Flow Analysis

- **Subroutine summary:** \( \forall p \) subroutine,
  
  \[
  P\text{Use}(p) = \text{LiveIn1(entry)} \\
  P\text{Def}(p) = \text{DeadIn1(entry)}
  \]

- **Phase 2:**
  
  \[
  \begin{align*}
  \text{LiveOut2}(B_i) & = \begin{cases} 
  \bigcup_{s \in \text{Succ}(B_i)} \text{LiveIn2}(s) & \text{if } B_i \text{ is not the return node of main} \\
  \emptyset & \text{otherwise}
  \end{cases} \\
  \text{LiveIn2}(B_i) & = \text{Use}(B_i) \cup (\text{LiveOut2}(B_i) - \text{Def}(B_i)) \\
  \text{LiveOut2}(\text{ret\_call}) & = \bigcup_{s \in \text{Succ}(\text{ret\_call})} \text{LiveIn2}(s) \\
  \text{LiveIn2}(\text{ret\_call}) & = \text{LiveOut2}(\text{ret\_call}) \\
  \text{LiveOut2}(\text{call}) & = P\text{Use}(p) \cup (\text{LiveOut2}(\text{ret\_call}) - P\text{Def}(p)) \\
  \text{LiveIn2}(\text{call}) & = \text{LiveOut2}(\text{call})
  \end{align*}
  \]

Figure 5-6: Live Register Example Graph

**Example 8** Consider the call graph of Figure 5-6. This program has a main procedure and two subroutines. Interprocedural live register analysis, as explained in Definition 39 provides the following summary information for its nodes:
- **Phase 1:**

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Node</th>
<th>Def</th>
<th>Use</th>
<th>LiveIn1</th>
<th>LiveOut1</th>
<th>DeadIn</th>
<th>DeadOut</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>0</td>
<td>{cx}</td>
<td>{cx}</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>{ax,cx}</td>
<td>0</td>
<td>0</td>
<td>{cx}</td>
<td>{ax,cx}</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>9</td>
<td>{ax}</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>{ax}</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>8</td>
<td>0</td>
<td>{dx}</td>
<td>{dx}</td>
<td>0</td>
<td>{ax}</td>
<td>{ax}</td>
</tr>
<tr>
<td>P2</td>
<td>16</td>
<td>0</td>
<td>{ax}</td>
<td>{ax}</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>{ax}</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>14</td>
<td>0</td>
<td>{dx}</td>
<td>{dx}</td>
<td>{ax}</td>
<td>{ax}</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>13</td>
<td>{dx}</td>
<td>0</td>
<td>0</td>
<td>{dx}</td>
<td>{ax,dx}</td>
<td>{ax}</td>
</tr>
<tr>
<td>main</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>0</td>
<td>{ax,bx}</td>
<td>{ax,bx}</td>
<td>0</td>
<td>{dx}</td>
<td>{ax,dx}</td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>{ax,bx}</td>
<td>{ax,bx}</td>
<td>{dx}</td>
<td>{dx}</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>{bx,dx}</td>
<td>{bx,dx}</td>
<td>{ax}</td>
<td>{ax}</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>{bx,dx}</td>
<td>0</td>
<td>0</td>
<td>{bx,dx}</td>
<td>{bx,dx}</td>
<td>{ax,bx,dx}</td>
</tr>
</tbody>
</table>

- **Subroutine summary:**

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>PUse</th>
<th>PDef</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1</td>
<td>{dx}</td>
<td>{ax}</td>
</tr>
<tr>
<td>P2</td>
<td>0</td>
<td>{ax,dx}</td>
</tr>
<tr>
<td>main</td>
<td>0</td>
<td>{ax,bx,dx}</td>
</tr>
</tbody>
</table>

- **Phase 2:**

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Node</th>
<th>Def</th>
<th>Use</th>
<th>LiveIn2</th>
<th>LiveOut2</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1</td>
<td>12</td>
<td>0</td>
<td>0</td>
<td>{ax,bx}</td>
<td>{ax,bx}</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>0</td>
<td>{cx}</td>
<td>{ax,bx, cx}</td>
<td>{ax,bx}</td>
</tr>
<tr>
<td></td>
<td>10</td>
<td>{ax,cx}</td>
<td>0</td>
<td>{bx}</td>
<td>{ax,bx, cx}</td>
</tr>
<tr>
<td></td>
<td>9</td>
<td>{ax}</td>
<td>0</td>
<td>{bx}</td>
<td>{ax,bx}</td>
</tr>
<tr>
<td></td>
<td>8</td>
<td>0</td>
<td>{dx}</td>
<td>{bx,dx}</td>
<td>{bx}</td>
</tr>
<tr>
<td>P2</td>
<td>16</td>
<td>0</td>
<td>{ax}</td>
<td>{ax}</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>15</td>
<td>0</td>
<td>0</td>
<td>{ax}</td>
<td>{ax}</td>
</tr>
<tr>
<td></td>
<td>14</td>
<td>0</td>
<td>0</td>
<td>{dx}</td>
<td>{dx}</td>
</tr>
<tr>
<td></td>
<td>13</td>
<td>{dx}</td>
<td>0</td>
<td>0</td>
<td>{dx}</td>
</tr>
<tr>
<td>main</td>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>0</td>
<td>{ax,bx}</td>
<td>{ax,bx}</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>{ax,bx}</td>
<td>{ax,bx}</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>{bx,dx}</td>
<td>{bx,dx}</td>
</tr>
<tr>
<td></td>
<td>1</td>
<td>{bx,dx}</td>
<td>0</td>
<td>0</td>
<td>{bx,dx}</td>
</tr>
</tbody>
</table>
Other types of data flow equations are also used to solve data flow problems. Consider the problem of finding all reaching register definitions to a basic block $B_i$ according to Definition 22. In this definition, the reaching problem is defined in terms of the available problem; a register reaches a basic block if that register is available along some path from a predecessor node to the current node. This problem is equivalent to finding the set $\text{ReachIn}()$. The following equation is used to solve this problem:

**Definition 40** Let

1. $B_i$ be a basic block
2. $\text{Reach}(B_i)$ be the set of reaching registers to $B_i$
3. $\text{Avail}(B_i)$ be the set of available registers from $B_i$

Then

$$\text{Reach}(B_i) = \bigcup_{p \in \text{Pred}(B_i)} \text{Avail}(p)$$

The problem of finding available registers out of a basic block is defined in terms of locally available and reaching definitions (see Definition 25). This problem is equivalent to finding the set $\text{AvailOut}()$. The following equation is used:

**Definition 41** Let

1. $B_i$ be a basic block
2. $\text{Avail}(B_i)$ be the set of available registers from $B_i$
3. $\text{Reach}(B_i)$ be the set of reaching registers to $B_i$
4. $\text{Propagate}(B_i)$ be the set of the registers that are propagated across $B_i$
5. $\text{Def}(B_i)$ be the set of locally available definitions in $B_i$

Then

$$\text{Avail}(B_i) = \text{Def}(B_i) \cup (\text{Reach}(B_i) \cap \text{Propagate}(B_i))$$

Finally, Definition 27 defines the live register problem in terms of reaching definitions and upwards exposed uses. This problem is equivalent to solving the equation to the $\text{LiveIn}()$ set. The following equation is used:

**Definition 42** Let

1. $B_i$ be a basic block
2. $\text{Live}(B_i)$ be the set of live registers on entrance to $B_i$
3. $\text{Reach}(B_i)$ be the set of reaching registers to $B_i$
4. $\text{UpwardExp}(B_i)$ be the set of the registers that are upwards exposed in $B_i$

Then

$$\text{Live}(B_i) = \text{Reach}(B_i) \cap \text{UpwardExp}(B_i)$$
5.3.3 Solving Data Flow Equations

Given the control flow graph of a subroutine, data flow equations can be solved by two different methods: the iterative method, where a solution is recomputed until a fixed-point is met; and the interval method, where a solution is found for an interval and then propagated across the nodes in that interval. These equations do not have a unique solution, but the minimal solution is taken as the answer. Iterative algorithms are explained in [ASU86b], and interval algorithms are given in [All72, AC76].

5.4 Code-improving Optimizations

This section describes how data flow information is used to solve code-improving optimizations for a decompiler. The aim of these optimizations is to eliminate all references to condition codes and registers as they do not exist in high-level languages, and to regenerate the high-level expressions available in the decompiled program. This section makes references to the initial Figure 5-2, which is replicated here for convenience as Figure 5-7.

5.4.1 Dead-Register Elimination

A register is dead if it is defined by an instruction and it is not used before being redefined by a subsequent instruction. If the instruction that defines a dead register defines only this one register, it is said that the instruction is useless, and thus, is eliminated. On the other hand, if the instruction also defines other register(s), the instruction is still useful but should not define the dead register any more. In this case, the instruction is modified to reflect this fact. Dead register analysis is solved with the use of definition-use chains on registers, as the definition-use chain states which instructions use the defined register; if there are no instructions that use this register, the register is dead. Consider the following code from basic block B1, Figure 5-7 with definition-use (du) chains for all registers defined. Note that register variables do not have a du-chain as they represent local variables rather than temporary registers.

6 ax = tmp / di ; du(ax) = {9}
7 dx = tmp % di ; du(dx) = {}
8 dx = 3 ; du(dx) = {9}
9 dx:ax = ax * dx ; du(ax) = {10} du(dx) = {}
10 si = ax

From inspection, register dx at instruction 7 and 9 is defined but not subsequently used before redefinition, so it is dead in both instructions. Instruction 7 defines only this register, thus, it is redundant and can be eliminated. Instruction 9 also defines register ax, so the instruction is modified to reflect the fact that dx is not defined by the instruction any more. The resulting code looks like this:

6 ax = tmp / di ; du(ax) = {9}
8 dx = 3 ; du(dx) = {9}
9 ax = ax * dx ; du(ax) = {10}
10 si = ax
The algorithm in Figure 5-8 finds all registers that are dead and removes them from the code.

For the purposes of decompilation optimization, du-chains are to be used again later on, so the du-chains needs to be updated to reflect the elimination of some instructions: if an instruction $i$ is to be eliminated due to a dead register definition $r$ defined in terms of other registers (i.e. $r = f(r_1, \ldots, r_n), n \geq 1$), the uses of these registers at instruction $i$ no longer exist, and thus, the corresponding du-chains of the instructions that define the registers used at $i$ are to be modified so that they no longer have a reference to $i$. This problem is solved by checking the use-definition chain of $i$, which states which instructions $j$ define registers used in $i$. Consider again the piece of code from basic block B1 with du and ud (use-definition) chains on registers:
procedure DeadRegElim
/* Pre: du-chains on registers have been computed for all instructions.*/
/* Post: dead registers and instructions are eliminated */

for (each basic block b) do
    for (each instruction i in b) do
        for (each register r defined in i) do
            if (du(r) = {}) then
                if (i defines only register r) then
                    eliminate instruction i
                else
                    modify instruction i not to define register r
                    def(i) = def(i) - {r}
                end if
            end if
        end for
    end for
end for
end procedure

Figure 5-8: Dead Register Elimination Algorithm

5  tmp = dx:ax ; du(tmp) = {6,7} ; ud(dx) = {4} ud(ax) = {4}
6  ax = tmp / di ; du(ax) = {9} ; ud(tmp) = {5}
7  dx = tmp % di ; du(dx) = {} ; ud(tmp) = {5}
8  dx = 3 ; du(dx) = {9}
9  dx:ax = ax * dx ; du(ax) = {10} du(dx)={} ; ud(ax) = {6} ud(dx) = {8}
10 si = ax ; ud(ax) = {9}

When instruction 7 is detected to be redundant, its ud-chain is checked for any instruction(s) that defined the register(s) involved in the computation of the dead register dx. As seen, register tmp is used at instruction 7 and was defined in instruction 5 (ud(tmp) = {5}), which has a du-chain of instructions 6 and 7. Since instruction 7 is going to be eliminated, the du-chain of instruction 5 must be updated to reach only instruction 6, leading to the following code after dead register elimination and du-chain update:

5  tmp = dx:ax ; du(tmp) = {6} ; ud(dx) = {4} ud(ax) = {4}
6  ax = tmp / di ; du(ax) = {9} ; ud(tmp) = {5}
8  dx = 3 ; du(dx) = {9}
9  ax = ax * dx ; du(ax) = {10} ; ud(ax) = {6} ud(dx) = {8}
10 si = ax ; ud(ax) = {9}

The algorithm in Figure 5-9 solves the problem of updating du-chains while doing dead-register elimination. This algorithm should be invoked by the deadRegElim procedure once
an instruction is detected to be redundant, and before it is removed. Note that the du-chain for a particular register might become empty, leading to further dead registers that are recursively eliminated from the code.

procedure UpdateDuChain (i: instructionNumber)
/* Pre: ud and du-chains on registers have been computed for all instructions. */ instruction i is to be eliminated.
/* Post: no du-chain references instruction i any more */

for (each register r used in instruction i) do
  for (each instruction j in ud(r)) do
    if (i in du(r) at instruction j) then
      du(r) = du(r) - {i}
      if (du(r) = {}) then
        if (j defines only register r) then
          updateDuChain (j)
          eliminate instruction j
        else
          modify instruction j not to define register r
          def(j) = def(j) - {r}
        end if
      end if
    end if
  end for
end for
end procedure

Figure 5-9: Update of du-chains

5.4.2 Dead-Condition Code Elimination

A condition code (or flag) is dead if it is defined by an instruction and is not used before redefinition. Since the definition of a condition code is a side effect of an instruction (i.e. the instruction has another function), eliminating dead-flags does not make an instruction redundant, therefore, instructions are not eliminated by dead-flag elimination. In this analysis, once a condition code has been determined to be dead, it is no longer necessary for it to be defined by an instruction, so this information is removed from the instruction. Information on condition codes is kept in an instruction in the form of sets: a set of defined conditions and a set of used conditions (i.e. bitsets). The analysis used to find which condition codes are dead is similar to dead-register analysis in that du-chains are used. In this case there is no need of ud-chains, since no instruction is eliminated. Consider the following code from basic block B1, Figure 5-7, with du-chains on condition codes:

14 cmp [bp-6]:[bp-8], dx:ax ; def={ZF,CF,SF} ; du(SF)={15} du(CF,ZF)={}
15 jg B2 ; use={SF}
Instruction 14 defines condition codes ZF (zero), CF (carry), and SF (sign). Checking the du-chains of these conditions we find that only flag SF is used later on, thus, the other flags are not used after this definition, and are therefore dead. The definition of these flags is removed from the code associated with instruction 14, leading to the following code:

```
14  cmp [bp-6]:[bp-8], dx:ax ; def = {SF} ; du(SF)={15}
15  jg B2 ; use = {SF}
```

The algorithm in Figure 5-10 finds all condition codes that are dead and eliminates them.

```
procedure DeadCCElim
/* Pre: du-chains on condition codes have been computed for all instructions.
 * Post: dead condition codes are eliminated */

for (each basic block b) do
    for (each instruction i in b) do
        for (each condition code c in def(i)) do
            if (du(c) = {}) then
                def(i) = def(i) - {c}
            end if
        end for
    end for
end for
```

Figure 5-10: Dead Condition Code Elimination Algorithm

### 5.4.3 Condition Code Propagation

Dead-condition code elimination removes all definitions of condition codes that are not used in the program. All remaining condition code definitions have a use in a subsequent instruction, and are to be eliminated after capturing the essence of the condition. The problem can be solved by means of du-chains or ud-chains in condition codes; either way provides an equivalent solution. Consider the following code from basic block B1, Figure 5-7 with ud-chains on condition codes:

```
14  cmp [bp-6]:[bp-8], dx:ax ; def = {SF}
15  jg B2 ; use = {SF} ; ud(SF) = {14}
```

For a particular flag(s) use, we find the instruction that defined the flag(s) and merge these two instructions according to the Boolean condition implicit in the instruction that uses the flag. Instruction 15 uses flag SF, and implicitly checks for a greater-than Boolean condition. Instruction 14 defines the flag used in instruction 15, and it compares the first identifier ([bp-6]:[bp-8]) against the second identifier (dx:ax). If the first identifier is greater than the second identifier, the SF is set. Other flags that were originally set by this instruction have been eliminated via dead-condition code elimination, so are not considered. It is
obvious from the function of these two instructions that the propagation of the condition that sets the SF (i.e. comparing two identifiers) to the instruction that uses this condition will eliminate the instruction that defines the condition, and will generate a Boolean condition for the instruction that uses the condition. In our example, the propagation of the SF leads to the following code:

```plaintext
15 jcond ([bp-6]:[bp-8] > dx:ax) B2
```

thus, eliminating all flag references.

**Condition Code Uses within Extended Basic Blocks**

**Definition 43** An extended basic block is a sequence of basic blocks \( B_1, \ldots, B_n \) such that for \( 1 \leq i < n \), \( B_i \) is the only predecessor of \( B_{i+1} \), and for \( 1 < i \leq n \), \( B_i \) has only a conditional jump instruction.

Flag definition and uses occur in the same basic block in most programs. In some standard cases, the flag definition is not within the same block of the flag use, but is within the same extended basic block, as in the following code:

```plaintext
1 cmp ax, dx ; def = {SF,ZF} ; du(SF) = {2} du(ZF) = {3} 
2 jg Bx ; use = {SF} ; ud(SF) = {1} 
3 je By ; use = {ZF} ; ud(ZF) = {1} 
```

In this case, instruction 1 defines two flags: SF and ZF. The sign flag is used by instruction 2 (within the same basic block), and the zero flag is used by instruction 3 (in a different basic block but within the same extended basic block). The sign condition from instruction 1 is propagated to instruction 2, which checks for a greater-than Boolean condition, and instruction 2 is replaced by:

```plaintext
1 cmp ax, dx ; def = {ZF} ; du(ZF) = {3} 
2 jcond (ax > dx) Bx 
3 je By ; use = {ZF} ; ud(ZF) = {1} 
```

Since instruction 1 also defines the zero flag, which is used at instruction 3, the instruction is not removed yet, as the identifiers that form part of the Boolean condition need to be known. Following the analysis, when instruction 3 is analyzed, the definition of the zero flag in instruction 1 is propagated to the use of this flag in instruction 3, and generates a Boolean condition that checks for the equality of the two registers. Since there are no other definitions of condition codes in instruction 1, this instruction is now safely eliminated, leading to the following code:

```plaintext
2 jcond (ax > dx) Bx 
3 jcond (ax = dx) By 
```

The algorithm can be extended to propagate condition codes that are defined in two or more basic blocks (i.e. by doing an and of the individual Boolean conditions), but it has not been required in practice, since it is almost unknown for even optimising compilers to attempt to track flag definitions across basic block boundaries[Gou93]. The algorithm in Figure 5-11 propagates the condition codes within an extended basic block.

The Boolean conditional expressions derived from this analysis generate expressions of the form described by the BNF in Figure 5-12. These expressions are saved as parse trees in the intermediate high-level representation.
procedure CondCodeProp
    /* Pre: dead-condition code elimination has been performed.
     * the sets of defined and used flags has been computed for all
     * instructions.
     * ud-chains on condition codes have been computed for all instructions.
     * Post: all references to condition codes have been eliminated */
    for (all basic blocks b in postorder)
        for (all instructions i in b in last to first order)
            if (use(i) <> {}) then /* check for a flag use */
                for (all flags f in use(i)) do
                    j = ud(f)
                    def(j) = def(j) - {f} /* remove it from the set */
                    propagate identifiers from instruction j to the Boolean
                    condition in instruction i (do not store repetitions).
                    if (def(j) = {}) then
                        eliminate instruction j.
                    end if
                end for
            end if
        end for
    end for
end procedure

Figure 5-11: Condition Code Propagation Algorithm

| Cond         | ::=  | (Cond ∧ RelTerm) | (Cond | RelTerm) | RelTerm |
|--------------|------|-------------------|--------------|---------|
| RelTerm      | ::=  | Factor op Factor  |
| Factor       | ::=  | register | localVar | literal | parameter | global |
| op           | ::=  | ≤ | < | = | > | ≥ | <> |

Figure 5-12: BNF for Conditional Expressions

5.4.4 Register Arguments

The register calling convention is used by compilers to speed up the invocation of a
subroutine. It is an option available in most contemporary compilers, and is also used
by the compiler runtime support routines. Given a subroutine, register arguments translate
to registers that are used by the subroutine before being defined in the subroutine; i.e.
upwards exposed uses of registers overall the whole subroutine. Consider the following
code from basic blocks B5 and B6, Figure 5-7, subroutine _aN1shl after condition code
elimination:
Instruction 34 uses register `cx`, which has not been completely defined in this subroutine: the high part, register `ch` is defined in instruction 33, but the low part is not defined at all. A similar problem is encountered in instruction 35: the registers `dx:ax` are not defined in the subroutine before being used. Information on registers used before being defined is summarized by an intraprocedural live register analysis: a register is live on entrance to the basic block that uses it. This analysis is done by solving the intraprocedural live register equations of Definition 36, or the equations for the first phase of precise interprocedural live register analysis (Definition 39). Performing live register analysis on subroutine `_aNlshl` leads to the following LiveIn and LiveOut sets:

<table>
<thead>
<tr>
<th>Basic Block</th>
<th>LiveIn</th>
<th>LiveOut</th>
</tr>
</thead>
<tbody>
<tr>
<td>B5</td>
<td><code>{dx, ax, cl}</code></td>
<td><code>{dx, ax}</code></td>
</tr>
<tr>
<td>B6</td>
<td><code>{dx, ax}</code></td>
<td><code>{}</code></td>
</tr>
<tr>
<td>B7</td>
<td><code>{}</code></td>
<td><code>{}</code></td>
</tr>
</tbody>
</table>

The set of LiveIn registers summarized for the header basic block B5 is the set of register arguments used by the subroutine; `dx`, `ax`, and `cl` in this example. The formal argument list of this subroutine is updated to reflect these two arguments:

```plaintext```
formal_arguments(_aNlshl) = (arg1 = dx:ax, arg2 = cl)
```

It is said that the `_aNlshl` subroutine *uses* these registers. In general, any subroutine that makes use of register arguments uses those registers, thus, an invocation to one of these subroutines (i.e. a `call` instruction) is also said to use those registers, as in the following instruction:

```plaintext```
21 call _aNlshl ; use = `{dx, ax, cl}`
```

The algorithm in Figure 5-13 finds the set of register arguments (if any) to a subroutine.

### 5.4.5 Function Return Register(s)

Functions return results in registers, and there is no machine instruction that states which registers are being returned by the function. After function return, the caller uses the registers returned by the function before they are redefined (i.e. these registers are live on entrance to the basic block that follows the function call). This register information is propagated across subroutine boundaries, and is solved with a reaching and live register analysis. Consider the following code from basic blocks B2 and B3, Figure 5-7:

```plaintext```
20 dx:ax = [bp-6]:[bp-8] ; def = `{dx, ax}` use = `{}`
21 call _aNlshl ; def = `{}` use = `{dx, ax, cl}`
22 [bp-6]:[bp-8] = dx:ax ; def = `{}` use = `{dx, ax}`
```

```plaintext```
33 ch = 0
34 jcond (cx = 0) B7 ; ud(ch) = `{33}` ud(cl) = `{}`
35 dx:ax = dx:ax << 1 ; ud(dx:ax) = `{}`
```
procedure FindRegArgs (s: subroutineRecord)
/* Pre: intraprocedural live register analysis has been performed on
* subroutine s.
* Post: uses(s) is the set of register arguments of subroutine s. */

if (LiveIn(headerNode(s)) <> {}) then
    uses(s) = LiveIn(headerNode(s))
else
    uses(s) = {}
end if
end procedure

Instruction 22 uses registers \texttt{dx:ax}; these registers are defined in instruction 20, but between this definition and the use a subroutine call occurs. Since it is not known whether this subroutine is a procedure or a function, it is not safe to assume that the definition in instruction 20 is the one reaching the use in instruction 22. Summary information is needed to determine which definition reaches instruction 22. Performing an intraprocedural reaching register analysis on subroutine \texttt{_aNlshl} leads to the following ReachIn and ReachOut sets:

<table>
<thead>
<tr>
<th>Basic Block</th>
<th>ReachIn</th>
<th>ReachOut</th>
</tr>
</thead>
<tbody>
<tr>
<td>B5</td>
<td>{}</td>
<td>{ch}</td>
</tr>
<tr>
<td>B6</td>
<td>{ch}</td>
<td>{cx,dx,ax}</td>
</tr>
<tr>
<td>B7</td>
<td>{cx,dx,ax}</td>
<td>{cx,dx,ax}</td>
</tr>
</tbody>
</table>

This analysis states that the last definitions of registers \texttt{cx}, \texttt{dx}, and \texttt{ax} reach the end of the subroutine (i.e. ReachOut set of basic block B7). The caller subroutine uses only some of these reaching registers, thus it is necessary to determine which registers are upwards exposed in the successor basic block(s) to the subroutine invocation. This information is calculated by solving the interprocedural live register equations of Definition 36, or the second phase of precise interprocedural live register analysis (Definition 39). Since the information needs to be accurate, the live register analysis equations are solved in an optimistical way; i.e. a register is live if a use of that register is seen in a subsequent node. The following LiveIn and LiveOut sets are calculated for the example of Figure 5-7:

<table>
<thead>
<tr>
<th>Basic Block</th>
<th>LiveIn</th>
<th>LiveOut</th>
</tr>
</thead>
<tbody>
<tr>
<td>B1</td>
<td>{}</td>
<td>{}</td>
</tr>
<tr>
<td>B2</td>
<td>{}</td>
<td>{dx,ax}</td>
</tr>
<tr>
<td>B3</td>
<td>{dx,ax}</td>
<td>{}</td>
</tr>
<tr>
<td>B4</td>
<td>{}</td>
<td>{}</td>
</tr>
<tr>
<td>B5</td>
<td>{dx,ax,cl}</td>
<td>{dx,ax}</td>
</tr>
<tr>
<td>B6</td>
<td>{dx,ax}</td>
<td>{dx,ax}</td>
</tr>
<tr>
<td>B7</td>
<td>{dx,ax}</td>
<td>{dx,ax}</td>
</tr>
</tbody>
</table>
From the three registers that reach basic block B3, only two of these registers are used (i.e.
belong to LiveIn of B3): \(dx:ax\), thus, these registers are the only registers of interest once
the called subroutine has been finished, and are the registers returned by the function. The
condition that checks for returned registers is:

\[
\text{ReachOut}(B7) \cap \text{LiveIn}(B3) = \{dx,ax\}
\]

In general, a subroutine can have one or more return nodes, therefore, the ReachOut() set
of the subroutine must have all registers that reach each single exit. The following equation
summarizes the ReachOut information for a subroutine \(s\):

\[
\text{ReachOut}(s) = \cap_{B_i=\text{return}} \text{ReachOut}(B_i)
\]

Once a subroutine has been determined to be a function, and the register(s) that the function
returns has been determined, this information is propagated to two different places: the
return instruction(s) from the function, and the instructions that call this function. In the
former case, all return basic blocks have a \texttt{ret} instruction; and this instruction is modified
to return the registers that the function returns. In our example, instruction 38 of basic
block B7, Figure 5-7 is modified to the following code:

\begin{verbatim}
38 ret dx:ax
\end{verbatim}

In the latter case, any function invocation instruction (i.e. \texttt{call} instruction) is replaced
by an \texttt{asgn} instruction that takes as left-hand side the defined register(s), and takes the
function call as the right-hand side of the instruction, as in the following code:

\begin{verbatim}
21 dx:ax = call _aNlshl ; def = \{dx,ax\} use = \{dx, ax, cl\}
\end{verbatim}

The instruction is transformed into an \texttt{asgn} instruction, and defines the registers on the
left-hand side (lhs).

The algorithm in Figure 5-14 determines which subroutines are functions (i.e. return a value
in a register(s)). It is important to note that in the case of library functions whose return
register(s) is not used, the call is not transformed into an \texttt{asgn} instruction but remains as
a \texttt{call} instruction.

### 5.4.6 Register Copy Propagation

Register copy propagation is the method by which a defined register in an assignment
instruction, say \(ax = cx\), is replaced in a subsequent instruction(s) that references or uses
this register, if neither register is modified (i.e. redefined) after the assignment (i.e. neither
\(ax\) nor \(cx\) is modified). If this is the case, references to register \(ax\) are replaced by references
to register \(cx\), and, if all uses of \(ax\) are replaced by \(cx\) then \(ax\) becomes dead and the
assignment instruction is eliminated. A use of \(ax\) can be replaced with a use of \(cx\) if
\(ax = cx\) is the only definition of \(ax\) that reaches the use of \(ax\) and if no assignments to \(cx\)
have occurred after the instruction \(ax = cx\). The former condition is checked with ud-chains
on registers. The latter condition is checked with an \(r\)-clear condition (i.e. a forward-flow,
all-paths analysis). Consider the following code from basic block B2, Figure 5-7 with ud-
chains and du-chains:
procedure FindRetRegs
    /* Pre: interprocedural live register analysis has been performed.
    * intraprocedural reaching register definition has been performed.
    * Post: def(f) is the set of registers returned by a function f.
    * call instruction to functions are modified to asgn instructions.
    * ret instructions of functions return the function return registers.*/
    for (all subroutines s) do
        for (all basic blocks b in postorder) do
            for (all instructions i in b) do
                if (i is a call instruction to subroutine f) then
                    if (function(f) == False) then /* f is not a function so far */
                        def(i) = LiveIn(succ(b)) intersect ReachOut(f)
                    if (def(i) <> {}) then /* it is a function */
                        def(f) = def(i)
                        function(f) = True
                        rhs(i) = i /* convert i into an asgn inst */
                        lhs(i) = def(f)
                        opcode(i) = asgn
                        for (all ret instructions j of function f) do
                            exp(j) = def(f) /* propagate return register(s) */
                        end for
                    end if
                else /* f is a function */
                    rhs(i) = i /* convert i into an asgn inst */
                    lhs(i) = def(f)
                    opcode(i) = asgn
                    def(i) = def(f) /* registers defined by i */
                end if
            end if
        end for
    end for
end procedure

Figure 5-14: Function Return Register(s)

16 dx:ax = [bp-6]:[bp-8] ; du(dx:ax) = {17}
17 dx:ax = dx:ax - [bp-2]:[bp-4] ; ud(dx:ax) = {16} du(dx:ax) = {18}
18 [bp-6]:[bp-8] = dx:ax ; ud(dx:ax) = {17}

Following the ud-chains of these instructions, instruction 17 uses registers dx:ax, which were defined in instruction 16. Since these registers have not been redefined between instructions 16 and 17, the right-hand side of the instruction is replaced in the use of the registers as follows:

17 dx:ax = [bp-6]:[bp-8] - [bp-2]:[bp-4] ; du(dx:ax)={18}
Since there is only one use of these registers at instruction 16 (i.e. \( \text{du(dx:ax)} = 17 \)), the registers are now dead and thus, the instruction is eliminated. In a similar way, instruction 18 uses registers \( \text{dx:ax} \), which are defined in instruction 17. Since these registers have not been redefined between those two instructions, the right-hand side of instruction 17 is replaced into the use of the registers in instruction 18, leading to:

\[
18 \ [\text{bp-6}] : [\text{bp-8}] = [\text{bp-6}] : [\text{bp-8}] - [\text{bp-2}] : [\text{bp-4}]
\]

Since there was only one use of the registers definition at instruction 17, these registers become dead and the instruction is eliminated. As noticed in this example, the right-hand side of an instruction \( i \) can be replaced into a further use of the left-hand side of instruction \( i \), building expressions on the right-hand side of an assignment instruction.

Consider another example from basic block B1, Figure 5-7, after dead-register elimination, and with ud-chains and du-chains on registers (excluding register variables):

\[
\begin{align*}
3 \ ax &= \text{si} &; &\text{du(ax)} = \{4\} \\
4 \ dx:ax &= \text{ax} &; &\text{ud(ax)} = \{3\} &\text{du(dx:ax)} = \{5\} \\
5 \ tmp &= \text{dx:ax} &; &\text{ud(dx:ax)} = \{4\} &\text{du(tmp)} = \{6\} \\
6 \ ax &= \text{tmp} / \text{di} &; &\text{ud(tmp)} = \{5\} &\text{du(ax)} = \{9\} \\
8 \ dx &= 3 &; &\text{du(dx)} = \{8\} \\
9 \ ax &= \text{ax} * \text{dx} &; &\text{ud(ax)} = \{6\} &\text{ud(dx)} = \{8\} &\text{du(ax)} = \{10\} \\
10 \ si &= \text{ax} &; &\text{ud(ax)} = \{9\}
\end{align*}
\]

The use of register \( \text{ax} \) in instruction 4 is replaced with a use of the register variable \( \text{si} \), making the definition of \( \text{ax} \) in 3 dead. The use of \( \text{dx:ax} \) in instruction 5 is replaced with a use of \( \text{si} \) (from instruction 4), making the definition of \( \text{dx:ax} \) dead. The use of \( \text{tmp} \) in instruction 6 is replaced with a use of \( \text{si} \) (from instruction 5), making the definition of \( \text{tmp} \) dead at 5. The use of \( \text{ax} \) at instruction 9 is replaced with a use of \( (\text{si} / \text{di}) \) from instruction 6, making the definition of \( \text{ax} \) dead. In the same instruction, the use of \( \text{dx} \) is replaced with a use of constant 3 from instruction 8, making the definition of \( \text{dx} \) at 8 dead. Finally, the use of \( \text{ax} \) at instruction 10 is replaced with a use of \( (\text{si} / \text{di}) \ast 3 \) from instruction 9, making the definition of \( \text{ax} \) at 9 dead. Since the register(s) defined in instructions 3 → 9 were used only once, and all these registers became dead, the instructions are eliminated, leading to the final code:

\[
10 \ si = (\text{si} / \text{di}) \ast 3
\]

When propagating registers across assignment instructions, a register is bound to be defined in terms of an expression of other registers, local variables, arguments, and constants. Since any of these identifiers (besides constants) can be redefined, it is necessary to check that none of these identifiers is redefined across the path from the instruction that defines the register to the instruction that uses it. Thus, the following necessary conditions are checked for register copy propagation:

1. Uniqueness of register definition for a register use: registers that are used before being redefined translate to temporary registers that hold an intermediate result for the machine. This condition is checked by means of ud-chains on registers used in an instruction.
2. rhs-clear path: the identifiers $x$ in an expression that defines a register $r$ (i.e. the rhs of the instruction) that satisfies condition 1 are checked to have an $x$-clear path to the instruction that uses the register $r$. The rhs-clear condition for an instruction $j$ that uses a register $r$ which is uniquely defined at instruction $i$ is formally defined as:

$$\text{rhs-clear}_{i\rightarrow j} = \bigcap_{x \in \text{rhs}(i)} x\text{-clear}_{i\rightarrow j}$$

where $\text{rhs}(i)$ = the right hand side of instruction $i$ and $x$ = an identifier that belong to the $\text{rhs}(i)$ and $x\text{-clear}_{i\rightarrow j} = \begin{cases} 
\text{True} & \text{if there is no definition of } x \text{ along the path } i \rightarrow j \\
\text{False} & \text{otherwise}
\end{cases}$

The algorithm in Figure 5-15 performs register copy propagation on assignment instructions. For this analysis, registers that can be used as both word and byte registers (e.g. $ax$, $ah$, $al$) are treated as different registers in the live register analysis. Whenever register $ax$ is defined, it also defines registers $ah$ and $al$, but, if register $al$ is defined, it defines only registers $al$ and $ax$, but not register $ah$. This is needed so that uses of part of a register (e.g. high or low part) can be detected and treated as a byte operand rather than an integer operand.

**Extension to Non-Assignment Register Usage Instructions**

The algorithm given in Figure 5-15 is general enough to propagate registers that are used in instructions other than assignments, such as `push`, `call`, and `jcond` instructions. Consider the following code from basic block B1, Figure 5-7 after condition code propagation:

```plaintext
13 dx:ax = [bp-2]:[bp-4] ; du(dx:ax) = {15}
15 jcond ([bp-6]:[bp-8] > dx:ax) B2 ; ud(dx:ax) = {13}
```

Instruction 15 uses registers $dx:ax$, which are uniquely defined in instruction 13. The rhs of instruction 13 is propagated to the use of these registers, leading to the elimination of instruction 13. The final code looks as follows:

```plaintext
15 jcond ([bp-6]:[bp-8] > [bp-2]:[bp-4]) B2
```

In a similar way, a use of a register in a `push` instruction is replaced by a use of the rhs of the instruction that defines the register, as in the following code from basic block B4, Figure 5-7 after dead-register elimination:

```plaintext
25 ax = si ; du(ax) = {27}
26 dx = 5 ; du(dx) = {27}
27 ax = ax * dx ; ud(dx) = {26} du(ax) = {28}
28 push ax ; ud(ax) = {27}
```

Applying the register copy propagation algorithm we arrive at the following code:

```plaintext
28 push (si * 5)
```

and instruction 25, 26, and 27 are eliminated.

A `call` instruction that has been modified into an `asgn` instruction due to a function being invoked rather than a procedure is also a candidate for register copy propagation. Consider the following code after function return register determination:
procedure RegCopyProp
/* Pre: dead-register elimination has been performed.
* ud-chains and du-chains have been computed for all instructions.
* Post: most references to registers have been eliminated.
* high-level language expression have been found. */

for (all basic blocks b in postorder) do
  for (all instructions j in basic block b) do
    for (all registers r used by instruction j) do
      if (ud(r) = {i}) then /* r is uniquely defined at instruction i */
        prop = True
      for (all identifiers x in rhs(i)) do /* compute rhs-clear */
        if (not x-clear(i, j)) then
          prop = False
        end if
      end for
      if (prop == True) then /* propagate rhs(i) */
        replace the use of r in instruction j with rhs(i)
        du(r) = du(r) - {j} /* at instruction i */
        if (du(r) = {}) then
          if (i defines only register r) then
            eliminate i
          else
            modify instruction i not to define register r
          end if
        end if
      end if
    end for
  end for
end for
end procedure

Figure 5-15: Register Copy Propagation Algorithm

21 dx:ax = call _aNlshl ; ud(dx:ax) = {20}  ud(cl) = {19}
          ; du(dx:ax) = {22}
22 [bp-6]:[bp-8] = dx:ax ; ud(dx:ax) = {21}

The function _aNlshl returns a value in registers dx:ax. These registers are used in the first instruction of the basic block that follows the current one, and are copied to the final local long variable at offset -6. Performing copy propagation leads to the following code:

22 [bp-6]:[bp-8] = call _aNlshl

eliminating instruction 21 as dx:ax become dead.
5.4.7 Actual Parameters

Actual parameters to a subroutine are normally pushed on the stack before invocation to the subroutine. Since nested subroutine calls are allowed in most languages, the arguments pushed on the stack represent those arguments of two or more subroutines, thus, it is necessary to determine which arguments belong to which subroutine. To do this, an expression stack is used, which stores the expressions associated with push instructions. Whenever a call instruction is met, the necessary number of arguments are popped from the stack. Consider the following code from basic block B4, Figure 5-7 after dead-register elimination and register copy propagation:

```
24 push [bp-6]:[bp-8]
28 push (si * 5)
30 push 66
31 call printf
```

Instructions 24, 28, and 30 push the expressions associated with each instruction into a stack, as shown in Figure 5-16. When the call to printf is reached, information on this function is checked to determine how many bytes of arguments the function call takes; in this case it takes 8 bytes. Expressions from the stack are then popped, checking the type of the expressions to determine how many bytes are used by each. The first expression is an integer constant which takes 2 bytes, the second expression is an integer expression which takes 2 bytes, and the third expression is a long variable which takes 4 bytes; for a total of 8 bytes needed by this function call. The expressions are popped from the stack and placed on the actual argument list of the invoked subroutine according to the calling convention used by the subroutine. In our example, the library function printf uses C calling convention, leading to the following code:

```
31 call printf (66, si * 5, [bp-6]:[bp-8])
```

Instructions 24, 28, and 30 are eliminated from the intermediate code when they are placed on the stack.

![Expression Stack](image)

Figure 5-16: Expression Stack

Register arguments are not pushed on the stack, but have been defined in the use set of the subroutine that uses them. In this case, placing the actual arguments to a subroutine in the actual argument list is an extension of the register copy propagation algorithm. Consider the following code from basic blocks B2 and B3, Figure 5-7 after dead register elimination, and register argument detection:

```
19 cl = 4 ; du(cl) = {21}
20 dx:ax = [bp-6]:[bp-8] ; du(dx:ax) = {21}
21 dx:ax = call _aNlshl ; ud(dx:ax) = {20} ud(cl) = {19}
```
5.4 Code-improving Optimizations

Instruction 21 uses registers dx:ax, defined in instruction 20, and register cl, defined in instruction 19. These uses are replaced with uses of the rhs of the corresponding instructions, and placed on the actual argument list of _aNlshl in the order defined by the formal argument list, leading to the following code:

\[ 21 \ dx:ax = \text{call } _aNlshl ([bp-6]:[bp-8], 4) \]

Instruction 19 and 20 are eliminated since they now define dead registers.

5.4.8 Data Type Propagation Across Procedure Calls

During the instantiation of actual arguments to formal arguments, data types for these arguments needs to be verified, as if they are different, one of the data types needs to be modified. Consider the following code from basic block B4, Figure 5-7 after all previous optimizations:

\[ 31 \ \text{call printf (66, si * 5, [bp-6]:[bp-8])} \]

where the actual argument list has the following data types: integer constant, integer, and long variable. The formal argument list of printf has a pointer to a character string as the first argument, and a variable number of unknown data type arguments following it. Since there is information on the first argument only, the first actual argument is checked, and it is found that it has a different data type. Given that the data types used by the library subroutines must be right (i.e. they are trusted), it is safe to say that the actual integer constant must be an offset into memory, pointing to a character string. By checking memory, it is found that at location DS:0066 there is a string; thus, the integer constant is replaced by the string itself. The next two arguments have unknown formal type, so the type given by the caller is trusted, leading to the following code:

\[ 31 \ \text{call printf ("c * 5 = %d, a = %ld\n", si * 5, [bp-6]:[bp-8])} \]

Other cases of type propagation include the conversion of two integers into one long variable (i.e. the callee has determined that one of the arguments is a long variable, but the caller has so far used the actual argument as two separate integers).

5.4.9 Register Variable Elimination

Register variables translate to local variables in a high-level language program. These registers are replaced by new local variable names. This name replacement can be done during data flow analysis, or by the code generator. In our example, if registers si and di are replaced by the local names loc1 and loc2, the following code fragment will be derived for part of basic block B1, Figure 5-7:

\[ 1 \ loc1 = 20 \\
2 \ loc2 = 80 \\
9 \ loc1 = (loc1 / loc2) * 3 \]
### 5.4.10 An Extended Register Copy Propagation Algorithm

The optimizations of register copy propagation, actual parameter detection, and data type propagation across procedure calls can be performed during the one pass that propagates register information to other instructions, including arguments. Figure 5-17 lists the different high-level instructions that define and use registers. Only 3 instructions can define registers: an `asgn`, which is eliminated via register copy propagation as explained in Section 5.2.6, a function `call`, which is translated into an equivalent `asgn` instruction and eliminated by the register copy propagation method, and a `pop` instruction, which has not been addressed yet.

<table>
<thead>
<tr>
<th>Define</th>
<th>Use</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>asgn (lhs)</code></td>
<td><code>asgn (rhs)</code></td>
</tr>
<tr>
<td><code>call (function)</code></td>
<td><code>call (register arguments)</code></td>
</tr>
<tr>
<td><code>pop</code></td>
<td><code>jcond</code></td>
</tr>
<tr>
<td></td>
<td><code>ret (function return registers)</code></td>
</tr>
<tr>
<td></td>
<td><code>push</code></td>
</tr>
</tbody>
</table>

Figure 5-17: Potential High-Level Instructions that Define and Use Registers

A `pop` instruction defines the associated register with whatever value is found on the top of stack. Given that `pop` instructions used to restore the stack after a subroutine call, or during subroutine return have already been eliminated from the intermediate code during idiom analysis (see Chapter 4, Sections 4.2.1 and 4.2.1), the only remaining use of a `pop` instruction is to get the last value pushed onto the stack by a previous `push` instruction (i.e. a spilled value). Since expressions associated with `push` instructions were being pushed onto an expression stack for the detection of actual arguments (see Section 5.4.7), whenever a `pop` instruction is reached, the expression on the top of stack is associated with the register of the `pop` instruction, converting the instruction into an `asgn` instruction. Consider the following code from a matrix addition procedure that spills the partially computed answer onto the stack at instructions 27 and 38, after dead-register elimination. In this example, three arrays have been passed as arguments to the procedure: the arrays pointed to by `bp+4` and `bp+6` are the two array operands, and the array pointed to by `bp+8` is the resultant array. The three arrays are arrays of integers (i.e. 2 bytes):

```plaintext
18  ax = si ; ud(ax) = {20}
19  dx = 14h ; ud(dx) = {20}
20  ax = ax * dx ; ud(ax) = {21}
21  bx = ax ; ud(bx) = {22}
22  bx = bx + [bp+4] ; ud(bx) = {25}
23  ax = di ; ud(ax) = {24}
24  ax = ax << 1 ; ud(ax) = {25}
25  bx = bx + ax ; ud(bx) = {26}
26  ax = [bx] ; ud(ax) = {27}
27  push ax ; spill ax
28  ax = si ; ud(ax) = {30}
29  dx = 14h ; ud(dx) = {30}
```
After register copy propagation on instructions 18 → 27, instruction 27 holds the contents of the array pointed to by bp+4 offset by si and di (row and column offsets), represented by the following expression:

\[
27 \quad \text{push } [(si*20) + [bp+4] + (di*2)]
\]

this expression is pushed on the stack, and register ax is redefined in the next instruction. Following extended register copy propagation, instruction 36 pops the expression on the stack, and is modified to the following asgn instruction:

\[
36 \quad \text{ax} = [(si*20) + [bp+4] + (di*2)] ; \text{ud(ax)} = \{37\}
\]

this instruction is replaced into instruction 37, and register ax is spilled at instruction 38 holding the addition of the contents of the two arrays at offsets si and di, represented by the following expression:

\[
38 \quad \text{push } [(si*20) + [bp+4] + (di*2)] + [(si*20) + [bp+6] + (di*2)]
\]

Finally, this expression is popped in instruction 47, replacing the pop by the following asgn instruction:

\[
47 \quad \text{ax} = [(si*20) + [bp+4] + (di*2)] + [(si*20) + [bp+6] + (di*2)]
\]

and register bx holds the offset into the result array at offsets si and di. The registers in instruction 48 are replaced by the expressions calculated in instructions 46 and 47, leading to the following code:

\[
48 \quad [(si*20) + [bp+8] + (di*2)] = [(si*20) + [bp+4] + (di*2)] + [(si*20) + [bp+6] + (di*2)]
\]
Note that this instruction does not *define* any registers, only uses them, therefore, this instruction is final in the sense that it cannot be replaced into any subsequent instruction. As seen, the rhs and lhs hold expressions that calculate an address of an array. These expressions can be further analyzed to determine that they calculate an array offset, and thus, the arguments passed to this subroutine are pointers to arrays; this information can then be propagated to the caller subroutine.

Figure 5-18 is a description of the final algorithm used for extended register copy propagation.

### 5.5 Further Data Type Propagation

Further data type determination can be done once all program expressions have been found, since data types such as arrays use address computation to reference an object in the array. This address computation is represented by an expression that needs to be simplified in order to arrive to a high-level language expression. Consider the array expression of Section 5.4.10:

\[
48 \quad \text{[(si*20) + [bp+8] + (di*2)]} = \text{[(si*20) + [bp+4] + (di*2)]} + \text{[(si*20) + [bp+6] + (di*2)]}
\]

A heuristic method can be used to determine that the integer pointer at \text{bp+8} is a 2-dimensional array given that 2 offset expressions are used to compute an address. The offset \text{di*2} is adjusting the index \text{di} by the size of the array element type (2 in this case for an integer), and the offset \text{si*20} is adjusting the index \text{si} by the size of the row times the size of the array element (i.e. \(20 / 2 = 10\) elements in a row, or the number of columns in the array); therefore, the expression could be modified to the following code:

\[
48 \quad \text{[bp+8][si][di]} = \text{[bp+4][si][di]} + \text{[bp+6][si][di]}
\]

and the type of the arguments are modified to array (i.e. a pointer to an integer array). In order to determine the bounds of the array, more heuristic intervention is needed. The number of elements in the one row was determined by the previous heuristic, the number of rows can be determined if the array is within a loop or any other structure that gives information regarding the number of rows. Consider the matrix addition subroutine in Figure 5-19.

This subroutine has two loops, one for the rows and one for the columns. By checking all conditional jumps for references to index \text{si}, the upper bound on the number of rows can be determined. In basic block B2, \text{si} is compared against 5; if \text{si} is greater or equal to 5, the loop is not executed (i.e. the array is not indexed into); therefore, we can assume that this is the upper bound on rows. The number of columns can also be checked by finding conditional jump instructions that use register \text{di}. In this case, basic block B5 compares this register against 10; if the register is greater or equal to this constant, the inner loop is not executed (i.e. the array is not indexed into). Therefore, this constant can be used as the upper bound for the number of columns. Note that this number is the same as the one that was already known from the heuristics in determining an array address computation, therefore, we assume the number is right. This leads to the following formal argument declaration:
procedure ExtRegCopyProp (p: subroutineRecord)
/* Pre: dead-register analysis has been performed.
   * dead-condition code analysis has been performed.
   * register arguments have been detected.
   * function return registers have been detected.
   * Post: temporary registers are removed from the intermediate code. */
initExpStk().
for (all basic blocks b of subroutine p in postorder) do
  for (all instructions j in b) do
    for (all registers r used by instruction j) do
      if (ud(r) = {i}) then /* uniquely defined at instruction i */
        case (opcode(i))
          asgn: if (rhsClear (i, j))
            case (opcode(j))
              asgn: propagate (r, rhs(i), rhs(j)).
              jcond, push, ret: propagate (r, rhs(i), exp(j)).
              call: newRegArg (r, actArgList(j)).
            end case
          end if
          pop: exp = popExpStk().
        case (opcode(j))
          asgn: propagate (r, exp, rhs(j)).
          jcond, push, ret: propagate (r, exp, exp(j)).
          call: newRegArg (exp, actArgList(j)).
        end case
        call: case (opcode(j))
          asgn: rhs(j) = i.
          push, ret, jcond: exp(j) = i.
          call: newRegArg (i, actArgList(j)).
        end case
      end if
    end for
  end for
if (opcode(i) == push) then
  pushExpStk (exp(i)).
elsif (opcode(i) == call) and (invoked routine uses stack arguments) then
  pop arguments from the stack.
  place arguments on actual argument list.
  propagate argument type.
end if
end for
end procedure

Figure 5-18: Extended Register Copy Propagation Algorithm
formal_arguments (arg1: array[5][10] = [bp+4],
arg2: array[5][10] = [bp+6],
arg3: array[5][10] = [bp+8])

and the information is propagated to the caller subroutine.

It is in general hard to determine the bounds of an array if the code was optimised. For example, if strength reduction had been applied to the subscript calculation, or code motion had moved part of the subscript calculation out of the loop, or if induction variable elimination had replaced the loop indexes, then the previous heuristic method could not be applied. In this case, the decompiler would either leave the bounds of the array unknown, or ask the user for a solution via an interactive session.
Chapter 6

Control Flow Analysis

The control flow graph constructed by the front-end has no information on high-level language control structures, such as if..then..else and while() loops. Such a graph can be converted into a structured high-level language graph by means of a structuring algorithm. High-level control structures are detected in the graph, and subgraphs of control structures are tagged in the graph. The relation of this phase with the data flow analysis phase and the back-end is shown in Figure 6-1.

A generic set of high-level control structures is used to structure the graph. This set should be general enough to cater for different control structures available in commonly used languages such as C, Pascal, Modula-2, and Fortran. Such structures should include different types of loops and conditionals. Since the underlying structure of the graph is not modified, functional and semantical equivalence is preserved by this method.

6.1 Previous Work

Most structuring algorithms have concentrated on the removal of goto statements from control flow graphs at the expense of introduction of new Boolean variables, code replication, the use of multilevel exit loops, or the use of a set of high-level structures not available in commonly used languages. A graph transformation system has also been presented, it aims at the recognition of the underlying control structures without the removal of all goto statements. The following sections summarize the work done in this area.

6.1.1 Introduction of Boolean Variables

Böhm and Jacopini[BJ66] proved that any program flowgraph can be represented by another flowgraph which is decomposable into $\pi$ (sequence of nodes), $\phi$ (post-tested loop), and $\Delta$ (2-way conditional node) with the introduction of new Boolean variables and assignments to these variables. Cooper[Coo67] pointed out that if new variables may be introduced to the original program, any program can be represented in one node with at most one $\phi$; therefore, from a practical point of view, the theorem is meaningless[Knu74].
Ashcroft and Manna [AM71] demonstrated that goto programs cannot be converted into while() programs without the introduction of new variables, and presented an algorithm for the conversion of these programs with the introduction of new Boolean variables. The conversion preserves the topology of the original flowchart program, but performs computations in different order.

Williams and Ossher [WO78] presented an iterative algorithm to convert a multiexit loop into a single exit loop, with the introduction of one Boolean variable and a counter integer variable for each loop.

Baker and Zweben [BZ80] reported on the structuring of multiexit loops with the introduction of new Boolean variables. The structuring of multiple exit loops is considered a control flow complexity issue, and is measured in this paper.

Williams and Chen [WG85] presented transformations to eliminate goto statements from Pascal programs. Gotos were classified according to the positioning of the target label: at the same level as the corresponding label, branch out of a structure, transferral of a label out of a structure, and abnormal exits from subroutines. All these transformations required the introduction of one or more Boolean variables, along with the necessary assignment and test statements to check on the value of a Boolean. The algorithm was implemented in Prolog on a PDP11/34.

Erosa and Hendren [EH93] present an algorithm to remove all goto statements from C programs. The method makes use of goto-elimination and goto-movement transformations, and introduces one new Boolean variable per goto. On average, three new instructions are introduced to test for each new Boolean, and different loop and if conditionals are modified to include the new Boolean. This method was implemented as part of the McCAT parallelizing decompiler.

The introduction of new (Boolean) variables modifies the semantics of the underlying program, as these variables do not form part of the original program. The resultant program is functionally equivalent to the original program, thus it produces the same results.

6.1.2 Code Replication

Knuth and Floyd [KF71] presented different methods to avoid the use of goto statements without the introductions of new variables. Four methods were given: the introduction of recursion, the introduction of new procedures, node splitting, and the use of a repeat..until() construct. The use of the node splitting technique replicates code in the final program. It is also proved that there exist programs whose goto statements cannot be eliminated without the introduction of new procedure calls.

Williams [Wil77] presents five subgraphs which lead to unstructured graphs: abnormal selection paths, multiple exit loops, multiple entry loops, overlapping loops, and parallel loops. In order to transform these subgraphs into structured graphs, code duplication is performed.
Williams and Ossher[WO78] presented an algorithm to replace multiple entry loops by single entry \texttt{while()} loop. The method made use of code duplication of all nodes that could be reached from abnormal entries into the loop.

Baker and Zweben[BZ80] reported on the use of the node splitting technique to generate executionally equivalent flowgraphs by replicating one or more nodes of the graph. Node splitting was considered a control flow complexity issue, and was measured.

Oulsnam[Oul82] presented transformations to convert six types of unstructured graphs to structured equivalent graphs. The methodology made use of node duplication, but no function duplication. It was demonstrated that the time overhead produced by the duplication of nodes was an increased time factor of 3 for at least one path.

\emph{Code replication modifies the original program/graph by replicating code/node one or more times, therefore, the final program/graph is functionally equivalent to the original program/graph, but its semantics and structure have been modified.}

6.1.3 Multilevel Exit Loops and Other Structures

Baker[Bak77] presented an algorithm to structure flowgraphs into equivalent flowgraphs that made use of the following control structures: \texttt{if..then..else}, multilevel \texttt{break}, multilevel \texttt{next}, and endless loops. \texttt{Gotos} were used whenever the graph could not be structured using the previous structures. The algorithm was extended to irreducible graphs as well. It was demonstrated that the algorithm generated well-formed and properly nested programs, and that any \texttt{goto} statements in the final graph jumped forward. This algorithm was implemented in the \texttt{struct} program on a PDP11/54 running under Unix. It was used to rewrite Fortran programs into Ratfor, an extended Fortran language that made use of control structures. The \texttt{struct} program was later used by J.Reuter in the \texttt{decomp} decompiler to structure graphs built from object files with symbol information.

Sharir[Sha80] presented an algorithm to find the underlying control structures in a flow graph. This algorithm detected normal conditional and looping constructs, but also detected proper and improper strongly-connected intervals, and proper and improper outermost intervals. The final flow graph was represented by a hierarchical flow structure.

Ramshaw[Ram88] presented a method to eliminate all \texttt{goto} statements from programs, by means of forward and backward elimination rules. The resultant program was a structurally equivalent program that made use of multilevel exits from endless-type, named loops. This algorithm was used to port the Pascal version of Knuth’s \texttt{TEx} compiler into the PARC/CSL, which uses Mesa. Both these languages allow the use of \texttt{goto} statements, but outward \texttt{gotos} are not allowed in Mesa.

\emph{The use of multilevel exits or high-level constructs not available in most languages restricts the generality of the structuring method and the number of languages in which the structured version of the program can be written. Currently, most 3rd generation languages (e.g. Pascal, Modula-2, C) do not make use of multilevel exits; only Ada allows them.}
6.1.4 Graph Transformation System

Lichtblau [Lic85] presented a series of transformation rules to transform a control flow graph into a trivial graph by identifying subgraphs that represent high-level control structures; such as 2-way conditionals, sequence, loops, and multiexit loops. Whenever no rules were applicable to the graph, an edge was removed from the graph and a goto was generated in its place. This transformation system was proved to be finite Church-Rosser, thus the transformations could be applied in any order and the same final answer is reached.

Lichtblau formalized the transformation system by introducing context-free flowgraph grammars, which are context-free grammars defined by production rules that transform one graph into another [Lic91]. He proved that given a rooted context-free flowgraph grammar $GG$, it is possible to determine whether a flowgraph $g$ can be derived from $GG$. He provided an algorithm to solve this problem in polynomial time complexity.

The detection of control structures by means of graph transformations does not modify the semantics or functionality of the underlying program, thus a transformation system provides a method to generate a semantically equivalent graph. Lichtblau’s method uses a series of graph transformations on the graph to convert/transform the graph into an equivalent structured graph (if possible). These transformations do not take into account graphs generated from short-circuit evaluation languages, where the operands of a compound Boolean condition are not all necessarily evaluated, and thus generate unstructured graphs according to this methodology.

In contrast, the structuring algorithms presented in this thesis transform an arbitrary control flow graph into a functional and semantical equivalent flow graph that is structured under a set of generic control structures available in most commonly used high-level languages, and that makes use of goto jumps whenever the graph cannot be structured with the generic structures. These algorithms take into account graphs generated by short-circuit evaluation, and thus do not generate unnecessary goto jumps for these graphs.

6.2 Graph Structuring

The structuring of a sample control flow graph is presented in an informal way. The algorithms used to structure graphs are explained in Section 6.6. The control flow graph of Figure 6-2 is a sample program that contains several control structures. The intermediate code has been analyzed by the data flow analysis phase, and all variables have been given names.

The aim of a structuring algorithm for decompilation is to determine all underlying control structures of a control flow graph, based upon a predetermined set of high-level control structures. If the graph cannot be structured with the predefined set of structures, goto jumps are used. These conditions ensure functional and semantical equivalence between the original and final graph.
6.2 Graph Structuring

In graphs, loops are detected by the presence of a back-edge; that is, an edge from a “lower” node to a “higher” node. The notion of lower and higher are not formally defined yet, but can be thought as the nodes that are lower and higher up in the diagram (for a graph that is drawn starting at the top). In the graph of Figure 6-2 there are 2 back-edges: (B14,B13), and (B15,B6). These back-edges represent the extent of 2 different loops.

The type of the loop is detected by checking the header and the last node of the loop. The loop (B14,B13) has no conditional check on its header node, but the last node of the loop tests whether the loop should be executed again or not; thus, this is a post-tested loop, such as a do..while() in C, or a repeat..until() in Modula-2. The subgraph that represents this loop can be logically transformed into the subgraph of Figure 6-3, where the loop subgraph was replaced by one node that holds all the intermediate code instructions, as well as information on the type of loop.

The loop (B15,B6) has a conditional header node that determines whether the loop is executed or not. The last node of this loop is a 1-way node that transfers control back
6.2.2 Structuring Conditionals

The 2-way conditional node B2 branches control to node B4 if the condition \((\text{loc3} * 2) \leq \text{loc4}\) is true, otherwise it branches to node B3. Both these nodes are followed by the node B5, in other words, the conditional branch that started at node B2 is finished at node B5. This graph is clearly an \texttt{if()then..else} structure, and can be logically transformed into the subgraph of Figure 6-5, where the node represents basic blocks B2, B3, and B4. Note that all instructions before the conditional jump that belong to the same basic block are not modified.

The 2-way conditional node B1 transfers control to node B5 if the condition \(\text{loc3} \geq \text{loc4}\) is true, otherwise it transfers control to node B2. From our previous example, node B2 has been merged with nodes B3 and B4, and transformed into an equivalent node with an out-edge to node B5; thus, there is a path from node B2 \(\rightarrow\) B5. Since B5 is one of the target branch nodes of the conditional at node B1, and it is reached by the other branch of the conditional, this 2-way node represents a single branch conditional (i.e. an \texttt{if()then}). This subgraph can be transformed into the node of Figure 6-6, where the condition at node B1 has been negated since the false branch is the single branch that forms part of the \texttt{if}. 
The 2-way conditional nodes B7 and B8 are not trivially structured, since, if node B8 is considered the head of an `if..then..else` finishing at node B10, and node B7 is considered head of an `if..then`, we do not enter the subgraph headed by B8 at its entry point, but in one of the clauses of the conditional branch. If we structure node B7 first as an `if..then`, then node B8 branches out of the subgraph headed at B7 through another node other than the exit node B9; thus, the graph cannot be structured with the `if..then`, and `if..then..else` structures. But since both B7 and B8 only have a conditional branch instruction, these two conditions could be merged into a compound conditional in the following way: node B9 is reached whenever the condition in node B7 is true, or when the condition at B7 is false and the condition at B8 is false as well. Node B10 is reached whenever the condition at node B7 is false and the one at B8 is true, or by a path from node B9. This means that node B9 is reached whenever the condition at node B7 is true or the condition at node B8 is false, and the final end node is basic block B10. The final compound condition is shown in Figure 6-7, along with the transformed subgraph.
6.3 Control Flow Analysis

Information on the control structures of a program is available through control flow analysis of the program’s graph. Information is collected in the different nodes of the graph, whether they belong to a loop and/or conditional, or are not part of any structure. This section defines control flow terminology available in the literature; for more information refer to [All72, Tar72, Tar74, HU75, Hec77, ASU86b].

6.3.1 Control Flow Analysis Definitions

The following definitions define basic concepts used in control flow analysis. These definitions make use of a directed graph $G = (N, E, h)$.

**Definition 44** A path from $n_1$ to $n_v$; $n_1, n_v \in N$, represented $n_1 \rightarrow n_v$, is a sequence of edges $(n_1, n_2), (n_2, n_3), \ldots, (n_{v-1}, n_v)$ such that $(n_i, n_{i+1}) \in E, \forall 1 \leq i < v, v \geq 1$.

**Definition 45** A closed path or cycle is a path $n_1 \rightarrow n_v$ where $n_1 = n_v$.

**Definition 46** The successors of $n_i \in N$ are $\{n_j \in N \mid n_i \rightarrow n_j\}$ (i.e. all nodes reachable from $n_i$).

The immediate successors of $n_i \in N$ are $\{n_j \in N \mid (n_i, n_j) \in E\}$.

**Definition 47** The predecessors of $n_j \in N$ are $\{n_i \in N \mid n_i \rightarrow n_j\}$ (i.e. all nodes that reach $n_j$).

The immediate predecessors of $n_j \in N$ are $\{n_i \in N \mid (n_i, n_j) \in E\}$.

**Definition 48** A node $n_i \in N$ back dominates or predominates a node $n_k \in N$ if $n_i$ is on every path $h \rightarrow n_k$. It is said that $n_i$ dominates $n_k$.

**Definition 49** A node $n_i \in N$ immediately back dominates $n_k \in N$ if $\nexists n_j \bullet n_j$ back dominates $n_k \land n_i$ back dominates $n_j$ (i.e. $n_i$ is the closest back dominator to $n_k$). It is said that $n_i$ is the immediate dominator of $n_k$.

**Definition 50** A strongly connected region (SCR) is a subgraph $S = (N_S, E_S, h_S)$ such that $\forall n_i, n_j \in N_S \bullet \exists n_i \rightarrow n_j \land n_j \rightarrow n_i$. 
Definition 51 A strongly connected component of $G$ is a subgraph $S = (N_S, E_S, h_S)$ such that

- $S$ is a strongly connected region.
- $\not\exists S_2$ strongly connected region of $G \cdot S \subset S_2$.

Definition 52 Depth first search (DFS) is a traversal method that selects edges to traverse emanating from the most recently visited node which still has unvisited edges.

A DFS algorithm defines a partial ordering of the nodes of $G$. The reverse postorder is the numbering of nodes during their last visit; the numbering starts with the maximum number of nodes in the graph, and finishes at 1. Throughout this chapter, all numbered graphs use the reverse postorder numbering scheme.

Definition 53 A depth first spanning tree (DFST) of a flow graph $G$ is a directed, rooted, ordered spanning tree of $G$ grown by a DFS algorithm. A DFST $T$ can partition the edges in $G$ into three sets:

1. Back edges = $\{(v, w) : w \rightarrow v \in T\}$.
2. Forward edges = $\{(v, w) : v \rightarrow w \in T\}$.
3. Cross edges = $\{(v, w) : \not\exists (v \rightarrow w \text{ or } w \rightarrow v) \text{ and } w \leq v \text{ in preorder}\}$.

6.3.2 Relations

Definition 54 Let $R$ be a relation on a set $S$, Then $xRy$ denotes $(x, y) \in R$.

Definition 55 Let $R$ be a relation on a set $S$, Then

- the reflexive closure of $R$ is $R^\alpha = R \cup \{(x, x) | x \in S\}$
- the transitive closure of $R$ is $R^\beta = R^1 \cup R^2 \cup \ldots$, where $R^1 = R$ and $R^i = RR^{i-1}$ for $i \geq 2$
- the reflexive transitive closure of $R$ is $R^* = R^\alpha \cup R^\beta$
- the completion of $R$ is $\hat{R} = \{(x, y) \in S \times S | xR^*y \land \not\exists z \in S \cdot yRz\}$.

Definition 56 Let $R$ be a relation on a set $S$, Then $(S, R)$ is finite Church-Rosser (fcr) if and only if:

1. $R$ is finite, i.e. $\forall p \in S \cdot \exists k_p \cdot pR^i q \Rightarrow i \leq k_p$.
2. $\hat{R}$ is a function, i.e. $p\hat{R}q \land p\hat{R}r \Rightarrow q = r$.

6.3.3 Interval Theory

An interval is a graph theoretic construct defined by J.Cocke in [Coc70], and widely used by F.Allen for control flow analysis[All70, AC72] and data flow analysis[All72, All74, AC76]. The following sections summarize interval theory concepts.
Intervals

Definition 57 Given a node $h$, an interval $I(h)$ is the maximal, single-entry subgraph in which $h$ is the only entry node and in which all closed paths contain $h$. The unique interval node $h$ is called the interval head or simply the header node.

By selecting the correct set of header nodes, $G$ can be partitioned into a unique set of disjoint intervals $\mathcal{I} = \{I(h_1), I(h_2), \ldots, I(h_n)\}$, for some $n \geq 1$. The algorithm to find the unique set of intervals of a graph is described in Figure 6-8. This algorithm makes use of the following variables: $H$ (set of header nodes), $I(i)$ (set of nodes of interval $i$), and $\mathcal{I}$ (list of intervals of the graph $G$), as well as the function $\text{immedPred}(n)$ which returns the next immediate predecessor of $n$.

```
procedure intervals ($G = (N, E, h)$)
/* Pre: $G$ is a graph. */
/* Post: the intervals of $G$ are contained in the list $\mathcal{I}$. */

$\mathcal{I} := \{\}$.  
$H := \{h\}$.  
for (all unprocessed $n \in H$) do
  $I(n) := \{n\}$.  
  repeat
    $I(n) := I(n) + \{m \in N \mid \forall p \in \text{immedPred}(m) \bullet p \in I(n)\}$.  
    until
    no more nodes can be added to $I(n)$.  
  $H := H + \{m \in N \mid m \notin H \land m \notin I(n) \land (\exists p \in \text{immedPred}(m) \bullet p \in I(n))\}$.  
  $\mathcal{I} := \mathcal{I} + I(n)$.  
end for
end procedure
```

Figure 6-8: Interval Algorithm

The example in Figure 6-9 shows a graph $G$ with its intervals in dotted boxes. This graph has two intervals, $I(1)$ and $I(2)$. Interval $I(2)$ contains a loop, the extent of this loop is given by the back-edge $(4, 2)$.

Definition 58 The interval order is defined as the order of nodes in an interval list, given by the intervals algorithm of Figure 6-8.

Some interval properties:

1. The header node back dominates each node in the interval.
2. Each strongly connected region in the interval must contain the header node.
3. The interval order is such that if all nodes are processed in the order given, then all interval predecessors of a node reachable along loop free paths from the header will have been processed before the given node.

Definition 59 A latching node is any node in the interval which has the header node as an immediate successor.

Derived Sequence Construction

The derived sequence of graphs, \( G^1 \ldots G^n \), was described by F. Allen [All70, All72] based on the intervals of graph \( G \). The construction of graphs is an iterative method that collapses intervals into nodes. \( G \) is the first order graph, represented \( G^1 \). The second order graph, \( G^2 \), is derived from \( G^1 \) by collapsing each interval in \( G^1 \) into a node. The immediate predecessors of the collapsed node are the immediate predecessors of the original header node which are not part of the interval. The immediate successors are all the immediate, non-interval successors of the original exit nodes. Intervals for \( G^2 \) are computed with the interval algorithm, and the graph construction process is repeated until a limit flow graph \( G^n \) is reached. \( G^n \) has the property of being a trivial graph (i.e. single node) or an irreducible graph. Figure 6-10 describes this algorithm.

Definition 60 The n-th order graph or limit flow graph, \( G^n \), of a graph \( G \) is defined as the graph \( G^{i-1} \), \( i \geq 1 \), constructed by the derivedSequence algorithm of Figure 6-10, such that \( G^{i-1} = G^i \).

Definition 61 A graph \( G \) is reducible if its n-th order graph \( G^n \) is trivial.
procedure derivedSequence \((G = (N,E,h))\)
/* Pre: \(G\) is a graph.
* Post: the derived sequence of \(G\), \(G^1 \ldots G^n\), \(n \geq 1\) has been constructed. */

\[G^1 = G.\]
\[I^1 = \text{intervals}(G^1).\]
\[i = 2.\]
repeat /* Construction of \(G^i\) */
\[N^i = \{n^i \mid I^{i-1}(n^{i-1}) \in I^{i-1}\}\]
\[\forall n \in N^i \bullet p \in \text{immedPred}(n) \iff (\exists m \in N^{i-1} \bullet m \in I^{i-1}(m) \land p \in \text{immedPred}(m) \land p \notin I^{i-1}(m)).\]
\[(h^i_j, h^i_k) \in E^i \iff (\exists n, m, h^{i-1}_j, h^{i-1}_k \in N^{i-1} \bullet h^{i-1}_j = I^{i-1}(h^{i-1}_j) \land h^{i-1}_k = I^{i-1}(h^{i-1}_k) \land m \in I^{i-1}(h^{i-1}_j) \land n \in I^{i-1}(h^{i-1}_k) \land (m,n) \in E^{i-1}).\]
\[i = i + 1.\]
until \(G^i == G^{i-1}\).
end procedure

Figure 6-10: Derived Sequence Algorithm

The construction of the derived sequence is illustrated in Figure 6-11. The graph \(G^1\) is the initial control flow graph \(G\). \(G^1\) has 2 intervals, previously described in Figure 6-9. Graph \(G^2\) represents the intervals of \(G^1\) as nodes. \(G^2\) has a loop in its unique interval. This loop represents the loop extended by the back-edge (5,1). Finally, \(G^3\) has no loops and is a trivial graph.

Figure 6-11: Derived Sequence of a Graph
Implementation Considerations

To compute the intervals of a graph $G$, $G$ needs to be defined in terms of its predecessors and successors (i.e. an adjacency-type graph representation). With the aid of extra data structures, Hecht presented an optimized algorithm to find intervals [Hec77], of complexity $O(e)$, $|E| = e$.

6.3.4 Irreducible Flow Graphs

An irreducible flow graph is a graph such that its n-th order graph is not a trivial graph (by interval reduction). Irreducible flow graphs are characterized by the existence of a forbidden canonical irreducible graph [HU72, HU74, Hec77]. The absence of this graph in a flow graph is enough for the graph to be reducible. The canonical irreducible graph is shown in Figure 6-12.

![Canonical Irreducible Graph](image)

Figure 6-12: Canonical Irreducible Graph

**Theorem 1** A flow graph is irreducible if and only if it has a subgraph of the form canonical irreducible graph.

6.4 High-Level Language Control Structures

Different high-level languages use different control structures, but in general, no high-level language uses all different available control structures. This section illustrates different control structures, gives a classification, and analyses the structures available in commonly used high-level languages such as C, Pascal, and Modula-2.

6.4.1 Control Structures - Classification

Control structures have been classified into different classes according to the complexity of the class. An initial classification was provided by Kosaraju in [Kos74], and was used to determine which classes were reducible to which other classes. This classification was expanded by Ledgard and Marcotty in [LM75], and was used to present a hierarchy of classes of control structures under semantical reducibility.

Figure 6-13 shows all the different control structures that are under consideration in this classification; these structures are:

1. Action: a single basic block node is an action.

2. Composition: a sequence of 2 structures is a composition.
3. Conditional: a structure of the form \( \text{if } p \text{ then } s_1 \text{ else } s_2 \), where \( p \) is a predicate and \( s_1, s_2 \) are structures is a conditional structure.

4. Pre-tested loop: a loop of the form \( \text{while } p \text{ do } s \), where \( p \) is a predicate and \( s \) is a structure, is a pre-tested loop structure.

5. Single branch conditional: a conditional of the form \( \text{if } p \text{ then } s \), where \( p \) is a predicate and \( s \) is a structure, is a single branch conditional structure.

6. n-way conditional: a conditional of the form

\[
\text{case } p \text{ of } \\
1 : s_1 \\
2 : s_2 \\
\ldots \\
n : s_n
\]
where \( p \) is a predicate and \( s_1 \ldots s_n \) are structures, is an \( n \)-way conditional structure.

7. Post-tested loop: a loop of the form `repeat s until p`, where \( s \) is a structure and \( p \) is a predicate, is a post-tested loop structure.

8. Multiexit loop: a loop of the form

\[
\text{while } p_1 \text{ do } \\
\quad s_1 \\
\quad \text{if } p_2 \text{ then exit } \\
\quad s_2 \\
\quad \text{if } p_3 \text{ then exit } \\
\quad \ldots \\
\quad \text{if } p_n \text{ then exit } \\
\quad s_n \\
\text{end while}
\]

where \( s_1 \ldots s_n \) are structures and \( p_1 \ldots p_n \) are predicates, is a multiexit loop structure. Each `exit` statement branches out of the loop to the first statement/basic block after the loop.

9. Endless loop: a loop of the form `loop s end`, where \( s \) is a structure, is an endless loop.

10. Multilevel exit: an `exit(i)` statement causes the termination of \( i \) enclosing endless loops.

11. Multilevel cycle: a `cycle(i)` statement causes the \( i \)-th enclosing endless loop to be re-executed.

12. Goto: a `goto` statement transfers control to any other basic block, regardless of unique entrance conditions.

Based on these 12 different structures, control structures are classified into the following classes:

- **D structures**: D for Dijkstra. \( D = \{1,2,3,4\} \)
- **D’ structures**: extension of D structures. \( D' = \{1,2,3,4,5,6,7\} \)
- **BJn structures**: BJ for Böhm and Jacopini, \( n \) for the maximum number of predicates in a multiexit loop. \( BJn = \{1,2,3,8\} \)
- **REn structures**: RE for Repeat-End, \( n \) for the maximum number of exit levels. \( REn = \{1,2,3,9,10\} \)
- **RECn structures**: REC for Repeat-End with `cycle(i)` structures, \( n \) for the number of levels. \( RECn = \{1,2,3,9,10,11\} \)
- **DREn structures**: DRE for Repeat-End and Do-while loops, \( n \) for the maximum number of enclosing levels to exit. \( DREn = \{1,2,3,4,9,10\} \)
• DRECn structures: DREC for Repeat-End, Do-while, and cycle(i) structures, n for the maximum number of enclosing endless loops. DRECn = \{1, 2, 3, 4, 9, 10, 11\}

• GPn structures: any structure that has one-in, one-out substructures that have at most n different predicates. GPn = \{1..7, 9\}

• L structures: any well-formed structure. There are no restrictions on the number of predicates, actions, and transfers of control; therefore, goto statements are allowed. L = \{1..12\}

**Definition 62** Let s1 and s2 be two structures, then s1 is a **semantical conversion** of s2 if and only if

- For every input, s2 computes the same function as s1.
- The primitive actions and predicates of s2 are precisely those of s1.

In other words, no new semantics such as variables, actions, or predicates, are allowed by this conversion.

Based on semantical conversion, the classes of control structures form a hierarchy, as shown in Figure 6-14. The classes higher up in the hierarchy are a semantical conversion of the lower classes.

\[
\text{RE}_\infty = \text{REC}_\infty = \text{DREC}_\infty = \text{GP}_\infty = \text{L} \\
\text{REN} = \text{REC}_n \\
\text{DRE}_n = \text{DREC}_n \\
\text{RE}_1 = \text{REC}_1 \\
\text{BJ}_\infty \\
\text{BJ}_2 \\
\text{D = D' = BJ1 = GP1}
\]

**Figure 6-14: Control Structures Classes Hierarchy**

6.4.2 **Control Structures in 3rd Generation Languages**

In this section, different high-level languages are analysed and classified in terms of their control structures. The selected languages are used in a variety of applications, including systems programming, numerical or scientifical applications, and multipurpose applications; these languages are: Modula-2, Pascal, C, Fortran, and Ada.
Modula-2 [Wir85, PLA91] does not allow for the use of goto statements, therefore, the control flow graphs generated by this language are structured and reducible. Modula-2 has all D'-type structures: 2-way conditionals (IF p THEN s1 {ELSE s2}), n-way conditional (CASE p OF ... END), pre-tested loop (WHILE p DO), post-tested loop (REPEAT s UNTIL p), and infinite loop (LOOP s END). An endless loop can be terminated by one or more EXIT statements within the statement sequence body of the loop. This construct can be used to simulate other loop structures, such as a multiexit loop with n predicates (BJn structure). An EXIT statement terminates the execution of the immediately enclosing endless loop statement, and the execution resumes at the statement following the end of the loop. If an EXIT occurs within a pre-tested or post-tested loop nested within an endless loop, both the inner loop and enclosing endless loop are terminated; therefore, an EXIT statement is equivalent to an exit(1) statement, and belongs to the RE1 class of structures.

Pascal [Coo83] is not as strict as Modula-2, in the sense that it allows goto statements to be used. All D'-type structures are allowed: 2-way conditionals (if p then s1 [else s2]), n-way conditional (case p of ... end), pre-tested loop (while p do), post-tested loop (repeat s until p), and the endless loop is simulated by a while() with a true condition (while (True) do). Goto statements can be used to simulate multiexit and multilevel loops, but can also be used in an unstructured way, to enter in the middle of a structure; therefore, L class structures are permitted in this language.

C [KR88] allows for structured and unstructured transfer of control. D' structures are represented by the following statements: 2-way conditional (if (p) s1 [else s2]), n-way conditional (switch (p) {...}), pre-tested loop (while (p) {s}), post-tested loop (do s while (p)), and endless loop (for (;;) or while (1) {s}). 1 level exit of control is allowed by the use of break statements, and 1 level cycle transfer of control is allowed by the use of the continue statement; therefore, C contains structures of the RE1 and REC1 classes. The use of goto statements can model any structure from the DRECn class, but can also produce unstructured graphs; therefore C allows for L class structures.

Fortran [Col81] has different types of conditionals which include: 2-way conditional (IF (p) s1,s2), arithmetic if or 3-way conditional (IF (p) s1,s2,s3), and computed goto statements or n-way conditionals (GOTO (s1,s2,...,sn) p). Pre-tested, post-tested and endless loops are all simulated by use of the DO statement; therefore, all D'-type structures are allowed in Fortran. Finally, goto statements are allowed, producing structured or unstructured transfers of control, allowing for L type structures.

Ada [DoD83] allows most D'-type structures, including: 2-way conditionals (if p then s1 [else s2]), n-way conditional (case p is ... end), pre-tested loop (while p and for loops), and endless loop (loop s end loop). Ada also allows the use of the exit statements to exit from within named endless loops; therefore, several nested loops can be terminated with this instruction (i.e. REn class type structure). Goto statements are allowed in a restricted way; they can transfer control only to a statement of an enclosing sequence of statements, but not the reverse. Also, it is prohibited to transfer control into the alternatives of a case statement, or an if .then..else statement. These restrictions on the use of gotos makes them simulate multilevel exits and multilevel continues, but do not permit unstructured transfers of control; therefore, up to DRECn-type structures can be built in.
this language.

Figure 6-15 summarizes the different types of classes of structures available in the set of distinguished languages. It must be pointed out that all of these languages make use of D'-type structures, plus one or more structures that belong to different types of classes. Unstructured languages allow for the unstructured use of goto, which is the case of Pascal and Fortran. Structured uses of goto, such as in Ada, permit the construction of structured control flow graphs, since up to DRECn-type structures can be simulated by these gotos.

<table>
<thead>
<tr>
<th>Language</th>
<th>Control Structure Classification</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modula-2</td>
<td>D' + BJn + RE1</td>
</tr>
<tr>
<td>Pascal</td>
<td>D' + L</td>
</tr>
<tr>
<td>C</td>
<td>D' + BJn + DREC1 + L</td>
</tr>
<tr>
<td>Fortran</td>
<td>D' + L</td>
</tr>
<tr>
<td>Ada</td>
<td>D' + DRECn</td>
</tr>
</tbody>
</table>

Figure 6-15: Classes of Control Structures in High-Level Languages

6.4.3 Generic Set of Control Structures

In order to structure a graph, a set of generic control structures needs to be selected. This set must be general enough to cater for commonly used structures in a variety of languages. From the review of some 3rd generation languages in the previous section, it is clear that most languages have D' class structures, plus some type of structured or unstructured transfer of control (i.e. multilevel exits or gotos). Structures from the REn, RECn, DREn, and DRECn classes can all be simulated by the use of structured transfers of control via a goto statement. Since most of the languages allow the use of goto, and not all languages have the same multilevel exit or multilevel continue structures, goto is a better choice of a generic construct than exit(i) or cycle(i). It is therefore desirable to structure a control flow graph using the following set of generic structures:

- Action
- Composition
- Conditional
- Pre-tested loop
- Single branching conditional
- n-way conditional
- Post-tested loop
- Endless loop
- Goto

In other words, the generic set of control structures has all D' and L class structures.
6.5 Structured and Unstructured Graphs

A structured control flow graph is a graph generated from programs that use structures of up to the DREKn class; i.e. a graph that is decomposable into subgraphs with one entry and one or more exits. Languages that allow the use of `goto` can still generate structured graphs, if the `goto` statements are used to transfer control in a structured way (i.e. to transfer control to the start or the end of a structure). Unstructured graphs are generated by the unstructured transfer of control of `goto` statements, that is, a transfer of control in the middle of a structured graph, which breaks the previously structured graph into an unstructured one since there is more than one entry into this subgraph. Unstructuredness can also be introduced by the optimization phase of the compiler when code motion is performed (i.e. code is moved).

6.5.1 Loops

A loop is a strongly connected region in which there is a path between any two nodes of the directed subgraph. This means that there must be at least one back-edge to the loop’s header node.

A **structured loop** is a subgraph that has one entry point, one back-edge, and possibly one or more exit points that transfer control to the same node. Structured loops include all natural loops (pre-tested and post-tested loops), endless loops, and multiexit loops. These loops are shown in Figure 6-16.

![Structured Loops](image)

Figure 6-16: Structured Loops

An **unstructured loop** is a subgraph that has one or more back-edges, one or more entry points, and one or more exit points to different nodes. Figure 6-17 illustrates four different types of unstructured loops:

- **Multientry loop**: a loop with two or more entry points.
- **Parallel loop**: a loop with two or more back-edges to the same header node.
- **Overlapping loops**: two loops that overlap in the same strongly connected region.
- **Multiexit loop**: a loop with two or more exits to different nodes.
The **follow** node of a structured or unstructured loop is the first node that is reached from the exit of the loop. In the case of unstructured loops, one node is considered the loop exit node, and the first node that follows it is the follow node of the loop.

### 6.5.2 Conditionals

A **structured 2-way conditional** is a directed subgraph with a 2-way conditional header node, one entry point, two or more branch nodes, and a common end node that is reached by both branch nodes. This final common end node is referred to as the follow node, and has the property of being immediately dominated by the header node.

In an `if..then` conditional, one of the two branch nodes of the header node is the follow node of the subgraph. In an `if..then..else` conditional, neither branch node is the follow node, but they both converge to a common end node. Figure 6-18 shows these two generic constructs, with the values of the out-edges of the header node; true or false. In the case of an `if..then`, either the true or the false edge leads to the follow node, thus, there are two different graphs to represent such a structure; whereas in the case of an `if..then..else`, the graph representation is unique.

In a similar way, a **structured n-way conditional** is a directed subgraph with one n-way entry header node (i.e. n successor nodes from the header node), and a common end node that is reached by the n successor nodes. This common end node is referred to as the follow node, and has the property of being dominated by the header node of the structure. A sample 4-way conditional is shown in Figure 6-19.

**Unstructured 2-way conditionals** are 2-way node header subgraphs, with two or more entries into the branches of the header node, or two or more exits from branches of the header node. These graphs are represented in the abnormal selection path graph, shown in Figure 6-20 (a). It is known from the graph structure that an `if..then..else` subgraph can start at nodes 1 and 2, generating the two subgraphs in Figure 6-20 (b) and (c). The
6.5 Structured and Unstructured Graphs

Figure 6-18: Structured 2-way Conditionals

Figure 6-19: Structured 4-way Conditional

Figure 6-20: Abnormal Selection Path

In a similar way, **unstructured n-way conditionals** allow for two or more entries or exits to/from one or more branches of the n-way header node. Figure 6-21 shows four different cases of unstructured 3-way graphs: graph (a) has an abnormal forward out-edge from one
of the branches, graph (b) has an abnormal backward out-edge from one of the branches, graph (c) has an abnormal forward in-edge into one of the branches, and graph (d) has an abnormal backward in-edge into one of the branches.

Figure 6-21: Unstructured 3-way Conditionals

### 6.5.3 Structured Graphs and Reducibility

A structured graph is one that is composed of structured subgraphs that belong to the class of graphs generated by DRECn structures. An informal demonstration is given to prove that all structured graphs of the class DRECn are reducible. Consider the informal graph grammar given in Figure 6-22. There are 11 production rules, each defining a different structured subgraph $S$. Each production rule indicates that a structured subgraph can be generated by replacing a node $S$ with the associated right-hand side subgraph of the production.

**Theorem 2** The class DRECn of graphs is reducible.

**Demonstration:** The class DRECn of graphs is defined by the informal graph grammar of Figure 6-22. All the subgraphs in the right-hand side of the productions have the common property of having one entry point, and one or more exit points to a common target end point; in this way, transfers of control are done in a structured way. By theorem 1, it is known that a graph is irreducible if and only if it has a subgraph of the form of the canonical irreducible graph (see Figure 6-12). This canonical irreducible graph is composed of two subgraphs: a conditional branching graph, and a loop. The former subgraph has one entry point, and the latter subgraph has two (or more) entry points; which is what makes the graph irreducible. Since none of the productions of the graph grammar generate subgraphs that have more than one entry point, this graph grammar cannot generate an irreducible graph; thus, the graphs that belong to the DRECn class are reducible.

### 6.6 Structuring Algorithms

In decompilation, the aim of a structuring algorithm is to determine the underlying control structures of an arbitrary graph, thus converting it into a functional and semantical
equivalent graph. Arbitrary graph stands for any control flow graph; reducible or irreducible, from a structured or unstructured language. Since it is not known what language the initial program was written in, and what compiler was used (e.g. what optimizations were turned on), the use of goto jumps must be allowed in case the graph cannot be structured into a set of generic high-level structures. The set of generic control structures of Section 6.4.3 is the one chosen for the structuring algorithm presented in this section.

6.6.1 Structuring Loops

In order to structure loops, a loop in terms of a graph representation needs to be defined. This representation must be able to not only determine the extent of a loop, but also provide a nesting order for the loops. As pointed out by Hecht in [Hec77], the representation of a loop by means of cycles is too fine a representation since loops are not necessarily properly nested or disjoint. The use of strongly connected components as loops is too coarse a representation as there is no nesting order. The use of strongly connected regions does not
provide a unique cover of the graph, and does not cover the entire graph. Finally, the use of intervals does provide a representation that satisfies the abovementioned conditions: one loop per interval, and a nesting order provided by the derived sequence of graphs.

Given an interval $I(h_j)$ with header $h_j$, there is a loop rooted at $h_j$ if there is a back-edge to the header node $h_j$ from a latching node $n_k \in I(h_j)$. Consider the graph in Figure 6-23, which is the same graph from Figure 6-2 without intermediate instruction information, and with intervals delimited by dotted lines. There are 3 intervals: $I_1$ rooted at basic block B1, $I_2$ rooted at node B6, and $I_3$ rooted at node B13.

In this graph, interval $I_3$ contains the loop (B14,B13) in its entirety, and interval $I_2$ contains the header of the loop (B15,B6), but its latching node is in interval $I_3$. If each of the intervals are collapsed into individual nodes, and the intervals of that new graph are found, the loop that was between intervals $I_3$ and $I_2$ must now belong to the same interval. Consider
the derived sequence of graphs $G^2 \ldots G^4$ in Figure 6-24. In graph $G^2$, the loop between nodes $I_3$ and $I_2$ is in interval $I_5$ in its entirety. This loop represents the corresponding loop of nodes (B15,B6) in the initial graph. It is noted that there are no more loops in these graphs, and that the initial graph is reducible since the trivial graph $G^4$ was derived by this process. It is noted that the length of the derived sequence is proportional to the maximum depth of nested loops in the initial graph.

Once a loop has been found, the type of loop (e.g. pre-tested, post-tested, endless) is determined according to the type of header and latching nodes. Also, the nodes that belong to the loop are flagged as being so, in order to prevent nodes from belonging to two different loops, such as in overlapping, or multientry loops. These methods are explained in the following sections, for now we assume there are two procedures that determine the type of the loop, and mark the nodes that belong to that loop.

Given a control flow graph $G = G^1$ with interval information, the derived sequence of graphs $G^1, \ldots, G^n$ of $G$, and the set of intervals of these graphs, $I^1 \ldots I^n$, an algorithm to find loops is as follows: each header of an interval in $G^1$ is checked for having a back-edge from a latching node that belong to the same interval. If this happens, a loop has been found, so its type is determined, and the nodes that belong to it are marked. Next, the intervals of $G^2$, $I^2$ are checked for loops, and the process is repeated until intervals in $I^n$ have been checked. Whenever there is a potential loop (i.e. a header of an interval that has a predecessor with a back-edge) that has its header or latching node marked as belonging to another loop, the loop is disregarded as it belongs to an unstructured loop. These loops always generate goto jumps during code generation. In this algorithm no goto jumps and target labels are determined. The complete algorithm is given in Figure 6-25. This algorithm finds the loops in the appropriate nesting level, from innermost to outermost loop.

**Finding the Nodes that Belong to a Loop**

Given a loop induced by $(y, x), y \in I(x)$, it is noted that the two different loops that are part of the sample program in Figure 6-23 satisfy the following condition:

$$\forall n \in \text{loop}(y, x) \cdot n \in \{x \ldots y\}$$
procedure loopStruct \((G = (N, E, h))\)

/* Pre: \(G^1 \ldots G^n\) has been constructed.
* \(T^1 \ldots T^n\) has been determined.
* Post: all nodes of \(G\) that belong to a loop are marked.
* all loop header nodes have information on the type of loop and the latching node. */

for \((G := G^1 \ldots G^n)\)
  for \((I := I^1(h_1) \ldots I^m(h_m))\)
    if \((\exists x \in N^i \bullet (x, h_j) \in E^i) \land \text{inLoop}(x) == \text{False})\)
      for (all \(n \in \text{loop } (x, h_j))\)
        inLoop\((n) = \text{True}\)
      end for
      loopType\((h_j) = \text{findLoopType } ((x, h_j))\).
      loopFollow\((h_j) = \text{findLoopFollow } ((x, h_j))\).
    end if
  end for
end for
end procedure

Figure 6-25: Loop Structuring Algorithm

In other words, the loop is formed of all nodes that are between \(x\) and \(y\) in terms of node numbering. Unfortunately, it is not that simple to determine the nodes that belong to a loop. Consider the multiexit graphs in Figure 6-26, where each loop has one abnormal exit, and each different graph has a different type of edge being used in the underlying DFST. As can be seen, loops with forward edges, back edges, or cross edges satisfy the above mentioned condition. The graph with the tree edge includes more nodes though, as nodes 4 and 5 are not really part of the loop, but have a number between nodes 2 and 6 (the bound of the loop). In this case, an extra condition is needed to be satisfied, and that is, that the nodes belong to the same interval, since the interval header (i.e. \(x\)) dominates all nodes of the interval, and in a loop, the loop header node dominates all nodes of the loop. If a node belongs to a different interval, it is not dominated by the loop header node, thus it cannot belong to the same loop. In other words, the following condition needs to be satisfied:

\[
\forall n \in \text{loop}(y, x) \bullet n \in I(x)
\]

Given an interval \(I(x)\) with a loop induced by \((y, x), y \in I(x)\), the nodes that belong to this loop satisfy two conditions: In other words, a node \(n\) belongs to the loop induced by \((y, x)\) if it belongs to the same interval (i.e. it is dominated by \(x\)), and its order (i.e. reverse postorder number) is greater than the header node and lesser than the latching node (i.e. it is a node from the “middle” of the loop). These conditions can be simplified in the following expression:

\[
n \in \text{loop}(y, x) \Leftrightarrow n \in (I(x) \cap \{x \ldots y\})
\]
The loops from Figure 6-23 have the following nodes: loop (9,8) has only those two nodes, and loop (10,6) has all nodes between 6 and 10 that belong to the interval I5 (Figure 6-24) in \( G^2 \). These nodes are as follows:

- Loop (9,8) = \{8,9\}
- Loop (10,6) = \{6..10\}

The algorithm in Figure 6-27 finds all nodes that belong to a loop induced by a back-edge. These nodes are marked by setting their loop head to the header of the loop. Note that if an inner loop node has already been marked, it means that the node also belongs to a nested loop, and thus, its loopHead field is not modified. In this way, all nodes that belong to a loop(s) are marked by the header node of the most nested loop they belong to.

**Determining the Type of Loop**

The type of a loop is determined by the header and latching nodes of the loop. In a pre-tested loop, the 2-way header node determines whether the loop is executed or not, and the 1-way latching node transfers control back to the header node. A post-tested loop is characterized by a 2-way latching node that branches back to the header of the loop or out of the loop, and any type of header node. Finally, an endless loop has a 1-way latching node that transfers control back to the header node, and any type of header node.

The types of the two loops of Figure 6-23 are as follows: the loop (9,8) has a 2-way latching node and a call header node, thus, the loop is a post-tested loop (i.e. a \texttt{repeat..until()} loop). The loop (10,6) has a 1-way latching node and a 2-way header node, thus, the loop is a pre-tested loop (i.e. a \texttt{while()} loop).

In this example, the \texttt{repeat..until()} loop had a call header node, so there were no problems in saying that this loop really is a post-tested loop. A problem arises when both
procedure markNodesInLoop \((G = (N, E, h), (y, x))\)
/* Pre: \((y, x)\) is a back-edge.
* Post: the nodes that belong to the loop \((y, x)\) are marked */

\[
\begin{align*}
\text{nodesInLoop} &= \{x\} \\
\text{loopHead}(x) &= x \\
\text{for} \ (\text{all nodes } n \in \{x + 1 \ldots y\}) & \\
\text{if} \ (n \in I(x)) & \\
\quad \text{nodesInLoop} &= \text{nodesInLoop} \cup \{n\} \\
\quad \text{if} \ (\text{loopHead}(n) == \text{No}_\text{Node}) & \\
\quad \text{loopHead}(n) &= x. \\
\text{end if} \\
\text{end if} \\
\text{end for}
\end{align*}
\]
end procedure

Figure 6-27: Algorithm to Mark all Nodes that belong to a Loop induced by \((y, x)\)

the header and latching nodes are 2-way conditional nodes, since it is not known whether one or both branches of the header 2-way node branch into the loop or out of the loop; i.e. the loop would be an abnormal loop in the former case, and a post-tested loop in the latter case. It is therefore necessary to check whether the nodes of the branches of the header node belong to the loop or not, if they do not, the loop can be coded as a \texttt{while()} loop with an abnormal exit from the latching node. Figure 6-28 gives an algorithm to determine the type of loop based on the nodesInLoop set constructed in the algorithm of Figure 6-27.

**Finding the Loop Follow Node**

The loop follow node is the first node that is reached after the loop is terminated. In the case of natural loops, there is only one node that is reached after loop termination, but in the case of multiexit and multilevel exit loops, there can be more than one exit, thus, more than one node can be reached after the loop. Since the structuring algorithm only structured natural loops, all multiexit loops are structured with one “real” exit, and one or more abnormal exits. In the case of endless loops that have exits in the middle of the loop, several nodes can be reached after the different exits. It is the purpose of this algorithm to find only one follow node.

In a pre-tested loop, the follow node is the successor of the loop header that does not belong to the loop. In a similar way, the follow node of a post-tested loop is the successor of the loop latching node that does not belong to the loop. In endless loops there are no follow nodes initially, as neither the header nor the latching node jump out of the loop. But since an endless loop can have a jump out of the loop in the middle of the loop (e.g. a \texttt{break} in C), it can too have a follow node. Since the follow node is the first node that is reached after the loop is ended, it is desirable to find the closest node that is reached from the loop
procedure loopType (G = (N, E, h), (y, x), nodesInLoop)
/* Pre: (y, x) induces a loop.
* nodesInLoop is the set of all nodes that belong to the loop (y, x).
* Post: loopType(x) has the type of loop induced by (y, x). */
if (nodeType(y) == 2-way)
    if (nodeType(x) == 2w)
        if (outEdge(x,1) ∈ nodesInLoop ∧ outEdge(x,2) ∈ nodesInLoop)
            loopType(x) = PostTested.
        else
            loopType(x) = PreTested.
        end if
    else
        loopType(x) = PreTested.
    end if
else /* 1-way latching node */
    if (nodeType(x) == 2-way)
        loopType(x) = PreTested.
    else
        loopType(x) = Endless.
    end if
end if
end procedure

Figure 6-28: Algorithm to Determine the Type of Loop

after an exit is performed. The closest node is the one with the smallest reverse postorder numbering; i.e. the one that is closest to the loop (in numbering order). Any other node that is also reached from the loop can be reached from the closest node (because it must have a greater reverse postorder numbering), thus, the closest node is considered the follow node of an endless loop.

Example 9 The loops of Figure 6-23 have the next follow nodes:

- Follow (loop (9,8)) = 10
- Follow (loop (10,6)) = 11

Figure 6-29 gives an algorithm to determine the follow node of a loop induced by (y, x), based on the nodesInLoop set determined in the algorithm of Figure 6-27.

6.6.2 Structuring 2-way Conditionals

Both a single branch conditional (i.e. if..then) and a conditional (i.e. if..then..else) subgraph have a common end node, from here onwards referred to as the follow node, that has the property of being immediately dominated by the 2-way header node. Whenever
procedure loopFollow \( (G = (N, E, h), (y, x), \text{nodesInLoop}) \)

/* Pre: \((y, x)\) induces a loop.
   * \text{nodesInLoop} is the set of all nodes that belong to the loop \((y, x)\).
   * Post: \text{loopFollow}(x)\) is the follow node to the loop induced by \((y, x)\). */

if (loopType\((x)\) == Pre_Tested)
    if (outEdges\((x,1)\) \(\in\) \text{nodesInLoop})
        loopFollow\((x)\) = outEdges\((x,2)\).
    else
        loopFollow\((x)\) = outEdges\((x,1)\).
    end if
else if (loopType\((x)\) == Post_Tested)
    if (outEdges\((y,1)\) \(\in\) \text{nodesInLoop})
        loopFollow\((x)\) = outEdges\((y,2)\).
    else
        loopFollow\((x)\) = outEdges\((y,1)\).
    end if
else /* endless loop */
    fol = Max /* a large constant */
    for (all 2-way nodes \(n \in \text{nodesInLoop}\))
        if ((outEdges\((x,1)\) \(\notin\) \text{nodesInLoop}) \(\land\) (outEdges\((x,1) < \text{fol})
            fol = outEdges\((x,1)\).
        else if ((outEdges\((x,2)\) \(\notin\) \text{nodesInLoop}) \(\land\) (outEdges\((x,2) < \text{fol})
            fol = outEdges\((x,2)\).
        end if
    end for
if (fol \(\neq\) Max)
    loopFollow\((x)\) = fol.
end if
end if
end procedure

Figure 6-29: Algorithm to Determine the Follow of a Loop

these subgraphs are nested, they can have different follow nodes or share the same common follow node. Consider the graph in Figure 6-30, which is the same graph from Figure 6-2 without intermediate instruction information, and with immediate dominator information. The nodes are numbered in reverse postorder.

In this graph there are six 2-way nodes, namely, nodes 1, 2, 6, 9, 11, and 12. As seen during loop structuring (Section 6.6.1), a 2-way node that belongs to either the header or the latching node of a loop is marked as being so, and must not be processed during 2-way conditional structuring given that it already belongs to another structure. Hence, the nodes 6 and 9 in Figure 6-30 are not considered in this analysis. Whenever two or more conditionals are nested, it is always desirable to analyze the most nested conditional first, and then the outer ones. In the case of the conditionals at nodes 1 and 2, node 2 must be
analyzed first than node 1 since it is nested in the subgraph headed by 1; in other words, the node that has a greater reverse postorder numbering needs to be analyzed first since it was last visited first in the depth first search traversal. In this example, both subgraphs share the common follow node 5; therefore, there is no node that is immediately dominated by node 2 (i.e. the inner conditional), but 5 is immediately dominated by 1 (i.e. the outer conditional), and this node is the follow node for both conditionals. Once the follow node has been determined, the type of the conditional can be known by checking whether one of the branches of the 2-way header node is the follow node, in which case, the subgraph is a single branching conditional, otherwise it is an if..then..else. In the case of nodes 11 and 12, node 12 is analyzed first and no follow node is determined since no node takes it as immediate dominator. This node is left in a list of unresolved nodes, because it can be nested in another conditional structure. When node 11 is analyzed, nodes 12, 13, and 14 are possible candidates for follow node, since nodes 12 and 13 reach node 14, this last node is taken as the follow (i.e. the node that encloses the most number of nodes in a subgraph, the largest node). Node 12, that is in the list of unresolved follow nodes, is also marked as having a follow node of 14. It is seen from the graph that these two conditionals are not properly nested, and a goto jump can be used during code generation.

A generalization of this example provides the algorithm to structure conditionals. The idea of the algorithm is to determine which nodes are header nodes of conditionals, and which nodes are the follow of such conditionals. The type of the conditional can be determined after finding the follow node by checking whether one of the branches of the header node is equivalent to the follow node. Inner conditionals are traversed first, then outer ones, so
a descending reverse postorder traversal is performed (i.e. from greater to smaller node number). A set of unresolved conditional follow nodes is kept throughout the process. This set holds all 2-way header nodes for which a follow has not been found. For each 2-way node that is not part of the header or latching node of a loop, the follow node is calculated as the node that takes it as an immediate dominator and has two or more in-edges (since it must be reached by at least two different paths from the header). If there is more than one such node, the one that encloses the maximum number of nodes is selected (i.e. the one with the largest number). If such a node is not found, the 2-way header node is placed on the unresolved set. Whenever a follow node is found, all nodes that belong to the set of unresolved nodes are set to have the same follow node as the one just found (i.e. they are nested conditionals or unstructured conditionals that reach this node). The complete algorithm is shown in Figure 6-31.

```
procedure struct2Way (G=(N,E,h))
/* Pre: G is a graph. */
/* Post: 2-way conditionals are marked in G. */
/* the follow node for all 2-way conditionals is determined. */
unresolved = {}
for (all nodes m in descending order)
  if ((nodeType(m) == 2-way) ∧ (inHeadLatch(m) == False))
    if (∃ n • n = max{i | immedDom(i) = m ∧ #inEdges(i) ≥ 2})
      follow(m) = n
      for (all x ∈ unresolved)
        follow(x) = n
      unresolved = unresolved - {x}
    end for
  else
    unresolved = unresolved ∪ {m}
  end if
end for
end procedure
```

Figure 6-31: 2-way Conditional Structuring Algorithm

**Compound Conditions**

When structuring graphs in decompilation, not only the structure of the underlying constructs is to be considered, but also the underlying intermediate instructions information. Most high-level languages allow for short-circuit evaluation of compound Boolean conditions (i.e. conditions that include \texttt{and} and \texttt{or}). In these languages, the generated control flow graphs for these conditional expressions become unstructured since an exit can be performed as soon as enough conditions have been checked and determined the expression is true or false as a whole. For example, if the expression \texttt{x and y} is compiled with short-circuit
evaluation, if expression \( x \) is false, the whole expression becomes false and therefore the expression \( y \) is not evaluated. In a similar way, an \( x \) or \( y \) expression is partially evaluated if the expression \( x \) is true. Figure 6-32 shows the four different subgraph sets that arise from compound conditions. The top graphs represent the logical condition that is under consideration, and the bottom graphs represent the short-circuit evaluated graphs for each compound condition.

![Figure 6-32: Compound Conditional Graphs](image_url)

During decompilation, whenever a subgraph of the form of the short-circuit evaluated graphs is found, it is checked for the following properties:

1. Nodes \( x \) and \( y \) are 2-way nodes.
2. Node \( y \) has 1 in-edge.
3. Node \( y \) has a unique instruction, a conditional jump (\( j_{\text{cond}} \)) high-level instruction.
4. Nodes \( x \) and \( y \) must branch to a common \( t \) or \( e \) node.

The first, second, and fourth properties are required in order to have an isomorphic subgraph to the bottom graphs given in Figure 6-32, and the third property is required to determine that the graph represents a compound condition, rather than an abnormal conditional graph. Consider the subgraph of Figure 6-2, in Figure 6-33 with intermediate instruction information. Nodes 11 and 12 are 2-way nodes, node 12 has 1 in-edge, node 12 has a unique instruction (a \( j_{\text{cond}} \)), and both the true branch of node 11 and the false branch of node 12 reach node 13; i.e. this subgraph is of the form \( \neg x \land y \) in Figure 6-32.
The algorithm to structure compound conditionals makes use of a traversal from top to bottom of the graph, as the first condition in a compound conditional expression is higher up in the graph (i.e. it is tested first). For all 2-way nodes, the then and else nodes are checked for a 2-way condition. If either of these nodes represents one high-level conditional instruction (\texttt{jcond}), and the node has no other entries (i.e. the only in-edge to this node comes from the header 2-way node), and the node forms one of the 4 subgraphs illustrated in Figure 6-32, these two nodes are merged into a unique node that has the equivalent semantic meaning of the compound condition (i.e. depends on the structure of the subgraph), and the node is removed from the graph. This process is repeated until no more compound conditions are found (i.e. there could be 3 or more compound \texttt{and}s and \texttt{or}s, so the process is repeated with the same header node until no more conditionals are found). The final algorithm is shown in Figure 6-34.

6.6.3 Structuring n-way Conditionals

N-way conditionals are structured in a similar way to 2-way conditionals. Nodes are traversed from bottom to top of the graph in order to find nested n-way conditionals first, followed by the outer ones. For each n-way node, a follow node is determined. This node will optimally have \( n \) in-edges coming from the \( n \) successor nodes of the \( n \)-way header node, and be immediately dominated by such header node.

The determination of the follow node in an unstructured n-way conditional subgraph makes use of modified properties of the abovementioned follow node. Consider the unstructured graph in Figure 6-35, which has an abnormal exit from the n-way conditional subgraph. Candidate follow nodes are all nodes that have the header node 1 as immediate dominator, and that are not successors of this node, thus, nodes 5 and 6 are candidate follow nodes. Node 5 has 3 in-edges that come from paths from the header node, and node 6 has 2 in-edges from paths from the header node. Since node 5 has more paths from the header node that reach it, this node is considered the follow of the complete subgraph.
procedure structCompConds (G=(N,E,h))  
/* Pre: G is a graph.  
 * 2-way, n-way, and loops have been structured in G.  
 * Post: compound conditionals are structured in G. */

change = True
while (change)
    change = False
    for (all nodes n in postorder)
        if (nodeType(n) = 2-way)
            t = succ[n, 1]
            e = succ[n, 2]
            if ((nodeType(t) = 2-way) ∧ (numInst(t) = 1) ∧ (numInEdges(t) = 1))
                if (succ[t, 1] = e)
                    modifyGraph (¬n ∧ t)
                    change = True
                else if (succ[t, 2] = e)
                    modifyGraph (n ∨ t)
                    change = True
            end if
        else if ((nodeType(e) = 2-way) ∧ (numInst(e) = 1) ∧ (numInEdges(e) = 1))
            if (succ[e, 1] = t)
                modifyGraph (n ∧ e)
                change = True
            else if (succ[e, 2] = t)
                modifyGraph (¬n ∨ e)
                change = True
            end if
        end if
    end for
end while
end procedure

Figure 6-34: Compound Condition Structuring Algorithm

Unfortunately, abnormal entries into an n-way subgraph are not covered by the above method. Consider the graph in Figure 6-36, which has an abnormal entry into one of the branches of the header n-way node. In this case, node 6 takes node 1 as immediate dominator, due to the abnormal entry (1,2), instead of 2 (the n-way header node). In other words, the follow node takes as immediate dominator the common dominator of all in-edges to node 3; i.e. node 1. In this case, the node that performs an abnormal entry into the subgraph needs to be determined, in order to find a follow node that takes it as immediate dominator. The complete algorithm is shown in Figure 6-37.
6.6.4 Application Order

The structuring algorithms presented in the previous three sections determine the entry and exit (i.e. header and follow) nodes of subgraphs that represent high-level loops, n-way, and 2-way structures. These algorithms cannot be applied in a random order since they do not form a finite Church-Rosser system. Consider the graphs in Figure 6-38, which due to the abnormal entries and exits have loop subgraphs. Graph (a) has an abnormal exit from an n-way subgraph, and the complete graph belongs to the same loop. If this graph ought to be structured by loops first, the back-edge (3,1) would be found, leading to the loop \{1,2,3\}. By then structuring n-way conditionals, it is found that node 2 is a header node for an n-way subgraph, but since only 2 nodes of the subgraph rooted at 2 belong to the loop, it is determined that the subgraph cannot be structured as an n-way subgraph, but has several abnormal exits from the loop. On the other hand, if the graph ought to be structured by n-way subgraphs first, the subgraph \{2,3,4,5,6\} would be structured as an n-way subgraph with follow node 6. By then applying the loop algorithm, the nodes from the back-edge (3,1) are found to belong to different structures (i.e. node 3 belongs to a structure headed by node 2, and node 1 does not belong to any structure so far), therefore, an abnormal exit from one structure to the other exists, and the loop is not structured as such. In the case of graph (b), this graph is an irreducible graph, therefore, by first structuring it by loops,
procedure structNW ay (G = (N,E,h))
/* Pre: G is a graph.
Post: n-way conditionals are structured in G.
the follow node is determined for all n-way subgraphs. */
unresolved = 
for (all nodes $m \in N$ in postorder)
    if (nodeType($m$) == n-way)
        if ($\exists s$: succ($m$)• immedDom($s$) $\neq m$)
            $n = \text{commonImmedDom}\{s | s = \text{succ}(m)\}$
        else
            $n = m$
        end if
    if ($\exists j$• #inEdges($j$) =
        max\{$i | \text{immedDom}(i) = n \land \#\text{inEdges}(i) \geq 2 • \#\text{inEdges}(i)\}$
    follow($m$) = $j$
    for (all $i \in$ unresolved)
        follow($i$) = $j$
        unresolved = unresolved - \{i\}
    end for
    else
        unresolved = unresolved $\cup \{m\}$
    end if
end for
end procedure

Figure 6-37: n-way Conditional Structuring Algorithm

a multiexit loop will be found, with abnormal exits coming from the nodes of the n-way subgraph (which is not structured as such due to the abnormal exits). On the other hand, if this graph was structured as an n-way subgraph first, the loop would not be structured as such, but as a goto jump.

These examples illustrate that the series of structuring algorithms presented in the previous sections is not finite Church-Rosser. This implies that an ordering is to be followed, and it is: structure n-way conditionals, followed by loop structuring, and 2-way conditional structuring last. Loops are structured first than 2-way conditionals to ensure the Boolean condition that form part of pre-tested or post-tested loops is part of the loop, rather than the header of a 2-way conditional subgraph. Once a 2-way conditional has been marked as being in the header or latching node of a loop, it is not considered for further structuring.
The Case of Irreducible Graphs

The examples presented so far in this Chapter deal with reducible graphs. Recall from Section 6.3.4 that a graph is irreducible if it contains a subgraph of the form of the canonical irreducible flowgraph. In essence, a graph is irreducible if it has 2 or more entries (i.e. a multientry loop), at least 2 entries are dominated by the same common node, and this common node dominates the entrance nodes to the loop. Consider the multientry graphs in Figure 6-39. These graphs represent different classes of multientry graphs according to the underlying edges in a depth-first tree of the graph. As can be seen, graphs that have a tree-edge, cross-edge, and forward-edge are irreducible, but the graph with the back-edge coming into the loop is not irreducible since there is no common node that dominates all entries into the loop. This later loop is equivalent to an overlapping loop, much in the same way as a multiexit loop with a back-edge out of the loop (Figure 6-26, graph (d)).
Since it is the purpose of a decompiler structuring algorithm not to modify the semantics and functionality of the control flow graph, node splitting is not used to structure irreducible graphs, since the addition of new nodes modifies the semantics of the program. It is therefore desired to structure the graph without node replication, i.e. leave the graph as an irreducible graph that has goto jumps. Consider the graph in Figure 7-14 with immediate dominator information. Since the graph is irreducible, there is no loop that is contained entirely in an interval, therefore, the loop structuring algorithm determines that there are no natural loops as such. When structuring 2-way conditionals, the conditional at node 1 is determined to have the follow node 3, since this node is reached from both paths from the header node and has a greater numbering than node 2. This means that the graph is structured as a 2-way subgraph with follow node 3, and no natural loop. During code generation, goto jumps are used to simulate the loop, and the multientries (see Chapter 7, Section 7.1.3).

Figure 6-40: Canonical Irreducible Graph with Immediate Dominator Information
Chapter 7

The Back-end

The high-level intermediate code generated by the data flow analyzer, and the structured control flow graph generated by the control flow analyzer, are the input to the back-end. This module is composed in its entirety by the code generator, which generates code for the target high-level language. This relationship is shown in Figure 7-1.

Figure 7-1: Relation of the Code Generator with the UDM

7.1 Code Generation

The code generator generates code for a predefined target high-level language. The following examples make use of the C language as target language, and the examples are based on the sample control flow graph of Chapter 6, Figure 6-2 after structuring information has been summarized on the graph.

7.1.1 Generating Code for a Basic Block

After data flow analysis, the intermediate instructions in a basic block are all high-level instructions; pseudo high-level instructions must have been eliminated from the code before this point. Consider the control flow graph in Figure 7-2 after data and control flow analyses. For each basic block, the instructions in the basic block are mapped to an equivalent instruction of the target language. Transfer of control instructions (i.e. jcond and jmp instructions) are dependent on the structure of the graph (i.e. they belong to a loop or a conditional jump (2 and n ways), or be equivalent to a goto), and hence, code is generated for them according to the control flow information, described in the next Section (Section 7.1.2). This section illustrates how code is generated for all other instructions of a basic block.
Generating Code for `asgn` Instructions

The `asgn` instruction assigns to an identifier an arithmetic expression or another identifier. Expressions are stored by the decompiler in abstract syntax trees, therefore, a tree walker is used to generate code for them. Consider the first instruction of basic block B1, Figure 7-2:

```
asgn loc3, 5
```

The left hand side is the local identifier `loc3` and the right hand side is the constant identifier 5. Since both expressions are identifiers, the code is trivially translated to:

```
loc3 = 5;
```

The first instruction of basic block B9, Figure 7-2 uses an expression in its right hand side:

```
asgn loc3, (loc3 + loc4) - 10
```

This instruction is represented by the abstract syntax tree of Figure 7-3; only the right hand side of the instruction is stored in the abstract syntax tree format (field `arg` of the intermediate triplet (see Figure 4-32, Chapter 4)). From the tree, the right hand side is equivalent to the expression `(loc3 + loc4) - 10`, and the C code for this instruction is:

```
loc3 = (loc3 + loc4) - 10;
```
Generating code from an abstract syntax tree is solved in a recursive way according to the type of operator; binary or unary. For binary operators, the left branch of the tree is traversed, followed by the operator, and the traversal of the right branch. For unary operators, the operator is first displayed, followed by its subtree expression. In both cases, the recursion ends when an identifier is met (i.e. the leaves of the tree).

**Example 10** Expressions are defined in an intermediate language using the following types of expressions:

- **Binary expressions**: all expressions that use a binary operator. The binary operators and their C counterparts are:
  - Less or equal to (\(\leq\)).
  - Less than (\(<\)).
  - Equal (\(==\)).
  - Not equal (\(!=\)).
  - Greater (\(>\)).
  - Greater or equal to (\(>=\)).
  - Bitwise and (\(&\)).
  - Bitwise or (\(|\|\)).
  - Bitwise xor (\(^\circ\)).
  - Not (1’s complement) (\(^\sim\)).
  - Add (+).
  - Subtract (−).
  - Multiply (\(*\)).
  - Divide (/).
  - Modulus (\(\%\)).
  - Shift right (\(>>\)).
  - Shift left (\(<<\)).
  - Compound and (\&\&).
  - Compound or (\(||\)).
• **Unary expressions**: all expressions that use a unary operator. The unary operators and their C counterparts are:
  - Expression negation (!).
  - Address of (&).
  - Dereference (*).
  - Post and pre increment (++).
  - Post and pre decrement (--).

• **Identifiers**: an identifier is the minimum type of expression. Identifiers are classified according to their location in memory and/or in registers, in the following way:
  - Global variable.
  - Local variable (negative offsets from the stack frame).
  - Formal parameter (positive offset from the stack frame).
  - Constant.
  - Register.
  - Function (function name and actual argument list).

The algorithm of Figure 7-4 generates code for an expression that uses the above operator types, by walking the tree recursively.

```plaintext
procedure walkCondExp (e: expression)
/* Pre: e points to an expression tree (abstract syntax tree).
* Post: the code for the expression tree pointed to by e is written. */
case (expressionType(e))
    Boolean: write ("(%s %s %s)", walkCondExp (lhs(e)), operatorType(e),
    walkCondExp (rhs(e))).
    Unary: write ("%s (%s)", operatorType(e), walkCondExp (exp(e))).
    Identifier: write ("%s", identifierName(e)).
end case
end procedure
```

Figure 7-4: Algorithm to Generate Code from an Expression Tree

The identifierName(e) function returns the name of the identifier in the identifier node e; this name is taken from the appropriate symbol table (i.e. global, local or argument). Whenever the identifier is a register, the register is uniquely named by generating a new local variable; the next in the sequence of local variables. The new variable is placed at the end of the subroutine’s local variables definition.
7.1 Code Generation

Generating Code for call Instructions

The `call` instruction invokes a procedure with the list of actual arguments. This list is stored in the `arg` field and is a sequential list of expressions (i.e., arithmetic expressions and/or identifiers). The name of the procedure is displayed followed by the actual arguments, which are displayed using the tree walker algorithm of Figure 7-4.

Generating Code for ret Instructions

The `ret` instruction returns an expression/identifier in a function. If the return instruction does not take any arguments, the procedure is finished at that statement. The return of an expression is optional.

The complete algorithm to generate code for a basic block (excluding transfer instructions) is shown in Figure 7-5. In this algorithm the function `indent()` is used; this function returns one or more spaces depending on the indentation level (3 spaces per indentation level).

```plaintext
procedure writeBB (BB: basicBlock, indLevel: integer)
    /* Pre: BB is a basic block.
     * indLevel is the indentation level to be used in this basic block.
     * Post: the code for all instructions, except transfer instructions, is displayed. */
    for (all high-level instructions i of BB) do
        case (instType(i))
            asgn: write "%s%s = %s;\n", indent(indLevel), walkCondExp (lhs(i)), walkCondExp (rhs(i)).
            call: fa = "".
                for (all actual arguments f ∈ formalArgList(i)) do
                    append (fa, "%s," walkCondExp (f)).
                end for
                write "%s%s (%s);\n", indent (indLevel), invokedProc (i), fa).
            ret: write "%sreturn (%s);\n", indent(indLevel), exp(i)).
        end case
    end for
end procedure
```

Figure 7-5: Algorithm to Generate Code from a Basic Block

7.1.2 Generating Code from Control Flow Graphs

The information collected during control flow analysis of the graph is used in code generation to determine the order in which code should be generated for the graph. Consider the graph in Figure 7-6 with structuring information. This graph is the same graph of Figure 7-2 without intermediate instruction information; nodes are numbered in reverse postorder.

The generation of code from a graph can be viewed as the problem of generating code for the root node, recursing on the successor nodes that belong the structure rooted at the root.
node (if any), and continue code generation with the follow node of the structure. Recall from Chapter 6 that the follow node is the first node that is reached from a structure (i.e. the first node that is executed once the structure is finished). Follow nodes for loops, 2-way and n-way conditionals are calculated during the control flow analysis phase. Other transfer of control nodes (i.e. 1-way, fall-through, call) transfer control to the unique successor node; hence the follow is the successor, and termination nodes (i.e. return) are leaves in the underlying depth-first search tree of the graph, and hence terminate the generation of code along that path.

This section describes the component algorithms of the algorithm to generate code for a procedure, `writeCode()`. To make the explanation easier, we will assume that this routine exists; therefore, we concentrate only on the generation of code for a particular structure and let the `writeCode()` routine generate code for the components of the structure. After enough algorithms have been explained, the algorithm for `writeCode()` is given.

**Generating Code for Loops**

Given a subgraph rooted at a loop header node, code for this loop is generated based on the type of loop. Regardless of type of loop, all loops have the same structure: loop header, loop body, and loop trailer. Both the loop header and trailer are generated depending on the type of loop, and the loop body is generated by generating code for the subgraph rooted at the first node of the loop body. Consider the loops in the graph of Figure 7-6. The loop rooted at node 6 is a pre-tested loop, and the loop rooted at node 8 is a post-tested loop.
In the case of the pre-tested loop, when the loop condition is True (i.e. the jcond Boolean conditional in node 6), the loop body is executed. If the branch into the loop was the False branch, the loop condition has to be negated since the loop is executed when the condition is False. The loop body is generated by the writeCode() routine, and the loop trailer consists only of an end of loop bracket (in C). Once this code has been generated, code for the loop follow node is generated by invoking the writeCode() routine. The following skeleton is used:

```c
write ("%s while (loc1 < 10) {\n", indent(indLevel))
writeCode (7, indLevel + 1, 10, ifFollow, nFollow)
write ("%s }\n", indent(indLevel))
writeCode (11, indLevel, latchNode, ifFollow, nFollow)
```

where the first instruction generates code for the loop header, the second instruction generates code for the loop body; rooted at node 7 and having a latching node 10, the third instruction generates code for the loop trailer, and the fourth instruction generates code for the rest of the graph rooted at node 11.

In the post-tested loop, the loop condition is true when the branch is made to the loop header node. The following skeleton is used:

```c
write ("%s do {\n", indent(indLevel))
writeBB (8, indLevel + 1)
writeCode (9, indLevel + 1, 9, ifFollow, nFollow)
write ("%s } while (loc2 < 5); \n", indent(indLevel))
writeCode (10, indLevel, latchNode, ifFollow, nFollow)
```

where the first instruction generates code for the loop header, the second instruction generates code for the instruction in the root node, the third instruction generates code for the loop body rooted at node 9 and ended at the loop latching node 9, the fourth instruction generates the loop trailer, and the fifth instruction generates code for the remainder of the graph rooted at node 10. Code is generated in a similar way for endless loops, with the distinction that there may or may not be a loop follow node.

Normally pre-tested loop header nodes have only one instruction associated with them, but in languages that allow for several logical instructions to be coded in the one physical instruction, such as in C, these instructions will be in the header node but not all of them would form part of the loop condition. For example, in the following C loop:

```c
while ((a += 11) > 50)
{
    printf ("greater than 50\n");
    a = a - b;
}
```

the while() statement has two purposes: to add 11 to variable a, and to check that after this assignment a is greater than 50. Since our choice of intermediate code allows for only one instruction to be stored in an intermediate instruction, the assignment and the comparison form part of two different instructions, as shown in the following intermediate code:
B3:
    asgn a, a + 11
    jcond (a <= 50) B5
B4:
    call printf ("greater than 50\n")
    asgn a, a - b
    jmp B3
B5:
    /* other code */

Two solutions are considered for this case: preserve the while() loop structure by repeating the extra instructions in the header basic block at the end of the loop, or transform the while() loop into an endless for (;; ) loop that breaks out of the loop whenever the Boolean condition associated with the while() is False. In our example, the former case leads to the following code in C:

```c
a = a + 11;
while (a > 50) {
    printf ("greater than 50\n");
    a = a - b;
    a = a + 11;
}
```

and the latter case leads to the following C code:

```c
for (;; ) {
    a = a + 11;
    if (a <= 50)
        break;
    printf ("greater than 50\n");
    a = a - b;
}
```

Either approach generates correct code for the graph; the former method replicates code (normally a few instructions, if any) and preserves the while() structure, the latter method does not replicate code but modifies the structure of the original loop. In this thesis the former method is used in preference to the latter, since this solution provides code that is easier to understand than the latter solution.

When generating code for the loop body or the loop follow node, if the target node has already been traversed by the code generator, it means that the node has already been reached along another path, therefore, a goto label needs to be generated to transfer control to the target code. The algorithm in Figure 7-7 generates code for a graph rooted at a loop header node. This algorithm generates code in C, and assumes the existence of the function invExp() which returns the inverse of an expression (i.e. negates the expression), and the procedure emitGotoLabel() which generates a unique label, generates a goto to that label, and places the label at the appropriate position in the final C code.
procedure writeLoop (BB: basicBlock; i, latchNode, ifFollow, nFollow: Integer)
/* Pre: BB is a pointer to the header basic block of a loop.
* i is the indentation level used for this basic block.
* latchNode is the number of the latching node of the enclosing loop (if any).
* ifFollow is the number of the follow node of the enclosing if structure (if any).
* nFollow is the number of the follow node of the enclosing n-way structure (if any).
* Post: code for the graph rooted at BB is generated. */

traversedNode(BB) = True.
case (loopType(BB)) /* Write loop header */
  Pre_Tested: writeBB (BB, i).
  if (succ (BB, Else) == loopFollow(BB)) then
    write ("%s while (%s) \n", indent(i), walkCondExp (loopExp(BB))).
  else
    write ("%s while (%s) \n", indent(i), walkCondExp (invExp(loopExp(BB)))).
  end if
  Post_Tested: write ("%s do\n \n", indent(i)).
  writeBB (BB, i+1).
  Endless: write ("%s for (;;) \n", indent(i)).
  writeBB (BB, i+1).
end case
if ((nodeName(BB) == Return) ∨ (revPostorder(BB) == latchNode)) then return.
if (latchNode(BB) ≠ BB) then /* Loop is several basic blocks */
  for (all successors s of BB) do
    if (loopType(BB) ≠ Pre_Tested) ∨ (s ≠ loopFollow(BB)) then
      if (traversedNode(BB) == False) then
        writeCode (s, i+1, latchNode (BB), ifFollow, nFollow).
      else /* has been traversed */
        emitGotoLabel (firstInst(s)).
      end if
    end if
  end for
end if
case (loopType(BB)) /* Write loop trailer */
  Pre_Tested: writeBB (BB, i+1).
  write ("%s \n", indent(i)).
  Post_Tested: write ("%s while (%s); \n", indent(i), walkCondExp (loopExp(BB)))).
  Endless: write ("%s \n", indent(i)).
end case
if (traversedNode(loopFollow(BBB)) == False) then /* Continue with follow */
  writeCode (loopFollow(BBB), i, latchNode, ifFollow, nFollow).
else
  emitGotoLabel (firstInst(loopFollow(BBB))).
end if
end procedure

Figure 7-7: Algorithm to Generate Code for a Loop Header Rooted Graph
Generating Code for 2-way Rooted Graphs

Given a graph rooted at a 2-way node that does not form part of a loop conditional expression, code for this graph is generated by determining whether the node is the header of an if..then or an if..then..else condition. In the former case, code is generated for the condition of the if, followed by the code for the then clause, and finalized with the code for the if follow subgraph. In the latter case, code is generated for the if condition, followed by the then and else clauses, and finalized with the code for the follow node. Consider the two 2-way nodes in Figure 7-6 which do not form part of loop expressions; nodes 1, 2 and 11.

Node 1 is the root of an if..then structure since the follow node (node 5) is one of the immediate successors of node 1. The other immediate successor, node 2, is the body of the then clause, which is reached when the condition in node 1 is False; i.e. the condition needs to be negated, as in the following code:

```c
write("\%s if (loc3 < loc4) {\n", indent(indLevel))
writeCode (2, indLevel+1, latchNode, 5, nFollow)
write("\%s }\n", indent(indLevel))
writeCode (5, indLevel, latchNode, ifFollow, nFollow)
```

where the first instruction generates code for the negated condition of the if, the second instruction generates code for the then clause subgraph which is rooted at node 2 and has 5 as a follow node, the third instruction generates the trailer of the if, and the last instruction generates code for the follow subgraph rooted at node 5.

Node 2 is the root of an if..then..else structure. In this case, neither immediate successors of the header node are equivalent to the follow node. The True branch is reached when the condition is True, and the False branch is reached when the condition is False, leading to the following code:

```c
write("\%s if ((loc3 * 4) <= loc4) {\n", indent(indLevel))
writeCode (3, indLevel+1, latchNode, 5, nFollow)
write("\%s }\n else {\n", indent(indLevel))
writeCode (4, indLevel+1, latchNode, 5, nFollow)
write("\%s }\n", indent(indLevel))
```

where the first instruction generates code for the if condition, the second instruction generates code for the then clause, the third instruction generates the else, the fourth instruction generates code for the else clause, and the last instruction generates the if trailer. Code for the follow node is not generated in this case because this conditional is nested in another conditional that also takes 5 as the follow node. This is easily checked with the ifFollow parameter, which specifies the follow of the enclosing if, if it is the same, code for this node is not yet generated.

In a similar way, code is generated for the subgraph rooted at node 11. In this case, the True branch leads to the follow node, hence, the Boolean condition associated with this if has to be negated, and the False branch becomes the then clause. The following skeletal code is used:
As with loops, goto jumps are generated when certain nodes in the graph have been visited before the current subgraph visits them. In this case, whenever the branches of a 2-way node have already been visited, a goto to such branch(es) is generated. Also, whenever a 2-way rooted subgraph does not have a follow node it means that the two branches of the graph do not lead to a common node because the branches are ended (i.e. a return node is met) before met. In this case, code is generated for both branches, and the end of the path will ensure that the recursion is ended. The algorithm in Figure 7-8 generates code for a graph rooted at a 2-way node that does not form part of a loop Boolean expression.

Generating Code for n-way Rooted Graphs

Given a graph rooted at an n-way node, code for this graph is generated in the following way: the n-way header code is emitted (a switch() is used in C), and for each successor of the header node the n-way option is emitted (a case is used in C), followed by the generation of code of the subgraph rooted at that successor and ended at the n-way follow node. Once the code for all successors has been generated, the n-way trailer is generated, and code is generated for the rest of the graph by generating code for the graph rooted at the follow node of the n-way header node. Whenever generating code for one of the branches or the follow node of the n-way structure, if the target node has already been traversed, a goto jump is generated to transfer control to the code associated with that node.

The algorithm in Figure 7-9 generates code for a graph rooted at an n-way node.

Generating Code for 1-way, Fall, and Call Rooted Graphs

Given a graph rooted at a 1-way, fall-through, or call node, the code for the basic block is generated, followed by the unique successor of such node. Even though call nodes have 2 successors, one of the successor edges points to the subroutine invoked by this instruction; since code is generated on a subroutine at a time basis, this branch is disregarded for code generation purposes, and the node is thought of as having a unique successor.

The algorithm in Figure 7-10 generates code for nodes that have a unique successor node. If code has already been generated for the unique follow node, it means that the graph was reached along another path and hence a goto jump is generated to transfer control to the code associated with that subgraph.

A Complete Algorithm

The final algorithm to generate C code from a subroutine’s graph is shown in Figure 7-11. The writeCode() procedure takes as arguments a pointer to a basic block, the indentation level to be used, the latching node of an enclosing loop (if any), and the follow nodes of enclosing 2-way and n-way conditionals (if any). Initially, the basic block pointer points to the start of the subroutine’s graph, the indentation level is 1, and there are no latching or
procedure write2way (BB: basicBlock; i, latchNode, ifFollow, nFollow: Integer)
/* Pre: BB is a 2-way basic block.
 * i is the indentation level.
 * latchNode is the latching node of the enclosing loop (if any).
 * ifFollow is the follow node of the enclosing 2-way structure (if any).
 * nFollow is the number of the follow node of the enclosing n-way structure (if any).
 * Post: the code for the tree rooted at BB is generated. */
if (ifFollow(BB) ≠ MAX) then
    emptyThen = False.
    if (traversedNode(succ(BB,Then)) == False) then /* Process then clause */
        if (succ(BB,Then) ≠ ifFollow(BB)) then
            write ("\n %s if (%s) \n {", indent(i+1), walkCondExp (ifExp(BB))).
            writeCode (succ(BB,Then), i+1, latchNode, ifFollow(BB), nFollow).
        else /* empty then clause; negate else clause */
            write ("\n %s if (%s) \n {", indent(i+1), walkCondExp (invExp(ifExp(BB))).
            writeCode (succ(BB,Else), i+1, latchNode, ifFollow(BB), nFollow).
            emptyThen = True.
        end if
    else
        emitGotoLabel (firstInst(succ(BB,Then))).
    end if
else if (emptyThen == False)
    write ("\n %s } \n %s else \n {", indent(i), indent(i)).
    writeCode (succ(BB,Else), i+1, latchNode, ifFollow(BB), nFollow).
end if
else if (emptyThen == False)
    write ("\n %s } \n %s else \n {", indent(i), indent(i)).
    emitGotoLabel (firstInst(succ(BB,Else))).
end if
write ("\n", indent(i)).
if (traversedNode(ifFollow(BB)) == False) then
    writeCode (ifFollow(BB), i, latchNode, ifFollow, nFollow).
end if
else /* No follow, emit if..then..else */
    write ("\n %s if (%s) \n {", indent(i), walkCondExp(ifExp(BB))).
    writeCode (succ(BB,Then), i, latchNode, ifFollow, nFollow).
    write ("\n %s } \n %s else \n {", indent(i), indent(i)).
    writeCode (succ(BB,Else), i, latchNode, ifFollow, nFollow).
    write ("\n", indent(i)).
end if
end procedure

Figure 7-8: Algorithm to Generate Code for a 2-way Rooted Graph
procedure writeNway (BB: basicBlock; i, latchNode, ifFollow, nFollow: Integer)
/* Pre: BB is an n-way basic block. */
* i is the indentation level.
* latchNode is the number of the enclosing loop latching node (if any).
* ifFollow is the number of the enclosing if terminating node (if any).
* nFollow is the number of the enclosing n-way terminating node (if any).
* Post: code is generated for the graph rooted at BB. */

write ("\nswitch (%s) \{ \n\n", indent(i), nwayExp(BB)).
for (all successors s of BB) do /* Generate Code for each Branch */
    if (traversedNode(s) == False) then
        write ("\ncase %s: \n\n", indent(i+1), index(s)).
        writeCode (s, i+2, latchNode, ifFollow, nwayFollow(BB)).
        write ("\nbreak; \n\n", indent(i+2)).
    else
        emitGotoLabel (firstInst(s)).
    end if
end for
if (traversedNode(nwayFollow(BB)) == False) /* Generate code for the follow node */
    writeCode (nwayFollow(BB), i, latchNode, ifFollow, nFollow).
else
    emitGotoLabel (firstInst(nwayFollow(BB))).
end if
end procedure

Figure 7-9: Algorithm to Generate Code for an n-way Rooted Graph

follow nodes to check upon (these values are set to a predetermined value). Whenever a follow node is met, no more code is generated along that path, and the procedure returns to the invoking procedure which is able to handle the code generation of the follow node. This is done so that the trailer of a conditional is generated before the code that follows the conditional. In the case of loops, the latching node is the last node for which code is generated along a path, after which recursion is ended and the invoked procedure handles the loop trailer code generation and the continuation of the follow of the loop.

The procedure order in which code is generated is determined by the call graph of the program. We like to generate code for the nested procedures first, followed by the ones that invoke them; hence, a depth-first search ordering on the call graph is followed, marking each subroutine graph as being traversed once it has been considered for code generation. C code for each subroutine is written by generating code for the header of the subroutine, followed by the local variables definition, and the body of the subroutine. The algorithm in Figure 7-12 shows the ordering used, and the generation of code for the subroutine. The isLib() function is used in this algorithm to determine whether a subroutine is a library or not; code is not generated for library routines that were detected by the signature method of Chapter 8. The writeComments() procedure writes information collected from the analysis.
procedure write1way (BB: basicBlock; i, latchNode, ifFollow, nFollow: Integer)  
/* Pre: BB is a pointer to a 1-way, call, or fall-through basic block. */  
* i is the indentation level used for this basic block.  
* latchNode is the number of the latching node of the enclosing loop (if any).  
* ifFollow is the number of the follow node of the enclosing 2-way structure (if any).  
* nFollow is the number of the follow node of the enclosing n-way structure (if any).  
* Post: code for the graph rooted at BB is generated. */  
writeBB (BB, i).  
if (traversedNode(succ(BB,1)) == False) then  
  writeCode (succ(BB,1), i, latchNode, ifFollow, nFollow).  
else  
  emitGotoLabel (firstInst(succ(BB,1))).  
end if  
end procedure

Figure 7-10: Algorithm to Generate Code for 1-way, Call, and Fall Rooted Graphs

procedure writeCode (BB: basicBlock; i, latchNode, ifFollow, nFollow: Integer)  
/* Pre: BB is a pointer to a basic block. Initially it points to the head of the graph. */  
* i is the indentation level used for this basic block.  
* latchNode is the number of the latching node of the enclosing loop (if any).  
* ifFollow is the number of the follow node of the enclosing 2-way structure (if any).  
* nFollow is the number of the follow node of the enclosing n-way structure (if any).  
* Post: code for the graph rooted at BB is generated. */  
if ((revPostorder(BB) == (ifFollow ∨ nFollow)) ∨ (traversedNode(BB) == True)) then  
  return.  
end if  
traversedNode(BB) = True.  
if (isLoopHeader(BB)) then  
  /* ... for loops */  
  writeLoop (BB, i, latchNode, ifFollow).  
else  
  /* ... for other nodes */  
  case (nodeType(BB))  
    2-way: write2way (BB, i, latchNode, ifFollow, nFollow).  
    n-way: writeNway (BB, i, latchNode, ifFollow, nFollow).  
    default: write1way (BB, i, latchNode, ifFollow, nFollow).  
  end case  
end if  
end procedure

Figure 7-11: Algorithm to Generate Code from a Control Flow Graph
of the subroutine, such as the type of arguments that were used (stack arguments or register arguments), whether a high-level prologue was detected in the subroutine, the number of arguments the subroutine takes, whether the subroutine generates an irreducible graph or not, and many more.

```plaintext
procedure writeProc (p: procedure)
/* Pre: p is a procedure pointer; initially the start node of the call graph.
* Post: C code is written for the program rooted at p in a depth-first fashion. */

if (traversedProc(p) ∨ isLib(p)) then
  return.
end if
traversedProc(p) = True.
for (all successors s ∈ succ(p)) do /* Dfs on Successors */
  writeProc (s).
end for

/* Generate code for this procedure */
if (isFunction(p)) then /* Generate Subroutine Header */
  write ("%s %s (%s) \n {", returnType(p), funcName(p), formalArgList(p)).
else
  write ("void %s (%s) \n {", procName(p), formalArgList(p)).
end if
writeComments(p). /* Generate Subroutine Comments */
for (all local variables v ∈ localStkFrame(p)) do /* Local Variable Definitions */
  write ("%s %s;\n", varType(v), genUniqueName(v)).
end for
if (isHighLevel(p)) then /* Generate Code for Subroutine */
  writeCode (controlFlowGraph(p), 1, Max, Max, Max).
else /* low-level subroutine, generate assembler */
  disassemble(p).
end if
write ("} \n").
end procedure
```

Figure 7-12: Algorithm to Generate Code from a Call Graph

Using the algorithms described in this section, the C code in Figure 7-13 is generated for the graph of Figure 7-2. Local variables are uniquely named in a sequential order starting from one, and making use of the prefix loc (for local).
void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    int loc1;
    int loc2;
    int loc3;
    int loc4;

    loc3 = 5;
    loc4 = (loc3 * 5);
    if (loc3 < loc4) {
        loc3 = (loc3 * loc4);
        if ($(loc3 << 2) > loc4) {
            loc3 = (loc3 << 3);
        }
        else {
            loc4 = (loc4 << 3);
        }
    }

    loc1 = 0;
    while ($(loc1 < 10)) {
        loc2 = loc1;
        do {
            loc2 = (loc2 + 1);
            printf("i = %d, j = %d\n", loc1, loc2);
        } while ($(loc2 < 5));
        loc1 = (loc1 + 1);
    }

    if ($(loc3 >= loc4) && ($(loc4 << 1) <= loc3)) {
        loc3 = ($(loc3 + loc4) - 10);
        loc4 = (loc4 / 2);
    }
    printf("a = %d, b = %d\n", loc3, loc4);
}

Figure 7-13: Final Code for the Graph of Figure 7.2

7.1.3 The Case of Irreducible Graphs

As pointed out in Chapter 6, Section 6.6.4, loops of irreducible graphs are not structured as natural loops since the nodes of the loop do not form part of a complete interval. Consider the canonical irreducible flow graph of Figure 7-14 with structuring information.
During code generation, code for the `if` in node 1 is generated first, since the True branch leads to the follow node of the `if` (node 3), the False branch of node 1 is the `then` clause (negating the Boolean condition associated with node 1). Code for the `then` clause is generated, and the `if..then` is followed by the code generated for node 3. Since node 3 transfers control to node 2, which has already been visited during code generation, a `goto` jump is generated to the associated code of node 2. This `goto` simulates the loop and provides an unstructured `if..then` structure by transferring control to the `then` clause of the `if`. The skeletal code for this graph in C is as follows:

```c
if (! 1) {
    L: 2;
} else {
    3;
    goto L;
}
```

where the numbers represent the basic blocks and code for each basic block is generated by the `writeBB()` procedure.
Chapter 8

Decompilation Tools

The decompiler tools are a series of programs that help the decompiler generate target high-level language programs. Given a binary file, the loader determines where the binary image starts, and whether there is any relocation information for the file. Once the binary image has been loaded into memory (and possibly relocated), a disassembler can be used to parse the binary image and produce an assembler form of the program. The parsing process can benefit from the use of a compiler and library signature recognizer, which determines whether a subroutine is a library subroutine or not, according to a set of predefined signatures generated by another program. In this way, only the original code written by the user is disassembled and decompiled. The disassembler can be considered part of the decompiler, as it parses only the binary image (i.e. it is a phase of the front-end module). Once the program has been parsed, it can be decompiled using the methods of Chapters 4, 5, 6, and 7, generating the target high-level program. Finally, a postprocessor program can improve the quality of the high-level code. Figure 8-1 shows the different stages involved in a decompilation system.

![Decompilation System Diagram](image-url)

Figure 8-1: Decompilation System
8.1 The Loader

The loader is an operating system program that loads an executable or binary program into memory if there is sufficient free memory for the program to be loaded and run. Most binary programs contain information on the amount of memory that is required to run the program, relocation addresses, and initial segment register values. Once the program is loaded into memory, the loader transfers control to the binary program by setting up the code and instruction segments.

The structure of binary programs differs from one operating system to another, therefore, the loading of a program is dependent on the operating system and the machine the binary program runs in. The simplest form of binary programs contains only the binary image of the program, that is, a fully linked image of the program that is loaded into memory as is, without any changes made to the binary image. The .com files under the DOS operating system use this binary structure. Most binary programs contain not only the binary image, but also header information to determine the type of binary program (i.e. there can be different types of executable programs for the same operating system, or for different operating systems that run on the same machine) and initial register values, and a relocation table that holds word offsets from the start of the binary image which need to be relocated according to the address were the program is loaded into memory. This type of binary file is used in the .exe files under DOS and Windows. The general format of a binary program is shown in Figure 8-2.

![Figure 8-2: General Format of a Binary Program](image)

The algorithm used to load a program into memory is as follows: the type of binary file is determined (on systems that allow for different types of binary files), if the file is a binary image on its own, the size of memory to be allocated is the size of the file, therefore, a block of memory of the size of the file is allocated, the file is loaded into the block of memory as is, without any modifications, and the default segment registers are set. In the case of binary files with header and relocation table information, the header is read to determine how much memory is needed to load the program, were the relocation table is, and to get other information to set up registers. A memory block of the size given in the header is allocated, the binary image of the file is then loaded into the memory block, the elements in the relocation table are relocated in memory, and segment registers are set up according to the information in the header. The algorithm is shown in Figure 8-3.
procedure loader (name: fileName)
/* Pre: name is the name of a binary file.
* Post: the binary program name has been loaded into memory. */

determine type of binary program.
if (only binary image) then
    S = size of the binary file.
    allocate free block of memory of size S.
    load file into allocated memory block, at a predefined offset.
    setup default segment registers.
else
    read header information.
    S = size of the binary image (from the header information).
    allocate free block of memory of size S.
    load binary image into allocated memory block.
    relocate all items from the relocation table in memory.
    setup segment registers with information from the header.
end if
end procedure

Figure 8-3: Loader Algorithm

8.2 Signature Generator

A signature generator is a program that automatically generates signatures for an input file. A signature is a binary pattern used to recognize viruses, compilers, and library subroutines. The aim of signatures in decompilation is to undo the process performed by the linker, that is, to determine which subroutines are libraries and compiler start-up code, and replace them by their name (in the former case) or eliminate them from the target output code (in the latter case). This is the case for operating systems that do not share libraries, and therefore bind the library subroutine’s object code into the program’s binary image. No information on the subroutine’s name or arguments is stored in the binary program, hence, without a method to distinguish them from user-written subroutines, it is impossible to differentiate them from other subroutines. In the case of operating systems that share library subroutines, the subroutine does not form part of the binary program, and a reference to the subroutine is made in the program, hence, the subroutine’s name is stored as part of the binary file (most likely in the header section). The methods presented in this section are targeted at operating systems that do not share library subroutines, and therefore include them in the binary program.

Once a signature file has been generated for a set of library subroutines, an interface procedure is called to check a particular subroutine that is to be parsed by the decompiler/disassembler against the library signatures. If a subroutine is matched against one of the signatures, the subroutine is replaced by its name (i.e. the name of the subroutine in the library, such as printf) and is marked as not needing any more analysis. In this way the
number of subroutines to be analyzed is reduced, but even better, the quality of the target high-level program is improved considerably since some subroutine calls will make use of real library names rather than arbitrary names. Also, since some of the library subroutines are written in assembler for performance reasons or due to low-level machine accesses, these routines do not have a high-level representation in most cases, and thus, can only be disassembled as opposed to decompiled; the use of a library signature recognition method eliminates the need to analyze this type of subroutines, producing better target code.

The ideas presented in this and the next section (Sections 8.2, and 8.3) were developed by Michael Van Emmerik while working at the Queensland University of Technology. These ideas are expressed in [Emm94]. Figure 8-4 has been reproduced with permission from the author.

8.2.1 Library Subroutine Signatures

A standard library file is a relocatable object file that implements different subroutines available in a particular language/compiler. A library subroutine signature is a binary pattern that uniquely identifies a subroutine in the library from any other subroutine in that same library. Since all subroutines perform different functions, a signature that contains the complete binary pattern of the subroutine will uniquely identify the subroutine from any other subroutine. The main problem with this approach is the size of the signature and the overhead created by that size. It is therefore ideal to check only a minimum number of bytes in the subroutine, hence, the signature is as small as possible. Given the great number of subroutines in a library, it is not hard to realize that for some subroutines there is need for \( n \) bytes in the signature to uniquely identify them, but \( n \) could be greater than the complete size of small subroutines; therefore, for small subroutines, the remaining bytes need to be padded with a predetermined value in order to avoid running into bytes that belong to another library subroutine. For example, if \( n \) is 23, the library function \( \cos \) has 10 bytes, and \( \cos \) is followed by the library function \( \\text{strcpy} \); the 10 bytes of \( \cos \) form part of its signature, along with 13 bytes of padded predetermined value; otherwise, the 13 bytes would be part of \( \\text{strcpy} \).

Given the first \( n \) bytes of a subroutine, machine instructions that use operands which cannot be determined to be constants or offsets, or that depend on the address where the module was loaded, are considered variant bytes that can have a different value in the library file and the binary program that contains such a subroutine. It is therefore necessary to wildcard these variant byte locations, in order to generate an address-independent signature. Consider the code for a library routine \( \\text{fseek}() \) in Figure 8-4. The call at instruction 108 has an offset to the subroutine that is being called. Called subroutines are not always linked at the same position, therefore, this offset address is variant; thus it is wildcarded. The \( \text{mov} \) at instruction 110 takes as one of its arguments a constant or offset operand; since it is not known whether this location is invariant (i.e. a constant), it is wildcarded as well. The choice of wildcard value is dependent on the machine assembler. A good candidate is a byte that is hardly used in the machine, such as \( \text{halt} \) in this example (opcode \( \text{F4} \)), or a byte that is not used in the assembler of the machine. Similar considerations are done for the padding bytes used when the signature is too small. In this example, 00 was used.
It is noted in this example that although the function `fseek()` has more bytes in its image, the signature is cut after 21 bytes due to the unconditional jump in instruction 113. This is done since it is unknown whether the bytes that follow the unconditional jump form part of the same library subroutine or not. In general, whenever a return or (un)conditional jump is met, the subroutine is considered finished for the purposes of library signatures, and any remaining bytes are padded. The final signature for this example is given in Figure 8-5. It should be noted that this method has some small probability of being in error since different subroutines may have the same starting code up to the first (un)conditional transfer of control.

The algorithm to automatically generate library subroutine signatures is shown in Figure 8-6. This algorithm takes as arguments a standard library file, the name of the output signature file, and the size of the signature (in bytes), which has been experimentally found in advance.

Since different library files are provided by the compiler vendor in machines that use different memory models, a different signature file needs to be generated for each memory model. It is ideal to use a naming convention to determine the compiler vendor and memory model of the signature library, in that way, eliminating any need for extra header information saved on the signature file.
Integration of Library Signatures and the Decompiler

Given the entry point to a subroutine, the parser disassembles instructions following all paths from the entry point. If it is known that a particular compiler was used to compile the source binary program that is currently being analyzed, the parser can check whether the subroutine is one that belongs to a library (for that particular compiler) or not. If it does, the code does not need to be parsed since it is known which subroutine was invoked, and hence, the name of the subroutine is used.

Due to the large number of subroutines present in a library, a linear search is very inefficient for checking against all possible signatures in a file. Hashing is a good technique to use in this case, and even better, perfect hashing can be used since the signatures are unique for each subroutine in a given library, and have a fixed size. Perfect hashing information can be stored in the header of the library signature file, and used by the parser whenever needing to determine whether a subroutine belongs to the library or not.

8.2.2 Compiler Signature

In order to determine which library signature file to use with a binary program, the compiler that was used to compile the original user program needs to be determined. Since different binary patterns in the compiler start-up code are used by different compiler vendors, these
patterns can be manually examined and stored in a signature that uses wildcards, in the same way as done for library subroutine signatures. Different memory models will provide different compiler signatures for the same compiler, and most likely, different versions of the same compiler have different signatures, therefore, a different signature for each (compiler vendor, memory model, compiler version) is stored. Again, a naming scheme can be used to differentiate different compiler signatures.

Determining the Main Program

The entry point given by the loader is the entry to the compiler start-up code, which invokes at least a dozen subroutines to set-up its environment before invoking the main subroutine of the program; i.e. the `main` in any C program, or the `BEGIN` in a Modula-2 program. The main entry point to a program compiled with a predefined compiler is determined by manual examination of the start-up code. In all C compilers, the parameters to the `main()` function (i.e. `argv`, `argc`, `envp`) are pushed before the main function is invoked; therefore, it is not very hard to determine the main entry point. Most C compilers provide the source code for their start-up code, in the interest of interoperability, hence the detection of the main entry point can be done in this way too. Once it is known how to determine the main entry point, this method is stored in the compiler signature file for that particular compiler.

Integration of Compiler Signatures with the Decompiler

Before the parser analyzes any instructions at the entry point given by the loader, an interface procedure is invoked to check for different compiler signatures. This procedure determines whether the first bytes of the loaded program are equivalent to a known compiler signature, and if so, the compiler vendor, compiler version, and memory model are determined, and stored in a global structure. Once this is done, the main entry point is determined by the signature, and that entry point is treated as the starting point for the parser. From there onwards, any subroutines called by the program can be checked against the library signature file for the appropriate compiler vendor, compiler version, and memory model.

8.2.3 Manual Generation of Signatures

Automatic generation of signatures is ideal, but it has the problem of finding a unique binary pattern that uniquely identifies all different subroutines in a library. Experimental results have shown that the number of repeated signatures across a standard library file varies from as low as 5.3% to as high as 29.7% [Emm94]. Most of the repeated signatures are due to functions that have different names but the same implementation, or due to unconditional jumps after a few bytes that force the signature to be cut short early.

A manual method for the generation of signatures was described in [FZ91], and used in an 8086 C decompiling system [FZL93]. A library file for the Microsoft C version 5.0 was analyzed by manual inspection of each function, and the following information was stored for each function: function name, binary pattern for the complete function (including variant bytes), and matching method to determine whether an arbitrary subroutine matches it or not. The matching method is a series of instructions that determines how many fixed bytes of information there are starting at an offset in the binary pattern for the function, and
what subroutines are called by the function. Whenever an operand cannot be determined to be an offset or a constant, those bytes are skipped (i.e. they are not compared against the bytes in the binary pattern since they are variant bytes), and when a subroutine is called, the offset address of the subroutine is not tested, but the call to the routine is performed; which in turn is matched against the patterns in the signature. In this way, all paths of the subroutine are followed and checked against the signature.

The disadvantage of the manual generation of signatures is the time involved in generating them; typically a library has over 300 subroutines, and numbers increase to over 1300 for object oriented languages. Manual generation of signatures for the one library can take days, up to a week in large library files. Also, when a new version of the compiler is available, the signatures have to be reanalyzed manually, hence, the time overhead is great. Using an automatic signature generator reduces the amount of time to generate the signatures for a complete library to a few seconds (less than a minute), with the inconvenience of repeated signatures for a percentage of the functions. These repeated functions can be manually checked, and unique signatures generated for them if necessary.

8.3 Library Prototype Generator

A library prototype generator is a program that automatically generates information on the prototypes of library subroutines; that is, the type of the arguments used by the subroutine, and the type of the return value for functions. Determining prototype information on library subroutines helps the decompiler check for the right type and number of arguments, and propagate any type information that has wrongly been considered another type due to lack of information in the analysis. Consider the following code:

```
mov   ax, 42
push  ax
call  printf
```

During data flow analysis, this code is transformed into the following code after extended register copy propagation:

```
call  printf (42)
```

Without knowing the type of arguments that `printf` takes, the constant argument 42 is considered the right argument to this function call. But, if prototype information exists on this function, the function’s formal argument list would have a fixed pointer to a character (i.e. a string in C) argument, and a variable number of other parameters of unknown type. Hence, the constant 42 could be determined to be an offset into the data segment rather than a constant, and replaced by that offset. This method provides the decompiler with the following improved code:

```
printf (''Hello world\n'');
```

and the disassembly version of the program could be improved to:

```
mov   ax, offset szHelloWorld
push  ax
call  printf
```
where `szHelloWorld` is the offset 42 into the data segment which points to the null terminated string.

It is therefore useful for the decompiler to use library prototypes. Unlike library signatures, there is a need for only one library prototype file for each high-level language (i.e. the standard functions of the language must all have the same prototypes). Compiler-dependent libraries require extra prototype files. Languages like C and Modula-2 have the advantage of using header files that define all library prototypes. These prototypes can be easily parsed by a program and stored in a file in a predetermined format. Languages such as Pascal store the library prototype information in their libraries, therefore, a special parser is required to read these files.

**Comment on Runtime Support Routines**

Compiler runtime support routines are subroutines used by the compiler to perform a particular task. These subroutines are stored in the library file, but do not have function prototypes available to the user (i.e. they are not in the header file of the library), hence they are used only by the compiler and do not follow high-level calling conventions. Most runtime subroutines have register arguments, and return the result in registers too. Since there is no prototype available for these subroutines, it is in the interest of the decompiler to analyze them in order to determine the register argument(s) that are being used, and the return register(s) (if any).

Runtime support routines are distinguished from any other library routine by checking the library prototypes: a subroutine that forms part of the library but does not have a prototype is a runtime routine. These routines have a name (e.g. `LXMUL`) but the type of the argument(s) and return value is unknown. During decompilation, these subroutines are analyzed, and the name from the library file is used to name the subroutine. Register arguments are mapped to formal arguments.

**Integration of the Library Prototypes and the Decompiler**

Whenever the type of compiler used to compile the original source program is determined by means of compiler signatures, the type of language used to compile that program is known; hence, the appropriate library prototype file can be used to determine more information on library subroutines used in the program. During parsing of the program, if a subroutine is determined to be one of the subroutines in a library signature file, the prototype file for that language is accessed, and this information along with the subroutine’s name is stored in the subroutine’s summary information record. This process provides the data flow analyzer with a complete certainty on the types of arguments to library subroutines, therefore, these types can be back-propagated to caller subroutines whenever found to be different to the ones in the prototype. Also, if the subroutine has a return type defined, it means that the subroutine is really a function, and hence, should be treated as one.

**8.4 Disassembler**

A disassembler is a program that converts a source binary program to a target assembler program. Assembly code uses mnemonics which represent machine opcodes; one or more
machine opcodes are mapped to the same assembly mnemonic (e.g., all machine instructions that add two operands are mapped to the `add` mnemonic).

Disassemblers are used as tools to modify existing binary files for which there is no source available, to clarify undocumented code, to recreate lost source code, and to find out how a program works [Fiu89, Com91]. In recent years, disassemblers are used as debugging tools in the process of determining the existence of virus code in a binary file, and the disassembly of such a virus; selective disassembly techniques are used to detect potential malicious code [Gar88].

A disassembler is composed of two phases: the parsing of the binary program, and the generation of assembler code. The former phase is identical to the parsing phase of the decompiler (see Chapter 4, Section 4.1), and the code generator produces assembler code on the fly or from an internal representation of the binary program. Symbol table information is also stored, in order to declare all strings and constants in the data segment.

Most public domain DOS disassemblers [Zan85, GD83, Cal, Mak90, Sof88, Chr80] perform one pass over the binary image without constructing a control flow graph of the program. In most cases, parsing errors are introduced by assumptions made on memory locations, considering them code when they represent data. Some of these disassemblers comment on different DOS interrupts, and are able to disassemble not only binary files, but blocks of memory and system files. Commercial disassemblers like Sourcer [Com91] perform several passes through the binary image, refining the symbol table on each pass, and assuring a better distinction between data and code. An internal simulator is used to resolve indexed jumps and calls, by keeping track of register contents. Cross-reference information is also collected by this disassembler.

In decompilation, the disassembler can be considered part of the decompiler by adding an extra assembler code generator phase, such as in Figure 8-7, or can be used as a tool to generate an assembler program that is taken as the source input program to the decompiler, such as in the initial Figure 8-1.

### 8.5 Language Independent Bindings

The decompiler generates code for the particular target language it was written for. Binary programs decompiled with the aid of compiler and library signatures produce target language programs that use the names of the library routines defined in the library signature file. If the language in which the binary program was originally written in is different to the target language of the decompiler, the target program cannot be re-compiled for this language since it uses library routines defined in another language/compiler. Consider the following fragment of decompiled code in C:

```c
WriteString ("Hello Pascal");
CrLf();
```

These two statements invoke Pascal library routines that implement the original Pascal statement `writeln ("Hello Pascal")`; the first routine displays the string and the second performs a carriage return, line feed. In other words, since there is no `writeln` library
routine in the Pascal libraries, this call is replaced by the calls to `WriteString` and `CrLf`. The decompiled code is correct, but since the target language is C, it cannot be re-compiled given that `WriteString` and `CrLf` do not belong to the C library routines.

The previous problem can be solved with the use of Pascal to C bindings for libraries. In this way, rather than generating the previous two statements, a call to `printf` is used, as follows:

```c
printf ("Hello Pascal\n");
```

ISO committee SC22 of Working Group 11 is concerned with the creation of standards for language independent access to service facilities. This work can be used to define language independent bindings for languages such as C and Modula-2. Information on library bindings can be placed in a file and used by the code generator of the decompiler to produce target code that uses the target language’s library routines.

### 8.6 Postprocessor

The quality of the target high-level language program generated by the decompiler can be improved by a postprocessor phase that replaces generic control structures by language-specific structures. Language-specific structures were not considered in the structuring analysis of Chapter 6 because these constructs are not general enough to be used across several languages.

In C, the `for()` loop is implemented by a `while()` loop that checks for the terminating condition. The induction variable is initialized before the loop, and is updated each time
around the loop in the last statement of the while(). Consider the following code in C after decompilation:

```c
loc1 = 0;
while (loc1 < 8)
{
    if (loc1 != 4)
    {
        printf ("%d", loc1);
    }
    loc1 = loc1 + 1;
}
```

The while() loop at statement 2 checks the local variable loc1 against constant 8. This variable was initialized in statement 1, and is also updated in the last statement of the loop (i.e. statement 5); therefore, this variable is an induction variable, and the while() loop can be replaced by a for loop, leading to the following code:

```c
for (loc1 = 0; loc1 < 8; loc1 = loc1 + 1)
{
    if (loc1 != 4)
    {
        printf ("%d", loc1);
    }
}
```

which eliminates instructions 1 and 5, replacing them into instruction 2. Pre and post increment instructions are used in C as well, hence, the previous code can be improved to the following:

```c
for (loc1 = 0; loc1 < 8; loc1++)
{
    if (loc1 != 4)
    {
        printf ("%d", loc1);
    }
}
```

A break statement in C terminates the execution of the current loop, branching control to the first instruction that follows the loop. Consider the following code after decompilation:

```c
loc1 = 0;
while (loc1 < 8)
{
    printf ("%d", loc1);
    if (loc1 == 4)
        goto L1;
    loc1 = loc1 + 1;
}
L1:
```
Instruction 4 checks local variable loc1 against 4, and if they are equal, a goto jump is executed, which transfers control to label L1; the first instruction after the loop. This transfer of control is equivalent to a break, which removes the need for the label and the goto. Also, the loop is transformed into a for loop, leading to the following code:

```c
2   for (loc1 = 0; loc1 < 8; loc1++)
    {
3       printf ("%d", loc1);
4       if (loc1 == 4)
5            break;
    }
```

In a similar way, continue statements can be found in the code. If the target language was Ada, labelled multiexit loops are allowed. These would have been structured by the decompiler as loops with several goto jump exits out of the loop. The target statements of these goto jumps can be checked for enclosing loop labels, and replaced by the appropriate exit loopName statement.

In general, any language-specific construct can be represented by the generic set of constructs used in the structuring algorithm of Chapter 6, Section 6.4.3; these constructs can be replaced by a postprocessor, but it is not strictly necessary to do so since the constructs are functionally equivalent.
Chapter 9

dcc

dcc is a prototype decompiler written in C for the DOS operating system. dcc was initially developed on a DecStation 3000 running Ultrix, and was ported to the PC architecture under DOS. dcc takes as input .exe and .com files for the Intel i80286 architecture, and produces target C and assembler programs. This decompiler was built using the techniques described in this thesis (Chapters 4, 5, 6, 7, and 8), and is composed of the phases shown in Figure 9-1. As can be seen, the decompiler has a built-in loader and disassembler, and there is no postprocessing phase. The following sections describe specific aspects about dcc, and a series of decompiled programs are given in Section 9.7.

Figure 9-1: Structure of the dcc Decompiler
The main decompiler program is shown in Figure 9-2, with five major modules identified: the `initArgs()` which reads the user options from the command line `argv[]` and places them in a global program options variable; the `Loader()` which reads the binary program and loads it into memory; the `FrontEnd()` which reads the binary program and loads it into memory; the `udm()` which analyses the control and data flow of the program; and the `BackEnd()` which generates C code for the different routines in the call graph.

```c
int main(int argc, char *argv[])
{
    char *filename; /* Binary file name */
    CALL_GRAPH *callGraph; /* Pointer to the program’s call graph */

    filename = initArgs(argc, argv);

    /* Read a .exe or .com file and load it into memory */
    Loader(filename);

    /* Parse the program, generate Icode while building the call graph */
    FrontEnd(filename, &callGraph);

    /* Universal Decompiling Machine: process the Icode and call graph */
    udm(callGraph);

    /* Generates C for each subroutine in the call graph */
    BackEnd(filename, callGraph);
}
```

Figure 9-2: Main Decompiler Program

The DOS operating system uses a segmented machine representation. Compilers written for this architecture make use of 6 different memory models: tiny, small, medium, compact, large, and huge. Memory models are derived from the choice of 16- or 32-bit pointers for code and data. Appendix A provides information on the i80286 architecture, and Appendix B provides information on the PSP. This chapter assumes familiarization with this architecture.

Decompiler Options

`dcc` is executed from the command line by specifying the binary file to be decompiled. For example, to decompile the file `test.exe` the following command is entered:

```
dcc test.exe
```

This command produces the `test.b` file, which is the target C file. There are several options available to the user to get more information on the program. These options are:
• **a1**: produces an assembler file after parsing (i.e. before graph optimization).

• **a2**: produces an assembler file after graph optimization.

• **o <fileName>**: uses the fileName as the name for the output assembler file.

• **m**: produces a memory map of the program.

• **s**: produces statistics on the number of basic blocks before and after optimization for each subroutine’s control flow graph.

• **v**: verbose option, displays information on the loaded program (default register values, image size, etc), basic blocks of each subroutine’s graph, defined and used registers of each instruction, and the liveIn, liveOut, liveUse, and defined register sets of each basic block.

• **V**: veryVerbose option, displays the information displayed by the verbose option, plus information on the program’s relocation table (if any), basic blocks of the control flow graph of each subroutine before graph optimization, and the derived sequence of graphs of each subroutine.

• **i**: text user interface for dcc. This interface was written by Michael Van Emmerik using the curses library. It allows the user to step the program, including subroutine calls. The right arrow is used to follow jumps and subroutine calls, the left arrow is used to step back to where you were before using the right arrow, up and down arrows are used to move up/down a line at a time, page up and page down are used to scroll a page up or down, and ctrl-X is used to exit the interactive interface.

### 9.1 The Loader

The DOS loader is an operating system program called **exec**. Exec checks for sufficient available memory to load the program, allocates a block of memory, builds the PSP at its base, reads the program into the allocated memory block after the PSP, sets up the segment registers and the stack, and transfers control to the program[Dun88a].

Since the decompiler needs to have control of the program, the **exec** program was not used, but a loader that performs a similar task was written. For **.exe** programs, the program header is checked for the amount of memory required and the location of the relocation table, the size of the image in bytes is dynamically allocated and the program is then loaded into memory and relocated. For **.com** programs, the amount of memory required is calculated from the size of the file, memory is dynamically allocated, and the program is loaded into memory as is. The format of these files is given in Appendix C.

Memory is represented in **dcc** by an array of bytes; a large enough array is dynamically allocated once the size of the program’s image is determined. For historical reasons, **.com** programs are loaded at offset 0100h. The loader also stores information relating to the program in a **PROG** record, defined in Figure 9-3. This record stores not only the information that was on the binary file, but also the memory map, and the address (segment, offset)
where the program was loaded (this address is fixed but dependent on the type of binary program).

```c
typedef struct {
    int16 initCS;    /* Initial CS register value */
    int16 initIP;    /* Initial IP register value */
    int16 initSS;    /* Initial SS register value */
    int16 initSP;    /* Initial SP register value */
    boolT fCOM;      /* Flag set if COM program (else EXE) */
    Int cReloc;      /* # of relocation table entries */
    dword *relocTable;    /* Pointer to relocation table */
    Int cProcs;      /* Number of subroutines */
    Int offMain;     /* The offset of the main() proc */
    word segMain;    /* The segment of the main() proc */
    boolT libSigs;   /* True if library signatures loaded */
    Int cbImage;     /* Length of image in bytes */
    byte *Image;     /* Entire program image */
    byte *map;       /* Memory bitmap pointer */
} PROG;
```

Figure 9-3: Program Information Record

### 9.2 Compiler and Library Signatures

The DOS operating system does not provide a method to share libraries, therefore library routines are bound to the program’s image. Compiler and library signatures were generated for several compilers due to this reason; Section 9.3.1 explains how they are used in `dcc`.

An automatic signature generator was written to generate library signatures for standard `lib` files, as described in Chapter 8, Section 8.2.1. The length of the signature was set to 23 bytes, which was proved to be a reasonable size by experimental results. The wildcard byte was F4 (the opcode for `HALT`) since this opcode is rarely used, and the padding byte was set to 00. Library signatures were generated for the following C compilers: Microsoft C 5.1, Microsoft Visual C++ V1.00, Turbo C 2.01, and Borland C V3.0. A separate library signature file was used for each triplet of compiler vendor, memory model, and compiler version. Signatures were generated in a few seconds.

Since automatic signature generation was used, repeated signatures were detected. The numbers varied from as low as 5.3% for Turbo C 2.01, to as high as 29.7% for Microsoft Visual C++ V1.00. In the former case, 19 out of 357 routines had repeated signatures. These were mainly due to identical representation of routines with different names, such as `spawnvp`, `spawnvpe`, `spawnve`, `spawnv`, `spawnlp`, and `spawnl`. A few signatures were identical for similar functions, such as `tolower` and `toupper`. In only one case unrelated functions had the same signature; these functions are `brk` and `atoi`. In the latter case, 440 out of 1327 routines had the same signature. Most of these duplicates were due to internal
public names that are not accessible by the user, such as \texttt{\_Ci\texttt{}}\texttt{cosh} and \texttt{\_Ci\texttt{}}\texttt{fabs}. Other signatures use different names for the same routines, especially due to the naming convention used by different memory models (i.e. the same routine works in different memory models) [Emm94].

Pascal compilers do not use standard library files. In the case of the Borland Pascal compilers, all library information is stored in a \texttt{.tpl} file, which has information on library routines and prototypes. A modified signature generator was written for \texttt{.tpl} files, and signatures were generated for Turbo Pascal version 4.0 and 5.0.

On average, the library signature files occupy 50Kb of disk space, which is moderate for the amount of library routines' information stored in them.

Compiler signatures for the above compilers were generated manually and stored as part of \texttt{dcc}. These signatures are checked for when the parser is first invoked.

\textit{The implementation of the signature and prototype generator is due to Michael Van Emmerik while working for the Queensland University of Technology. This work is reported in [Emm94].}

### 9.2.1 Library Prototypes

A program called \texttt{parsehdr} was written to parse C library header files, isolate prototypes, and store the information about the argument types and the return type to a file. Prototypes were generated for the standard libraries used in C.

In the case of Pascal, prototype information is stored as part of the \texttt{.tpl} library file. These prototypes were not generated due to missing information regarding the exact structure of the prototype information.

### 9.3 The Front-end

The front-end constructs a call graph of the program while parsing the loaded program in memory. For each subroutine, the intermediate code and control flow graph are attached to the subroutine node in the call graph; hence, the parsing, intermediate code generation, and the construction of the flow graph are done in the same pass through the program’s image. Data information is stored in global and local symbol tables. If the user requests for disassembly, an assembler file is written out to a file with extension \texttt{.a1}, and if the user requested interactive interface, an interactive window is displayed and the user can follow the program by stepping through the instructions. Semantic analysis is done last, followed by the displaying of the bitmap (if user requested). Figure 9-4 shows the code for the \texttt{FrontEnd()} procedure.

### 9.3.1 The Parser

The parser determines whether the code reached from the entry point provided by the loader is equivalent to one of the compiler signatures stored in the program, if so, the \texttt{main} to the program is determined and used as the entry point for the analysis. Whenever a
compiler signature is recognized, the associated library signature file is loaded. The parsing
process is not affected in any way if a compiler signature is not found. In these cases, all
code reached from the entry point provided by the loader is decompiled, and no library
routine recognition is done. It is important to point out that some compilers have set-up
routines that are hard to parse since they use indirect jumps; in these cases, the complete
code cannot be parsed, and the decompilation is jeopardized.
Given the entry point to a subroutine, the parser implements the data/instruction separation algorithm described in Chapter 4, Figure 4-7. This algorithm recursively follows all paths from the entry point, and emulates loads into registers (whenever possible). When a subroutine call is met, the entry address to the subroutine becomes a new entry point which is analyzed in a recursive way, placing the subroutine information in the call graph. Register content is emulated to detect such cases as end of program via interrupts, which relies on the contents of one or more registers. Programs in which the compiler signature was recognized are known to be terminated by a routine that is executed after the finishing of the main program, hence, emulating the contents of registers in this case is not necessary. This parser does not make any attempt at recognizing uplevel addressing.

Figure 9-5 shows the definition of the PROC record, which stores information about a subroutine. Note that during parsing not all of the fields are filled with information; some are later filled by the universal decompiling machine.

```c
typedef struct _proc {
    Int procEntry;  /* label number */
    char name[SYMLEN];  /* Meaningful name for this proc */
    STATE state;  /* Entry state */
    flags32 flg;  /* Combination of Icode & Procedure flags */
    int16 cbParam;  /* Probable no. of bytes of parameters */
    STKFRAME args;  /* Array of formal arguments */
    LOCAL_ID localId;  /* Local symbol table */
    ID retVal;  /* Return value type (for functions) */

    /* Icodes and control flow graph */
    ICODE_REC Icode;  /* Record of ICODE records */
    PBB cfg;  /* Pointer to control flow graph (cfg) */
    PBB *dfsLast;  /* Array of pointers to BBs in revPostorder */
    Int numBBs;  /* Number of basic blocks in the graph cfg */
    boolT hasCase;  /* Boolean: subroutine has an n-way node */

    /* For interprocedural live analysis */
    dword liveIn;  /* Registers used before defined */
    dword liveOut;  /* Registers that may be used in successors */
    boolT liveAnal;  /* Procedure has been analysed already */
} PROC;
```

Figure 9-5: Procedure Record

The parser is followed by a checkDataCode() procedure which checks each byte in the bitmap for having two flags: data, and code; in which case the byte position is flagged as being data and code, and the corresponding subroutine is flagged as potentially using self-modifying code.
9.3.2 The Intermediate Code

The intermediate code used in dcc is called Icode, of which there are two types: low-level and high-level. The low-level Icode is a mapping of i80286 machine instructions to assembler mnemonics, ensuring that every Icode instruction performs one logical instruction only. For example, the instruction:

```
DIV bx
```

assigns to ax the quotient of dx:ax divided by bx, and assigns to dx the reminder of the previous quotient; hence, two logical instructions are performed by the DIV machine instruction. In Icode instructions, DIV is separated into two different instructions: iDIV and iMOD. The former performs the division of the operands, and the latter performs the modulus of the operands. Since both instructions use the registers that are overwritten by the result of the instruction (i.e. dx and ax in this example), these registers need to be placed in a temporary register before the instructions are performed. dcc uses register tmp as a temporary register. This register is forward substituted into another instruction and eliminated during data flow analysis. The above machine instruction is translated into three Icode instructions as follows:

```
iMOV tmp, dx:ax ; tmp = dx:ax
iDIV bx ; ax = tmp / bx
iMOD bx ; dx = tmp % bx
```

where the dividend of both iDIV and iMOD is set to the tmp register rather than dx:ax. Figure 9-6 shows the different machine instructions that are represented by more than one Icode instruction. An example is given for each instruction.

<table>
<thead>
<tr>
<th>Machine Instruction</th>
<th>Icode Instructions</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>DIV cl</td>
<td>iMOV tmp, ax</td>
<td>tmp = ax</td>
</tr>
<tr>
<td></td>
<td>iDIV cl</td>
<td>al = tmp / cl</td>
</tr>
<tr>
<td></td>
<td>iMOD cl</td>
<td>ah = tmp % cl</td>
</tr>
<tr>
<td>LOOP L</td>
<td>iSUB cx, 1</td>
<td>cx = cx - 1</td>
</tr>
<tr>
<td></td>
<td>iJNCXZ L</td>
<td>cx &lt;&gt; 0 goto L</td>
</tr>
<tr>
<td>LOOPE L</td>
<td>iSUB cx, 1</td>
<td>cx = cx - 1</td>
</tr>
<tr>
<td></td>
<td>iCMP cx, 0</td>
<td>cx == 0?</td>
</tr>
<tr>
<td></td>
<td>iJZ L</td>
<td>zeroFlag == 1 goto L</td>
</tr>
<tr>
<td></td>
<td>iJNCXZ L</td>
<td>if cx &lt;&gt; 0 goto L</td>
</tr>
<tr>
<td>LOOPNE L</td>
<td>iSUB cx, 1</td>
<td>cx = cx = 1</td>
</tr>
<tr>
<td></td>
<td>iCMP cx, 0</td>
<td>cx == 0?</td>
</tr>
<tr>
<td></td>
<td>iJNE L</td>
<td>zeroFlag == 1 goto L</td>
</tr>
<tr>
<td></td>
<td>iJNCXZ L</td>
<td>if cx &lt;&gt; 0 goto L</td>
</tr>
<tr>
<td>XCHG cx, bx</td>
<td>iMOV tmp, cx</td>
<td>tmp = cx</td>
</tr>
<tr>
<td></td>
<td>iMOV cx, bx</td>
<td>cx = bx</td>
</tr>
<tr>
<td></td>
<td>iMOV bx, tmp</td>
<td>bx = tmp</td>
</tr>
</tbody>
</table>

Figure 9-6: Machine Instructions that Represent more than One Icode Instruction
Compound instructions such as \texttt{rep movsb} are represented by two different machine instructions but perform one logical string function; repeat while not end-of-string in this case. These instructions are represented by one Icode instruction; \texttt{iREP MOVSB} in this example.

Machine instructions that perform low-level tasks, such as input and output from a port, are most likely never generated by a compiler whilst compiling high-level language code (i.e. embedded assembler code can make use of these instructions but the high-level code does not generate these instructions). These instructions are flagged in the Icode as being non high-level, and the subroutine that makes use of these instructions is flagged as well so that assembler is generated for the subroutine. The following instructions are considered not to be generated by compilers; the instructions marked with an asterisk are sometimes non high-level, depending on the register operands used:

\begin{verbatim}
AAA, AAD, AAM, AAS, CLI, DAA, DAS, *DEC, HLT, IN, *INC, INS, 
INT, INTO, IRET, JO, JNO, JP, JNP, LAHF, LOCK, *MOV, OUT, OUTS,
*POP, POPA, POPF, *PUSH, PUSHA, PUSHF, SAHF, STI, *XCHG, XLAT
\end{verbatim}

Icode instructions have a set of Icode flags associated with them to acknowledge properties found during the parsing of the instruction. The following flags are used:

- \texttt{B}: byte operands (default is word operands).
- \texttt{I}: immediate (constant) source operand.
- \texttt{No\_Src}: no source operand.
- \texttt{No\_Ops}: no operands.
- \texttt{Src\_B}: source operand is byte, destination is word.
- \texttt{Im\_Ops}: implicit operands.
- \texttt{Im\_Src}: implicit source operand.
- \texttt{Im\_Dst}: implicit destination operand.
- \texttt{Seg\_Immed}: instruction has a relocated segment value.
- \texttt{Not\_Hll}: non high-level instruction.
- \texttt{Data\_Code}: instruction modifies data.
- \texttt{Word\_Off}: instruction has a word offset (i.e. could be an address).
- \texttt{Terminates}: instruction terminates the program.
- \texttt{Target}: instruction is the target of a jump instruction.
- \texttt{Switch}: current indirect jump determines the start of an n-way statement.
- \texttt{Synthetic}: instruction is a synthetic (i.e. does not exist in the binary file).
- \texttt{Float\_Op}: the next instruction is a floating point instruction.
The mapping of machine instructions to Icode instructions converts 250 instructions into 108 Icode instructions. This mapping is shown in Figure 9-7.

### 9.3.3 The Control Flow Graph Generator

dcc implements the construction of the control flow graph for each subroutine by placing basic blocks on a list and then converting that list to a proper graph. While parsing, whenever an end of basic block instruction is met, the basic block is constructed, and the start and finish instruction indexes into the Icode array for that subroutine are stored. Instructions for which it is not possible to determine where they transfer control to (i.e. indexed jumps that are not recognized as a known n-way structure header, indirect calls, etc) are said to terminate the basic block since no more instructions are parsed along the path that contains that instruction. These nodes are called no-where nodes in dcc. The other types of basic blocks are the standard 1-way, 2-way, n-way, fall-through, call, and return nodes. The definition record of a basic block is shown in Figure 9-8. Most of this information is later filled in by the universal decompiling machine.

The control flow graph of each subroutine is optimized by flow-of-control optimizations which remove redundant jumps to jumps, and conditional jumps to jumps. These optimizations have the potential of removing basic blocks from the graph, therefore the numbering of the graph is left until all possible nodes are removed from the graph. At the same time, the predecessors to each basic block are determined and placed in the *inEdges[]* array.

### 9.3.4 The Semantic Analyzer

dcc's semantic analyzer determines idioms and replaces them with another Icode instruction(s). The idioms checked for in dcc are the ones described in Chapter 4, Section 4.2.1, and grouped into the following categories: subroutine idioms, calling conventions, long variable operations, and miscellaneous idioms.

There is a series of idioms available only in C. In C, a variable can be pre and post incremented, and pre and post decremented. The machine code that represents these instructions makes use of an extra register to hold the value of the pre or post incremented/decremented variable when it is being checked against some value/variable. This extra register can be eliminated by using an idiom to transform the set of instructions into one that uses the pre/post increment/decrement operand.

In the case of a post increment/decrement variable in a conditional jump, the value of the variable is copied to a register, the variable then gets incremented or decremented, and finally, the register that holds the copy of the initial variable (i.e. before increment or decrement) is compared against the other identifier. The use of the extra register can be eliminated by using the post increment/decrement operator available in C. Therefore, these idioms can be checked for only if code is to be generated in C. Figure 9-9 shows this case.
<table>
<thead>
<tr>
<th>Low-level Instruction</th>
<th>Machine Instruction(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>iAAA</td>
<td>37</td>
</tr>
<tr>
<td>iAAD</td>
<td>D5</td>
</tr>
<tr>
<td>iAAM</td>
<td>D4</td>
</tr>
<tr>
<td>iAAS</td>
<td>3F</td>
</tr>
<tr>
<td>iADC</td>
<td>10..15, (80..83)(50..57,90..97,D0..D7)</td>
</tr>
<tr>
<td>iADD</td>
<td>00..05, (80..83)(40..47,80..87,C0..C7)</td>
</tr>
<tr>
<td>iAND</td>
<td>20..25, (80..83)(60..67,A0..A7,E0..E7)</td>
</tr>
<tr>
<td>iBOUND</td>
<td>62</td>
</tr>
<tr>
<td>iCALL</td>
<td>E8, FE50..FE57, FE90..FE9F, FED0..FED7, FF50..FF57, FF90..FF9F, FFD0..FFD7</td>
</tr>
<tr>
<td>iCALLF</td>
<td>9A, FE58..FE5F, FE98..FE9F, FED8..FEDF, FF58..FF5F, FF98..FF9F, FFD8..FFDF</td>
</tr>
<tr>
<td>iCLC</td>
<td>F8</td>
</tr>
<tr>
<td>iCLD</td>
<td>FC</td>
</tr>
<tr>
<td>iCLI</td>
<td>FA</td>
</tr>
<tr>
<td>iCMC</td>
<td>F5</td>
</tr>
<tr>
<td>iCMP</td>
<td>38..3D, (80..83)(78..7F,B8..BF,F8..FF)</td>
</tr>
<tr>
<td>iCMPS</td>
<td>A6, A7</td>
</tr>
<tr>
<td>iREPNE_CMPS</td>
<td>F2A6, F2A7</td>
</tr>
<tr>
<td>iREPE_CMPS</td>
<td>F3A6, F3A7</td>
</tr>
<tr>
<td>iDAA</td>
<td>27</td>
</tr>
<tr>
<td>iDAS</td>
<td>2F</td>
</tr>
<tr>
<td>iDEC</td>
<td>48..4F, FE48..FE4F, FE88..FE8F, FEC8..FECF, FF48..FF4F, FF88..FF8F, FFC8..FFCF</td>
</tr>
<tr>
<td>iDIV</td>
<td>F670..F677, F6A0..F6A7, F6F0..F6F7, F770..F777, F7A0..F7A7, F7F0..F7F7</td>
</tr>
<tr>
<td>iMOD</td>
<td>F670..F677, F6A0..F6A7, F6F0..F6F7, F770..F777, F7A0..F7A7, F7F0..F7F7, F7F0..F7F7, F78..F78F, F7A8..F7AF, F7F8..F7FF</td>
</tr>
<tr>
<td>iENTER</td>
<td>C8</td>
</tr>
<tr>
<td>iESC</td>
<td>D8..DF</td>
</tr>
<tr>
<td>iHLT</td>
<td>F4</td>
</tr>
<tr>
<td>iDIV</td>
<td>F678..F67F, F6A8..F6AF, F6F8..F6FF, F778..F77F, F7A8..F7AF, F7F8..F7FF</td>
</tr>
<tr>
<td>iMUL</td>
<td>69, 6B, F668..F66F, F6A8..F6AF, F6E8..F6EF, F768..F76F, F7A8..F7AF, F7F8..F7FF</td>
</tr>
<tr>
<td>iIN</td>
<td>E4, E5, EC, ED</td>
</tr>
<tr>
<td>iINC</td>
<td>40..47, FE40..FE47, FE80..FE87, FEC0..FEC7, FF40..FF47, FF80..FF87, FFC0..FFC7</td>
</tr>
<tr>
<td>iINS</td>
<td>6C, 6D</td>
</tr>
<tr>
<td>iREP_INS</td>
<td>F36C, F36D</td>
</tr>
<tr>
<td>iINT</td>
<td>CC, CD</td>
</tr>
</tbody>
</table>

Figure 9-7: Low-level Intermediate Code for the i80286
<table>
<thead>
<tr>
<th>Low-level Instruction</th>
<th>Machine Instruction(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>iINTO</td>
<td>CE</td>
</tr>
<tr>
<td>iIRET</td>
<td>CF</td>
</tr>
<tr>
<td>iJB</td>
<td>72</td>
</tr>
<tr>
<td>iJBE</td>
<td>76</td>
</tr>
<tr>
<td>iJAE</td>
<td>73</td>
</tr>
<tr>
<td>iJA</td>
<td>77</td>
</tr>
<tr>
<td>iJE</td>
<td>74</td>
</tr>
<tr>
<td>iJNE</td>
<td>75</td>
</tr>
<tr>
<td>iJL</td>
<td>7C</td>
</tr>
<tr>
<td>iJGE</td>
<td>7D</td>
</tr>
<tr>
<td>iJLE</td>
<td>7E</td>
</tr>
<tr>
<td>iJG</td>
<td>7F</td>
</tr>
<tr>
<td>iJS</td>
<td>78</td>
</tr>
<tr>
<td>iJNS</td>
<td>79</td>
</tr>
<tr>
<td>iJO</td>
<td>70</td>
</tr>
<tr>
<td>iJNO</td>
<td>71</td>
</tr>
<tr>
<td>iJP</td>
<td>7A</td>
</tr>
<tr>
<td>iJNP</td>
<td>7B</td>
</tr>
<tr>
<td>iJcxZ</td>
<td>E3</td>
</tr>
<tr>
<td>iJNCXZ</td>
<td>E0..E2</td>
</tr>
<tr>
<td>iJMP</td>
<td>E9, EB, FE60..FE67, FEA0..FEA7, FEE0..FEE7, FF60..FF67, FFA0..FFA7, FFE0..FFE7</td>
</tr>
<tr>
<td>iJMPF</td>
<td>EA, FE68..FE6F, FEA8..FEAF, FEE8..FEEF, FF68..FF6F, FFA8..FFAF, FFE8..FFE8</td>
</tr>
<tr>
<td>iLAHF</td>
<td>9F</td>
</tr>
<tr>
<td>iLDS</td>
<td>C5</td>
</tr>
<tr>
<td>iLEA</td>
<td>8D</td>
</tr>
<tr>
<td>iLEAVE</td>
<td>C9</td>
</tr>
<tr>
<td>iLES</td>
<td>C4</td>
</tr>
<tr>
<td>iLOCK</td>
<td>F0</td>
</tr>
<tr>
<td>iLODS</td>
<td>AC, AD</td>
</tr>
<tr>
<td>iREP_LODS</td>
<td>F3AC, F3AD</td>
</tr>
<tr>
<td>iMOV</td>
<td>88..8C, 8E, A0..A3, B0..BF, C6, C7</td>
</tr>
<tr>
<td>iMOVVS</td>
<td>A4, A5</td>
</tr>
<tr>
<td>iREP_MOVVS</td>
<td>F3A4, F3A5</td>
</tr>
<tr>
<td>iMUL</td>
<td>F660..F667, F6A0..F6A7, F6E0..F6E7, F760..F767, F7A0..F7A7, F7E0..F7E7</td>
</tr>
<tr>
<td>iNEG</td>
<td>F658..F65F, F698..F69F, F6D8..F6DF, F758..F75F, F798..F79F, F7D8..F7DF</td>
</tr>
<tr>
<td>iNOT</td>
<td>F650..F657, F690..F697, F6D0..F6D7, F750..F757, F790..F797, F7D0..F7D7</td>
</tr>
<tr>
<td>iNOP</td>
<td>90</td>
</tr>
<tr>
<td>iOR</td>
<td>08..0D, (80..83)(48..4F,88..8F,C8..CF)</td>
</tr>
</tbody>
</table>

Figure 9-7: Low-level Intermediate Code for the i80286 - Continued
<table>
<thead>
<tr>
<th>Low-level Instruction</th>
<th>Machine Instruction(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>iOUT</td>
<td>E6, E7, EE, EF</td>
</tr>
<tr>
<td>iOUTS</td>
<td>6E, 6F</td>
</tr>
<tr>
<td>iREP_OUTS</td>
<td>F36E, F36F</td>
</tr>
<tr>
<td>iPOP</td>
<td>07, 17, 1F, 58..5F, 8F</td>
</tr>
<tr>
<td>iPOPA</td>
<td>61</td>
</tr>
<tr>
<td>iPOPF</td>
<td>9D</td>
</tr>
<tr>
<td>iPUSH</td>
<td>06, 0E, 16, 1E, 50..57, 68, 6A, FE70..FE77, FEB0..FEB7, FEF0..FEF7, FF70..FF77, FFB0..FFB7, FFF0..FFF7</td>
</tr>
<tr>
<td>iPUSHA</td>
<td>60</td>
</tr>
<tr>
<td>iPUSHF</td>
<td>9C</td>
</tr>
<tr>
<td>iRCL</td>
<td>(C0,C1,D0..D3)(50..57,90..97,D0..D7)</td>
</tr>
<tr>
<td>iRCR</td>
<td>(C0,C1,D0..D3)(58..5F,98..9F,D8..DF)</td>
</tr>
<tr>
<td>iREPE</td>
<td>F3</td>
</tr>
<tr>
<td>iREPNE</td>
<td>F2</td>
</tr>
<tr>
<td>iRET</td>
<td>C2, C3</td>
</tr>
<tr>
<td>iRETF</td>
<td>CA, CB</td>
</tr>
<tr>
<td>iROL</td>
<td>(C0,C1,D0..D3)(40..47,80..87,C0..C7)</td>
</tr>
<tr>
<td>iROR</td>
<td>(C0,C1,D0..D3)(48..4F,88..8F,C8..CF)</td>
</tr>
<tr>
<td>iSAHF</td>
<td>9E</td>
</tr>
<tr>
<td>iSAR</td>
<td>(C0,C1,D0..D3)(78..7F,B8..BF,F8..FF)</td>
</tr>
<tr>
<td>iSHL</td>
<td>(C0,C1,D0..D3)(60..67,A0..A7,E0..E7)</td>
</tr>
<tr>
<td>iSHR</td>
<td>(C0,C1,D0..D3)(68..6F,A8..AF,E8..EF)</td>
</tr>
<tr>
<td>iSBB</td>
<td>18..1D, (80..83)(58..5F,98..9F,D8..DF)</td>
</tr>
<tr>
<td>iSCAS</td>
<td>AE, AF</td>
</tr>
<tr>
<td>iREPE_SCAS</td>
<td>F3AE, F3AF</td>
</tr>
<tr>
<td>iREPNE_SCAS</td>
<td>F2AE, F2AF</td>
</tr>
<tr>
<td>iSIGNEX</td>
<td>98, 99</td>
</tr>
<tr>
<td>iSTC</td>
<td>F9</td>
</tr>
<tr>
<td>iSTD</td>
<td>FD</td>
</tr>
<tr>
<td>iSTI</td>
<td>FB</td>
</tr>
<tr>
<td>iSTOS</td>
<td>AA, AB</td>
</tr>
<tr>
<td>iREP_STOS</td>
<td>F3AA, F3AB</td>
</tr>
<tr>
<td>iSUB</td>
<td>28..2D, (80..83)(68..6F,A8..AF,E8..EF)</td>
</tr>
<tr>
<td>iTEST</td>
<td>84, 85, A8, A9, F640..F647, F680..F687, F6C0..F6C7, F740..F747, F780..F787, F7C0..F7C7</td>
</tr>
<tr>
<td>iWAIT</td>
<td>9B</td>
</tr>
<tr>
<td>iXCHG</td>
<td>86, 87, 91..97</td>
</tr>
<tr>
<td>iXLAT</td>
<td>D7</td>
</tr>
<tr>
<td>iXOR</td>
<td>30..35, (80..83)(70..77,B0..B7,F0..F7)</td>
</tr>
</tbody>
</table>

Figure 9-7: Low-level Intermediate Code for the 80286 - Continued
typedef struct _BB {
    byte nodeType;  /* Type of node */
    Int start;     /* First instruction offset */
    Int finish;    /* Last instruction in this BB */
    flags32 flg;   /* BB flags */
    Int numHllcodes;  /* # of high-level Icodes */
    Int numInEdges;  /* Number of in edges */
    struct _BB **inEdges;  /* Array of pointers to in-edges */
    Int numOutEdges;  /* Number of out edges */
    Int numInEdges;  /* Number of in edges */
    struct _BB **inEdges;  /* Array of pointers to in-edges */
union typeAdr {
    dword ip;    /* Out edge Icode address */
    struct _BB *BBptr;  /* Out edge pointer to successor BB */
    interval *intPtr;  /* Out edge pointer to next interval */
} *outEdges;  /* Array of pointers to out-edges */
} BB;

/* For interval and derived sequence construction */
Int beenOnH;  /* #times been on header list H */
Int inEdgeCount;  /* # inEdges (to find intervals) */
struct _BB *reachingInt;  /* Reaching interval header */
interval *inInterval;  /* Node’s interval */
interval *correspInt;  /* Corresponding interval in Gi-1 */

/* For live register analysis */
dword liveUse;  /* LiveUse(b) */
dword def;      /* Def(b) */
dword liveIn;   /* LiveIn(b) */
dword liveOut;  /* LiveOut(b) */

/* For structuring analysis */
Int preorder;  /* DFS #: first visit of the node */
Int revPostorder;  /* DFS #: last visit of the node */
Int immedDom;  /* Immediate dominator (revPostorder) */
Int ifFollow;  /* follow node (if node is 2-way) */
Int loopType;  /* Type of loop (if any) */
Int latchNode;  /* latching node of the loop */
Int numBackEdges;  /* # of back edges */
Int loopFollow;  /* node that follows the loop */
Int caseFollow;  /* follow node for n-way node */

/* Other fields */
Int traversed;  /* Boolean: traversed yet? */
struct _BB *next;  /* Next (initial list link) */
} BB;

Figure 9-8: Basic Block Record
In a similar way, a pre increment/decrement makes use of an intermediate register. The variable is first incremented/decremented, then it is moved onto a register, which is compared against another identifier, and then the conditional jump occurs. In this case, the intermediate register is used because identifiers other than a register cannot be used in the compare instruction. This intermediate register can be eliminated by means of a pre increment/decrement operator, as shown in Figure 9-10.

C-dependent idioms are implemented in dcc. As seen in the general format of these idioms, a series of low-level Icode instructions is replaced by one high-level \texttt{jcond} instruction. This instruction is flagged as being a high-level instruction so that it is not processed again by the data flow analyzer. Also, all other instructions involved in these idioms are flagged as not representing high-level instructions.

After idiom recognition, simple type propagation is done on signed integers, signed bytes, and long variables. When propagating long variables across conditionals, the propagation modifies the control flow graph by removing a node from the graph, as described in Chapter 4, Section 4.2.2. Since the high-level condition is determined from the type of graph, the corresponding high-level \texttt{jcond} instruction is written and that instruction is flagged as being a high-level instruction.
9.4 The Disassembler

dcc implements a built-in disassembler that generates assembler files. The assembly file contains only information on the assembler mnemonics of the program (i.e. the code segment) and does not display any information relating to data. All the information used by the disassembler is collected by the parser and intermediate code phases of the decompiler; since there is almost a 1:1 mapping of low-level Icodes to assembler mnemonics, the assembler code generator is mostly concerned with output formatting.

The disassembler handles one subroutine at a time; given a call graph, the graph is traversed in a depth-first search to generate assembler for nested subroutines first. The user has two options for generating assembler files: to generate assembler straight after the parsing phase, and to generate assembler after graph optimization. The former case generates assembler that is as close as possible to the binary image; the latter case may miss certain jump instructions that were considered redundant by the graph optimizer. The disassembler is also used by the decompiler when generating target C code; if a subroutine is flagged as being a non high-level subroutine, assembler code is generated for that subroutine after generating the subroutine’s header and comments in C.

9.5 The Universal Decompiling Machine

The universal decompiling machine (udm) is composed of two phases; the data flow analysis phase which transforms the low-level Icode to an optimal high-level Icode representation, and the control flow analysis phase which traverses the graph of each subroutine to determine the bounds of loops and conditionals; these bounds are later used by the code generator. Figure 9-11 shows the code for the udm() procedure.

9.5.1 Data Flow Analysis

The first part of the data flow analysis is the removal of condition codes. Condition codes are classified into two sets as follows: the set of condition codes that are likely to have been generated by a compiler (the HLCC set), and the set of conditions that are likely to have been hand-crafted in assembler (the NHLCC set). From the 9 condition codes available in the Intel i80286[Int86] (overflow, direction, interrupt enable, trap, sign, zero, auxiliary carry, parity and carry), only 4 flags are likely to be high-level; these are, carry, direction, zero and sign. These flags are modified by instructions that are likely to be high-level (i.e the ones that were not flagged as being non high-level), and thus this set is the one that is analyzed for condition code removal. From the probable high-level instructions, 30 instructions define flags in the HLCC set; ranging from 1 to 3 flags defined by an instruction, and 25 instruction use flags; normally using one or two flags per instruction. dcc implements dead-condition code elimination and condition code propagation, as described in Chapter 5, Sections 5.4.2 and 5.4.3. These optimizations remove all references to condition codes and creates jcond instructions that have an associated Boolean conditional expression. This analysis is overlapped with the initial mapping of all other low-level Icodes to high-level Icodes in terms of registers. The initial mapping of Icodes is explained in Appendix D.
void udm (CALL_GRAPH *callGraph)
{
    derSeq *derivedG;

    /* Data flow analysis - optimizations on Icode */
    dataFlow (callGraph);

    /* Control flow analysis -- structure the graphs */
    /* Build derived sequences for each subroutine */
    buildDerivedSeq (callGraph, &derivedG);

    if (option.VeryVerbose) /* display derived sequence for each subroutine */
        displayDerivedSeq (derivedG);

    /* Graph structuring */
    structure (callGraph, derivedG);
}

Figure 9-11: Procedure for the Universal Decompiling Machine

The second part of the analysis is the generation of summary information on the operands of the Icode instructions and basic blocks in the graph. For each subroutine a definition-use and use-definition analysis is done; the associated chains are constructed for each instruction. While constructing these chains, dead-register elimination is performed, as described in Chapter 5, Section 5.4.1. Next, an intraprocedural live register analysis is performed for each subroutine to determine any register arguments used by the subroutine. This analysis is described in Chapter 5, Section 5.4.4. Finally, an interprocedural live register analysis is done next to determine registers that are returned by functions; the analysis is described in Chapter 5, Section 5.4.5.

Dead-register elimination determines the purpose of the DIV machine instructions, as this instruction is used for both quotient and remainder of operands. The following intermediate code

1
2
3
4

asgn tmp, dx:ax ; ud(tmp) = \{2,3\}
asgn ax, tmp / bx ; ud(ax) = {}    
asgn dx, tmp % bx ; ud(dx) = \{4\}
asgn [bp-2], dx /* no further use of ax before redefinition */

determines that register ax is not used before redefinition as its use-definition chain in instruction 2 is empty. Since this definition is dead, the instruction is eliminated, hence eliminating the division of the operands, and leading to the following code:

1
3
4

asgn tmp, dx:ax ; ud(tmp) = \{2,3\}
asgn dx, tmp % bx ; ud(dx) = \{4\}
asgn [bp-2], dx
All instructions that had instruction 2 in their use-definition chain need to be updated to reflect the fact that the register is not used any more since it was used to define a dead register; hence, the ud() chain in instruction 1 is updated in this example, leading to the final code:

1. `assign tmp, dx:ax ; ud(tmp) = {3}`
2. `assign dx, tmp % bx ; ud(dx) = {4}`
3. `assign [bp-2], dx

The third and last part of the analysis is the usage of the use-definition chains on registers to perform extended register copy propagation, as described in Chapter 5, Section 5.4.10. This analysis removes redundant register references, determines high-level expressions, places actual parameters on the subroutine’s list, and propagates argument types across subroutine calls. A temporary expression stack is used throughout the analysis to eliminate the intermediate pseudo high-level instructions `push` and `pop`.

In the previous example, forward substitution determines that the initial `DIV` instruction was used to determine the modulus between two operands (which are placed in registers `dx:ax` and `bx` in this case):

4. `assign [bp-2], dx:ax % bx`

### 9.5.2 Control Flow Analysis

There are two parts to the control flow analyzer: the first part constructs a derived sequence of graphs for each subroutine in the call graph and calculates intervals. This sequence is used by the structuring algorithm to determine the bounds of loops and the nesting level of such loops. Once the derived sequence of graphs is built for the one subroutine, the graph is tested for reducibility; if the limit n-th order graph is not a trivial graph, the subroutine is irreducible.

The second part of the analysis is the structuring of the control flow graphs of the program. The structuring algorithm determines the bounds of loops and conditionals (2-way and n-way structures); these bounds are later used during code generation. Loops are structured by means of intervals, and their nesting level is determined by the order in which they are found in the derived sequence of graphs, as described in Chapter 6, Section 6.6.1. Pre-tested, post-tested and endless loops are determined by this algorithm. Conditionals are structured by means of a reverse traversal of the depth-first search tree of the graph; in this way nested conditionals are found first. The method for structuring 2-way and n-way conditionals is described in Chapter 6, Sections 6.6.2 and 6.6.3. This method takes into account compound Boolean conditions, and removes some nodes from the graph by storing the Boolean conditional information of two or more nodes in the one node.

### 9.6 The Back-end

The back-end is composed in its entirety of the C code generator. This module opens the output file and gives it an extension `.b` (b for beta), writes the program header to it, and
then invokes the code generator. Once code has been generated for the complete graph, the file is closed. Figure 9-12 shows code for the back-end procedure.

```c
void BackEnd (char *fileName, CALL_GRAPH *callGraph)
{ FILE *fp; /* Output C file */

    /* Open output file with extension .b */
    openFile (fp, fileName, ".b", "wt");
    printf ("dcc: Writing C beta file %s.b\n", fileName);

    /* Header information */
    writeHeader (fp, fileName);

    /* Process each procedure at a time */
    writeCallGraph (fileName, callGraph, fp);

    /* Close output file */
    fclose (fp);
    printf ("dcc: Finished writing C beta file\n");
}
```

Figure 9-12: Back-end Procedure

### 9.6.1 Code Generation

dcc implements the C code generator described in Chapter 7, Section 7.1. The program’s call graph is traversed in a depth-first fashion to generate C code for the leaf subroutines first (i.e. in reverse invocation order if the graph is reducible). For each subroutine, code for the control flow graph is generated according to the structures in the graph; the bounds of loops and conditional structures have been marked in the graph by the structuring phase. Code is generated in a recursive way, so if a node is reached twice along the recursion, a goto jump is used to transfer control to the code associated with such a node.

Since registers that are found in leaves of an expression are given a name during code generation (i.e. after all local variables have been defined in the local variable definition section), and instructions for which code has been generated may have a label associated with them if a goto jump is generated later on, code cannot be generated directly to a file but needs to be stored in an intermediate data structure until the code for a complete subroutine has been generated; then it can be copied to the target output file, and the structure is reused for the next subroutine in the call graph. The data structure used by dcc to handle subroutine declarations and code is called a bundle. A bundle is composed of two arrays of lines, one for subroutine declarations, and the other for the subroutine code. Subroutine declarations include not only the subroutine header, but also the comments and the local variable definitions. The array of lines can grow dynamically if the initial allocated array size is too small. The definition of the bundle data structure is shown in Figure 9-13.
typedef struct {
    Int numLines; /* Number of lines in the table */
    Int allocLines; /* Number of lines allocated in the table */
    char **str; /* Table of strings */
} strTable;

typedef struct {
    strTable decl; /* Declarations */
    strTable code; /* C code */
} bundle;

Figure 9-13: Bundle Data Structure Definition

The comments and error messages displayed by dcc are listed in Appendix E.

9.7 Results

This section presents a series of programs decompiled by dcc. The original programs were written in C, and compiled with Borland Turbo C under DOS. These programs make use of base type variables (i.e. byte, integer and long), and illustrate different aspects of the decompilation process. These programs were run in batch mode, generating the disassembly file .a2, the C file .b, the call graph of the program, and statistics on the intermediate code instructions. The statistics reflect the percentage of intermediate instruction reduction on all subroutines for which C is generated; subroutines which translate to assembler are not considered in the statistics. For each program, a total count on low-level and high-level instructions, and a total percentage reduction is given.

The first three programs illustrate operations on the different three base types. The original C programs have the same code, but their variables have been defined as a different type. The next four programs are benchmark programs from the Plum-Hall benchmark suite. These programs were written by Eric S. Raymond and are freely available on the network [Ray89]. These programs were modified to ask for the arguments to the program with scanf() rather than scanning for them in the argv[] command line array since arrays are not supported by dcc. Finally, the last three programs calculate Fibonacci numbers, compute the cyclic redundancy check (CRC) for a character, and multiply two matrixes. This last program is introduced to show how array expressions are derived from the low-level intermediate code.

9.7.1 Intops.exe

Intops is a program that computes different operations on two integer variables, and displays the final result of these variables. The disassembly C program is shown in Figure 9-14, the decompiled C program in Figure 9-15, and the initial C program in Figure 9-16. The program has the following call graph:
As can be seen in the disassembly of the program, the second variable was placed in register \texttt{si}, and the first variable was placed on the stack at offset \texttt{-2}. Synthetic instructions were generated by the parser for the \texttt{IDIV} machine instruction; this instruction was used as a division in one case, and as a modulus in the other. The intermediate code makes use of the temporary register \texttt{tmp}, as previously explained in Section 9.3.2; this register is eliminated during data flow analysis. For each operation, the operands of the instruction are moved to registers, the operation is performed on registers, and the result is placed back on the variables. There are no control structures in the program. The idioms and data flow analyses reduce the number of intermediate instructions by 77.78\%, as shown in Figure 9-17.
main PROC NEAR
000 0002FA 55  PUSH bp
001 0002FB 88EC  MOV bp, sp
002 0002FD 83EC02 MOV sp, 2
003 000300 56  PUSH si
004 000301 C746FEFF00 MOV word ptr [bp-2], OFFh
005 000306 BE8F00 MOV si, 8Fh
006 000309 8B46FE MOV ax, [bp-2]
007 00030C 03C6  ADD ax, si
008 00030E 8BF0  MOV si, ax
009 000310 8B46FE MOV ax, [bp-2]
010 000313 2BC6 SUB ax, si
011 000315 8946FE MOV [bp-2], ax
012 000318 8B46FE MOV ax, [bp-2]
013 00031B F7E6 MUL si
014 00031D 8946FE MOV [bp-2], ax
015 000320 88C6 MOV ax, si
016 000322 99  CWD
017    MOV tmp, dx:ax  ;Synthetic inst
018 000323 F77EFE IDIV word ptr [bp-2]  ;Synthetic inst
019    MOD word ptr [bp-2]  ;Synthetic inst
020 000326 8BF0  MOV si, ax
021 000328 88C6 MOV ax, si
022 00032A 99  CWD
023    MOV tmp, dx:ax  ;Synthetic inst
024 00032B F77EFE IDIV word ptr [bp-2]  ;Synthetic inst
025    MOD word ptr [bp-2]  ;Synthetic inst
026 00032E 8BF2 MOV si, dx
027 000330 8B46FE MOV ax, [bp-2]
028 000333 B105 MOV cl, 5
029 000335 D3E0  SHL ax, cl
030 000337 8946FE MOV [bp-2], ax
031 00033A 88C6 MOV ax, si
032 00033C 8A4EFE MOV cl, [bp-2]
033 00033F D3F8  SAR ax, cl
034 000341 8BF0 MOV si, ax
035 000343 56  PUSH si
036 000344 FF76FE PUSH word ptr [bp-2]
037 000347 B89401 MOV ax, 194h
038 00034A 50  PUSH ax
039 00034B E8AC06 CALL near ptr printf
040 00034E 83C406 ADD sp, 6
041 000351 5E  POP si
042 000352 8BE5 MOV sp, bp
043 000354 5D  POP bp
044 000355 C3  RET
main ENDP

Figure 9-14: Intops.a2
/ * 
  * Input file : intops.exe 
  * File type : EXE 
  */

#include "dcc.h"

void main ()
/* Takes no parameters. 
 * High-level language prologue code. 
 */
{
  int loc1;
  int loc2;

  loc1 = 255;
  loc2 = 143;
  loc2 = (loc1 + loc2);
  loc1 = (loc1 - loc2);
  loc1 = (loc1 * loc2);
  loc2 = (loc2 / loc1);
  loc2 = (loc2 % loc1);
  loc1 = (loc1 << 5);
  loc2 = (loc2 >> loc1);
  printf ("a = %d, b = %d\n", loc1, loc2);
}
```c
#define TYPE int

main()
{ TYPE a, b;
    a = 255;
    b = 143;
    b = a + b;
    a = a - b;
    a = a * b;
    b = b / a;
    b = b % a;
    a = a << 5;
    b = b >> a;
    printf("a = %d, b = %d\n", a, b);
}
```

Figure 9-16: Intops.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>main</td>
<td>45</td>
<td>10</td>
<td>77.78</td>
</tr>
<tr>
<td>total</td>
<td>45</td>
<td>10</td>
<td>77.78</td>
</tr>
</tbody>
</table>

Figure 9-17: Intops Statistics
9.7.2 Byteops.exe

Byteops is a similar program to intops, with the difference that the two variables are bytes rather than integers. The disassembly program is shown in Figure 9-18, the decompiled C version in Figure 9-19, and the initial C program in Figure 9-20. The program has the following call graph:

```
main
   printf
```

As can be seen in the disassembly of the program, the local variables are placed on the stack at offsets -1 and -2. There are 22.41% more instructions in this program when compared against the intops.a2 program since some machine instructions such as IDIV take word registers as operands rather than byte registers; hence, the byte registers are either padded or sign-extended to form a word register. The final number of high-level instructions is the same in both programs, hence the reduction in the number of intermediate instructions is greater in this program. It reached 82.76%, as shown in Figure 9-21.
main PROC NEAR
000 0002FA 55    PUSH    bp
001 0002FB 8BEC  MOV     bp, sp
002 0002FD 83EC02 SUB     sp, 2
003 000300 C646FEFF MOV byte ptr [bp-2], 0FFh
004 000304 C646FF8F MOV byte ptr [bp-1], 08Fh
005 000308 8A46FE MOV     al, [bp-2]
006 00030B 0246FF ADD     al, [bp-1]
007 00030E 8846FF MOV     [bp-1], al
008 000311 8A46FE MOV     al, [bp-2]
009 000314 2A46FF SUB     al, [bp-1]
010 000317 8846FE MOV [bp-2], al
011 00031A 8A46FE MOV     al, [bp-2]
012 00031D B400 MOV     ah, 0
013 00031F 8A56FF MOV     dl, [bp-1]
014 000322 B600 MOV     dh, 0
015 000324 F7E2 MUL     dx
016 000326 8846FE MOV [bp-2], al
017 000329 8A46FF MOV     al, [bp-1]
018 00032C B400 MOV     ah, 0
019 00032E 8A56FE MOV     dl, [bp-2]
020 000331 B600 MOV     dh, 0
021 000333 88DA MOV     bx, dx
022 000335 99    CWD
023 000336 F7FB MOV     tmp, dx:ax ;Synthetic inst
024 000338 8846FF MOV [bp-1], al
025 00033B 8A46FF MOV     al, [bp-1]
026 00033E B400 MOV     ah, 0
027 000340 8A56FE MOV     dl, [bp-2]
028 000343 B600 MOV     dh, 0
029 000345 88DA MOV     bx, dx
030 000347 99    CWD
031 000348 F7FB MOV     tmp, dx:ax ;Synthetic inst
032 00034A 8856FF MOV [bp-1], dl
033 00034D 8A46FE MOV     al, [bp-2]
034 000350 B105 MOV     cl, 5
035 000352 D2E0 SHR     al, cl
036 000354 8846FE MOV [bp-2], al
037 000357 8A46FF MOV     al, [bp-1]
038 00035A 8A46FE MOV     cl, [bp-2]
039 00035D 2E8 SHR     al, cl
040 00035F 8846FF MOV [bp-1], al

Figure 9-18: Byteops.a2
Figure 9-18: Byteops.a2 – Continued
void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    int loc1;
    int loc2;

    loc1 = 255;
    loc2 = 143;
    loc2 = (loc1 + loc2);
    loc1 = (loc1 - loc2);
    loc1 = (loc1 * loc2);
    loc2 = (loc2 / loc1);
    loc2 = (loc2 % loc1);
    loc1 = (loc1 << 5);
    loc2 = (loc2 >> loc1);
    printf("a = %d, b = %d\n", loc1, loc2);
}

Figure 9-19: Byteops.b
```c
#define TYPE unsigned char

main()
{
    TYPE a, b;
    a = 255;
    b = 143;
    b = a + b;
    a = a - b;
    a = a * b;
    b = b / a;
    b = b % a;
    a = a << 5;
    b = b >> a;
    printf("a = %d, b = %d\n", a, b);
}
```

Figure 9-20: Byteops.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>main</td>
<td>58</td>
<td>10</td>
<td>82.76</td>
</tr>
<tr>
<td>total</td>
<td>58</td>
<td>10</td>
<td>82.76</td>
</tr>
</tbody>
</table>

Figure 9-21: Byteops Statistics
9.7.3 Longops.exe

The longops program is similar to the intops and byteops programs, but makes use of two long variables. The disassembly program is shown in Figure 9-22, the decompiled C program in Figure 9-23, and the initial C program in Figure 9-24. The program has the following call graph:

```
main
  LXMUL@
  LDIV@
  LMOD@
  LXLSH@
  LXRSH@
  printf
```

Operations performed on long variables make use of idioms and run-time support routines of the compiler. In this program, long addition and subtraction are performed by the idioms of Chapter 4, Section 4.2.1, and the run-time routines `LXMUL@`, `LDIV@`, `LMOD@`, `LXLSH@`, and `LXRSH@` are used for long multiplication, division, modulus, left-shift, and right-shift accordingly. From these run-time routines, long multiplication, left-shift, and right-shift are translatable into C; macros are used to access the low or high part of a variable in some cases. The division and modulus routines are untranslatable into C, so assembler is generated for them. The long variables are placed on the stack at offsets -4 and -8 (see `main` subroutine). The `main` program has 28.57% more instructions than the intops program, and 7.9% more instructions than the byteops program. The increase in the number of instructions has two causes: first, the transfer of long variables to registers now takes two instructions rather than one (i.e. the high and low part are transferred to different registers), and second, the subroutine call instructions to run-time support routines. The final decompiled `main` program still generates the same number of high-level instructions as in the previous two programs, with a reduction in the number of intermediate instructions of 84.13%, as shown in Figure 9-25. Overall, the reduction in the number of instructions is 58.97%, which is low due to the run-time routines that were translated to C, which did not make use of a lot of register movement since the arguments were in registers and these routines were initially written in assembler.
9.7 Results

Figure 9-22: Longops.a2
010 0011E7 8B4E10    MOV     cx, [bp+10h]
011 0011EA 0BC9    OR      cx, cx
012 0011EC 7508    JNE     L3
013 0011EE 0BD2    OR      dx, dx
014 0011F0 7469    JE      L4
015 0011F2 0BD9    OR      bx, bx
016 0011F4 7465    JE      L4
017 0011F6 F7C70100 L3: TEST   di, 1
018 0011FA 751C    JNE     L5
019 0011FC 0BD2    OR      dx, dx
020 0011FE 790A    JNS     L6
021 001200 F7DA    NEG     dx
022 001202 F7D8    NEG     ax
023 001204 83DA00  SBB    dx, 0
024 001207 83CFC0  OR      di, 0Ch
025 00120A 0BC9    L6: OR    cx, cx
026 00120C 790A    JNS     L5
027 00120E F7D9    NEG     cx
028 001210 F7DB    NEG     bx
029 001212 83D900  SBB    cx, 0
030 001215 83F704  XOR    di, 4
031 001218 8BE9    L5: MOV   bp, cx
032 00121A B92000  MOV    cx, 20h
033 00121D 57    PUSH    di
034 00121E 33FF  XOR    di, di
035 001220 33F6  XOR    si, si
036 001222 D1E0    L7: SHL   ax, 1
037 001224 D1D2    RCL    dx, 1
038 001226 D1D6    RCL    si, 1
039 001228 D1D7    RCL    di, 1
040 00122A 3BFD    CMP    di, bp
041 00122C 720B    JB     L8
042 00122E 7704    JA     L9
043 001230 3BF3    CMP    si, bx
044 001232 7205    JB     L8
045 001234 2BF3    L9: SUB   si, bx
046 001236 1BFD    SBB    di, bp
047 001238 40    INC    ax
048 001239 E2E7    L8: LOOP  L7
049 00123B 5B    POP     bx
050 00123C F7C30200 TEST    bx, 2
051 001240 7406    JE     L10
052 001242 8BC6    MOV    ax, si
053 001244 8BD7    MOV    dx, di
054 001246 D1EB    SHR    bx, 1

Figure 9-22: Longops.a2 – Continued
Figure 9-22: Longops.a2 – Continued
026 00120C 790A          JNS    L15
027 00120E F7D9          NEG    cx
028 001210 F7DB          NEG    bx
029 001212 83D900         SBB    cx, 0
030 001215 83F704         XOR    di, 4
031 001218 8BE9          L15: MOV    bp, cx
032 00121A B92000         MOV    cx, 20h
033 00121D 57            PUSH   di
034 00121E 33FF          XOR    di, di
035 001220 33F6          XOR    si, si
036 001222 D1E0          L17: SHL    ax, 1
037 001224 D1D2          RCL    dx, 1
038 001226 D1D6          RCL    si, 1
039 001228 D1D7          RCL    di, 1
040 00122A 3BF3          CMP    di, bp
041 00122C 720B          JB     L18
042 00122E 7704          JA     L19
043 001230 3BF3          CMP    si, bx
044 001232 7205          JB     L18
045 001234 2BF3          L19: SUB    si, bx
046 001236 1BF3          SBB    di, bp
047 001238 40            INC    ax
048 001239 E2E7          L18: LOOP    L17
049 00123B 58            POP    bx
050 00123C F7C30200       TEST    bx, 2
051 001240 7406          JE     L20
052 001242 8BC6          MOV    ax, si
053 001244 8BD7          MOV    dx, di
054 001246 D1EB          SHR    bx, 1
055 001248 F7C30400       L20: TEST    bx, 4
056 00124C 7407          JE     L21
057 00124E F7DA          NEG    dx
058 001250 F7D8          NEG    ax
059 001252 83DA00         SBB    dx, 0
060 001255 5F            L21: POP    di
061 001256 5E            POP    si
062 001257 5D            POP    bp
063 001258 CA0800         RETF    8
064 00125B F7F3          DIV    bx
065 00125F 4F3F          MOD    bx
066 00125D F7C70200       TEST    di, 2
067 001261 7402          JE     L22
068 001263 8BC2          MOV    ax, dx
069 001265 33D2          L22: XOR    dx, dx
070 001267 EBEC          JMP    L21

LDIV@ ENDP

Figure 9-22: Longops.a2 – Continued
LXMUL@ PROC FAR
000 0009C3 56  PUSH  si
001  MOV  tmp, ax  ;Synthetic inst
002  MOV  ax, si  ;Synthetic inst
003  MOV  si, tmp  ;Synthetic inst
004  MOV  tmp, ax  ;Synthetic inst
005  MOV  ax, dx  ;Synthetic inst
006  MOV  dx, tmp  ;Synthetic inst
007 0009C6 85C0  TEST  ax, ax
008 0009C8 7402  JE   L23
009 0009CA F7E3  MUL  bx
010 L23:  MOV  tmp, ax  ;Synthetic inst
011  MOV  ax, cx  ;Synthetic inst
012  MOV  cx, tmp  ;Synthetic inst
013 0009CD 85C0  TEST  ax, ax
014 0009CF 7404  JE   L24
015 0009D1 F7E6  MUL  si
016 0009D3 03C8  ADD  cx, ax
017 L24:  MOV  tmp, ax  ;Synthetic inst
018  MOV  ax, si  ;Synthetic inst
019  MOV  si, tmp  ;Synthetic inst
020 0009D6 F7E3  MUL  bx
021 0009D8 03D1  ADD  dx, cx
022 0009DA 5E  POP  si
023 0009DB CB  RETF
LXMUL@ ENDP

main PROC NEAR
000 0002FA 55  PUSH  bp
001 0002FB 8BEC  MOV  bp, sp
002 0002FD 83EC08  SUB  sp, 8
003 000300 C746FA0000  MOV  word ptr [bp-6], 0
004 000305 C746F8FF00  MOV  word ptr [bp-8], 0FFh
005 00030A C746FE0000  MOV  word ptr [bp-2], 0
006 00030F C746FC8F00  MOV  word ptr [bp-4], 8Fh
007 000314 8B56FA  MOV  dx, [bp-6]
008 000317 8B46F8  MOV  ax, [bp-8]
009 00031A 0346FC  ADD  ax, [bp-4]
010 00031D 1356FE  ADC  dx, [bp-2]
011 000320 8956FE  MOV  [bp-2], dx
012 000323 8946FC  MOV  [bp-4], ax
013 000326 8B56FA  MOV  dx, [bp-6]
014 000329 8B46F8  MOV  ax, [bp-8]
015 00032C 2B46FC  SUB  ax, [bp-4]
016 00032F 1B56FE  SBB  dx, [bp-2]
017 000332 8956FA  MOV  [bp-6], dx

Figure 9-22: Longops.a2 – Continued
main ENDP

Figure 9-22: Longops.a2 – Continued
/*
 * Input file : longops.exe
 * File type : EXE
 */

#include "dcc.h"

long LXMUL0 (long arg0, long arg1)
/* Uses register arguments:
 * arg0 = dx:ax.
 * arg1 = cx:bx.
 * Runtime support routine of the compiler.
 */
{
  int loc1;
  int loc2; /* tmp */

  loc2 = LO(arg0);
  L0(arg0) = loc1;
  loc1 = loc2;
  loc2 = L0(arg0);
  L0(arg0) = HI(arg0);
  if ((L0(arg0) & L0(arg0)) != 0) {
    L0(arg0) = (L0(arg0) * L0(arg1));
  }
  loc2 = L0(arg0);
  L0(arg0) = HI(arg1);
  HI(arg1) = loc2;
  if ((L0(arg0) & L0(arg0)) != 0) {
    L0(arg0) = (L0(arg0) * loc1);
    HI(arg1) = (HI(arg1) + L0(arg0));
  }
  loc2 = L0(arg0);
  L0(arg0) = loc1;
  loc1 = loc2;
  arg0 = (L0(arg0) * L0(arg1));
  HI(arg0) = (HI(arg0) + HI(arg1));
  return (arg0);
}
long LDIV0 (long arg0, long arg2)
/* Takes 8 bytes of parameters.
 * Runtime support routine of the compiler.
 * Untranslatable routine. Assembler provided.
 * Return value in registers dx:ax.
 * Pascal calling convention.
 */
{
    /* disassembly code here */
}

long LMOD0 (long arg0, long arg2)
/* Takes 8 bytes of parameters.
 * Runtime support routine of the compiler.
 * Untranslatable routine. Assembler provided.
 * Return value in registers dx:ax.
 * Pascal calling convention.
 */
{
    /* disassembly code here */
}

long LXLSH0 (long arg0, char arg1)
/* Uses register arguments:
 * arg0 = dx:ax.
 * arg1 = cl.
 * Runtime support routine of the compiler.
 */
{
    int loc1; /* bx */

    if (arg1 < 16) {
        loc1 = LO(arg0);
        LO(arg0) = (LO(arg0) << arg1);
        HI(arg0) = (HI(arg0) << arg1);
        HI(arg0) = (HI(arg0) | (loc1 >> (!arg1 + 16)));
        return (arg0);
    }
    else {
        HI(arg0) = LO(arg0);
        LO(arg0) = 0;
        HI(arg0) = (HI(arg0) << (arg1 - 16));
        return (arg0);
    }
}

Figure 9-23: Longops.b – Continued
long LXRSH@ (long arg0, char arg1)
/* Uses register arguments:
*   arg0 = dx:ax.
*   arg1 = cl.
* Runtime support routine of the compiler.
*/
{
    int loc1; /* bx */

    if (arg1 < 16) {
        loc1 = HI(arg0);
        LO(arg0) = (LO(arg0) >> arg1);
        HI(arg0) = (HI(arg0) >> arg1);
        LO(arg0) = (LO(arg0) | (loc1 << (!arg1 + 16)));
        return (arg0);
    }
    else {
        arg0 = HI(arg0);
        LO(arg0) = (LO(arg0) >> (arg1 - 16));
        return (arg0);
    }
}

void main ()
/* Takes no parameters.
* High-level language prologue code.
*/
{
    long loc1;
    long loc2;

    loc2 = 255;
    loc1 = 143;
    loc1 = (loc2 + loc1);
    loc2 = (loc2 - loc1);
    loc2 = LXMUL@ (loc2, loc1);
    loc1 = LDIV@ (loc1, loc2);
    loc1 = LMOD@ (loc1, loc2);
    loc2 = LXLSH@ (loc2, 5);
    loc1 = LXRSH@ (loc1, loc1);
    printf ("a = %ld, b = %ld\n", loc2, loc1);
}

Figure 9-23: Longops.b – Continued
```c
#define TYPE long

main()
{    TYPE a, b;
    a = 255;
    b = 143;
    b = a + b;
    a = a - b;
    a = a * b;
    b = b / a;
    b = b % a;
    a = a << 5;
    b = b >> a;
    printf ("a = %ld, b = %ld\n", a, b);
}
```

Figure 9-24: Longops.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>LXMUL@</td>
<td>24</td>
<td>19</td>
<td>20.83</td>
</tr>
<tr>
<td>LDIV@</td>
<td>72</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>LMOD@</td>
<td>72</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>LXLSH@</td>
<td>15</td>
<td>10</td>
<td>33.33</td>
</tr>
<tr>
<td>LXRSH@</td>
<td>15</td>
<td>9</td>
<td>40.00</td>
</tr>
<tr>
<td>main</td>
<td>63</td>
<td>10</td>
<td>84.13</td>
</tr>
<tr>
<td>total</td>
<td>117</td>
<td>48</td>
<td>58.97</td>
</tr>
</tbody>
</table>

Figure 9-25: Longops Statistics
9.7 Results

9.7.4 Benchsho.exe

Benchsho is a program from the Plum-Hall benchmark suite, which benchmarks short integers. The program makes use of two long variables to iterate through the loop, and three (short) integer variables to execute 1000 operations. The disassembly program is shown in Figure 9-27, the decompiled C program in Figure 9-28, and the initial C program in Figure 9-29. The program has the following call graph:

```
main
  scanf
  printf
```

As seen in the disassembly of the program, the long variables are located in the stack at offsets -4 and -8, and the integer variables are located at offsets -14, -12, and -10. The final C code makes use of the integer variable `loc6` to hold the result of a Boolean expression (i.e. 0 or 1) and assign it to the corresponding variable. This Boolean variable is a register variable (register `ax`) and could have been eliminated from the code with further analysis of the control flow graph, in a similar way to the structuring of compound conditions.

![Figure 9-26: Control Flow Graph for Boolean Assignment](image)

For example, graph (a) in Figure 9-26 can be reduced to graph (b) if the following conditions are satisfied:

1. Node 1 is a 2-way node.
2. Nodes 2 and 3 have one in-edge from node 1 only, and lead to a common node 4.
3. Nodes 2 and 3 have one instruction only. This instruction assigns 0 and 1 respectively to a register.
4. Node 4 assigns the register of nodes 2 and 3 to a local variable. The register is not further used before redefinition in the program.

Since the register is used only once to store the intermediate result of a Boolean expression evaluation, it is eliminated from the final code by assigning the Boolean expression to the
target variable. This transformation not only removes the involved register, but also the two nodes that assigned a value to it (i.e. nodes 2 and 3 in the graph of Figure 9-26).

It is clear that the two Boolean assignments of Figure 9-28 can be transformed into the following code:

```c
loc1 = (loc2 == loc3);
/* other code */
loc1 = (loc2 > loc3);
```

which would make the final C program an exact decompilation of the original C program. Without this transformation, the generated C code is functionally equivalent to the initial C code, and structurally equivalent to the decompiled graph. Since the graph of a Boolean assignment is structured by nature, the non-implementation of this transformation does not generate unstructured code in any way, unlike the case of compound conditions, which are unstructured graphs by nature that are transformed into structured graphs.

Without the graph optimization, the final decompiled code generated by `dcc` produces a 75.25% reduction on the number of intermediate instructions, as shown in Figure 9-30. For each Boolean assignment of the initial C code, there are three extra instructions due to the use of a temporary local variable (`loc6` in this case).
main PROC NEAR
000 0002FA 55      PUSH  bp  
001 0002FB 8BEC     MOV   bp, sp  
002 0002FD 83ECOE   SUB   sp, 0Eh  
003 000300 8D46FC   LEA   ax, [bp-4]  
004 000303 50       PUSH  ax  
005 000304 B99401   MOV   ax, 194h  
006 000307 50       PUSH  ax  
007 000308 E8E914   CALL  near ptr scanf  
008 00030B 59       POP   cx  
009 00030C 59       POP   cx  
010 00030D FF76FE   PUSH  word ptr [bp-2]  
011 000310 FF76FC   PUSH  word ptr [bp-4]  
012 000313 B99801   MOV   ax, 198h  
013 000316 50       PUSH  ax  
014 000317 E8510C   CALL  near ptr printf  
015 00031A 83C406   ADD   sp, 6  
016 00031D 8D46F2   LEA   ax, [bp-0Eh]  
017 000320 50       PUSH  ax  
018 000321 B8B201   MOV   ax, 1B2h  
019 000324 50       PUSH  ax  
020 000325 E8C114   CALL  near ptr scanf  
021 000328 59       POP   cx  
022 000329 59       POP   cx  
023 00032A 8D46F4   LEA   ax, [bp-0Ch]  
024 00032D 50       PUSH  ax  
025 00032E B8B601   MOV   ax, 1B6h  
026 000331 50       PUSH  ax  
027 000332 E8BF14   CALL  near ptr scanf  
028 000335 59       POP   cx  
029 000336 59       POP   cx  
030 000337 C746FA0000 MOV word ptr [bp-6], 0  
031 00033C C746F80100 MOV word ptr [bp-8], 1  
032 0003BD 8B56FA   L1: MOV dx, [bp-6]  
033 0003C0 8B46F8   MOV ax, [bp-8]  
034 0003C3 3B56FE   CMP dx, [bp-2]  
035 0003C6 7D03     JGE   L2  
036 000344 C746F60100 L3: MOV word ptr [bp-0Ah], 1  
037 00034F C746F60100 L4: CMP word ptr [bp-0Ah], 28h  
038 0003AF 837EF628  JLE   L5  
039 0003B3 7E96     ADD word ptr [bp-8], 1  
040 0003B5 8346F801  ADC word ptr [bp-6], 0  
041 0003B9 8356FA00  JMP   L1 ;Synthetic inst  

Figure 9-27: Benchsho.a2
Figure 9-27: Benchsho.a2 – Continued
<table>
<thead>
<tr>
<th>Address</th>
<th>Opcode 1</th>
<th>Opcode 2</th>
<th>Mnemonic 1</th>
<th>Mnemonic 2</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>0003D5</td>
<td>FF76</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0003D8</td>
<td>B8 BA</td>
<td>01</td>
<td>MOV</td>
<td>ax, 1BAh</td>
<td></td>
</tr>
<tr>
<td>0003DB</td>
<td></td>
<td></td>
<td>PUSH</td>
<td>ax</td>
<td></td>
</tr>
<tr>
<td>0003DC</td>
<td>E8 C0</td>
<td>8B</td>
<td>CALL</td>
<td>near ptr printf</td>
<td></td>
</tr>
<tr>
<td>0003DF</td>
<td>59</td>
<td></td>
<td>POP</td>
<td>cx</td>
<td></td>
</tr>
<tr>
<td>0003E0</td>
<td>59</td>
<td></td>
<td>POP</td>
<td>cx</td>
<td></td>
</tr>
<tr>
<td>0003E1</td>
<td>8B E5</td>
<td></td>
<td>MOV</td>
<td>sp, bp</td>
<td></td>
</tr>
<tr>
<td>0003E3</td>
<td>5D</td>
<td></td>
<td>POP</td>
<td>bp</td>
<td></td>
</tr>
<tr>
<td>0003E4</td>
<td>C3</td>
<td></td>
<td>RET</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

```
main ENDP
```
/*
 * Input file: benchsho.exe
 * File type: EXE
 */
#include "dcc.h"

void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
  int loc1; int loc2; int loc3;
  long loc4; long loc5; int loc6; /* ax */

  scanf ("%ld", &loc5);
  printf ("executing %ld iterations\n", loc5);
  scanf ("%ld", &loc1);
  scanf ("%ld", &loc2);
  loc4 = 1;
  while ((loc4 <= loc5)) {
    loc3 = 1;
    while ((loc3 <= 40)) {
      loc1 = ((loc1 + loc2) + loc3);
      loc2 = (loc1 >> 1);
      loc1 = (loc2 % 10);
      if (loc2 == loc3) {
        loc6 = 1;
      } else {
        loc6 = 0;
      }
      loc1 = loc6;
      loc2 = (loc1 | loc3);
      loc1 = !loc2;
      loc2 = (loc1 + loc3);
      if (loc2 > loc3) {
        loc6 = 1;
      } else {
        loc6 = 0;
      }
      loc1 = loc6;
      loc3 = (loc3 + 1);
    }
    loc4 = (loc4 + 1);
  }
  printf ("a=%d\n", loc1);
}
/* benchsho - benchmark for short integers
 * Thomas Plum, Plum Hall Inc, 609-927-3770
 * If machine traps overflow, use an unsigned type
 * Let T be the execution time in milliseconds
 * Then average time per operator = T/major usec
 * (Because the inner loop has exactly 1000 operations)
 */
#define STOR_CL auto
#define TYPE short
#include <stdio.h>

main (int ac, char *av[])
{
    STOR_CL TYPE a, b, c;
    long d, major;

    scanf ("%ld", &major);
    printf("executing %ld iterations\n", major);
    scanf ("%ld", &a);
    scanf ("%ld", &b);
    for (d = 1; d <= major; ++d)
    {
        /* inner loop executes 1000 selected operations */
        for (c = 1; c <= 40; ++c)
        {
            a = a + b + c;
            b = a >> 1;
            a = b % 10;
            a = b == c;
            b = a | c;
            a = !b;
            b = a + c;
            a = b > c;
        }
    }
    printf("a=%d\n", a);
}

Figure 9-29: Benchsho.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>main</td>
<td>101</td>
<td>25</td>
<td>75.25</td>
</tr>
<tr>
<td>total</td>
<td>101</td>
<td>25</td>
<td>75.25</td>
</tr>
</tbody>
</table>

Figure 9-30: Benchsho Statistics
9.7.5 Benchlng.exe

Benchlng is a program from the Plum-Hall benchmark suite, which benchmarks long variables. The program is exactly the same as the benchsho.exe program, but makes use of long variables rather than short integers. The disassembly program is shown in Figure 9-31, the decompiled C program in Figure 9-32, and the initial C program in Figure 9-33. The program has the following call graph:

```
main
    scanf
    printf
    LMOD@
```

As seen from the disassembly of the program, the long variables are located in the stack at offsets -4, -20, -16, -8, and -12. The final C decompiled code makes use of five long variables and an integer variable `loc6`. This latter variable is used as a Boolean variable to hold the contents of a Boolean expression evaluation. Three Boolean expression evaluations are seen in the final C code:

```
loc1 == loc2
LO(loc1) | HI(loc1)
loc1 > loc2
```

All these expressions can be transformed into Boolean assignment by means of the transformation described in the previous Section. The generated code would look like this:

```
loc4 = (loc1 == loc2);
/* other code here */
loc4 = (LO(loc1) | HI(loc1));
/* other code here */
loc4 = (loc1 > loc2);
```

The second Boolean expression checks the low and high part of a long variable and ors them together; this is equivalent to a logical negation of the long variable, which would lead to the following final code:

```
loc4 = !loc1;
```

The benchlng program as compared to the benchsho program has 27.34% more low-level instructions in the main program (the LMOD@ subroutine calculates the modulus of long variables and is untranslatable to a high-level language), three more instructions in the high-level representation of main (due to the logical negation of a long variable, which makes use of the temporary Boolean variable `loc6`), and performs a reduction of 79.86% instructions as shown in Figure 9-34.
Figure 9-31: Bench\textreg;g.a2
L7: SUB si, bx
L6: LOOP L5
L5: SBB di, bp
INC ax
L4: LOOP L3
L3: SBB dx, di
INC ax
L2: LOOP L1
L1: SBB dx, di
INC ax
L0: LOOP L
L: MOV tmp, dx:ax ;Synthetic inst
DIV bx
MOD bx ;Synthetic inst
TEST di, 2
JE L10
MOV ax, 198h
50 PUSH ax
16 CALL near ptr printf
51 PUSH ax
17 CALL near ptr scanf
52 PUSH ax
18 CALL near ptr printf
53 PUSH ax
19 CALL near ptr printf
54 PUSH ax
20 CALL near ptr printf
55 PUSH ax
21 CALL near ptr printf
56 PUSH ax
22 CALL near ptr printf
57 PUSH ax
23 CALL near ptr printf
58 PUSH ax
24 CALL near ptr printf
59 PUSH ax
25 CALL near ptr printf
60 PUSH ax
26 CALL near ptr printf
61 PUSH ax
27 CALL near ptr printf
62 PUSH ax
28 CALL near ptr printf
63 PUSH ax
29 CALL near ptr printf
64 PUSH ax
30 CALL near ptr printf
65 PUSH ax
31 CALL near ptr printf
66 PUSH ax
32 CALL near ptr printf
67 PUSH ax
33 CALL near ptr printf
68 PUSH ax
34 CALL near ptr printf
69 PUSH ax
35 CALL near ptr printf
70 PUSH ax
36 CALL near ptr printf
71 PUSH ax
37 CALL near ptr printf
72 PUSH ax
38 CALL near ptr printf
73 PUSH ax
39 CALL near ptr printf
74 PUSH ax
40 CALL near ptr printf
41 PUSH ax
42 CALL near ptr printf
43 PUSH ax
44 CALL near ptr printf
45 PUSH ax
46 CALL near ptr printf
47 PUSH ax
48 CALL near ptr printf
49 PUSH ax
50 CALL near ptr printf
51 PUSH ax
52 CALL near ptr printf
53 PUSH ax
54 CALL near ptr printf
55 PUSH ax
56 CALL near ptr printf
57 PUSH ax
58 CALL near ptr printf
59 PUSH ax
60 CALL near ptr printf
61 PUSH ax
62 CALL near ptr printf
63 PUSH ax
64 CALL near ptr printf
65 PUSH ax
66 CALL near ptr printf
67 PUSH ax
68 CALL near ptr printf
69 PUSH ax
70 CALL near ptr printf
71 PUSH ax
72 CALL near ptr printf
73 PUSH ax
74 CALL near ptr printf
75 PUSH ax
76 CALL near ptr printf
77 PUSH ax
78 CALL near ptr printf
79 PUSH ax
80 CALL near ptr printf
81 PUSH ax
82 CALL near ptr printf
83 PUSH ax
84 CALL near ptr printf
85 PUSH ax
86 CALL near ptr printf
87 PUSH ax
88 CALL near ptr printf
89 PUSH ax
90 CALL near ptr printf
91 PUSH ax
92 CALL near ptr printf
93 PUSH ax
94 CALL near ptr printf
95 PUSH ax
96 CALL near ptr printf
97 PUSH ax
98 CALL near ptr printf
99 PUSH ax
100 CALL near ptr printf
101 CALL near ptr printf
102 CALL near ptr printf
103 CALL near ptr printf
104 CALL near ptr printf
105 CALL near ptr printf
106 CALL near ptr printf
107 CALL near ptr printf
108 CALL near ptr printf
109 CALL near ptr printf
110 CALL near ptr printf
111 CALL near ptr printf
112 CALL near ptr printf
113 CALL near ptr printf
114 CALL near ptr printf
115 CALL near ptr printf
116 CALL near ptr printf
117 CALL near ptr printf
118 CALL near ptr printf
119 CALL near ptr printf
120 CALL near ptr printf
121 CALL near ptr printf
122 CALL near ptr printf
123 CALL near ptr printf
124 CALL near ptr printf
125 CALL near ptr printf
126 CALL near ptr printf
127 CALL near ptr printf
128 CALL near ptr printf
129 CALL near ptr printf
130 CALL near ptr printf
131 CALL near ptr printf
132 CALL near ptr printf
133 CALL near ptr printf
134 CALL near ptr printf
135 CALL near ptr printf
136 CALL near ptr printf
137 CALL near ptr printf
138 CALL near ptr printf
139 CALL near ptr printf
140 CALL near ptr printf
141 CALL near ptr printf
142 CALL near ptr printf
143 CALL near ptr printf
144 CALL near ptr printf
145 CALL near ptr printf
146 CALL near ptr printf
147 CALL near ptr printf
148 CALL near ptr printf
149 CALL near ptr printf
150 CALL near ptr printf
151 CALL near ptr printf
152 CALL near ptr printf
153 CALL near ptr printf
154 CALL near ptr printf
155 CALL near ptr printf
156 CALL near ptr printf
157 CALL near ptr printf
158 CALL near ptr printf
159 CALL near ptr printf
160 CALL near ptr printf
161 CALL near ptr printf
162 CALL near ptr printf
163 CALL near ptr printf
164 CALL near ptr printf
165 CALL near ptr printf
166 CALL near ptr printf
167 CALL near ptr printf
168 CALL near ptr printf
169 CALL near ptr printf
170 CALL near ptr printf
171 CALL near ptr printf
172 CALL near ptr printf
173 CALL near ptr printf
174 CALL near ptr printf
175 CALL near ptr printf
176 CALL near ptr printf
177 CALL near ptr printf
178 CALL near ptr printf
179 CALL near ptr printf
180 CALL near ptr printf
181 CALL near ptr printf
182 CALL near ptr printf
183 CALL near ptr printf
184 CALL near ptr printf
185 CALL near ptr printf
186 CALL near ptr printf
187 CALL near ptr printf
188 CALL near ptr printf
189 CALL near ptr printf
190 CALL near ptr printf
191 CALL near ptr printf
192 CALL near ptr printf
193 CALL near ptr printf
194 CALL near ptr printf
195 CALL near ptr printf
196 CALL near ptr printf
197 CALL near ptr printf
198 CALL near ptr printf
199 CALL near ptr printf
200 CALL near ptr printf
201 CALL near ptr printf
202 CALL near ptr printf
203 CALL near ptr printf
204 CALL near ptr printf
205 CALL near ptr printf
206 CALL near ptr printf
207 CALL near ptr printf
208 CALL near ptr printf
209 CALL near ptr printf
210 CALL near ptr printf
211 CALL near ptr printf
212 CALL near ptr printf
213 CALL near ptr printf
214 CALL near ptr printf
215 CALL near ptr printf
015 00031A 83C406 ADD sp, 6
016 00031D 8D46EC LEA ax, [bp-14h]
017 000320 50 PUSH ax
018 000321 B8B201 MOV ax, 1B2h
019 000324 50 PUSH ax
020 000325 E84015 CALL near ptr scanf
021 000328 59 POP cx
022 000329 59 POP cx
023 00032A 8D46F0 LEA ax, [bp-10h]
024 00032D 50 PUSH ax
025 00032E B8B601 MOV ax, 1B6h
026 000331 50 PUSH ax
027 000332 E83315 CALL near ptr scanf
028 000335 59 POP cx
029 000336 59 POP cx
030 000337 C746FA0000 MOV word ptr [bp-6], 0
031 00033C C746F80100 MOV word ptr [bp-8], 1
032 00042D 8B56FA L11: MOV dx, [bp-6]
033 000430 8B46F8 MOV ax, [bp-8]
034 000433 3B56FE CMP dx, [bp-2]
035 000436 7D03 JGE L12
036 000344 C746F60000 L13: MOV word ptr [bp-0Ah], 0
037 000349 C746F40100 MOV word ptr [bp-0Ch], 1
038 00034C 837EF600 L14: CMP word ptr [bp-0Ah], 0
039 000411 837EF600 L14: CMP word ptr [bp-0Ah], 0
040 000415 7D03 JGE L15
041 000351 8B56EE L16: MOV dx, [bp-12h]
042 000354 8B46EC MOV ax, [bp-14h]
043 000357 0346F0 ADD ax, [bp-10h]
044 00035A 1356F2 ADC dx, [bp-0Eh]
045 00035D 0346F4 ADD ax, [bp-0Ch]
046 000360 1356F6 ADC dx, [bp-0Ah]
047 000363 8956EE MOV [bp-12h], dx
048 000366 8946EC MOV [bp-14h], ax
049 000369 8B56EE MOV dx, [bp-12h]
050 00036C 8B46EC MOV ax, [bp-14h]
051 00036F D1FA SAR dx, 1
052 000371 D1D8 RCR ax, 1
053 000373 8956F2 MOV [bp-0Eh], dx
054 000376 8946F0 MOV [bp-10h], ax
055 000379 33D2 XOR dx, dx
056 00037B B80A00 MOV ax, 0Ah

Figure 9-31: Benchlng.a2 – Continued
Figure 9-31: Benchlng.a2 – Continued
105 0003FB B80100  L22: MOV      ax, 1
107 000402 99       L23: CWD
108 000403 8956EE    MOV      [bp-12h], dx
109 000406 8946EC    MOV      [bp-14h], ax
110 000409 8346F401   ADD word ptr [bp-0Ch], 1
111 00040D 8356F600   ADC word ptr [bp-0Ah], 0
112                       JMP L14 ;Synthetic inst
113 000400 33C0      L21: XOR      ax, ax
114                        JMP L23 ;Synthetic inst
115 0003CE 33C0      L19: XOR      ax, ax
116                        JMP L20 ;Synthetic inst
117 0003A6 33C0      L17: XOR      ax, ax
118                        JMP L18 ;Synthetic inst
119 00041A 7F09      L15:      ;Synthetic inst
120 00041C 837EF428   CMP      word ptr [bp-0Ch], 28h
121 000420 7703      JA L24
122 000425 8346F801   ADD word ptr [bp-8], 1
123 000429 8356FA00   ADC word ptr [bp-6], 0
124                        JMP L11 ;Synthetic inst
126 00043B 7F08      L12:      ;Synthetic inst
127 00043D 3B46FC    CMP      ax, [bp-4]
128 000440 7703      JA L25
130 000445 FF76EE    L25: PUSH word ptr [bp-12h]
131 000448 FF76EC    PUSH word ptr [bp-14h]
132 00044B B8BA01    MOV ax, 1BaH
133 00044E 50        PUSH ax
134 00044F E88D0B    CALL near ptr printf
135 000452 83C406    ADD sp, 6
136 000455 8BE5      MOV sp, bp
137 000457 5D        POP bp
138 000458 C3        RET
main ENDP

Figure 9-31: Benchlng.a2 – Continued
/ * Input file: benchlng.exe
 * File type: EXE
 */

#include "dcc.h"

long LMOD@ (long arg0, long arg2)
/* Takes 8 bytes of parameters.
 * Runtime support routine of the compiler.
 * Untranslatable routine. Assembler provided.
 * Return value in registers dx:ax.
 * Pascal calling convention.
 */
{
    /* disassembly code here */
}

void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    long loc1;
    long loc2;
    long loc3;
    long loc4;
    long loc5;
    int loc6; /* ax */

    scanf ("%ld", &loc5);
    printf ("executing %ld iterations\n", loc5);
    scanf ("%ld", &loc2);
    scanf ("%ld", &loc4);
    loc3 = 1;
    while ((loc3 <= loc5)) {
        loc2 = 1;
        while ((loc2 <= 40)) {
            loc4 = ((loc4 + loc1) + loc2);
            loc1 = (loc4 >> 1);
            loc4 = LMOD@ (loc1, 10);
            if (loc1 == loc2) {
                loc6 = 1;
            }
        }
    }
}

Figure 9-32: Benchlng.b
else {
    loc6 = 0;
}
loc4 = loc6;
loc1 = (loc4 | loc2);
if ((LO(loc1) | HI(loc1)) == 0) {
    loc6 = 1;
} else {
    loc6 = 0;
}
loc4 = loc6;
loc1 = (loc4 + loc2);
if (loc1 > loc2) {
    loc6 = 1;
} else {
    loc6 = 0;
}
loc4 = loc6;
loc2 = (loc2 + 1);
loc3 = (loc3 + 1);
printf("a=%d\n", loc4);

Figure 9-32: Benchlng.b – Continued
/* benchlng - benchmark for long integers
 * Thomas Plum, Plum Hall Inc, 609-927-3770
 * If machine traps overflow, use an unsigned type
 * Let T be the execution time in milliseconds
 * Then average time per operator = T/major usec
 * (Because the inner loop has exactly 1000 operations)
 */
#define TYPE long
#include <stdio.h>

main (int ac, char *av[])
{
    TYPE a, b, c;
    long d, major;

    scanf "%ld", &major);
    printf("executing %ld iterations
", major);
    scanf "%ld", &a);
    scanf "%ld", &b);
    for (d = 1; d <= major; ++d)
    {
        /* inner loop executes 1000 selected operations */
        for (c = 1; c <= 40; ++c)
        {
            a = a + b + c;
            b = a >> 1;
            a = b % 10;
            a = b == c;
            b = a | c;
            a = !b;
            b = a + c;
            a = b > c;
        }
    }
    printf("a=%d\n", a);
    }

Figure 9-33: Benchlng.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>LMOD@</td>
<td>72</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>main</td>
<td>139</td>
<td>28</td>
<td>79.86</td>
</tr>
<tr>
<td>total</td>
<td>139</td>
<td>28</td>
<td>79.86</td>
</tr>
</tbody>
</table>

Figure 9-34: Benchlng Statistics
9.7.6 Benchmul.exe

Benchmul is another program from the Plum-Hall benchmarks. This program benchmarks integer multiplication by executing 1000 multiplications in a loop. The disassembly program is shown in Figure 9-35, the decompiled C program in Figure 9-36, and the initial C program in Figure 9-37. This program has the following call graph:

```
main
    scanf
    printf
```

Benchmul makes use of two long variables to loop a large number of times through the program, and three integer variables that perform the operations; one of these variables is not actually used in the program. As seen from the disassembly, the long variables are located on the stack at offsets -4 and -8, and the integer variables are at offsets -12, -10, and on the register variable si. The final C code is identical to the initial C code, and a reduction of 86.36% of instructions was achieved by this program, as seen in Figure 9-38.
main PROC NEAR

000 0002FA 55  PUSH    bp
001 0002FB 8BEC  MOV    bp, sp
002 0002FD 83EC0C SUB    sp, 0Ch
003 000300 56  PUSH    si
004 000301 8D46FC LEA    ax, [bp-4]
005 000304 50  PUSH    ax
006 000305 B89401 MOV    ax, 194h
007 000308 50  PUSH    ax
008 000309 E8D014 CALL near ptr scanf
009 00030C 59  POP     cx
010 00030D 59  POP     cx
011 00030E FF76FE PUSH word ptr [bp-2]
012 000311 FF76FC PUSH word ptr [bp-4]
013 000314 B99801 MOV    ax, 198h
014 000317 50  PUSH    ax
015 000318 E8380C CALL near ptr printf
016 00031B 83C406 ADD sp, 6
017 00031E 8D46F4 LEA    ax, [bp-0Ch]
018 000321 50  PUSH    ax
019 000322 B8B201 MOV ax, 1B2h
020 000325 50  PUSH    ax
021 000326 E8B314 CALL near ptr scanf
022 000329 59  POP     cx
023 00032A 59  POP     cx
024 00032B 8D46F6 LEA    ax, [bp-0Ah]
025 00032E 50  PUSH    ax
026 000332 B8B501 MOV ax, 1B5h
027 000334 50  PUSH    ax
028 000336 E8A614 CALL near ptr scanf
029 000339 59  POP     cx
030 00033A 59  POP     cx
031 00033B C746FA0000 MOV word ptr [bp-6], 0
032 00033D C746F80100 MOV word ptr [bp-8], 1

034 0003AA 8B56FA LM1: MOV dx, [bp-6]
035 0003AD 8B46F8 MOV ax, [bp-8]
036 0003B0 3B56FE CMP dx, [bp-2]
037 0003B3 7C8F JL L2
038 0003B5 7F05 JG L3
039 0003B7 3B46FC CMP ax, [bp-4]
040 0003BA 7688 JBE L2
041 0003BC FF76F4 LM3: PUSH word ptr [bp-0Ch]
042 0003BF B8B801 MOV ax, 1B8h
043 0003C2 50  PUSH    ax
044 0003C3 E8D0DB CALL near ptr printf

Figure 9-35: Benchmul.a2
Figure 9-35: Benchmul.a2 – Continued
void main ()
    /* Takes no parameters.  
    * High-level language prologue code.
    */
{
    int loc1;
    int loc2;
    long loc3;
    long loc4;
    int loc5;

    scanf ("%ld", &loc4);
    printf ("executing %ld iterations\n", loc4);
    scanf ("%d", &loc1);
    scanf ("%d", &loc2);
    loc3 = 1;
    while ((loc3 <= loc4)) {
        loc5 = 1;
        while ((loc5 <= 40)) {
            loc1 = (((((((((loc1 * loc1) * loc1) * loc1) * loc1) * loc1) * loc1) * loc1) * loc1) * loc1) * loc1) * 3);
            loc5 = (loc5 + 1);
        }
        loc3 = (loc3 + 1);
    }
    printf ("a=%d\n", loc1);
}
/* benchmul - benchmark for int multiply
* Thomas Plum, Plum Hall Inc, 609-927-3770
* If machine traps overflow, use an unsigned type
* Let T be the execution time in milliseconds
* Then average time per operator = T/major usec
* (Because the inner loop has exactly 1000 operations)
*/
#define STOR_CL auto
#define TYPE int
#include <stdio.h>

main (int ac, char *av[])
{ STOR_CL TYPE a, b, c;
  long d, major;

  scanf ("%ld", &major);
  printf("executing %ld iterations\n", major);
  scanf ("%d", &a);
  scanf ("%d", &b);
  for (d = 1; d <= major; ++d)
  {
    /* inner loop executes 1000 selected operations */
    for (c = 1; c <= 40; ++c)
    {
      a = 3 *a*a*a*a*a*a*a*a * a*a*a*a*a*a*a*a * a*a*a*a*a*a*a*a * a; /* 25 * */
    }
  }
  printf("a=%d\n", a);
}

Figure 9-37: Benchmul.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>main</td>
<td>88</td>
<td>12</td>
<td>86.36</td>
</tr>
<tr>
<td>total</td>
<td>88</td>
<td>12</td>
<td>86.36</td>
</tr>
</tbody>
</table>

Figure 9-38: Benchmul Statistics
9.7.7 Benchfn.exe

Benchfn is a program from the Plum-Hall benchmark suite, which benchmarks function calls; 1000 subroutine calls are done each time around the loop. The disassembly program is shown in Figure 9-39, the decompiled C program in Figure 9-40, and the initial C program in Figure 9-41. This program has the following call graph:

```
main
  scanf
  printf
  proc_1
    proc_2
      proc_3
    proc_4
```

Benchfn has four procedures and a main program. Three of the four procedures invoke other procedure, and the fourth procedure is empty. The percentage of reduction on the number of intermediate instructions is not as high in this program as compared to the previous programs since there are not many expressions in the program (which is not normally the case with high-level programs). As seen in the statistics of this program (see Figure 9-42), the empty procedure has a 100% reduction since the procedure prologue and trailer low-level instructions are eliminated in the C program; the other three procedures have an average of 29.30% reduction of instructions on 29 procedure calls performed by them, and the main program has an 81.08% reduction of instructions since expressions and assignments are used in this procedure. The overall average for the program is low, 56.10%, and is due to the lack of assignment statements in this program.
Figure 9-39: Benchfn.a2
Figure 9-39: Benchfn.a2 – Continued
<table>
<thead>
<tr>
<th>Address</th>
<th>Instructions</th>
</tr>
</thead>
<tbody>
<tr>
<td>031 0003BB 5D</td>
<td>POP bp</td>
</tr>
<tr>
<td>032 0003BC C3</td>
<td>RET</td>
</tr>
<tr>
<td>033 000394 E8AEFF</td>
<td>L2: CALL near ptr proc_1</td>
</tr>
<tr>
<td>034 000397 8346F801</td>
<td>ADD word ptr [bp-8], 1</td>
</tr>
<tr>
<td>035 00039B 8356FA00</td>
<td>ADC word ptr [bp-6], 0</td>
</tr>
<tr>
<td>036</td>
<td>JMP L1 ; Synthetic inst</td>
</tr>
</tbody>
</table>

main ENDP

Figure 9-39: Benchfn.a2 – Continued
/*
 * Input file : benchfn.exe
 * File type : EXE
 */

#include "dcc.h"

void proc_4 ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
}

void proc_3 ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
    proc_4 ();
}

void proc_2 ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    proc_3 ();
    proc_3 ();
    proc_3 ();
    proc_3 ();
    proc_3 ();
    proc_3 ();
    proc_3 ();
    proc_3 ();
}
void proc_1 ()
/* Takes no parameters. *
 * High-level language prologue code. */
{
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
    proc_2 ();
}

void main ()
/* Takes no parameters. *
 * High-level language prologue code. */
{
    long loc1;
    long loc2;

    scanf ("%ld", &loc2);
    printf ("executing %ld iterations\n", loc2);
    loc1 = 1;
    while ((loc1 <= loc2)) {
        proc_1 ();
        loc1 = (loc1 + 1);
    }
    printf ("finished\n");
}
/* benchfn - benchmark for function calls  
 * Thomas Plum, Plum Hall Inc, 609-927-3770  
 * Let T be the execution time in milliseconds  
 * Then average time per operator = T/major usec  
 * (Because the inner loop has exactly 1000 operations)  
 */  
#include <stdio.h>

f3() { ;}  
f2() { f3();f3();f3();f3();f3();f3();f3();f3();f3();f3();} /* 10 */  
f1() { f2();f2();f2();f2();f2();f2();f2();f2();f2();f2();} /* 10 */  
f0() { f1();f1();f1();f1();f1();f1();f1();f1();f1();f1();} /* 9 */

main (int ac, char *av[])  
{  long d, major;
    scanf("%ld", &major);  
    printf("executing %ld iterations\n", major);  
    for (d = 1; d <= major; ++d)  
      f0();  /* executes 1000 calls */  
      printf("finished\n");  
}  

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>proc_4</td>
<td>4</td>
<td>0</td>
<td>100.00</td>
</tr>
<tr>
<td>proc_3</td>
<td>14</td>
<td>10</td>
<td>28.57</td>
</tr>
<tr>
<td>proc_2</td>
<td>14</td>
<td>10</td>
<td>28.57</td>
</tr>
<tr>
<td>proc_1</td>
<td>13</td>
<td>9</td>
<td>30.77</td>
</tr>
<tr>
<td>main</td>
<td>37</td>
<td>7</td>
<td>81.08</td>
</tr>
<tr>
<td>total</td>
<td>82</td>
<td>36</td>
<td>56.10</td>
</tr>
</tbody>
</table>

Figure 9-41: Benchfn.c  

Figure 9-42: Benchfn Statistics
9.7.8 Fibo.exe

Fibo is a program that calculates the Fibonacci of input numbers. The computation of the Fibonacci number is done in a recursive function (two recursions are used). The disassembly program is shown in Figure 9-43, the decompiled C program in Figure 9-44, and the initial C program in Figure 9-45. Fibo has the following call graph:

```
main
  scanf
  printf
  exit
  proc_1
    proc_1
```

The `main` of the decompiled C program has the same number of instructions as the initial C program; the `for()` loop is represented by a `while()` loop. The recursive Fibonacci function, `proc_1` in the decompiled program, makes use of five instructions as opposed to three instructions in the initial code. These extra instructions are due to a copy of the argument to a local variable (`loc1 = arg0;`), and the placement of the result in a register variable along two different paths (i.e. two different possible results) before returning this value. The code is functionally equivalent to the initial code in all ways. Note that on the second recursive invocation of `proc_1`, the actual parameter expression is `(loc1 + -2)`; which is equivalent to `(loc1 - 2)`. The former expression comes from the disassembly of the program which makes use of the addition of a local variable and a negative number, rather than the subtraction of a positive number. As seen in the statistics of the program (see Figure 9-46, the individual and overall reduction on the number of intermediate instruction is 80.77%.)
<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>proc_1 PROC NEAR</td>
<td></td>
</tr>
<tr>
<td>000 00035B 55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>001 00035C 8BEC</td>
<td>MOV bp, sp</td>
</tr>
<tr>
<td>002 00035E 56</td>
<td>PUSH si</td>
</tr>
<tr>
<td>003 00035F 8B7604</td>
<td>MOV si, [bp+4]</td>
</tr>
<tr>
<td>004 000362 83FE02</td>
<td>CMP si, 2</td>
</tr>
<tr>
<td>005 000365 7E1C</td>
<td>JLE L1</td>
</tr>
<tr>
<td>006 000367 8BCE6</td>
<td>MOV ax, si</td>
</tr>
<tr>
<td>007 000369 48</td>
<td>DEC ax</td>
</tr>
<tr>
<td>008 00036A 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>009 00036B E8EDFF</td>
<td>CALL near ptr proc_1</td>
</tr>
<tr>
<td>010 00036E 59</td>
<td>POP cx</td>
</tr>
<tr>
<td>011 00036F 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>012 000370 8BCE6</td>
<td>MOV ax, si</td>
</tr>
<tr>
<td>013 000372 05FEFF</td>
<td>ADD ax, OFFFFFFFFh</td>
</tr>
<tr>
<td>014 000375 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>015 000376 E8E2FF</td>
<td>CALL near ptr proc_1</td>
</tr>
<tr>
<td>016 000379 59</td>
<td>POP cx</td>
</tr>
<tr>
<td>017 00037A 8BD0</td>
<td>MOV dx, ax</td>
</tr>
<tr>
<td>018 00037C 58</td>
<td>POP ax</td>
</tr>
<tr>
<td>019 00037D 03C2</td>
<td>ADD ax, dx</td>
</tr>
<tr>
<td>020 000388 5E</td>
<td>L2: POP si</td>
</tr>
<tr>
<td>021 000389 5D</td>
<td>POP bp</td>
</tr>
<tr>
<td>022 00038A C3</td>
<td>RET</td>
</tr>
<tr>
<td>023 00038B 8B100</td>
<td>L1: MOV ax, 1</td>
</tr>
<tr>
<td>024 00038C EB00</td>
<td>JMP L2</td>
</tr>
<tr>
<td>proc_1 ENDP</td>
<td></td>
</tr>
<tr>
<td>main PROC NEAR</td>
<td></td>
</tr>
<tr>
<td>000 0002FA 55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>001 0002FB 8BEC</td>
<td>MOV bp, sp</td>
</tr>
<tr>
<td>002 0002FD 83EC04</td>
<td>SUB sp, 4</td>
</tr>
<tr>
<td>003 000300 56</td>
<td>PUSH si</td>
</tr>
<tr>
<td>004 000301 57</td>
<td>PUSH di</td>
</tr>
<tr>
<td>005 000302 B89401</td>
<td>MOV ax, 194h</td>
</tr>
<tr>
<td>006 000305 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>007 000306 E8080C</td>
<td>CALL near ptr printf</td>
</tr>
<tr>
<td>008 000309 59</td>
<td>POP cx</td>
</tr>
<tr>
<td>009 00030A 8D46FC</td>
<td>LEA ax, [bp-4]</td>
</tr>
<tr>
<td>010 00030D 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>011 00030E B8B101</td>
<td>MOV ax, 1B1h</td>
</tr>
<tr>
<td>012 000311 50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>013 000312 E88514</td>
<td>CALL near ptr scanf</td>
</tr>
<tr>
<td>014 000315 59</td>
<td>POP cx</td>
</tr>
<tr>
<td>015 000316 59</td>
<td>POP cx</td>
</tr>
</tbody>
</table>

Figure 9-43: Fibo.a2
Figure 9-43: Fibo.a2 – Continued

```assembly
016 000317 BE0100  MOV    si, 1
018 000349 3B76FC   L3: CMP    si, [bp-4]
019 00034C 7ECE      JLE     L4
020 00034E 33C0      XOR     ax, ax
021 000350 50        PUSH    ax
022 000351 E87300    CALL    near ptr exit
023 000354 59        POP     cx
024 000355 5F        POP     di
025 000356 5E        POP     si
026 000357 8BE5      MOV     sp, bp
027 000358 5D        POP     bp
028 00035A C3        RET
029 00031C B8B401    L4: MOV    ax, 1B4h
030 00031F 50        PUSH    ax
031 000320 E8EE0B    CALL    near ptr printf
032 000323 59        POP     cx
033 000324 8D46FE    LEA     ax, [bp-2]
034 000327 50        PUSH    ax
035 000328 B8C301    MOV     ax, 1C3h
036 00032B 50        PUSH    ax
037 00032C E8B14     CALL    near ptr scanf
038 00032F 59        POP     cx
039 000330 59        POP     cx
040 000331 FF76FE    PUSH    word ptr [bp-2]
041 000334 E82400    CALL    near ptr proc_1
042 000337 59        POP     cx
043 000338 8BF8      MOV     di, ax
044 00033A 57        PUSH    di
045 00033B FF76FE    PUSH    word ptr [bp-2]
046 00033E B8C601    MOV     ax, 1C6h
047 000341 50        PUSH    ax
048 000342 E8CC0B    CALL    near ptr printf
049 000345 83C406    ADD     sp, 6
050 000348 46        INC     si
051 000349 B8B401    mov ax, 1B4h

main ENDP
```

Figure 9-43: Fibo.a2 – Continued
#include "dcc.h"

int proc_1 (int arg0)
/* Takes 2 bytes of parameters.
 * High-level language prologue code.
 * C calling convention.
 */
{
    int loc1;
    int loc2; /* ax */

    loc1 = arg0;
    if (loc1 > 2) {
        loc2 = (proc_1 ((loc1 - 1)) + proc_1 ((loc1 + -2)));
    } else {
        loc2 = 1;
    }
    return (loc2);
}

void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    int loc1; int loc2;
    int loc3; int loc4;

    printf("Input number of iterations: ");
    scanf("%d", &loc1);
    loc3 = 1;
    while ((loc3 <= loc1)) {
        printf("Input number: ");
        scanf("%d", &loc2);
        loc4 = proc_1 (loc2);
        printf("fibonacci(%d) = %u\n", loc2, loc4);
        loc3 = (loc3 + 1);
    }
    exit (0);
}
```c
#include <stdio.h>

int main()
{   int i, numtimes, number;
    unsigned value, fib();

    printf("Input number of iterations: ");
    scanf ("%d", &numtimes);
    for (i = 1; i <= numtimes; i++)
    {
        printf ("Input number: ");
        scanf ("%d", &number);
        value = fib(number);
        printf("fibonacci(%d) = %u\\n", number, value);
    }
    exit(0);
}

unsigned fib(x) /* compute fibonacci number recursively */
int x;
{
    if (x > 2)
        return (fib(x - 1) + fib(x - 2));
    else
        return (1);
}
```

Figure 9-45: Fibo.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>proc_1</td>
<td>26</td>
<td>5</td>
<td>80.77</td>
</tr>
<tr>
<td>main</td>
<td>52</td>
<td>10</td>
<td>80.77</td>
</tr>
<tr>
<td>total</td>
<td>78</td>
<td>15</td>
<td>80.77</td>
</tr>
</tbody>
</table>

Figure 9-46: Fibo Statistics
9.7.9 Crc.exe

Crc is a program that calculates the cyclic redundancy check (CRC) for a 1-character message block, and then passes the resulting CRC back into the CRC functions to see if the “received” 1-character message and CRC are correct. The disassembly program is shown in Figure 9-47, the decompiled C program in Figure 9-48, and the initial C program in Figure 9-49. Crc has the following call graph:

```
main
 proc_1
 proc_2
    LXLSH@
    LXRSH@
 proc_3
    proc_2
 printf
```

As seen in the initial C program, crc has three functions and a main procedure. The decompiled version of the program has five functions and a main program; the two extra functions are runtime support routines to support long right and left shifts (LXRSH@ and LXLSH@ respectively). These two routines were initially written in assembler, and are translated into C by accessing the low and high parts of the long argument. As seen in the statistics of the program (see Figure 9-50), the user functions have a reduction of over 80% intermediate instructions. These functions have the same number of high-level instructions when compared with the original program. Function proc_1 is the crc_clear function that returns zero. This function has a 83.33% reduction of intermediate instructions due to the overhead provided by the procedure prologue and trailer code. Function proc_2 is the crc_update function that calculates the CRC for the input argument according to the CCITT recommended CRC generator function. This function uses 32 bits to compute the result, and returns the lower 16 bits as the function’s value. The decompiled version of this function propagates the fact that only 16 bits are used for the result to the invoked runtime routine LXRSH@, and hence this latter function only returns an integer (16 bits) rather than a long integer; the code is much simpler than its homologous LXLSH@ (which returns a long integer). The reduction in the number of instruction is of 84.62%. Function proc_3 is the crc_finish function which returns the final two CRC characters that are to be transmitted at the end of the block. This function calls the crc_update function twice; one as an argument of the other. The reduction on the number of instructions is high (93.75%) since all 16 low-level instructions are transformed into 1 high-level return instruction. Finally, the main program invokes the functions in the right order; a reduction of 82.09% is achieved. Note that integers are used in this program rather than characters since there is no use of the character variables as such characters (i.e. an unsigned character generates the same code). The overall intermediate instruction reduction on the program is of 77.78%, which is less than 80% due to the runtime routines.
<table>
<thead>
<tr>
<th>Line</th>
<th>Address</th>
<th>Opcode</th>
<th>Arguments</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>000385</td>
<td>55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>001</td>
<td>000386</td>
<td>8BEC</td>
<td>MOV bp, sp</td>
</tr>
<tr>
<td>002</td>
<td>000388</td>
<td>33C0</td>
<td>XOR ax, ax</td>
</tr>
<tr>
<td>003</td>
<td>00038A</td>
<td>50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>004</td>
<td>00038B</td>
<td>33C0</td>
<td>XOR ax, ax</td>
</tr>
<tr>
<td>005</td>
<td>00038D</td>
<td>50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>006</td>
<td>00038E</td>
<td>FF7604</td>
<td>PUSH word ptr [bp+4]</td>
</tr>
<tr>
<td>007</td>
<td>000391</td>
<td>E86FFF</td>
<td>CALL near ptr proc_2</td>
</tr>
<tr>
<td>008</td>
<td>000393</td>
<td>59</td>
<td>POP cx</td>
</tr>
<tr>
<td>009</td>
<td>000395</td>
<td>59</td>
<td>POP cx</td>
</tr>
<tr>
<td>010</td>
<td>000396</td>
<td>50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>011</td>
<td>000397</td>
<td>E869FF</td>
<td>CALL near ptr proc_2</td>
</tr>
<tr>
<td>012</td>
<td>000399</td>
<td>59</td>
<td>MOV sp, bp</td>
</tr>
<tr>
<td>014</td>
<td>00039E</td>
<td>5D</td>
<td>POP bp</td>
</tr>
<tr>
<td>015</td>
<td>00039F</td>
<td>C3</td>
<td>RET</td>
</tr>
</tbody>
</table>

**proc_3 ENDP**

<table>
<thead>
<tr>
<th>Line</th>
<th>Address</th>
<th>Opcode</th>
<th>Arguments</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>001585</td>
<td>80F910</td>
<td>CMP cl, 10h</td>
</tr>
<tr>
<td>001</td>
<td>001588</td>
<td>7310</td>
<td>JAE L1</td>
</tr>
<tr>
<td>002</td>
<td>00158A</td>
<td>8BDA</td>
<td>MOV bx, dx</td>
</tr>
<tr>
<td>003</td>
<td>00158C</td>
<td>D3E8</td>
<td>SHR ax, cl</td>
</tr>
<tr>
<td>004</td>
<td>00158E</td>
<td>D3FA</td>
<td>SAR dx, cl</td>
</tr>
<tr>
<td>005</td>
<td>001590</td>
<td>F6D9</td>
<td>NEG cl</td>
</tr>
<tr>
<td>006</td>
<td>001592</td>
<td>80C110</td>
<td>ADD cl, 10h</td>
</tr>
<tr>
<td>007</td>
<td>001595</td>
<td>D3E3</td>
<td>SHR bx, cl</td>
</tr>
<tr>
<td>008</td>
<td>001597</td>
<td>0BC3</td>
<td>OR ax, bx</td>
</tr>
<tr>
<td>009</td>
<td>001599</td>
<td>CB</td>
<td>RETF</td>
</tr>
<tr>
<td>010</td>
<td>00159A</td>
<td>80E910</td>
<td>L1: SUB cl, 10h</td>
</tr>
<tr>
<td>011</td>
<td>00159D</td>
<td>8BC2</td>
<td>MOV ax, dx</td>
</tr>
<tr>
<td>012</td>
<td>00159F</td>
<td>99</td>
<td>CWD</td>
</tr>
<tr>
<td>013</td>
<td>0015A0</td>
<td>D3F8</td>
<td>SAR ax, cl</td>
</tr>
<tr>
<td>014</td>
<td>0015A2</td>
<td>CB</td>
<td>RETF</td>
</tr>
</tbody>
</table>

**LXRSH@ ENDP**

<table>
<thead>
<tr>
<th>Line</th>
<th>Address</th>
<th>Opcode</th>
<th>Arguments</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>0002FA</td>
<td>55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>001</td>
<td>0002FB</td>
<td>8BEC</td>
<td>MOV bp, sp</td>
</tr>
<tr>
<td>002</td>
<td>0002FD</td>
<td>33C0</td>
<td>XOR ax, ax</td>
</tr>
<tr>
<td>004</td>
<td>000301</td>
<td>5D</td>
<td>POP bp</td>
</tr>
<tr>
<td>005</td>
<td>000302</td>
<td>C3</td>
<td>RET</td>
</tr>
</tbody>
</table>

**proc_1 ENDP**

Figure 9-47: Crc.a2
**LXLSH@ PROC FAR**

<table>
<thead>
<tr>
<th>Address</th>
<th>Instruction</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>000 0015A3</td>
<td>80F910</td>
<td>CMP cl, 10h</td>
</tr>
<tr>
<td>001 0015A6</td>
<td>7310</td>
<td>JAE L2</td>
</tr>
<tr>
<td>002 0015A8</td>
<td>8BD8</td>
<td>MOV bx, ax</td>
</tr>
<tr>
<td>003 0015AA</td>
<td>D3E0</td>
<td>SHL ax, cl</td>
</tr>
<tr>
<td>004 0015AC</td>
<td>D3E2</td>
<td>SHL dx, cl</td>
</tr>
<tr>
<td>005 0015AE</td>
<td>F6D9</td>
<td>NEG cl</td>
</tr>
<tr>
<td>006 0015B0</td>
<td>80C110</td>
<td>ADD cl, 10h</td>
</tr>
<tr>
<td>007 0015B3</td>
<td>D3EB</td>
<td>SHR bx, cl</td>
</tr>
<tr>
<td>008 0015B6</td>
<td>0BD3</td>
<td>OR dx, bx</td>
</tr>
<tr>
<td>009 0015B7</td>
<td>CB</td>
<td>RETF</td>
</tr>
<tr>
<td>010 0015B8</td>
<td>80E910</td>
<td>SUB cl, 10h</td>
</tr>
<tr>
<td>011 0015BB</td>
<td>8BD0</td>
<td>MOV dx, ax</td>
</tr>
<tr>
<td>012 0015BD</td>
<td>33C0</td>
<td>XOR ax, ax</td>
</tr>
<tr>
<td>013 0015BF</td>
<td>D3E2</td>
<td>SHL dx, cl</td>
</tr>
<tr>
<td>014 0015C1</td>
<td>CB</td>
<td>RETF</td>
</tr>
</tbody>
</table>

**LXLSH@ ENDP**

**proc_2 PROC NEAR**

<table>
<thead>
<tr>
<th>Address</th>
<th>Instruction</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>000 000303</td>
<td>55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>001 000304</td>
<td>8BEC</td>
<td>MOV bp, sp</td>
</tr>
<tr>
<td>002 000306</td>
<td>83EC06</td>
<td>SUB sp, 6</td>
</tr>
<tr>
<td>003 000309</td>
<td>8B4604</td>
<td>MOV ax, [bp+4]</td>
</tr>
<tr>
<td>004 00030C</td>
<td>99</td>
<td>CWD</td>
</tr>
<tr>
<td>005 00030D</td>
<td>B108</td>
<td>MOV cl, 8</td>
</tr>
<tr>
<td>006 00030F</td>
<td>9AA3141000</td>
<td>CALL far ptr LXLSH@</td>
</tr>
<tr>
<td>007 000314</td>
<td>52</td>
<td>PUSH dx</td>
</tr>
<tr>
<td>008 000315</td>
<td>50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>009 000316</td>
<td>8A4606</td>
<td>MOV al, [bp+6]</td>
</tr>
<tr>
<td>010 000319</td>
<td>98</td>
<td>CWD</td>
</tr>
<tr>
<td>011 00031A</td>
<td>99</td>
<td>CWD</td>
</tr>
<tr>
<td>012 00031B</td>
<td>5B</td>
<td>POP bx</td>
</tr>
<tr>
<td>013 00031C</td>
<td>59</td>
<td>POP cx</td>
</tr>
<tr>
<td>014 00031D</td>
<td>03D8</td>
<td>ADD bx, ax</td>
</tr>
<tr>
<td>015 00031F</td>
<td>13CA</td>
<td>ADC cx, dx</td>
</tr>
<tr>
<td>016 000321</td>
<td>894EFC</td>
<td>MOV [bp-4], cx</td>
</tr>
<tr>
<td>017 000324</td>
<td>895EFA</td>
<td>MOV [bp-6], bx</td>
</tr>
<tr>
<td>018 000327</td>
<td>C746FE0000</td>
<td>MOV word ptr [bp-2], 0</td>
</tr>
<tr>
<td>020 000365</td>
<td>837EFE08</td>
<td>L3: CMP word ptr [bp-2], 8</td>
</tr>
<tr>
<td>021 000369</td>
<td>7CC3</td>
<td>JL L4</td>
</tr>
<tr>
<td>022 00036B</td>
<td>8856FC</td>
<td>MOV dx, [bp-4]</td>
</tr>
<tr>
<td>023 00036E</td>
<td>8B46FA</td>
<td>MOV ax, [bp-6]</td>
</tr>
<tr>
<td>024 000371</td>
<td>2500FF</td>
<td>AND ax, OFF00h</td>
</tr>
<tr>
<td>025 000374</td>
<td>81E2FF00</td>
<td>AND dx, OFFh</td>
</tr>
<tr>
<td>026 00037B</td>
<td>B108</td>
<td>MOV cl, 8</td>
</tr>
<tr>
<td>027 00037A</td>
<td>9A85141000</td>
<td>CALL far ptr LXRSH@</td>
</tr>
</tbody>
</table>

Figure 9-47: Crc.a2 – Continued
Figure 9-47: Crc.a2 – Continued
Figure 9-47: Crc.a2 – Continued
/ * 
* Input file: crc.exe 
* File type: EXE
*/

#include "dcc.h"

int proc_1 () 
/* Takes no parameters.
* High-level language prologue code.
*/
{
    return (0);
}

long LXLSH@ (long arg0, char arg1) 
/* Uses register arguments:
* arg0 = dx:ax.
* arg1 = cl.
* Runtime support routine of the compiler.
*/
{
    int loc1; /* bx */

    if (arg1 < 16) {
        loc1 = L0(arg0);
        L0(arg0) = (L0(arg0) << arg1);
        HI(arg0) = (HI(arg0) << arg1);
        HI(arg0) = (HI(arg0) | (loc1 >> (!arg1 + 16)));
        return (arg0);
    }
    else {
        HI(arg0) = L0(arg0);
        L0(arg0) = 0;
        HI(arg0) = (HI(arg0) << (arg1 - 16));
        return (arg0);
    }
}
int LXRSH@ (long arg0, char arg1)
/* Uses register arguments:
* arg0 = dx:ax.
* arg1 = cl.
* Runtime support routine of the compiler.
*/
{
  int loc1; /* bx */

  if (arg1 < 16) {
    loc1 = HI(arg0);
    LO(arg0) = (LO(arg0) >> arg1);
    HI(arg0) = (HI(arg0) >> arg1);
    return ((LO(arg0) | (loc1 << (!arg1 + 16))));
  }
  else {
    return ((HI(arg0) >> (arg1 - 16)));
  }
}

int proc_2 (int arg0, unsigned char arg1)
/* Takes 4 bytes of parameters.
* High-level language prologue code.
* C calling convention.
*/
{
  int loc1;
  long loc2;

  loc2 = (LXLSH@ (arg0, 8) + arg1);
  loc1 = 0;
  while ((loc1 < 8)) {
    loc2 = (loc2 << 1);
    if (((loc2 & 0x1000000) != 0) {
      loc2 = (loc2 ^ 0x1102100);
    } else {
      loc2 = (loc2 ^ 0x1102100);
    }
    loc1 = (loc1 + 1);
  }
  return (LXRSH@ ((loc2 & 0xFFFF00), 8));
}

Figure 9-48: Crc.b – Continued
int proc_3 (int arg0)
/* Takes 2 bytes of parameters.
 * High-level language prologue code.
 * C calling convention.
 */
{
    return (proc_2 (proc_2 (arg0, 0), 0));
}

void main()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
    int loc1;
    int loc2;
    int loc3;
    int loc4;

    loc1 = 65;
    loc2 = proc_1();
    loc2 = proc_2(loc2, loc1);
    loc2 = proc_3(loc2);
    loc3 = ((loc2 & 0xFF00) >> 8);
    loc4 = (loc2 & 255);
    printf("%04x\n", loc2);
    loc2 = proc_1();
    loc2 = proc_2(loc2, loc1);
    loc2 = proc_2(loc2, loc3);
    loc2 = proc_2(loc2, loc4);
    printf("%04x\n", loc2);
}
/ * crc_clear:
 *   This function clears the CRC to zero. It should be called prior to
 *   the start of the processing of a block for both received messages,
 *   and messages to be transmitted.
 *
 *   Calling sequence:
 *   short crc;
 *   crc = crc_clear();
 */
short crc_clear()
{
    return(0);
}

/*
 * crc_update:
 *   this function must be called once for each character which is
 *   to be included in the CRC for messages to be transmitted.
 *   This function is called once for each character which is included
 *   in the CRC of a received message, AND once for each of the two CRC
 *   characters at the end of the received message. If the resulting
 *   CRC is zero, then the message has been correctly received.
 *
 *   Calling sequence:
 *   short crc = crc_update(crc,next_char);
 */
short crc_update(crc,crc_char)
short crc;
char crc_char;
{
    long x;
    short i;

    /* "x" will contain the character to be processed in bits 0-7 and the CRC */
    /* in bits 8-23. Bit 24 will be used to test for overflow, and then cleared */
    /* to prevent the sign bit of "x" from being set to 1. Bits 25-31 are not */
    /* used. ("x" is treated as though it is a 32 bit register). */
    x = ((long)crc << 8) + crc_char;   /* Get the CRC and the character */

    /* Repeat the following loop 8 times (for the 8 bits of the character). */
    for(i = 0;i < 8;i++)
    {

Figure 9-49: Crc.c
/* Shift the high-order bit of the character into the low-order bit of the CRC, and shift the high-order bit of the CRC into bit 24. */
x = x << 1; /* Shift "x" left one bit */

/* Test to see if the old high-order bit of the CRC was a 1. */
if(x & 0x01000000) /* Test bit 24 of "x" */
    /* If the old high-order bit of the CRC was a 1, exclusive-or it with a one */
    /* to set it to 0, and exclusive-or the CRC with hex 1021 to produce the */
    /* CCITT-recommended CRC generator of: X**16 + X**12 + X**5 + 1. To produce */
    /* the CRC generator of: X**16 + X**15 + X**2 + 1, change the constant from */
    /* 0x01102100 to 0x01800500. This will exclusive-or the CRC with hex 8005 */
    /* and produce the same CRC that IBM uses for their synchronous transmission */
    /* protocols. */
    x = x ^ 0x01102100; /* Exclusive-or "x" with a...*/
    /* ...constant of hex 01102100 */
    /* And repeat 8 times. */
}
/* End of "for" loop */

/* Return the CRC as the 16 low-order bits of this function's value. */
return(((x & 0x00ffff00) >> 8)); /* AND off the unneeded bits and... */
/* ...shift the result 8 bits to the right */

/* crc_finish:

This function must be called once after all the characters in a block have been processed for a message which is to be TRANSMITTED. It returns the calculated CRC bytes, which should be transmitted as the two characters following the block. The first of these 2 bytes must be taken from the high-order byte of the CRC, and the second must be taken from the low-order byte of the CRC. This routine is NOT called for a message which has been RECEIVED.

Calling sequence:

crc = crc_finish(crc);
*/

short crc_finish(crc)
short crc;
{
    /* Call crc_update twice, passing it a character of hex 00 each time, to */
    /* flush out the last 16 bits from the CRC calculation, and return the */
    /* result as the value of this function. */
    return(crc_update(crc_update(crc,'\0'),'\0'));
}

Figure 9-49: Crc.c – Continued
/ * This is a sample of the use of the CRC functions, which calculates the * CRC for a 1-character message block, and then passes the resulting CRC back * into the CRC functions to see if the "received" 1-character message and CRC * are correct. */

main()
{
    short crc; /* The calculated CRC */
    char crc_char; /* The 1-character message */
    char x, y; /* 2 places to hold the 2 "received" CRC bytes */

    crc_char = 'A'; /* Define the 1-character message */
    crc = crc_clear(); /* Reset the CRC to "transmit" a new message */
    crc = crc_update(crc, crc_char); /* Update the CRC for the first... */
        /* ...(and only) character of the message */
    crc = crc_finish(crc); /* Finish the transmission calculation */
    x = (char)((crc & 0xff00) >> 8); /* Extract the high-order CRC byte */
    y = (char)(crc & 0x00ff); /* And extract the low-order byte */
    printf("%04x\n", crc); /* Print the results */

    crc = crc_clear(); /* Prepare to "receive" a message */
    crc = crc_update(crc, crc_char); /* Update the CRC for the first... */
        /* ...(and only) character of the message */
    crc = crc_update(crc, x); /* Pass both bytes of the "received" ... */
    crc = crc_update(crc, y); /* ...CRC through crc_update, too */
    printf("%04x\n", crc); /* If the result was 0, then the message... */
        /* ...was received without error */
}

Figure 9-49: Crc.c – Continued

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>proc_1</td>
<td>6</td>
<td>1</td>
<td>83.33</td>
</tr>
<tr>
<td>LXLSH@</td>
<td>15</td>
<td>10</td>
<td>33.33</td>
</tr>
<tr>
<td>LXRSH@</td>
<td>15</td>
<td>6</td>
<td>60.00</td>
</tr>
<tr>
<td>proc_2</td>
<td>52</td>
<td>8</td>
<td>84.62</td>
</tr>
<tr>
<td>proc_3</td>
<td>16</td>
<td>1</td>
<td>93.75</td>
</tr>
<tr>
<td>main</td>
<td>67</td>
<td>12</td>
<td>82.09</td>
</tr>
<tr>
<td>total</td>
<td>171</td>
<td>38</td>
<td>77.78</td>
</tr>
</tbody>
</table>

Figure 9-50: Crc Statistics
9.7.10 Matrixmu

Matrixmu is a program that multiplies two matrixes. This program is incomplete in the sense that it does not initialize the matrixes, but was decompiled to show that the forward substitution method of Chapter 5, Section 5.4.10 is able to find array expressions. The conversion of this expression into an array was not done in dcc, but was explained in Chapter 5, Section 5.5. The disassembly program is shown in Figure 9-51, the decompiled C program in Figure 9-52, and the initial C program in Figure 9-53. The call graph for this program is as follows:

```
main
  proc_1
```

Both user procedures are decompiled with the same number of high-level instructions; 10 for the matrix multiplication procedure, and 1 for the main program. The reduction on the number of instructions is over 85% due to the large number of low-level instructions involved on the computation of an array offset. In the disassembled version of the program, the basic block at lines 026 to 069 of procedure proc_1 has 44 instructions which are converted into two high-level instructions; a reduction of 95.45% intermediate instructions. Overall, this program has a 86.90% reduction of intermediate instructions, as shown in Figure 9-54.
proc_1 PROC NEAR
000 0002FA 55 PUSH    bp
001 0002FB 8BEC MOV     bp, sp
002 0002FD 83EC02 SUB     sp, 2
003 000300 56 PUSH    si
004 000301 57 PUSH    di
005 000302 33F6 XOR    si, si
007 000378 83FE05 L1:  CMP    si, 5
008 00037B 7C89 JL      L2
009 00037D 5F POP     di
010 00037E 5E POP     si
011 00037F 8BE5 MOV     sp, bp
012 000381 5D POP     bp
013 000382 C3 RET
014 000306 33FF L2:  XOR    di, di
016 000372 83FF04 L3:  CMP    di, 4
017 000375 7C93 JL      L4
018 000377 46 INC     si
019 000378 55 JMP     L1 ;Synthetic inst
020 00030A C746FE0000 L4:  MOV    word ptr [bp-2], 0
022 00036B 837EFE04 L5:  CMP    word ptr [bp-2], 4
023 00036F 7CA0 JL      L6
024 000371 47 INC     di
025 000372 55 JMP     L3 ;Synthetic inst
026 000311 8BDE L6:  MOV     bx, si
027 000313 D1E3 SHL     bx, 1
028 000315 D1E3 SHL     bx, 1
029 000317 D1E3 SHL     bx, 1
030 000319 035E04 ADD     bx, [bp+4]
031 00031C 8846FE MOV     ax, [bp-2]
032 00031F D1E0 SHL     ax, 1
033 000321 03D8 ADD     bx, ax
034 000323 8B07 MOV     ax, [bx]
035 000325 50 PUSH     ax
036 000326 8B46FE MOV     ax, [bp-2]
037 000329 BA0A00 MOV     dx, 0Ah
038 00032C F7E2 MUL     dx
039 00032E 8BD8 MOV     bx, ax
040 000330 035E06 ADD     bx, [bp+6]
041 000333 8BC7 MOV     ax, di
042 000335 D1E0 SHL     ax, 1
043 000337 03D8 ADD     bx, ax
044 000339 58 POP      ax

Figure 9-51: Matrixmu.a2
9.7 Results

Figure 9-51: Matrixmu.a2 – Continued
/*
 * Input file : matrixmu.exe
 * File type : EXE
 */

#include "dcc.h"

void proc_1 (int arg0, int arg1, int arg2)
/* Takes 6 bytes of parameters.
 * High-level language prologue code.
 * C calling convention.
 */
{
  int loc1;
  int loc2;
  int loc3;

  loc2 = 0;
  while ((loc2 < 5)) {
    loc3 = 0;
    while ((loc3 < 4)) {
      loc1 = 0;
      while ((loc1 < 4)) {
        *((((loc2 * 10) + arg2) + (loc3 << 1))) =
        (**(((loc2 << 3) + arg0) + (loc1 << 1))) *
        (**(((loc1 * 10) + arg1) + (loc3 << 1))) +
        (**(((loc2 * 10) + arg2) + (loc3 << 1))));
        loc1 = (loc1 + 1);
      }
      loc3 = (loc3 + 1);
    }
    loc2 = (loc2 + 1);
  }
}

void main ()
/* Takes no parameters.
 * High-level language prologue code.
 */
{
  int loc1;
  int loc2;
  int loc3;

  proc_1 (&loc3, &loc2, &loc1);
}

Figure 9-52: Matrixmu.b

#define n 5
#define m 4

static void multMatrix (int a[n][m], int b[m][n], int c[n][n])
{ int i, j, k;

   for (i=0; i<n; i++)
      for (j=0; j<m; j++)
          for (k=0; k<m; k++)
              c[i][j] = a[i][k] * b[k][j] + c[i][j];
}

main()
{ int a[n][m], b[n][m], c[n][m];

   multMatrix (a, b, c);
}

Figure 9-53: Matrixmu.c

<table>
<thead>
<tr>
<th>Subroutine</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>proc_1</td>
<td>70</td>
<td>10</td>
<td>85.71</td>
</tr>
<tr>
<td>main</td>
<td>14</td>
<td>1</td>
<td>92.86</td>
</tr>
<tr>
<td>total</td>
<td>84</td>
<td>11</td>
<td>86.90</td>
</tr>
</tbody>
</table>

Figure 9-54: Matrixmu Statistics
9.7.11 Overall Results

The summary results of the 10 programs that were presented in the previous sections are given in Figure 9-55. The total number of low-level intermediate instructions is 963, compared with the final 306 high-level instructions, which gives a reduction of instructions of 76.25%. This reduction of instructions is mainly due to the optimizations performed during data flow analysis, particularly extended register copy propagation (Chapter 5, Section 5.4.10). The recognition of idioms in the low-level code also reduces the number of instructions and helps in the determination of data types such as long integers. Decompiled programs have the same number of user subroutines, plus any runtime support routines used in the program. These latter routines are sometimes translatable into a high-level representation; assembler is generated whenever they are untranslatable.

<table>
<thead>
<tr>
<th>Program</th>
<th>Low-level</th>
<th>High-level</th>
<th>% Reduction</th>
</tr>
</thead>
<tbody>
<tr>
<td>intops</td>
<td>45</td>
<td>10</td>
<td>77.78</td>
</tr>
<tr>
<td>byteops</td>
<td>58</td>
<td>10</td>
<td>82.76</td>
</tr>
<tr>
<td>longops</td>
<td>117</td>
<td>48</td>
<td>58.97</td>
</tr>
<tr>
<td>benchsho</td>
<td>101</td>
<td>25</td>
<td>75.25</td>
</tr>
<tr>
<td>benchlng</td>
<td>139</td>
<td>28</td>
<td>79.86</td>
</tr>
<tr>
<td>benchmul</td>
<td>88</td>
<td>12</td>
<td>86.36</td>
</tr>
<tr>
<td>benchfn</td>
<td>82</td>
<td>36</td>
<td>56.10</td>
</tr>
<tr>
<td>fibo</td>
<td>78</td>
<td>15</td>
<td>80.77</td>
</tr>
<tr>
<td>crc</td>
<td>171</td>
<td>38</td>
<td>77.78</td>
</tr>
<tr>
<td>matrixmu</td>
<td>84</td>
<td>11</td>
<td>86.90</td>
</tr>
<tr>
<td>total</td>
<td>963</td>
<td>306</td>
<td>76.25</td>
</tr>
</tbody>
</table>

Figure 9-55: Results for Tested Programs
Chapter 10

Conclusions

This thesis has presented techniques for the reverse compilation or decompilation of binary programs, and provided algorithms for the implementation of the different phases of the decompiler. The methodology was implemented and tested in a prototype decompiler, dcc, which runs under DOS and Unix.

Decompilers use similar principles and techniques used in compilers. A decompiler has seven different phases, which incorporate compiler and optimization phases. There is no lexical analysis phase due to the simplicity of the source machine language. The syntax analysis phase parses the source binary program separating code from data, and placing data references in the symbol table. The main difficulty with the separation of code from data is that they are represented in the same way in von Neumann machines. The intermediate code generation phase generates a low-level intermediate representation of the program. The semantic analysis phase checks the semantic meaning of groups of low-level instructions (idioms), gathers type information, and propagates it across the intermediate representation. The control flow graph generation phase generates a control flow graph of each subroutine of the program, and attaches the intermediate representation information to the nodes of the graph. The data flow analysis phase analyzes the low-level intermediate code and converts it into a high-level intermediate representation available in any high-level language. The transformation of instructions eliminates all low-level references to condition codes and registers, and introduces the high-level concept of expression. Subroutines that are not representable in a high-level language are flagged. The structure of the program is analyzed in the control flow analysis phase, which structures the control flow graphs of each subroutine in the program. Finally, the code generation phase generates high-level code based on the high-level intermediate representation and the structured graph of each subroutine.

A complete decompilation of a program makes use of not only the decompiler but other related tools: the loader, the signature generator, the prototype generator, the disassembler, and the postprocessor. The loader loads the source binary program into memory, the signature generator generates signatures for known compilers and their libraries (if required), the prototype generator determines the formal argument types for library subroutines, the disassembler parses the program and produces an assembler output file, the decompiler makes use of the signature information to reduce the number of subroutines to decompile (i.e. it does not attempt to decompile library routines if they are recognized by a signature or the loader), and the postprocessor transforms the output decompiled high-level program into a semantically equivalent program that makes use of specific control structures available
in the target language. In practice, a decompiler can take as input a binary program or an assembler program, and produce a high-level language output program. Most literature available on decompilers make use of the latter approach; an assembler source program. This thesis concentrates on source binary programs, which have far less information than assembler programs.

The techniques described in this thesis are general enough to construct decompilers for different machine architectures. The phases are grouped into 3 different modules that separate machine and language dependent features: the front-end is a machine dependent module that parses the source binary program and produces a low-level intermediate representation of the program and a control flow graph of each subroutine; the universal decompiling machine is a machine and language independent module that analyzes the intermediate code and the structure of the graph(s) and generates a high-level intermediate representation of the program and a structured graph(s); and the back-end is a target language dependent module that generates high-level target code from the intermediate representation and the structure of the graph. In this way, a decompiler for a different machine can be built by writing a new front-end for that machine, and a decompiler for a different target high-level language can be built by writing a new back-end for the target language. This approach is limited in practice by the choice of low-level intermediate language representation.

The significant contributions of this thesis are the types of analyses done in the universal decompiling machine: data flow analysis and control flow analysis, which transform the low-level (machine-like) intermediate code into a high-level (HLL-like) intermediate representation. The data flow analyzer describes optimization techniques based on compiler optimization principles, which eliminate the low-level concepts of condition codes and registers, and introduces the high-level concept of expression. These techniques take into account interprocedural analysis, register spilling, and type propagation. The control flow analyzer describes structuring algorithms to determine the underlying high-level control structures of the program. These algorithms structure the graph according to a predefined, generic set of control structures available in most commonly used languages.

The implementation of these techniques in the prototype decompiler dcc demonstrates the feasibility of the presented techniques. dcc is a decompiler for the DOS operating system and the Intel i80286 machine architecture which generates target C programs. This decompiler runs on a DecStation 3100 under Unix, and on Intel machines under DOS. dcc makes use of compiler and library signature recognition to decompile user routines only (whenever possible), rather than decompiling compiler start-up and library routines as well. Whenever a compiler signature is not determined, all subroutines available in the source binary program are decompiled; several of the library and compiler start-up routines are untranslatable into a high-level language representation and hence are disassembled only. dcc provides comments for each subroutine, and has command switches to generate the bitmap of the program, the call graph, an output assembler file, statistics on the number of low-level and high-level instructions in each subroutine, and information on the control flow graph of each subroutine.

Decompile is used in two main areas of computer science: software maintenance and security. A decompiler is used in software maintenance to recover lost or inaccessible source
Conclusions

code, translate code written in an obsolete language into a newer language, structure old code written in an unstructured way (i.e. spaghetti code), migrate applications to a new hardware platform, and debug binary programs that are known to have a bug. In security, a decompiler is used to verify binary programs and the correctness of the code produced by a compiler for safety-critical systems; where the compiler is not trusted to generate correct code; and to check for the existence of malicious code such as viruses.

Further work on decompilation can be done in two areas: the separation of code and data, and the determination of data types such as arrays, records, and pointers. The former area needs a robust method of determining n-way branch statements (i.e. indexed jumps) and indirect subroutine calls. The latter area needs heuristic methods to identify different types of compound data types and propagate their values. Efficient implementation of the algorithms would provide a faster decompiler, although the speed of decompilation is not a concern given that a program is normally decompiled once only.
Appendix A

i8086 – i80286 Architecture

The Intel iAPX 8086, 8088, 80186 and 80286 machine architectures consist of the same type of registers, memory structure and input/output port organization[Int86, Int87]. These architectures are downwards compatible, hence the 80286 supports all machine instructions supported by the 8086 architecture. The registers of these 16-bit word machines are classified into five different sets according to their usage: data, pointer, index, control, and segment registers; this classification is shown in Figure A-1.

<table>
<thead>
<tr>
<th>Type</th>
<th>Register</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data</td>
<td>ax</td>
<td>accumulator</td>
</tr>
<tr>
<td></td>
<td>bx</td>
<td>base register in some addressing modes</td>
</tr>
<tr>
<td></td>
<td>cx</td>
<td>counter</td>
</tr>
<tr>
<td></td>
<td>dx</td>
<td>general purpose</td>
</tr>
<tr>
<td>Pointer</td>
<td>sp</td>
<td>stack pointer</td>
</tr>
<tr>
<td></td>
<td>bp</td>
<td>base pointer</td>
</tr>
<tr>
<td>Index</td>
<td>si</td>
<td>source</td>
</tr>
<tr>
<td></td>
<td>di</td>
<td>destination</td>
</tr>
<tr>
<td>Control</td>
<td>ip</td>
<td>instruction pointer</td>
</tr>
<tr>
<td></td>
<td>flags</td>
<td>flags or status word</td>
</tr>
<tr>
<td>Segment</td>
<td>cs</td>
<td>code segment</td>
</tr>
<tr>
<td></td>
<td>ds</td>
<td>data segment</td>
</tr>
<tr>
<td></td>
<td>ss</td>
<td>stack segment</td>
</tr>
<tr>
<td></td>
<td>es</td>
<td>extra segment</td>
</tr>
</tbody>
</table>

Figure A-1: Register Classification

Data or general purpose registers can be accessed as word or byte registers. Each register has a high and low byte with the following naming convention: register names that replace the x by a h access the high byte of that register; and register names that replace the x by an l access the low byte of that register. The flags register is a special purpose register that keeps track of the condition codes set up by different instructions. The structure of this register is shown in Figure A-2. As can be seen, not all bits are used; unused bits are reserved by Intel.

Memory is structured as an array of 8-bit bytes stored in little-endian convention (i.e. most significant byte of a word is stored at the highest memory address). Memory is divided
into banks of segments, each segment is a linear sequence of 64K bytes; therefore memory is addressed via a segment and offset pair.

Input/output port organization consists of up to 64Kb of 8-bit ports or 32Kb of 16-bit ports, located in a separate addressing space from the memory space.

### A.1 Instruction Format

The length of an 80286 instruction varies from 1 up to 6 bytes. There are two types of opcodes: 1-byte opcodes and compound opcodes. 1-byte opcodes use the first byte of an instruction as the opcode, followed by the fields byte, at most 2 bytes of displacement, and at most 2 bytes of data. The fields byte contains information about registers, immediate operands, and/or displacement data. Compound opcodes store part of the opcode in the first byte of the instruction, and part in three bits of the second byte of the instruction (see Figure A-3). The first byte determines the group table to which the instruction belongs, and the 3-bit opcode of the second byte determines the index into the table (i.e. there are 8 entries into the table). The remaining bits of the second byte are used as the fields byte. The rest of the instruction is structured in the same way as for 1-byte opcodes[LG86].

![Figure A-3: Compound Opcodes' Second Byte](image)

In the 80286, almost all byte combinations are valid opcodes. There are 229 1-byte opcodes, 29 compound-opcodes and 6 prefix instructions. A complete list of the machine language instructions, mnemonics and operands is found in Section A.2.

The fields byte is used to calculate the effective address (EA) of the operand. This byte is made up of 3 fields: the reg 3-bit field which takes the value of a register, the r/m 3-bit field which is used as a second register or a memory operand, and the mod 2-bit field which
determines the number of displacement bytes (DISP), whether \( r/m \) is used as a register or a memory operand, or the effective address of instructions that are not indexed nor based-indexed. The structure of this byte is shown in Figure A-4. An algorithm to interpret the fields byte is shown in Figure A-5.

```plaintext
case (mod) of {
  0: if (r/m == 6) /* get 2 bytes displacement */
      EA = dispHi:dispLo;
      else /* no extra bytes */
          DISP = 0;
  1: /* get 1 byte displacement */
      DISP = dispLo sign-extended to 16 bits;
  2: /* get 2 bytes displacement */
      DISP = dispHi:dispLo;
  3: /* Indexed */
      r/m is treated as a register field;
}
```

The EA for indexed and based-indexed operands is calculated according to the \( r/m \) field; each value is mapped to an indexed register or a combination of indexed and based registers, as shown in Figure A-6.

<table>
<thead>
<tr>
<th>Value of ( r/m )</th>
<th>Indexed register(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>bx + si</td>
</tr>
<tr>
<td>1</td>
<td>bx + di</td>
</tr>
<tr>
<td>2</td>
<td>bp + si</td>
</tr>
<tr>
<td>3</td>
<td>bp + di</td>
</tr>
<tr>
<td>4</td>
<td>si</td>
</tr>
<tr>
<td>5</td>
<td>di</td>
</tr>
<tr>
<td>6</td>
<td>bp</td>
</tr>
<tr>
<td>7</td>
<td>bx</td>
</tr>
</tbody>
</table>

Figure A-6: Mapping of \( r/m \) field
The final effective address is calculated as the addition of the displacement (DISP) and the register(s) given by the \( r/m \) bits.

Each combination of \( mod, r/m \) values uses a default segment register for its addressing, these default segments are shown in Figure A-7. Although the effective address of an operand is determined by the combination of the \( mod, r/m \) fields, the final physical address is calculated by adding the EA to the contents of the default segment register multiplied by 16. As a general rule, when the \( bp \) register is used, the default segment is \( ss \), otherwise the default segment is \( ds \).

<table>
<thead>
<tr>
<th>( r/m / mod )</th>
<th>0</th>
<th>1</th>
<th>2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>DS</td>
<td>DS</td>
<td>DS</td>
</tr>
<tr>
<td>1</td>
<td>DS</td>
<td>DS</td>
<td>DS</td>
</tr>
<tr>
<td>2</td>
<td>SS</td>
<td>SS</td>
<td>SS</td>
</tr>
<tr>
<td>3</td>
<td>SS</td>
<td>SS</td>
<td>SS</td>
</tr>
<tr>
<td>4</td>
<td>DS</td>
<td>DS</td>
<td>DS</td>
</tr>
<tr>
<td>5</td>
<td>DS</td>
<td>DS</td>
<td>DS</td>
</tr>
<tr>
<td>6</td>
<td>DS</td>
<td>SS</td>
<td>SS</td>
</tr>
<tr>
<td>7</td>
<td>DS</td>
<td>DS</td>
<td>DS</td>
</tr>
</tbody>
</table>

Figure A-7: Default Segments

The segment override prefix is a 1 byte opcode that permits exceptions to the default segment register to be used by the next instruction (i.e. it is only valid for 1 instruction; the one that follows it). The segment is determined by a 2-bit field (bits 3 and 4) of the prefix byte. All other fields take constant values, as illustrated in Figure A-8.

Figure A-8: Segment Override Prefix

There are two repeat prefix opcodes, \texttt{repne} and \texttt{repe}. These opcodes repeat the execution of the next instruction while register \texttt{cx} is not equal or equal to zero. They are normally used with string instructions such as \texttt{movs} and \texttt{ins} to repeat a condition while it is not end of string.

### A.2 Instruction Set

The instruction set of the i80286 is described in terms of the machine opcode, the assembler mnemonic, and the assembler operands to the instruction. The following conventions are used to describe such an instruction set:
A.2 Instruction Set

- reg8: 8-bit register.
- reg16: 16-bit register.
- mem8: 8-bit memory value.
- mem16: 16-bit memory value.
- immed8: 8-bit immediate value.
- immed16: 16-bit immediate value.
- immed32: 32-bit immediate value.
- segReg: 16-bit segment register.

Figure A-9 shows all 1-byte opcodes. Compound opcodes are referenced as indexes into a table, each table has 8 possible values. The tables are shown in Figures A-10, A-11, A-12, and A-13. These figures are summaries of figures described in [Int86, Int87].
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>ADD reg8/mem8,reg8</td>
</tr>
<tr>
<td>01</td>
<td>ADD reg16/mem16,reg16</td>
</tr>
<tr>
<td>02</td>
<td>ADD reg8,reg8/mem8</td>
</tr>
<tr>
<td>03</td>
<td>ADD reg16,reg16/mem16</td>
</tr>
<tr>
<td>04</td>
<td>ADD AL,immed8</td>
</tr>
<tr>
<td>05</td>
<td>ADD AX,immed16</td>
</tr>
<tr>
<td>06</td>
<td>PUSH es</td>
</tr>
<tr>
<td>07</td>
<td>POP es</td>
</tr>
<tr>
<td>08</td>
<td>OR reg8/mem8,reg8</td>
</tr>
<tr>
<td>09</td>
<td>OR reg16/mem16,reg16</td>
</tr>
<tr>
<td>0A</td>
<td>OR reg8,reg8/mem8</td>
</tr>
<tr>
<td>0B</td>
<td>OR reg16,reg16/mem16</td>
</tr>
<tr>
<td>0C</td>
<td>OR al,immed8</td>
</tr>
<tr>
<td>0D</td>
<td>OR ax,immed16</td>
</tr>
<tr>
<td>0E</td>
<td>PUSH cs</td>
</tr>
<tr>
<td>0F</td>
<td>Not used</td>
</tr>
<tr>
<td>10</td>
<td>ADC reg8/mem8,reg8</td>
</tr>
<tr>
<td>11</td>
<td>ADC reg16/mem16,reg16</td>
</tr>
<tr>
<td>12</td>
<td>ADC reg8,reg8/mem8</td>
</tr>
<tr>
<td>13</td>
<td>ADC reg16,reg16/mem16</td>
</tr>
<tr>
<td>14</td>
<td>ADC al,immed8</td>
</tr>
<tr>
<td>15</td>
<td>ADC ax,immed16</td>
</tr>
<tr>
<td>16</td>
<td>PUSH ss</td>
</tr>
<tr>
<td>17</td>
<td>POP ss</td>
</tr>
<tr>
<td>18</td>
<td>SBB reg8/mem8,reg8</td>
</tr>
<tr>
<td>19</td>
<td>SBB reg16/mem16,reg16</td>
</tr>
<tr>
<td>1A</td>
<td>SBB reg8,reg8/mem8</td>
</tr>
<tr>
<td>1B</td>
<td>SBB reg16,reg16/mem16</td>
</tr>
<tr>
<td>1C</td>
<td>SBB al,immed8</td>
</tr>
<tr>
<td>1D</td>
<td>SBB ax,immed16</td>
</tr>
<tr>
<td>1E</td>
<td>PUSH ds</td>
</tr>
<tr>
<td>1F</td>
<td>POP ds</td>
</tr>
<tr>
<td>20</td>
<td>AND reg8/mem8,reg8</td>
</tr>
<tr>
<td>21</td>
<td>AND reg16/mem16,reg16</td>
</tr>
<tr>
<td>22</td>
<td>AND reg8,reg8/mem8</td>
</tr>
<tr>
<td>23</td>
<td>AND reg16,reg16/mem16</td>
</tr>
<tr>
<td>24</td>
<td>AND al,immed8</td>
</tr>
<tr>
<td>25</td>
<td>AND ax,immed16</td>
</tr>
<tr>
<td>26</td>
<td>Segment override</td>
</tr>
<tr>
<td>27</td>
<td>DAA</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>28</td>
<td>SUB reg8/mem8,reg8</td>
</tr>
<tr>
<td>29</td>
<td>SUB reg16/mem16,reg16</td>
</tr>
<tr>
<td>2A</td>
<td>SUB reg8,reg8/mem8</td>
</tr>
<tr>
<td>2B</td>
<td>SUB reg16,reg16/mem16</td>
</tr>
<tr>
<td>2C</td>
<td>SUB al,immed8</td>
</tr>
<tr>
<td>2D</td>
<td>SUB ax,immed16</td>
</tr>
<tr>
<td>2E</td>
<td>Segment override</td>
</tr>
<tr>
<td>2F</td>
<td>DAS</td>
</tr>
<tr>
<td>30</td>
<td>XOR reg8/mem8,reg8</td>
</tr>
<tr>
<td>31</td>
<td>XOR reg16/mem16,reg16</td>
</tr>
<tr>
<td>32</td>
<td>XOR reg8,reg8/mem8</td>
</tr>
<tr>
<td>33</td>
<td>XOR reg16,reg16/mem16</td>
</tr>
<tr>
<td>34</td>
<td>XOR al,immed8</td>
</tr>
<tr>
<td>35</td>
<td>XOR ax,immed16</td>
</tr>
<tr>
<td>36</td>
<td>Segment override</td>
</tr>
<tr>
<td>37</td>
<td>AAA</td>
</tr>
<tr>
<td>38</td>
<td>CMP reg8/mem8,reg8</td>
</tr>
<tr>
<td>39</td>
<td>CMP reg16/mem16,reg16</td>
</tr>
<tr>
<td>3A</td>
<td>CMP reg8,reg8/mem8</td>
</tr>
<tr>
<td>3B</td>
<td>CMP reg16,reg16/mem16</td>
</tr>
<tr>
<td>3C</td>
<td>CMP al,immed8</td>
</tr>
<tr>
<td>3D</td>
<td>CMP ax,immed16</td>
</tr>
<tr>
<td>3E</td>
<td>Segment override</td>
</tr>
<tr>
<td>3F</td>
<td>AAS</td>
</tr>
<tr>
<td>40</td>
<td>INC ax</td>
</tr>
<tr>
<td>41</td>
<td>INC cx</td>
</tr>
<tr>
<td>42</td>
<td>INC dx</td>
</tr>
<tr>
<td>43</td>
<td>INC bx</td>
</tr>
<tr>
<td>44</td>
<td>INC sp</td>
</tr>
<tr>
<td>45</td>
<td>INC bp</td>
</tr>
<tr>
<td>46</td>
<td>INC si</td>
</tr>
<tr>
<td>47</td>
<td>INC di</td>
</tr>
<tr>
<td>48</td>
<td>DEC ax</td>
</tr>
<tr>
<td>49</td>
<td>DEC cx</td>
</tr>
<tr>
<td>4A</td>
<td>DEC dx</td>
</tr>
<tr>
<td>4B</td>
<td>DEC bx</td>
</tr>
<tr>
<td>4C</td>
<td>DEC sp</td>
</tr>
<tr>
<td>4D</td>
<td>DEC bp</td>
</tr>
<tr>
<td>4E</td>
<td>DEC si</td>
</tr>
<tr>
<td>4F</td>
<td>DEC di</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte opcodes – Continued
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>50</td>
<td>PUSH ax</td>
</tr>
<tr>
<td>51</td>
<td>PUSH cx</td>
</tr>
<tr>
<td>52</td>
<td>PUSH dx</td>
</tr>
<tr>
<td>53</td>
<td>PUSH bx</td>
</tr>
<tr>
<td>54</td>
<td>PUSH sp</td>
</tr>
<tr>
<td>55</td>
<td>PUSH bp</td>
</tr>
<tr>
<td>56</td>
<td>PUSH si</td>
</tr>
<tr>
<td>57</td>
<td>PUSH di</td>
</tr>
<tr>
<td>58</td>
<td>POP ax</td>
</tr>
<tr>
<td>59</td>
<td>POP cx</td>
</tr>
<tr>
<td>5A</td>
<td>POP dx</td>
</tr>
<tr>
<td>5B</td>
<td>POP bx</td>
</tr>
<tr>
<td>5C</td>
<td>POP sp</td>
</tr>
<tr>
<td>5D</td>
<td>POP bp</td>
</tr>
<tr>
<td>5E</td>
<td>POP si</td>
</tr>
<tr>
<td>5F</td>
<td>POP di</td>
</tr>
<tr>
<td>60</td>
<td>PUSHA</td>
</tr>
<tr>
<td>61</td>
<td>POPA</td>
</tr>
<tr>
<td>62</td>
<td>BOUND reg16/mem16,reg16</td>
</tr>
<tr>
<td>63</td>
<td>Not used</td>
</tr>
<tr>
<td>64</td>
<td>Not used</td>
</tr>
<tr>
<td>65</td>
<td>Not used</td>
</tr>
<tr>
<td>66</td>
<td>Not used</td>
</tr>
<tr>
<td>67</td>
<td>Not used</td>
</tr>
<tr>
<td>68</td>
<td>PUSH immed16</td>
</tr>
<tr>
<td>69</td>
<td>IMUL reg16/mem16,immed16</td>
</tr>
<tr>
<td>6A</td>
<td>PUSH immed8</td>
</tr>
<tr>
<td>6B</td>
<td>IMUL reg8/mem8,immed8</td>
</tr>
<tr>
<td>6C</td>
<td>INSB</td>
</tr>
<tr>
<td>6D</td>
<td>INSW</td>
</tr>
<tr>
<td>6E</td>
<td>OUTSB</td>
</tr>
<tr>
<td>6F</td>
<td>OUTSW</td>
</tr>
<tr>
<td>70</td>
<td>JO immed8</td>
</tr>
<tr>
<td>71</td>
<td>JNO immed8</td>
</tr>
<tr>
<td>72</td>
<td>JB immed8</td>
</tr>
<tr>
<td>73</td>
<td>JNB immed8</td>
</tr>
<tr>
<td>74</td>
<td>JZ immed8</td>
</tr>
<tr>
<td>75</td>
<td>JNZ immed8</td>
</tr>
<tr>
<td>76</td>
<td>JBE immed8</td>
</tr>
<tr>
<td>77</td>
<td>JA immed8</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes – Continued
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>78</td>
<td>JS immed8</td>
</tr>
<tr>
<td>79</td>
<td>JNS immed8</td>
</tr>
<tr>
<td>7A</td>
<td>JP immed8</td>
</tr>
<tr>
<td>7B</td>
<td>JNP immed8</td>
</tr>
<tr>
<td>7C</td>
<td>JL immed8</td>
</tr>
<tr>
<td>7D</td>
<td>JNL immed8</td>
</tr>
<tr>
<td>7E</td>
<td>JLE immed8</td>
</tr>
<tr>
<td>7F</td>
<td>JG immed8</td>
</tr>
<tr>
<td>80</td>
<td>Table2 reg8</td>
</tr>
<tr>
<td>81</td>
<td>Table2 reg16</td>
</tr>
<tr>
<td>82</td>
<td>Table2 reg8</td>
</tr>
<tr>
<td>83</td>
<td>Table2 reg8, reg16</td>
</tr>
<tr>
<td>84</td>
<td>TEST reg8/mem8,reg8</td>
</tr>
<tr>
<td>85</td>
<td>TEST reg16/mem16,reg16</td>
</tr>
<tr>
<td>86</td>
<td>XCHG reg8,reg8</td>
</tr>
<tr>
<td>87</td>
<td>XCHG reg16,reg16</td>
</tr>
<tr>
<td>88</td>
<td>MOV reg8/mem8,reg8</td>
</tr>
<tr>
<td>89</td>
<td>MOV reg16/mem16,reg16</td>
</tr>
<tr>
<td>8A</td>
<td>MOV reg8,reg8/mem8</td>
</tr>
<tr>
<td>8B</td>
<td>MOV reg16,reg16/mem16</td>
</tr>
<tr>
<td>8C</td>
<td>MOV reg16/mem16,segReg</td>
</tr>
<tr>
<td>8D</td>
<td>LEA reg16,reg16/mem16</td>
</tr>
<tr>
<td>8E</td>
<td>MOV segReg,reg16/mem16</td>
</tr>
<tr>
<td>8F</td>
<td>POP reg16/mem16</td>
</tr>
<tr>
<td>90</td>
<td>NOP</td>
</tr>
<tr>
<td>91</td>
<td>XCHG ax,cx</td>
</tr>
<tr>
<td>92</td>
<td>XCHG ax,dx</td>
</tr>
<tr>
<td>93</td>
<td>XCHG ax,bx</td>
</tr>
<tr>
<td>94</td>
<td>XCHG ax,sp</td>
</tr>
<tr>
<td>95</td>
<td>XCHG ax,bp</td>
</tr>
<tr>
<td>96</td>
<td>XCHG ax,si</td>
</tr>
<tr>
<td>97</td>
<td>XCHG ax,di</td>
</tr>
<tr>
<td>98</td>
<td>CBW</td>
</tr>
<tr>
<td>99</td>
<td>CWD</td>
</tr>
<tr>
<td>9A</td>
<td>CALL immed32</td>
</tr>
<tr>
<td>9B</td>
<td>WAIT</td>
</tr>
<tr>
<td>9C</td>
<td>PUSHF</td>
</tr>
<tr>
<td>9D</td>
<td>POPF</td>
</tr>
<tr>
<td>9E</td>
<td>SAHF</td>
</tr>
<tr>
<td>9F</td>
<td>LAHF</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes – Continued
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>A0</td>
<td>MOV al,[mem8]</td>
</tr>
<tr>
<td>A1</td>
<td>MOV ax,[mem16]</td>
</tr>
<tr>
<td>A2</td>
<td>MOV [mem8],al</td>
</tr>
<tr>
<td>A3</td>
<td>MOV [mem16],ax</td>
</tr>
<tr>
<td>A4</td>
<td>MOVSB</td>
</tr>
<tr>
<td>A5</td>
<td>MOVSW</td>
</tr>
<tr>
<td>A6</td>
<td>CMPSB</td>
</tr>
<tr>
<td>A7</td>
<td>CMPSW</td>
</tr>
<tr>
<td>A8</td>
<td>TEST al,[mem8]</td>
</tr>
<tr>
<td>A9</td>
<td>TEST ax,[mem16]</td>
</tr>
<tr>
<td>AA</td>
<td>STOSB</td>
</tr>
<tr>
<td>AB</td>
<td>STOSW</td>
</tr>
<tr>
<td>AC</td>
<td>LODSB</td>
</tr>
<tr>
<td>AD</td>
<td>LODSW</td>
</tr>
<tr>
<td>AE</td>
<td>SCASB</td>
</tr>
<tr>
<td>AF</td>
<td>SCASW</td>
</tr>
<tr>
<td>B0</td>
<td>MOV al,immed8</td>
</tr>
<tr>
<td>B1</td>
<td>MOV cl,immed8</td>
</tr>
<tr>
<td>B2</td>
<td>MOV dl,immed8</td>
</tr>
<tr>
<td>B3</td>
<td>MOV bl,immed8</td>
</tr>
<tr>
<td>B4</td>
<td>MOV ah,immed8</td>
</tr>
<tr>
<td>B5</td>
<td>MOV ch,immed8</td>
</tr>
<tr>
<td>B6</td>
<td>MOV dh,immed8</td>
</tr>
<tr>
<td>B7</td>
<td>MOV bh,immed8</td>
</tr>
<tr>
<td>B8</td>
<td>MOV ax,immed16</td>
</tr>
<tr>
<td>B9</td>
<td>MOV cx,immed16</td>
</tr>
<tr>
<td>BA</td>
<td>MOV dx,immed16</td>
</tr>
<tr>
<td>BB</td>
<td>MOV bx,immed16</td>
</tr>
<tr>
<td>BC</td>
<td>MOV sp,immed16</td>
</tr>
<tr>
<td>BD</td>
<td>MOV bp,immed16</td>
</tr>
<tr>
<td>BE</td>
<td>MOV si,immed16</td>
</tr>
<tr>
<td>BF</td>
<td>MOV di,immed16</td>
</tr>
<tr>
<td>C0</td>
<td>Table1 reg8</td>
</tr>
<tr>
<td>C1</td>
<td>Table1 reg8, reg16</td>
</tr>
<tr>
<td>C2</td>
<td>RET immed16</td>
</tr>
<tr>
<td>C3</td>
<td>RET</td>
</tr>
<tr>
<td>C4</td>
<td>LES reg16/mem16,mem16</td>
</tr>
<tr>
<td>C5</td>
<td>LDS reg16/mem16,mem16</td>
</tr>
<tr>
<td>C6</td>
<td>MOV reg8/mem8,immed8</td>
</tr>
<tr>
<td>C7</td>
<td>MOV reg16/mem16,immed16</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes – Continued
<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>C8</td>
<td>ENTER immed16, immed8</td>
</tr>
<tr>
<td>C9</td>
<td>LEAVE</td>
</tr>
<tr>
<td>CA</td>
<td>RET immed16</td>
</tr>
<tr>
<td>CB</td>
<td>RET</td>
</tr>
<tr>
<td>CC</td>
<td>INT 3</td>
</tr>
<tr>
<td>CD</td>
<td>INT immed8</td>
</tr>
<tr>
<td>CE</td>
<td>INTO</td>
</tr>
<tr>
<td>CF</td>
<td>IRET</td>
</tr>
<tr>
<td>D0</td>
<td>Table1 reg8</td>
</tr>
<tr>
<td>D1</td>
<td>Table1 reg16</td>
</tr>
<tr>
<td>D2</td>
<td>Table1 reg8</td>
</tr>
<tr>
<td>D3</td>
<td>Table1 reg16</td>
</tr>
<tr>
<td>D4</td>
<td>AAM</td>
</tr>
<tr>
<td>D5</td>
<td>AAD</td>
</tr>
<tr>
<td>D6</td>
<td>Not used</td>
</tr>
<tr>
<td>D7</td>
<td>XLAT [bx]</td>
</tr>
<tr>
<td>D8</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>D9</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DA</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DB</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DC</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DD</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DE</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>DF</td>
<td>ESC immed8</td>
</tr>
<tr>
<td>E0</td>
<td>LOOPNE immed8</td>
</tr>
<tr>
<td>E1</td>
<td>LOOPE immed8</td>
</tr>
<tr>
<td>E2</td>
<td>LOOP immed8</td>
</tr>
<tr>
<td>E3</td>
<td>JCXZ immed8</td>
</tr>
<tr>
<td>E4</td>
<td>IN al,immed8</td>
</tr>
<tr>
<td>E5</td>
<td>IN ax,immed16</td>
</tr>
<tr>
<td>E6</td>
<td>OUT al,immed8</td>
</tr>
<tr>
<td>E7</td>
<td>OUT ax,immed16</td>
</tr>
<tr>
<td>E8</td>
<td>CALL immed16</td>
</tr>
<tr>
<td>E9</td>
<td>JMP immed16</td>
</tr>
<tr>
<td>EA</td>
<td>JMP immed32</td>
</tr>
<tr>
<td>EB</td>
<td>JMP immed8</td>
</tr>
<tr>
<td>EC</td>
<td>IN al,dx</td>
</tr>
<tr>
<td>ED</td>
<td>IN ax,dx</td>
</tr>
<tr>
<td>EE</td>
<td>OUT al,dx</td>
</tr>
<tr>
<td>EF</td>
<td>OUT ax,dx</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes – Continued
### Machine Opcode

<table>
<thead>
<tr>
<th>Machine Opcode</th>
<th>Assembler Mnemonic and Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>F0</td>
<td>LOCK</td>
</tr>
<tr>
<td>F1</td>
<td>Not used</td>
</tr>
<tr>
<td>F2</td>
<td>REPNE</td>
</tr>
<tr>
<td>F3</td>
<td>REP</td>
</tr>
<tr>
<td>F4</td>
<td>HLT</td>
</tr>
<tr>
<td>F5</td>
<td>CMC</td>
</tr>
<tr>
<td>F6</td>
<td>Table3 reg8</td>
</tr>
<tr>
<td>F7</td>
<td>Table3 reg16</td>
</tr>
<tr>
<td>F8</td>
<td>CLC</td>
</tr>
<tr>
<td>F9</td>
<td>STC</td>
</tr>
<tr>
<td>FA</td>
<td>CLI</td>
</tr>
<tr>
<td>FB</td>
<td>STI</td>
</tr>
<tr>
<td>FC</td>
<td>CLD</td>
</tr>
<tr>
<td>FD</td>
<td>STD</td>
</tr>
<tr>
<td>FE</td>
<td>Table4 reg8</td>
</tr>
<tr>
<td>FF</td>
<td>Table4 reg16</td>
</tr>
</tbody>
</table>

Figure A-9: 1-byte Opcodes – Continued

### Index

<table>
<thead>
<tr>
<th>Index</th>
<th>Assembler Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ROL</td>
</tr>
<tr>
<td>1</td>
<td>ROR</td>
</tr>
<tr>
<td>2</td>
<td>RCL</td>
</tr>
<tr>
<td>3</td>
<td>RCR</td>
</tr>
<tr>
<td>4</td>
<td>SHL</td>
</tr>
<tr>
<td>5</td>
<td>SHR</td>
</tr>
<tr>
<td>6</td>
<td>Not used</td>
</tr>
<tr>
<td>7</td>
<td>SAR</td>
</tr>
</tbody>
</table>

Figure A-10: Table1 Opcodes
<table>
<thead>
<tr>
<th>Index</th>
<th>Assembler Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ADD</td>
</tr>
<tr>
<td>1</td>
<td>OR</td>
</tr>
<tr>
<td>2</td>
<td>ADC</td>
</tr>
<tr>
<td>3</td>
<td>SBB</td>
</tr>
<tr>
<td>4</td>
<td>AND</td>
</tr>
<tr>
<td>5</td>
<td>SUB</td>
</tr>
<tr>
<td>6</td>
<td>XOR</td>
</tr>
<tr>
<td>7</td>
<td>CMP</td>
</tr>
</tbody>
</table>

Figure A-11: Table 2 Opcodes

<table>
<thead>
<tr>
<th>Index</th>
<th>Assembler Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>TEST</td>
</tr>
<tr>
<td>1</td>
<td>Not used</td>
</tr>
<tr>
<td>2</td>
<td>NOT</td>
</tr>
<tr>
<td>3</td>
<td>NEG</td>
</tr>
<tr>
<td>4</td>
<td>MUL</td>
</tr>
<tr>
<td>5</td>
<td>IMUL</td>
</tr>
<tr>
<td>6</td>
<td>DIV</td>
</tr>
<tr>
<td>7</td>
<td>IDIV</td>
</tr>
</tbody>
</table>

Figure A-12: Table 3 Opcodes

<table>
<thead>
<tr>
<th>Index</th>
<th>Assembler Mnemonic</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>INC</td>
</tr>
<tr>
<td>1</td>
<td>DEC</td>
</tr>
<tr>
<td>2</td>
<td>CALL</td>
</tr>
<tr>
<td>3</td>
<td>CALL</td>
</tr>
<tr>
<td>4</td>
<td>JMP</td>
</tr>
<tr>
<td>5</td>
<td>JMP</td>
</tr>
<tr>
<td>6</td>
<td>PUSH</td>
</tr>
<tr>
<td>7</td>
<td>Not used</td>
</tr>
</tbody>
</table>

Figure A-13: Table 4 Opcodes
Appendix B

Program Segment Prefix

The program segment prefix or PSP is a 256-byte block of information, apparently a remnant of the CP/M operating system, that was adopted to assist in porting CP/M programs to the DOS environment [Dun88b]. When a program is loaded into memory, a PSP is built on the first 256 bytes of the allocated memory block. The fields of the PSP are shown in Figure B-1.

<table>
<thead>
<tr>
<th>Segment offset</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>00h</td>
<td>terminate vector: interrupt 20h (transfer to DOS)</td>
</tr>
<tr>
<td>02h</td>
<td>last segment allocated</td>
</tr>
<tr>
<td>04h</td>
<td>reserved</td>
</tr>
<tr>
<td>05h</td>
<td>call vector function: far call to DOS’s function request handler</td>
</tr>
<tr>
<td>0Ah</td>
<td>copy of the parent’s program termination handler vector</td>
</tr>
<tr>
<td>0Eh</td>
<td>copy of the parent’s control-c/control-break handler vector</td>
</tr>
<tr>
<td>12h</td>
<td>copy of the parent’s critical error handler vector</td>
</tr>
<tr>
<td>16H</td>
<td>reserved</td>
</tr>
<tr>
<td>2Ch</td>
<td>address of the first paragraph of the DOS environment</td>
</tr>
<tr>
<td>2Eh</td>
<td>reserved</td>
</tr>
<tr>
<td>50h</td>
<td>interrupt 21h, return far (reft) instruction</td>
</tr>
<tr>
<td>53h</td>
<td>reserved</td>
</tr>
<tr>
<td>5Ch</td>
<td>first parameter from the command line</td>
</tr>
<tr>
<td>6Ch</td>
<td>second parameter from the command line</td>
</tr>
<tr>
<td>80h</td>
<td>command tail; used as a buffer</td>
</tr>
</tbody>
</table>

Figure B-1: PSP Fields

The terminate vector (offset 00h of the PSP) used to be the warm boot/terminate (WBOOT) vector under CP/M. The call vector function (offset 05h of the PSP) used to be the basic disk operating system (BDOS) vector under CP/M.
Appendix C

Executable File Format

The DOS operating system supports two different types of executable files: .exe and .com files. The former allows for large programs and multiple segments to be used in memory, the latter for small programs that fit into one segment (i.e. 64Kb maximum) [Dun88b].

C.1 .exe Files

The .exe file consists of a header and a load module, as shown in Figure C-1. The file header consists of 28 bytes of fixed formatted area, and a relocation table which varies in size. The load module is a fully linked image of the program; there is no information on how to separate segments in the module since DOS ignores how the program is segmented.

![Figure C-1: Structure of an .exe File](image)

The structure of the header’s formatted area is shown in Figure C-2. The size of a page is 512 bytes, and the size of a paragraph is 16 bytes. The program image size is calculated from the value in the formatted area as the difference between the file size and the header size. The file size is given by the number of file pages (rounded up) and the size in bytes of the last page.

The relocation table is a list of pointers to words within the load module that must be adjusted. These words are adjusted by adding the start segment address where the program is to be loaded. Pointers in this table are stored as 2 words relative to the start of the load module.
<table>
<thead>
<tr>
<th>Bytes</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>00-01h</td>
<td>.exe signature (4Dh, 5Ah)</td>
</tr>
<tr>
<td>02-03h</td>
<td>number of bytes in the last page</td>
</tr>
<tr>
<td>04-05h</td>
<td>number of pages (rounded up)</td>
</tr>
<tr>
<td>06-07h</td>
<td>number of entries in the relocation table</td>
</tr>
<tr>
<td>08-09h</td>
<td>number of paragraphs in the header</td>
</tr>
<tr>
<td>0A-0Bh</td>
<td>minimum number of paragraphs required for data and stack</td>
</tr>
<tr>
<td>0C-0Dh</td>
<td>maximum number of memory paragraphs</td>
</tr>
<tr>
<td>0E-0Fh</td>
<td>pre-relocated initial ss value</td>
</tr>
<tr>
<td>10-11h</td>
<td>initial sp value (absolute value)</td>
</tr>
<tr>
<td>12-13h</td>
<td>complemented checksum (1’s complement)</td>
</tr>
<tr>
<td>14-15h</td>
<td>initial ip value</td>
</tr>
<tr>
<td>16-17h</td>
<td>pre-relocated initial value of cs</td>
</tr>
<tr>
<td>18-19h</td>
<td>relocation table offset</td>
</tr>
<tr>
<td>1A-1B</td>
<td>overlay number (default: 0000h)</td>
</tr>
</tbody>
</table>

Figure C-2: Fixed Formatted Area

C.2 .com Files

A .com file is an image program without a header (i.e. equivalent to the load module of an .exe file), hence the program is loaded into memory “as is”. As opposed to .exe programs, .com programs can only use one segment (up to 64Kb). These programs were designed to transport programs from CP/M into the DOS environment.
Appendix D

Low-level to High-level Icode Mapping

The mapping between low-level and high-level Icodes is shown in the following pages (Figure D-1). A dash (-) in the high-level Icode column means that there is no high-level counterpart to the low-level icode, an asterisk (*) means that the low-level Icode forms part of a high-level instruction only when in an idiom, an f means that an Icode flag is set and the instruction is not considered any further, a cc means that the low-level instruction sets a condition code, it does not have a high-level counterpart, and is eliminated by condition code propagation, and an n means that the low-level Icode instruction was not considered in the analysis. Instructions marked with an n deal with machine string instructions, and were not considered in the analysis performed by dcc.

The initial mapping of low-level to high-level Icodes is expressed in terms of registers. Further data flow analysis on the Icodes transforms these instructions into expressions that do not make use of temporary registers, only variables and register variables (if any).
<table>
<thead>
<tr>
<th>Low-level Icode</th>
<th>High-level Icode</th>
</tr>
</thead>
<tbody>
<tr>
<td>iAAA</td>
<td>-</td>
</tr>
<tr>
<td>iAAD</td>
<td>-</td>
</tr>
<tr>
<td>iAAM</td>
<td>-</td>
</tr>
<tr>
<td>iAAS</td>
<td>-</td>
</tr>
<tr>
<td>iADC</td>
<td>*</td>
</tr>
<tr>
<td>iADD</td>
<td>asgn (+)</td>
</tr>
<tr>
<td>iAND</td>
<td>asgn (&amp;)</td>
</tr>
<tr>
<td>iBOUND</td>
<td>f</td>
</tr>
<tr>
<td>iCALL</td>
<td>call</td>
</tr>
<tr>
<td>iCALLF</td>
<td>call</td>
</tr>
<tr>
<td>iCLC</td>
<td>cc</td>
</tr>
<tr>
<td>iCLD</td>
<td>cc</td>
</tr>
<tr>
<td>iCLI</td>
<td>-</td>
</tr>
<tr>
<td>iCMC</td>
<td>cc</td>
</tr>
<tr>
<td>iCMP</td>
<td>cc</td>
</tr>
<tr>
<td>iCMPS</td>
<td>n</td>
</tr>
<tr>
<td>iREPNE_CMPS</td>
<td>n</td>
</tr>
<tr>
<td>iREPE_CMPS</td>
<td>n</td>
</tr>
<tr>
<td>iDAA</td>
<td>-</td>
</tr>
<tr>
<td>iDAS</td>
<td>-</td>
</tr>
<tr>
<td>iDEC</td>
<td>asgn (- 1)</td>
</tr>
<tr>
<td>iDIV</td>
<td>asgn (/)</td>
</tr>
<tr>
<td>iENTER</td>
<td>f</td>
</tr>
<tr>
<td>iESC</td>
<td>f</td>
</tr>
<tr>
<td>iHLT</td>
<td>-</td>
</tr>
<tr>
<td>iDIV</td>
<td>asgn (/)</td>
</tr>
<tr>
<td>iIMUL</td>
<td>asgn (*)</td>
</tr>
<tr>
<td>iIN</td>
<td>-</td>
</tr>
<tr>
<td>iINC</td>
<td>asgn (+ 1)</td>
</tr>
<tr>
<td>iINS</td>
<td>-</td>
</tr>
<tr>
<td>iREP_INS</td>
<td>-</td>
</tr>
<tr>
<td>iINT</td>
<td>-</td>
</tr>
<tr>
<td>iINTO</td>
<td>-</td>
</tr>
<tr>
<td>iIRET</td>
<td>-</td>
</tr>
<tr>
<td>iJB</td>
<td>jcond (&lt;)</td>
</tr>
<tr>
<td>iJBE</td>
<td>jcond (&lt;=)</td>
</tr>
<tr>
<td>iJAE</td>
<td>jcond (&gt;=)</td>
</tr>
<tr>
<td>iJA</td>
<td>jcond (&gt;)</td>
</tr>
<tr>
<td>iJE</td>
<td>jcond (==)</td>
</tr>
<tr>
<td>iJNE</td>
<td>jcond (&lt;&gt;)</td>
</tr>
</tbody>
</table>

Figure D-1: Icode Opcodes
<table>
<thead>
<tr>
<th>Low-level Icode</th>
<th>High-level Icode</th>
</tr>
</thead>
<tbody>
<tr>
<td>iJL</td>
<td>jcond (&lt;)</td>
</tr>
<tr>
<td>iJGE</td>
<td>jcond (&gt;=)</td>
</tr>
<tr>
<td>iJLE</td>
<td>jcond (&lt;=)</td>
</tr>
<tr>
<td>iJG</td>
<td>jcond (&gt;)</td>
</tr>
<tr>
<td>iJS</td>
<td>jcond (&gt; 0)</td>
</tr>
<tr>
<td>iJNS</td>
<td>jcond (&lt; 0)</td>
</tr>
<tr>
<td>iJO</td>
<td>-</td>
</tr>
<tr>
<td>iJNO</td>
<td>-</td>
</tr>
<tr>
<td>iJP</td>
<td>-</td>
</tr>
<tr>
<td>iJNP</td>
<td>-</td>
</tr>
<tr>
<td>iJCXZ</td>
<td>jcond (cx == 0)</td>
</tr>
<tr>
<td>iJNCXZ</td>
<td>jcond (cx &lt;&gt; 0)</td>
</tr>
<tr>
<td>iJMP</td>
<td>jmp</td>
</tr>
<tr>
<td>iJMPF</td>
<td>jmp</td>
</tr>
<tr>
<td>iLAHF</td>
<td>-</td>
</tr>
<tr>
<td>iLDS</td>
<td>asgn (far pointer)</td>
</tr>
<tr>
<td>iLEA</td>
<td>asgn (near pointer)</td>
</tr>
<tr>
<td>iLEAVE</td>
<td>ret</td>
</tr>
<tr>
<td>iLES</td>
<td>asgn (far pointer)</td>
</tr>
<tr>
<td>iLOCK</td>
<td>-</td>
</tr>
<tr>
<td>iLODS</td>
<td>n</td>
</tr>
<tr>
<td>iREPLODS</td>
<td>n</td>
</tr>
<tr>
<td>iMOV</td>
<td>asgn (=)</td>
</tr>
<tr>
<td>iMOVS</td>
<td>n</td>
</tr>
<tr>
<td>iREP_MOVS</td>
<td>n</td>
</tr>
<tr>
<td>iMOD</td>
<td>asgn (%)</td>
</tr>
<tr>
<td>iMUL</td>
<td>asgn (*)</td>
</tr>
<tr>
<td>iNEG</td>
<td>asgn (-)</td>
</tr>
<tr>
<td>iNOT</td>
<td>!</td>
</tr>
<tr>
<td>iNOP</td>
<td>-</td>
</tr>
<tr>
<td>iOR</td>
<td>asgn (l)</td>
</tr>
<tr>
<td>iOUT</td>
<td>-</td>
</tr>
<tr>
<td>iOUTS</td>
<td>-</td>
</tr>
<tr>
<td>iREP_OUTS</td>
<td>-</td>
</tr>
<tr>
<td>iPOP</td>
<td>pop</td>
</tr>
<tr>
<td>iPOPA</td>
<td>-</td>
</tr>
<tr>
<td>iPOPF</td>
<td>-</td>
</tr>
<tr>
<td>iPUSH</td>
<td>push</td>
</tr>
<tr>
<td>iPUSHHA</td>
<td>-</td>
</tr>
<tr>
<td>iPUSHF</td>
<td>-</td>
</tr>
</tbody>
</table>

Figure D-1: Icode Opcodes – Continued
<table>
<thead>
<tr>
<th>Low-level Icode</th>
<th>High-level Icode</th>
</tr>
</thead>
<tbody>
<tr>
<td>iRCL</td>
<td>*</td>
</tr>
<tr>
<td>iRCR</td>
<td>*</td>
</tr>
<tr>
<td>iREPE</td>
<td>n</td>
</tr>
<tr>
<td>iREPNE</td>
<td>n</td>
</tr>
<tr>
<td>iRET</td>
<td>ret</td>
</tr>
<tr>
<td>iRETf</td>
<td>ret</td>
</tr>
<tr>
<td>iROL</td>
<td>*</td>
</tr>
<tr>
<td>iROR</td>
<td>*</td>
</tr>
<tr>
<td>iSAHF</td>
<td>-</td>
</tr>
<tr>
<td>iSAR</td>
<td>*</td>
</tr>
<tr>
<td>iSHL</td>
<td>asgn (&lt;&lt;)</td>
</tr>
<tr>
<td>iSHR</td>
<td>asgn (&gt;&gt;)</td>
</tr>
<tr>
<td>iSBB</td>
<td>*</td>
</tr>
<tr>
<td>iSCAS</td>
<td>n</td>
</tr>
<tr>
<td>iREPNE_SCAS</td>
<td>n</td>
</tr>
<tr>
<td>iREPE_SCAS</td>
<td>n</td>
</tr>
<tr>
<td>iSIGNEX</td>
<td>asgn (=)</td>
</tr>
<tr>
<td>iSTC</td>
<td>cc</td>
</tr>
<tr>
<td>iSTD</td>
<td>cc</td>
</tr>
<tr>
<td>iSTI</td>
<td>-</td>
</tr>
<tr>
<td>iSTOS</td>
<td>n</td>
</tr>
<tr>
<td>iREP_STOS</td>
<td>n</td>
</tr>
<tr>
<td>iSUB</td>
<td>asgn (-)</td>
</tr>
<tr>
<td>iTEST</td>
<td>cc</td>
</tr>
<tr>
<td>iWAIT</td>
<td>f</td>
</tr>
<tr>
<td>iXCHG</td>
<td>asgn (uses tmp)</td>
</tr>
<tr>
<td>iXLAT</td>
<td>-</td>
</tr>
<tr>
<td>iXOR</td>
<td>asgn (*)</td>
</tr>
</tbody>
</table>

Figure D-1: Icode Opcodes – Continued
Appendix E

Comments and Error Messages displayed by dcc

dcc displays a series of comments in the output C and assembler files, on information collected during the analysis of each subroutine. This information is displayed before each subroutine. The following comments are supported by dcc:

- “Takes %d bytes of parameters.”
- “Uses register arguments:” (and lists the registers and the formal argument name).
- “Takes no parameters.”
- “Runtime support routine of the compiler.”
- “High-level language prologue code.”
- “Untranslatable routine. Assembler provided.”
- “Return value in register %s.” (register(s) provided).
- “Pascal calling convention.”
- “C calling convention.”
- “Unknown calling convention.”
- “Incomplete due to an untranslatable opcode”
- “Incomplete due to an indirect jump”
- “Indirect call procedure.”
- “Contains self-modifying code.”
- “Contains coprocessor instructions.”
- “Irreducible control flow graph.”

Assembler subroutines are also commented, as well as all DOS kernel services; interrupts 20h to 2Fh. Appendix F contains a list of all DOS interrupts supported by dcc.

dcc also displays two different types of errors: fatal and non-fatal errors. Fatal errors terminate the execution of dcc, displaying the error with enough information to determine what
happened. Non fatal errors do not cause dcc to terminate, and are treated as warnings to the user.

The fatal errors supported by dcc are:

- “Invalid option -%c.”
- “Usage: dcc [-a1a2mpsvV][-o asmfile] DOS_executable”
- “New EXE format not supported.”
- “Cannot open file %s.”
- “Error while reading file %s.”
- “Invalid instruction %02X at location %06lX.”
- “Don’t understand 80386 instruction %02X at location %06lX.”
- “Instruction at location %06lX goes beyond loaded image.”
- “malloc of %ld bytes failed.”
- “Failed to find a basic block for jump to %ld in subroutine %s.”
- “Basic Block is a synthetic jump.”
- “Failed to find a basic block for interval.”
- “Definition not found for condition code usage at opcode %d.”

The non fatal errors supported by dcc are:

- “Segment override with no memory operand at location %06lX.”
- “REP prefix without a string instruction at location %06lX.”
- “Conditional jump use, definition not supported at opcode %d.”
- “Definition-use not supported. Definition opcode = %d, use opcode = %d.”
- “Failed to construct do..while() condition.”
- “Failed to construct while() condition.”
Appendix F

DOS Interrupts

The DOS kernel provides services to application programs via software interrupts 20h..2Fh. Interrupt 21h deals with character input/output, files, records, directory operations, disk, processes, memory management, network functions, and miscellaneous system functions; the function number is held in register ah. Figure F-1 lists the different interrupts provided by DOS [Dun88a]. These interrupts are commented by dcc when producing the disassembly of a subroutine.
<table>
<thead>
<tr>
<th>Interrupt</th>
<th>Function</th>
<th>Function name</th>
</tr>
</thead>
<tbody>
<tr>
<td>20h</td>
<td></td>
<td>Terminate process</td>
</tr>
<tr>
<td>21h 0h</td>
<td></td>
<td>Terminate process</td>
</tr>
<tr>
<td>21h 1h</td>
<td></td>
<td>Character input with echo</td>
</tr>
<tr>
<td>21h 2h</td>
<td></td>
<td>Character output</td>
</tr>
<tr>
<td>21h 3h</td>
<td></td>
<td>Auxiliary input</td>
</tr>
<tr>
<td>21h 4h</td>
<td></td>
<td>Auxiliary output</td>
</tr>
<tr>
<td>21h 5h</td>
<td></td>
<td>Printer output</td>
</tr>
<tr>
<td>21h 6h</td>
<td></td>
<td>Direct console input/output</td>
</tr>
<tr>
<td>21h 7h</td>
<td></td>
<td>Unfiltered character input without echo</td>
</tr>
<tr>
<td>21h 8h</td>
<td></td>
<td>Character input without echo</td>
</tr>
<tr>
<td>21h 9h</td>
<td></td>
<td>Display string</td>
</tr>
<tr>
<td>21h Ah</td>
<td></td>
<td>Buffered keyboard input</td>
</tr>
<tr>
<td>21h Bh</td>
<td></td>
<td>Check input status</td>
</tr>
<tr>
<td>21h Ch</td>
<td></td>
<td>Flush input buffer and then input</td>
</tr>
<tr>
<td>21h Dh</td>
<td></td>
<td>Disk reset</td>
</tr>
<tr>
<td>21h Eh</td>
<td></td>
<td>Select disk</td>
</tr>
<tr>
<td>21h Fh</td>
<td></td>
<td>Open file</td>
</tr>
<tr>
<td>21h 10h</td>
<td></td>
<td>Close file</td>
</tr>
<tr>
<td>21h 11h</td>
<td></td>
<td>Find first file</td>
</tr>
<tr>
<td>21h 12h</td>
<td></td>
<td>Find next file</td>
</tr>
<tr>
<td>21h 13h</td>
<td></td>
<td>Delete file</td>
</tr>
<tr>
<td>21h 14h</td>
<td></td>
<td>Sequential read</td>
</tr>
<tr>
<td>21h 15h</td>
<td></td>
<td>Sequential write</td>
</tr>
<tr>
<td>21h 16h</td>
<td></td>
<td>Create file</td>
</tr>
<tr>
<td>21h 17h</td>
<td></td>
<td>Rename file</td>
</tr>
<tr>
<td>21h 18h</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>21h 19h</td>
<td></td>
<td>Get current disk</td>
</tr>
<tr>
<td>21h 1Ah</td>
<td></td>
<td>Set DTA address</td>
</tr>
<tr>
<td>21h 1Bh</td>
<td></td>
<td>Get default drive data</td>
</tr>
<tr>
<td>21h 1Ch</td>
<td></td>
<td>Get drive data</td>
</tr>
<tr>
<td>21h 1Dh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>21h 1Eh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>21h 1Fh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>21h 20h</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>21h 21h</td>
<td></td>
<td>Random read</td>
</tr>
<tr>
<td>21h 22h</td>
<td></td>
<td>Random write</td>
</tr>
<tr>
<td>21h 23h</td>
<td></td>
<td>Get file size</td>
</tr>
<tr>
<td>21h 24h</td>
<td></td>
<td>Set relative record number</td>
</tr>
<tr>
<td>21h 25h</td>
<td></td>
<td>Set interrupt vector</td>
</tr>
<tr>
<td>21h 26h</td>
<td></td>
<td>Create new PSP</td>
</tr>
<tr>
<td>21h 27h</td>
<td></td>
<td>Random block read</td>
</tr>
<tr>
<td>21h 28h</td>
<td></td>
<td>Random block write</td>
</tr>
</tbody>
</table>

Figure F-1: DOS Interrupts
<table>
<thead>
<tr>
<th>Interrupt</th>
<th>Function</th>
<th>Function name</th>
</tr>
</thead>
<tbody>
<tr>
<td>21h</td>
<td>29h</td>
<td>Parse filename</td>
</tr>
<tr>
<td>21h</td>
<td>2Ah</td>
<td>Get date</td>
</tr>
<tr>
<td>21h</td>
<td>2Bh</td>
<td>Set date</td>
</tr>
<tr>
<td>21h</td>
<td>2Ch</td>
<td>Get time</td>
</tr>
<tr>
<td>21h</td>
<td>2Dh</td>
<td>Set time</td>
</tr>
<tr>
<td>21h</td>
<td>2 Eh</td>
<td>Set verify flag</td>
</tr>
<tr>
<td>21h</td>
<td>2Fh</td>
<td>Get DTA address</td>
</tr>
<tr>
<td>21h</td>
<td>30h</td>
<td>Get DOS version number</td>
</tr>
<tr>
<td>21h</td>
<td>31h</td>
<td>Terminate and stay resident</td>
</tr>
<tr>
<td>21h</td>
<td>32h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>33h</td>
<td>Get or set break flag</td>
</tr>
<tr>
<td>21h</td>
<td>34h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>35h</td>
<td>Get interrupt vector</td>
</tr>
<tr>
<td>21h</td>
<td>36h</td>
<td>Get drive allocation info</td>
</tr>
<tr>
<td>21h</td>
<td>37h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>38h</td>
<td>Get or set country info</td>
</tr>
<tr>
<td>21h</td>
<td>39h</td>
<td>Create directory</td>
</tr>
<tr>
<td>21h</td>
<td>3Ah</td>
<td>Delete directory</td>
</tr>
<tr>
<td>21h</td>
<td>3Bh</td>
<td>Set current directory</td>
</tr>
<tr>
<td>21h</td>
<td>3 Ch</td>
<td>Create file</td>
</tr>
<tr>
<td>21h</td>
<td>3Dh</td>
<td>Open file</td>
</tr>
<tr>
<td>21h</td>
<td>3 Eh</td>
<td>Close file</td>
</tr>
<tr>
<td>21h</td>
<td>3Fh</td>
<td>Read file or device</td>
</tr>
<tr>
<td>21h</td>
<td>40h</td>
<td>Write file or device</td>
</tr>
<tr>
<td>21h</td>
<td>41h</td>
<td>Delete file</td>
</tr>
<tr>
<td>21h</td>
<td>42h</td>
<td>Set file pointer</td>
</tr>
<tr>
<td>21h</td>
<td>43h</td>
<td>Get or set file attributes</td>
</tr>
<tr>
<td>21h</td>
<td>44h</td>
<td>IOCTL (input/output control)</td>
</tr>
<tr>
<td>21h</td>
<td>45h</td>
<td>Duplicate handle</td>
</tr>
<tr>
<td>21h</td>
<td>46h</td>
<td>Redirect handle</td>
</tr>
<tr>
<td>21h</td>
<td>47h</td>
<td>Get current directory</td>
</tr>
<tr>
<td>21h</td>
<td>48h</td>
<td>Allocate memory block</td>
</tr>
<tr>
<td>21h</td>
<td>49h</td>
<td>Release memory block</td>
</tr>
<tr>
<td>21h</td>
<td>4 Ah</td>
<td>Resize memory block</td>
</tr>
<tr>
<td>21h</td>
<td>4 Bh</td>
<td>Execute program (exec)</td>
</tr>
<tr>
<td>21h</td>
<td>4 Ch</td>
<td>Terminate process with return code</td>
</tr>
<tr>
<td>21h</td>
<td>4 Dh</td>
<td>Get return code</td>
</tr>
<tr>
<td>21h</td>
<td>4 Eh</td>
<td>Find first file</td>
</tr>
<tr>
<td>21h</td>
<td>4Fh</td>
<td>Find next file</td>
</tr>
<tr>
<td>21h</td>
<td>50h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>51h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>52h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>53h</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

Figure F-1: DOS Interrupts – Continued
<table>
<thead>
<tr>
<th>Interrupt</th>
<th>Function</th>
<th>Function name</th>
</tr>
</thead>
<tbody>
<tr>
<td>21h</td>
<td>54h</td>
<td>Get verify flag</td>
</tr>
<tr>
<td>21h</td>
<td>55h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>56h</td>
<td>Rename file</td>
</tr>
<tr>
<td>21h</td>
<td>57h</td>
<td>Get or set file date and time</td>
</tr>
<tr>
<td>21h</td>
<td>58h</td>
<td>Get or set allocation strategy</td>
</tr>
<tr>
<td>21h</td>
<td>59h</td>
<td>Get extended error info</td>
</tr>
<tr>
<td>21h</td>
<td>5Ah</td>
<td>Create temporary file</td>
</tr>
<tr>
<td>21h</td>
<td>5Bh</td>
<td>Create new file</td>
</tr>
<tr>
<td>21h</td>
<td>5Ch</td>
<td>Lock or unlock file region</td>
</tr>
<tr>
<td>21h</td>
<td>5Dh</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>5Eh</td>
<td>Get machine name</td>
</tr>
<tr>
<td>21h</td>
<td>5Fh</td>
<td>Device redirection</td>
</tr>
<tr>
<td>21h</td>
<td>60h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>61h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>62h</td>
<td>Get PSP address</td>
</tr>
<tr>
<td>21h</td>
<td>63h</td>
<td>Get DBCS lead byte table</td>
</tr>
<tr>
<td>21h</td>
<td>64h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>65h</td>
<td>Get extended country info</td>
</tr>
<tr>
<td>21h</td>
<td>66h</td>
<td>Get or set code page</td>
</tr>
<tr>
<td>21h</td>
<td>67h</td>
<td>Set handle count</td>
</tr>
<tr>
<td>21h</td>
<td>68h</td>
<td>Commit file</td>
</tr>
<tr>
<td>21h</td>
<td>69h</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>6Ah</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>6Bh</td>
<td>Reserved</td>
</tr>
<tr>
<td>21h</td>
<td>6Ch</td>
<td>Extended open file</td>
</tr>
<tr>
<td>22h</td>
<td></td>
<td>Terminate handler address</td>
</tr>
<tr>
<td>23h</td>
<td></td>
<td>Ctrl-C handler address</td>
</tr>
<tr>
<td>24h</td>
<td></td>
<td>Critical-error handler address</td>
</tr>
<tr>
<td>25h</td>
<td></td>
<td>Absolute disk read</td>
</tr>
<tr>
<td>26h</td>
<td></td>
<td>Absolute disk write</td>
</tr>
<tr>
<td>27h</td>
<td></td>
<td>Terminate and stay resident</td>
</tr>
<tr>
<td>28h</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>29h</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2Ah</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2Bh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2Ch</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2Dh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2 Eh</td>
<td></td>
<td>Reserved</td>
</tr>
<tr>
<td>2Fh</td>
<td>1h</td>
<td>Print spooler</td>
</tr>
<tr>
<td>2Fh</td>
<td>2h</td>
<td>Assign</td>
</tr>
<tr>
<td>2Fh</td>
<td>10h</td>
<td>Share</td>
</tr>
<tr>
<td>2Fh</td>
<td>B7h</td>
<td>Append</td>
</tr>
</tbody>
</table>

Figure F-1: DOS Interrupts – Continued
Bibliography


