Disassembler using High Level Pro cessor Mo dels

A T h e s i s S u b m i t t e d

in P a r t i a l Fu l l l m e n t o f th e Requirem ents

fo r t h e D e g r e e o f

M a s t e r o f Te c h n o l o g y

by

Nihal Chand Jain

t o t h e

Department of Computer Science Engineering

Indian Institute of Technology Kanpur

Jan

Certicate

Certied that the work contained in the thesis entitled D i s

a s s e m b l e r u s i n g H i g h L e v e l P r o c e s s o r M o d e l s by MrN i h a l

C h a n d J a i n has b een carried out under my sup ervision and

that this work has not b een submitted elsewhere for a degree

Dr Ra jat Mo ona

Asso ciate Professor

Department of Computer Science Engineering

Indian Institute of Technology

Kanpur

Jan ii

Abstract

The design of a high p erformance system requires an integrated environment to sim

ulate and analyze the p erformance of various design alternatives In this thesis we

have develop ed a g e n e r i c disassem b ler for an integrated environment where SimnML

acts as the sp ecication language for pro cessor p erformance mo del The SimnML

an extension of nML machine description formalism is a simple elegant and p owerful

language to mo del machine b ehavior at instruction level As part of the thesis work

we have designed an in t e r m e d i a t e re p r e s e n t a t i o n I R for pro cessor sp ecication written

in SimnML language The IR is simple and facilitates the development of various

to ols such as assembler backend generator instruction set simulator trace

generator etc based on the pro cessor sp ecication A to ol IR G e n e r a t o r is develop ed

which takes a pro cessor sp ecication written in SimnML language and pro duces it

in the intermediate representation Further a G e n e r i c S y m b o l i c Disassem b ler is de

velop ed which takes the intermediate representation of a pro cessor and a relo catable

binary le in ELF format as input and pro duces an equivalent program in assembly

language of the pro cessor The disassembler is generic enough to b e used for all typ e

of pro cessors

Acknowledgments

I am grateful to my thesis sup ervisor Dr Ra jat Mo ona who guided me at every

stage of this pro ject with his valuable suggestions whose qualities have attracted me

a lot His ideas have b een of great help in exploring the areas which otherwise would

have b een imp ossible I thank the Almighty for giving such a brotherly gure as my

guide

This work is done as a part of the ongoing research in Cadence Research Center at

I IT Kanpur I express my gratitude to Cadence India Ltd for their enduring supp ort

to this work Apart from the ample nancial supp ort provided by the fellowship it

has b een a source of p ersonal pride and motivation to b e called a Cadence Fellow

I am also greatly indebted to Dr Deepak Gupta and Dr Sanjeev Agarwal for

their guidance and supp ort throughout my work I express my heartfelt thanks to all

the faculty memb ers for teaching the principles in most exciting and enjoyable way

I am greatly indebted to my seniors sp ecially Atul and Kshitiz for helping me out

in crucial situations I would also like to thank VRa jesh Subhash and Shishir who

has help ed me with lots of ideas throughout the work I thank all my MTech class

mates esp ecially Professor Zade Bepari Kapil Atul Mano j Anna Anjali Uma

Prasad Srikar Gopi Girish Prasanna Kousik and Ma jor Ajay for b eing aectionate

and the source of inspiration for me My gratitude go es to all of my MTech batch

mates who made my stay in HallV I ITK a memorable one I acknowledge the

MTech batch for their exciting company I wish I could express my thankfulness

to all my old friends for their love supp ort and encouragement

I thank my parents my brothers for their love and aection I have b een receiving

I am grateful to all of them for their eorts in building my career Finally I would

also like to thank Go d for b eing kind to me and driving me through this journey i

Contents

Acknowledgments i

Intro duction

Motivation

Overview of Related Work

Goals Achieved

Organization of Rep ort

Intermediate Representation of Pro cessor Mo dels

SimnML Language

SimnML Grammar

Design of an Intermediate Representation

Simplication of Information by Substitution

Simplifying the Hierarchy

Representation of Attribute Denition

Structure of the Intermediate Representation

Conversion from High Level to Intermediate Representation

Pass Macro Prepro cessor

Pass Parsing and Flattening the Hierarchy

Design and Implementation of Disassembler ii

Input Binary File ELF Structure

Two Pass Design of Disassembler

Resolving References

Extracting Information from Intermediate Representation

Extracting Syntax and Image of instructions

Instruction Matching Algorithm

Extracting Control Transfer Instruction

Evaluation of Next Instruction Address

Implementation Details of the Disassembler

Initialization Phase

First Pass of Disassembly

Second Pass of Disassembly

Disassembly of Other Sections

Results and Conclusion

Results

Example

Example

Conclusion

Future Work and Extensions

A Grammar of SimnML Language

B File Format of Intermediate Representation

B Meta Table

B Constant Table

B Resource Table

B Identier Table

B Attribute Table iii

B Memory Table

B AndRule Table

B OrRule Table

B Syntax Table

B Image Table

B String Table

B Integer Table

B PrexAttributeDenition Table

C Users Manual

C IRGenerator

C Usage

C Disassembler

C Usage iv

List of Tables

Test Results

Enco ding of data typ es

Parameter Typ e for a n d r u l e

Example of the String Table

Interpretation of the String Table

Interpretation of the tuple used in Prex Notation

Op erators Used in Prex Attribute Denition v

List of Figures

System Overview

System Overview with IR

SimnML Sp ecication for a Simple Pro cessor

Algorithm for Flattening of o r r u l e s

SimnML Program for a Hyp othetical Pro cessor

Example of o r r u l e s Flattening

Example of a n d r u l e Flattening

Algorithm for Flattening of S y n t a x I m a g e Attribute Denitions

Example of S y n t a x Attribute Denitions Flattening

Algorithm for Calculating Mask Values

Algorithm for Calculating More Mask Values

Algorithm to Find Data Enco ding of the Host Pro cessor

Example of the Conguration File vi

Chapter

Intro duction

The design of a high p erformance system requires complex software to ols Designers

use p owerful and generic mo deling to ols to evaluate many alternative implementa

tions In addition designers need hardware and software co design and other tradeos

at early stages of the system design to keep the development cost down Therefore

system designers need an integrated environment which allows them to simulate and

analyze the p erformance of various design alternatives

In this thesis we have used S i m n M L language which is primarily an extension

of the n M L language for pro cessor mo deling and designed a g e n e r i c pr o c e s s o r in d e

p e n d e n t sy m b o l i c disassem b ler For this purp ose we have also designed an in t e r m e d i a t e

re p r e s e n t a t i o n I R for the pro cessor sp ecication written in the SimnML language

The IR is simple but p owerful enough to facilitate the development of various to ols

such as assembler compiler backend generator instruction simulator etc based on

the pro cessor sp ecication We have designed a to ol IR g e n e r a t o r which takes a

pro cessor sp ecication in the SimnML language and provides the intermediate rep

resentation of the pro cessor sp ecication as output The g e n e r i c s y m b o l i c disassem b ler

takes the intermediate representation and a relo catable binary le of a pro cessor and

provides the corresp onding program as output

Motivation

A pro cessor mo del provides means to facilitate hardware and software co design and

coanalysis early in the system design pro cess To mo del the candidate application and

pro cessor mo del interaction a systematic design pro cess is required A systematic

design pro cess starts with selecting the application and involves writing a mo del that

measures the p erformance of the system testing the system analyzing the results

and rening the mo del to enhance p erformance In this pro cess the mo del undergo es

several changes till the desired p erformance is achieved This approach requires to

have an environment where changes to the design are made at one place and the

corresp onding changes in other to ols are automated Such an integrated environment

can incorp orate the mo del changes and validation rapidly

Processor

Specification in

Sim-nML

INSTRUCTION COMPILER TRACE ASSEMBLER SET DIS-ASSEMBLER GENERATOR

SIMULATOR BACK-END

Figure System Overview

In this thesis we are discussing the design of an integrated environment where

S i m n M L language acts as the sp ecication language for pro cessors in a generic

way The pro cessor sp ecication written in SimnML language can b e used to gen

erate various pro cessor sp ecic to ols such as compiler backend generator assembler

disassembler trace generator instruction set simulator IS S etc The ISS can simu

late the execution of a binary program using the pro cessor sp ecication Therefore

the ISS helps in p erformance mo deling A compiler backend generator integrated

with a compiler frontend can b e used to generate a complete compiler for the same Sim-nML file with macros

MACRO PRE - PROCESSOR

Sim-nML file without macros

IR = Intermediate Representation

IR- Generator

IR IR IR IR IR

INSTRUCTION COMPILER TRACE

ASSEMBLER SET DIS-ASSEMBLER BACK-END GENERATOR

SIMULATOR

Figure System Overview with IR

pro cessor Thus an integrated environment as shown in gure is designed with

SimnML as the sp ecication language to automate the generation of various pro ces

sor sp ecic to ols While it is convenient to describ e pro cessor mo dels in SimnML it

is not so convenient for the to ols to use SimnML sp ecication directly as input due

to hierarchical description formalism of SimnML Further the direct usage requires

all to ols to duplicate the eort in pro cessing of pro cessor sp ecications This mo

tivated us to design a simple in t e r m e d i a t e re p r e s e n t a t i o n I R for SimnML language

sp ecication so that the design of various to ols is simplied The system design can

b e viewed as shown in gure with integration of the IR and previous view

In emb edded applications it is helpful to study the algorithm used by the applica

tion program and then change it to suit the sp ecic needs However it is convenient

to op erate at assembly language level rather than at the binary executable level This

motivated us to design a sy m b o l i c g e n e r i c disassem b ler to translate a relo catable bi

nary co de to its assembly language counterpart In the symb olic disassembly sy m b o l s

are used to refer to the lo cations and functions rather than the absolute addresses

in the assembly language program The disassembler develop ed in this thesis uses

the pro cessor mo dels sp ecied in the IR t h u s it is g e n e r i c The assembly language

program helps in extracting a lot of hidden information and further improvements or

analysis can b e done

Overview of Related Work

Performance mo deling of a system is a growing area and a lot of research has b een

pursued in this area These previous works have resulted in a set of p erformance

mo deling to ols using dierent languages for pro cessor sp ecication

VHDL is an expressive language with full hierarchy and congurations that allow

development and application of highly congurable and exible mo dels There are

wealth of VHDLbased mo deling to ols as describ ed in various works

SLED a Sp ecication Language for Enco ding and Deco ding is used for ab

stract binary and assemblylanguage representation of machine instructions SLED

is suitable for describing b oth CISC and RISC machines Pro cessor representation

for MIPS SPARC Alpha Pentium PowerPC and Motorola are also written in

SLED and a to olkit the New Jersey MachineCo de NJMC To olkit is implemented

to help programmers write applications that pro cess machine co deassemblers dis

assemblers co de generators tracers prolers and A Disassembler for

SPARC is also implemented in the NJMC using SLED as pro cessor sp ecication

language

Visualization based Microarchitecture Workb ench VMW is an infrastructure

which facilitates the sp ecication of instruction set architecture and microarchitecture

of a machine in concise manner VMW provides all necessary infrastructure software

to the designer including generic simulation software visualization supp ort software

and graphical user interface software VMW automatically integrates the machine

sp ecication and infrastructure software to generate a customized p erformance simu

lator based on the tracedriven simulation approach Thus VMW provides a p owerful

environment for mo dern sup erscalar pro cessor design

Trimaran System is an integrated compilation and p erformance monitoring in

frastructure for Instruction Level Parallel ILP architectures The ILP architecture

HPLPD parameterized by a machine description allows the user to exp eriment

with dierent machines The HPLPD architecture supp orts novel features such as

predication control and data sp eculation and compiler controlled management of the

memory hierarchy A cyclelevel HPLPD simulator provides a detailed simulation

environment to get various information The information is used for proledriven

optimization and for validation of new optimization The machine description is sp ec

ied in a high level textual language HMDES A compiler frontend IMPACT

and a compiler backend Elcor parameterized by the machine description together

provides exp erimentation for new ILP architectures and the compiler mo dules needed

to generate highp erformance co de for these architectures The mo dular Elcor uses

an intermediate representation throughout its mo dule which enable the construction

and insertion of new compilation mo dules into the compiler in a easy way

Other than these complete machine simulation environments many p erformance

mo dels exist for analyzing the individual comp onents such as pro cessors caches etc

A Framework for Statistical Mo deling of Sup erscalar Pro cessor Performance is dis

cussed in Performance Estimation for RealTime Distributed Emb edded Systems

is discussed in An IS S Instruction Set Simulator is develop ed to simulate an

architecture of a pro cessor which is dened through templates Further a p erfor

mance simulator is implemented using traces from the IS S as input which has b een

used to evaluate the UltraSPARCcompatible architecture A cycle accurate mo del

of UltraSPARC pro cessor is written in C to verify the pro cessor by cross checking

the RTL mo del at run time as well as to provide accurate p erformance estimates

In the area of disassembling several disassemblers have b een implemented as

listed in Among these IDA Pro is a disassembler based on FLIRT Fast

Library Identication and Recognition Technology and can disassemble binary les

for several pro cessors

Goals Achieved

In this thesis work we aimed at developing an integrated environment for pro cessor

mo deling using S i m n M L language for pro cessor sp ecication The development of

a complete integrated environment is in progress where other to ols ie simulator

trace generator are under development The goals achieved in this thesis work are

listed b elow

In t e r m e d i a t e Represen tation I R for SimnML language sp ecication is designed

which is simple but p owerful enough to facilitate the design of various pro cessor

sp ecic to ols

IR G e n e r a t o r is designed and implemented which takes a pro cessor sp ecica

tion in SimnML language and provides an intermediate representation of the

pro cessor sp ecication as output

S y m b o l i c G e n e r i c Disassem b ler is designed and implemented which takes the

intermediate representation and a relo catable binary le of a pro cessor and

provides corresp onding assembly language program of the pro cessor as output

Work done in this thesis is also outlined in the gure

Organization of Rep ort

The rest of the thesis is organized as follows In c h a p t e r we describ e the design

of the intermediate representation and the implementation of the IRGenerator after

giving an overview of the SimnML language In c h a p t e r we describ e the design and

implementation of the symb olic generic disassembler Finally we conclude in c h a p t e r

and provide the results We also enumerate p ossible future work in this area Context

free grammar of the SimnML language is listed in A p p e n d i x A Detailed format of

the intermediate representation is given in A p p e n d i x B Lastly users manuals for the

IRGenerator and the disassembler are given in A p p e n d i x C

Chapter

Intermediate Representation of

Pro cessor Mo dels

One part of this thesis involves the development of In t e r m e d i a t e Represen tation I R

of the pro cessor mo del We develop ed a to ol IR G e n e r a t o r which takes a pro ces

sor sp ecication written in SimnML language as input and pro duces corresp onding

intermediate representation of pro cessor sp ecication as output In order to have

intermediate representation usable by all frontend to ols such as disassembler assem

bler simulator etc certain goals were setup b ehind the design of the IR as listed

b elow

The IR should b e as simple as p ossible

The IR should not lose any useful information which is available in original

input of SimnML sp ecication

The IR should not have any unnecessary or redundant information

The IR should b e easy to understand as well as to use

It should b e easy and ecient to retrieve the required information from the IR

The IR should b e exible and extensible

The IR should facilitate the design of various pro cessor sp ecic to ols such as as

sembler disassembler simulator trace generator compiler backend generator

etc

Before discussing the IR in detail it is necessary to understand the structure of the

input S i m n M L work by VRa jesh is primarily an extension of nML designed

by Markus Freerick Here we will discuss SimnML in brief for b etter understanding

of our work More information ab out SimnML can b e found in relevant literature

SimnML Language

S i m n M L is an extensible formalism targeted for describing arbitrary single pro

cessor It facilitates the description at instruction set level and

hides the implementation details In SimnML the instruction set is enumerated by

1

an a t t r i b u t e g r a m m a r The semantic action of an instruction is comp osed of frag

ments that are distributed over the whole sp ecication tree ie the common b ehavior

of a class of instructions is captured at the top level of the tree and the sp ecialized

b ehavior of subclasses is captured at the subsequent lower levels

SimnML Grammar

SimnML grammar has a xed start symb ol namely in s t r u c t i o n and two kind of

pro ductions namely o r r u l e which lo oks like

op n n n n

and a n d r u l e which lo oks like

op n p t p t

a e

a e

1

An attribute grammar is a context free grammar in which for each nonterminal a xed set

of attributes and for each pro duction a set of semantic rule is given In SimnML grammar all

nonterminals have to have derivations So we dont dierentiate b etween pro ductions and non

terminals

let REGS

type long card

type index cardREGS

reg RREGSlong

reg PClong

reg AClong

mode REGiindexRi

syntax formatRdi

image formatbi

op instructionxinstracti on

action PC PC

xaction

syntax xsyntax

image ximage

op instraction move store

op movesrcREG

action AC src

syntax formatload ssrcsyntax

image format bsrcimage

op storesrcREG

action src AC

syntax formatstore ssrcsyntax

image format b srcimage

Figure SimnML Sp ecication for a Simple Pro cessor

n n n n are nonterminals and each ti is a token Each ai is an attribute

name and ei is its denition The pi are names of the parameters used in the attribute

denitions

SimnML grammar predenes some attributes namely s y n t a x im a g e a c t i o n

u s e s v o l a t i l e a l i a s and in i t The sy n t a x attribute describ es the textual syntax of

the instruction The im a g e attribute describ es the binary co ding of the instruction

The a c t i o n attribute describ es the semantics of an instruction The u s e s attribute is

used to describ e the resource usage mo del and the control ow of an instruction The

v o l a t i l e a l i a s and in i t attributes are valid for memory variables The in i t attribute

is used to assign initial values to memory variables while v o l a t i l e attribute is used to

dene the volatile name of the memory

The SimnML description in gure is that of a simple machine with two instruc

tions the load instruction which is used to load accumulator AC with the contents

of a register sp ecied by an argument and the store instruction which is used to

store the value of the accumulator AC to the register sp ecied by an argument The

register PC has sp ecial semantics and p oints to the nexttob eexecuted instruction

In the most pro cessors addressing mo des and instructions are orthogonal to each

other Therefore describing an instruction with each of the p ossible addressing mo des

explo de the size of the description Therefore SimnML separates addressing mo de

description as register addressing mo de is describ ed in gure with declaration of

m o d e r u l e REG

The SimnML also supp orts resource and exception declaration which are useful

for resource usage mo del In addition SimnML supp orts macros and declarations

for typ es and constants This enhances the clarity of the description In app endix A

SimnML grammar is given in detail

The SimnML formalism helps in describing the pro cessor concisely and precisely

Thus SimnML description of a pro cessor can b e used as input to various to ols such

as assembler and disassembler generators compiler backend generators and general

purp ose instruction set simulators

Design of an Intermediate Representation

A pro cessor sp ecication in SimnML language is a human readable text le Several

constructs are provided in SimnML to enhance the clarity and readability of the

description In order to retrieve the desired information from such a description a

to ol needs to p erform parsing of input variable substitution etc An intermediate

representation helps in reducing such extra burden on the to ol Thus we need an

intermediate representation keeping previously mentioned goals in mind In this

section we will discuss the design of the IR in detail

Simplication of Information by Substitution

SimnML language allows the constant denition using letspecification ielet

REGS In SimnML sp ecication le wherever a constant is referenced its value

is substituted in the IR For example value of the constant REGS ie is substi

tuted whereever REGS is used in the example given in gure Thus constants are

not referenced in the IR of the pro cessor sp ecication Therefore all such constant

declarations can b e eliminated from the IR However some constant might b e used

by the to ols ie constant like byte order may b e used by to ols to dene the byte

ordering of a pro cessor As it is dicult to guess what all constant denitions might

b e used by all such to ols it was decided to retain information ab out all constant

declarations in the IR even if these are not referenced anywhere

SimnML language has some basic data typ es and allows new data typ e denitions

using basic data typ es and previously dened user data typ es Since all user dened

data typ es can b e built using only basic data typ es all variables are redened with

only basic data typ es in the IR Thus all user dened data typ e declarations are of

no use and are eliminated from the IR For example parameter i in m o d e r u l e REG

is redened with data typ e card Now typ e denition index is eliminated from

the IR

SimnML allows macro declarations macro name and macro denition in the

pro cessor sp ecication to save users eort in writing it These macro declaration

may have parameters and may use macros within the macro denition Wherever a

macro name is used in the sp ecication corresp onding macro denition is substituted

in the IR Thus all macro declarations are eliminated from the IR

There are some other constructs in the SimnML which are simplied in the IR

For example in the SimnML all memory variables o p r u l e s attribute names pa

rameter names in a n d r u l e etc are given unique identier names and everywhere

corresp onding identier name is used for reference As length of an identier name

is variable it wastes a lot of pro cessing time to retrieve the information ab out a par

ticular identier SimnML also allows the use of some identier name for o p r u l e s

even b efore they are dened This necessarily requires a to ol to do multiples passes

over pro cessor sp ecication Many of these identiers are not signicant at all for

example parameter names In the IR all signicant identiers are assigned a unique

integer key and all their references are replaced by the use of the corresp onding key

It simplies the information retrieval from the IR The mapping b etween the key and

the identier is also provided in the IR though the to ols may never need to refer to

these

Simplifying the Hierarchy

In SimnML information ab out an instruction is comp osed of fragments that are

distributed over the whole sp ecication tree with ro ot no de named as in s t r u c t i o n To

get information ab out one particular instruction a complete path from ro ot no de to

a leaf no de is traversed with prop er parameter substitution at all levels of the tree

If all such paths are traversed then information ab out all p ossible instructions are

obtained This pro cess is called attening of the tree In the IR information ab out

the instructions are attened using two dierent algorithms

First algorithm p erforms attening of all o r r u l e s and is describ ed in gure

Basically all references of any o r r u l e are eliminated from all the o r r u l e and a n d

ru l e denitions Therefore all o r r u l e denitions can b e eliminated from the IR

But some o r r u l e denitions might b e used by other to ols For example if ro ot no de

in s t r u c t i o n itself is an o r r u l e then information ab out all its children will b e useful for

the to ols Therefore all o r r u l e s resultant from the algorithm are stored collectively

at one place in the IR even if these o r r u l e s will not b e referenced anywhere

Elimination of o r r u l e parameters from an a n d r u l e denition results in generation

of new a n d r u l e s All attributes of the a n d r u l e remain unchanged in the new a n d

ru l e s To make the IR compact these new a n d r u l e s are treated as su b r u l e s of the

original a n d r u l e All su b r u l e s of an a n d r u l e are stored along with the a n d r u l e in the

IR The references for the attributes in the a n d r u l e are not duplicated for su b r u l e s

Working of the algorithm can b e understo o d with an example of SimnML program

given in gure Figure explains the working of the algorithm on o r r u l e s Figure

shows the working of the algorithm on a particular a n d r u l e

Representation of Attribute Denition

In SimnML language sp ecication memory variables m o d e r u l e s and o p r u l e s decla

rations dene attribute names and their denitions The attribute denition is either

an expression consisting of various op erands and op erator or a sequence of statements

Algorithm

For each o r r u l e R do the following steps

i

For all child no des of R do the following step

i

If the child no de is an o r r u l e C

i

then replace the child no de by all children no de of C

i

For each a n d r u l e A do the following steps

i

For each parameter P of A do the following step recursively

i i

If P is an o r r u l e say R where R has nchildren namely C C C

i 1 2 n

then create nnew s u b r u l e s and asso ciate them with the a n d r u l e A In

i

th

the i s u b r u l e the parameter P is declared of typ e C

i i

Figure Algorithm for Flattening of o r r u l e s

separated by a semicolon Each of these statements might b e a simple assignment

statement or a conditional statement or a function call or a use of an attribute from

some o p r u l e Refer to app endix A for SimnML grammar

For sy n t a x and im a g e attributes denition is always an expression which evalu

ates to a string In the IR a record is stored for each s y n t a x and im a g e attribute

denition The record includes a string value corresp onding to the expression The

string values are evaluated by a l g o r i t h m given in gure Basically the algorithm

p erforms substitution of parameter values in the expression to evaluate the string

value However the expressions also have references to parameters which can only b e

known at the run time of a to ol For example sy n t a x attribute denition of m o d e r u l e

REG has reference d for parameter i In the IR a tuple fXYZg is used after a

parameter reference such as d Each tuple represents a parameter which can b e

converted using parameter reference In the tuple X denotes an a n d r u l e Y denotes

a su b r u l e and Z denotes a parameter numb er Example in gure provides the IR

translation for the sy n t a x attribute denitions of a few a n d r u l e s in the example given

in gure

The record holds another string called do tex p re ssio n as shown in gure The

dote x p ressio n denotes the sequence of parameter substitution applied for calculating

s y n t a x and im a g e attribute values Each do tex p ression contains a numb er of

tuples each of typ e XY All tuples except for the rst one are put in parentheses In

type indexcard

reg PCcard

mode SHORT MEM REG

mode MEMiindexMRi

syntax formatRdi

image formatbi

mode REGiindexRi

syntax formatRdi

image formatbi

op instructionxinstracti on

syntax xsyntax

image ximage

op instraction aluop moveop

op aluopsrcSHORTdstSHO RTa aal uac tion

syntax formats ssaasyntaxsrcsynta xds tsy ntax

image formatb b baaimagesrcimage dst imag e

op aluaction aadd asub

op aadd

syntax add

image

op asub

syntax sub

image

op moveop move store

op movesrcSHORTdstSHORT

syntax formatmove sssrcsyntaxdstsy ntax

image format b bdstimagesrcimage

op storesrcSHORTdstSHOR T

syntax formatmove sssrcsyntaxdstsy ntax

image format b bsrcimagedstimage

Figure SimnML Program for a Hyp othetical Pro cessor

Before application of algorithm

op instraction aluop moveop

op aluaction aadd asub

op moveop move store

mode SHORT REG MEM

After application of algorithm

op instraction aluop move store

op aluaction aadd asub

op moveop move store

mode SHORT REG MEM

Figure Example of o r r u l e s Flattening

Before application of algorithm

op aluopsrcSHORT dstSHORT aaaluaction

After application of algorithm

op aluop

subrule src REG dst REG aa aadd

subrule src REG dst REG aa asub

subrule src REG dst MEM aa aadd

subrule src REG dst MEM aa asub

subrule src MEM dst REG aa aadd

subrule src MEM dst REG aa asub

subrule src MEM dst MEM aa aadd

subrule src MEM dst MEM aa asub

Figure Example of a n d r u l e Flattening

the rst tuple XY X denotes an a n d r u l e and Y denotes the corresp onding s u b r u l e

Rest of the tuples denote the parameters for the string A tuple dote x p ressio n

corresp onding to a parameter is the do tex p ressio n asso ciated with the corresp onding

a n d r u l e and su b r u l e

In the SimnML instructions are describ ed in a hierarchical manner The s y n t a x

and im a g e attribute records asso ciated with all the no des ie o p r u l e and m o d e r u l e

Algorithm

For each a n d r u l e rep eat following steps

2

Chose an a n d r u l e A

i

Take the sy n t a x i m a g e attribute denitions D of the a n d r u l e A

i

For each su b r u l e S of A rep eat following steps

i 1

If attribute denition D takes no parameter then D is the resultant string

value of s y n t a x i m a g e attribute for S

i

If D has reference to a parameter P of basic data typ e then insert a tuple

i

A S P in D

i i i

If D has reference to a parameter P of typ e a n d r u l e say A where numb er

i

3

of s y n t a x i m a g e attribute values asso ciated with A are n then create n new

s y n t a x i m a g e attribute values by substituting each sy n t a x i m a g e attribute

value in place of parameter reference P one by one These nnew attribute

i

values are asso ciated with the su b r u l e S and so with the a n d r u l e A

i i

Figure Algorithm for Flattening of S y n t a x I m a g e Attribute Denitions

in the sp ecication tree are evaluated The sy n t a x and im a g e records of instructions

in the instruction set are given by the sy n t a x and im a g e attribute records of the

o p r u l e named instruction Rest of the records hold enco ding of partial s y n t a x and

im a g e attribute strings In the IR the sy n t a x and im a g e attribute records for all

the a n d r u l e s are stored Although to ols such as assembler disassembler compiler

simulator etc need only the attribute records of o p r u l e instruction other records

might b e helpful for other purp oses such as to build the sp ecication tree back

Other attributes in the SimnML are used to hold semantic action asso ciated with

the instruction For example to simulate the b ehavior of an instruction attribute

denition of a c t i o n attribute is used A to ol such as the instruction set simulator

could b e made to run faster if such attribute denitions are represented dierently

Usually expressions inside an attribute denition are written in an inx notation using

priority and asso ciativity rules to deco de an expression uniquely However prex or

2

Andrules are chosen by starting with all leaf no des of sp ecication tree then all no des ab ove

the leaf no des and so on

3

Syntax image attribute values asso ciated with all subrules of an andrule are called syn

tax image attribute values of the andrule

Before application of algorithm

For and rule mode REG

subrule iindex

syntax formatRdi

For and rule mode MEM

subrule iindex

syntax formatRdi

For and rule op aadd

subrule no parameter

syntax add

For and rule op asub

subrule no parameter

syntax sub

For and rule op aluop

see figure for subrules

syntax formats s saasyntaxsrcsynta xds tsy ntax

After application of algorithm

For and rule mode REG

subrule syntaxstring dotexpr

Rd

For and rule mode MEM

Rd

For and rule op aadd

add

For and rule op asub

sub

For and rule op aluop

add Rd Rd

sub Rd Rd

add Rd Rd

sub Rd Rd

add Rd Rd

sub Rd Rd

add Rd Rd

sub Rd Rd

Figure Example of S y n t a x Attribute Denitions Flattening

p ostx notation is b etter for faster evaluation as the priority and asso ciativity b ecomes

implicit

In the IR prex notation is used for all attribute denitions except s y n t a x and

im a g e attributes Using such a representation to ols like simulator trace generator

compiler backend generator etc can b e made to run fast

Structure of the Intermediate Representation

As it is evident the structure of the IR should b e capable of storing information ab out

constants identiers o r r u l e s a n d r u l e s and information ab out attributes such as

s y n t a x im a g e a c t i o n etc Some of this information can b e represented in a xed size

data structure whereas rest of the information requires variable size data structure

For faster retrieval of information we separate out the variable size data structure

and store it at one place

The IR structure is essentially a collection of various tables Information of each

typ e is stored in a dierent table The entries in most of these tables are xed

size records However some tables hold variable size records We have group ed

the similar typ e of information under same table by creating dierent record Also

at some places we created two dierent tables for clarity although they b oth hold

information in similar typ e of record A table of contents is also added in the IR

which contains the lo cation and name of all the tables This simplies the access

mechanism for all tables In brief the IR consists of following tables

M e t a ta b l e This is a table of contents having a road map to know ab out the

lo cation and name of other tables in the IR

C o n s t a n t ta b l e This table holds the all constant declarations in the SimnML

pro cessor sp ecications For the example given in gure this table will contain

the following

name type value

REGS integer

R e s o u r c e ta b l e This table holds the names of the resources which are declared

with re so u rced eclara tion s Each resource is assigned a unique key by which it

is referred to at other places

A t t r i b u t e t a b l e This table holds the name and the corresp onding key of all

distinct attributes used in the input pro cessor sp ecication For the example

given in gure this table will contain the following

key name

syntax

image

Id e n t i e r t a b l e This table holds the name of all the identiers other than

those sp ecied in the c o n s t a n t t a b l e and in the re s o u r c e t a b l e Each identier

is assigned a unique key to refer to the identier at other places For the earlier

example the following is the contents of the identier table

key name type

PC regvar

MEM modeand

REG modeand

SHORT modeor

instruction opand

instraction opor

aluop opand

moveop opor

aluaction opor

aadd opand

asub opand

move opand

store opand

M e m o r y t a b l e This table holds the information ab out all memory variables

declared with a re g or a m e m declaration It includes a unique key typ e and

size of the data and information to lo cate various attributes of the variable

stored in other tables For the earlier example the following is the contents of

the identier table

key Namekey type size attribute

card

Note that instead of storing the name of memory variable ie PC the key

assigned in the id e n t i e r t a b l e is used

O r R u l e ta b l e This table holds the information ab out children of all o r r u l e s

m o d e r u l e s or o r r u l e s It holds records as sp ecied earlier in gure

A n d R u l e t a b l e This table holds the information ab out all a n d r u l e s m o d e

ru l e s and o p r u l e s along with the su b r u l e s asso ciated with them It also holds

the information to lo cate the attribute denitions stored in other tables It

holds records as sp ecied earlier in gure

S y n t a x t a b l e This table holds the syntaxrecord asso ciated with the s y n t a x

attribute denitions of all a n d r u l e s It also holds the information to asso ciate

the corresp ondence b etween the a n d r u l e ta b l e and the sy n t a x t a b l e as sp ecied

earlier in gure

Im a g e t a b l e This table holds the imagerecord asso ciated with the image

attribute denitions of all a n d r u l e s It also holds the information to asso ciate

the corresp ondence b etween the a n d r u l e ta b l e and the im a g e t a b l e It holds

records similar to the sy n t a x t a b l e

S t r i n g ta b l e This table is used for storing variable length string null termi

nated such as identier names This table helps in having xed size entries

in other tables For clarity we used identiernames and strings in example of

tables describ ed earlier In reality all such strings are stored in the s t r i n g t a b l e

and corresp onding index into the string table is stored in other tables

In t e g e r t a b l e This table is used for storing only integer values These inte

gers are asso ciated with other tables and represent dierent meanings in dier

ent contexts This table helps in having xed size entries in other table For

example list of attributes present for an a n d r u l e are stored as list of corre

sp onding attributekeys in the in t e g e r t a b l e The a n d r u l e holds the infor

mation which asso ciates the list of integers stored in the in t e g e r t a b l e as list of

attributekeys

Pre xA ttrib u teD e n ition Ta b l e This table holds the attribute denition of all

the attributes except s y n t a x and im a g e a t t r i b u t e s asso ciated with memory

variables and a n d r u l e s These denitions are stored in prex notation Other

tables store the information to lo cate the appropriate attribute denition cor

rectly

In App endix B we present the structure of each of the tables The following two

p oints are imp ortant

A crucial decision ab out the IR is whether it should b e a human readable text

le or a binary le We decided to have a binary le as output to enable fast

pro cessing by various to ols

The data enco ding of output le is dep endent on the pro cessor on which it

is created ie data enco ding can b e little endian or big endian dep ending on

the pro cessor A to ol can gure out the endianness of the IR by reading the

table of contents irresp ective of the typ e of the machine on which the to ols is

running For example the records of a m e t a t a b l e contain three elds n o o f r e c

si z e o f r e c and si z e o f t a b l e These elds in the rst record represent the m e t a

t a b l e entries itself Therefore the n o o f r e c contains the total numb er of tables

si z e o f r e c contains the size of each record in the meta table and si z e o f t a b l e

contains the total size of the meta table A to ol can read three values and check

if the following equation is satised

n o o f r e c si z e o f r e c si z e o f t a b l e

If this equation is not satised then the endianness of the IR and the machine

on which the to ol is running are not the same otherwise they are the same

Conversion from High Level to Intermediate

Representation

The conversion from SimnML to the IR is done in the following two passes

Pass Macro Prepro cessor

The IR do es not retain any macro denition from the source For ease of implementa

tion macro pro cessing is implemented as a separate pass over SimnML sp ecication

le This part is b eing done in another pro ject by Y Subhash Chandra but we are

also describing it here for the sake of continuity The macro prepro cessor takes the

SimnML le with macro denitions as input and pro duces a SimnML le without

macros It gathers all macro denitions and converts them into equivalent m

macro denitions Then m a standard utility available on Unix is run on this le

to get the SimnML le without any macros

Pass Parsing and Flattening the Hierarchy

Pass two takes a SimnML sp ecication le for a pro cessor as input and pro duces the

sp ecication in the IR for that pro cessor This pass pro ceeds in three phases

The rst phase involves the parsing of input le During the parsing all relevant

information is gathered in appropriate data structures Attribute denitions

for all attributes except sy n t a x and im a g e attributes are converted into prex

notations during the parsing time As so on as a denition is complete it is

stored in the prexattributedenition table In this pass three temp orary

les are used to store the stringtable the integertable and the prexattribute

denition table resp ectively Each of these table are later merged into the IR

In the second phase rst half of the tree attening is p erformed It eliminates

references of all o r r u l e s

In the third phase second half of the tree attening is p erformed All a n d r u l e s

are attened further and s y n t a x and im a g e attributes denition records are

created with prop er parameter substitutions as describ ed earlier

At the end of the second pass all tables are written in the output le and all cor

resp onding data structures are freed Temp orary les generated during this pass are

concatenated at prop er places in the output le During this pass all p ossible errors

at various places are also checked and appropriate error messages are generated In

case of an error in the rst phase the second and the third phases are not p erformed

Chapter

Design and Implementation of

Disassembler

A disassembler is a to ol which takes a binary le relo catable ob ject le executable

le etc as input and gives the corresp onding assembly language program as output

We have designed and implemented a g e n e r i c s y m b o l i c disassem b ler referred to as a

disassembler now onwards which takes an ELF binary le for a pro cessor and

generates the assembly language program The disassembler is generic and pro cessor

indep endent It takes a pro cessor sp ecication in the IR as another input The

disassembler generates symb ols to refer to the lo cations and functions rather than

the absolute addresses in the output assembly language program Thus the output

le resembles the original source from which the binary le was pro duced In the

output le the format of assembler directives is the ATT format and that of

the assembly language instructions is the one sp ecied in the pro cessor sp ecication

The pro cess of disassembly involves reading a binary instruction searching in the

instruction set and generating assembly language instruction The input binary le

contains almost all w e l l m o s t o f the necessary information of the original source

le Unfortunately the pro cess of disassembly is nontrivial as the binary le is

not designed to undergo disassembly Assemblers throw away a lot of information

present in the original source which is irrelevant to the execution of the program The

greatest problem in disassembling is to identify and distinguish co de instructions

and data as b oth are represented as sequence of bytes Furthermore designing a

generic disassembler involves extra eort b ecause information ab out instruction set

of a pro cessor is co ded in the pro cessor sp ecication le Instruction set of the

pro cessor must b e extracted in a format so that an instruction read from the binary

le can b e identied easily In addition information ab out numb er of instructions in

the instruction set length of an instruction parameters in an instruction etc varies

from pro cessor to pro cessor Various dierent pro cessors evaluate the target address

for jump instructions using bits available in the instruction in dierent ways which

aects the design of a disassembler

Lastly the complexity of the symb olic disassembler is high b ecause it uses symb ols

to refer to the lo cations While programming users normally use symb ols n a m e s to

refer to variables and functions The usually retain the names of functions

and global variables sometimes in the compiled binary les However symb ols

corresp onding to lo cal variables or lo cations are not retained Thus disassembler has

to generate new names if not available in the binary le

In this chapter we shall describ e the algorithm used by the disassembler for the

disassembly Basically the approach adopted is to p oint out what information is

available and how it contributes in the generation of the nal output

Input Binary File ELF Structure

Let us b egin by examining the structure of the binary le in ELF format w h i c h is

a n in p u t to t h e disassem b ler as taken from the manual

A le in ELF format always b egins with a header ca l l e d th e ELF h e a d e r which

is in a machine indep endent format so that it can b e read on any pro cessor This

header contains information which helps in interpreting the contents of the rest of

the le Thus the ELF header is the master key to the rest of the information in the

le A binary ob ject le contains information group ed together in logical units called

s e c t i o n s There are numerous sections in the ob ject le each dedicated to holding a

particular kind of information p r o g r a m d a t a co d e e t c Each section has a se c t i o n

h e a d e r which holds the necessary information to interpret the section The section

headers are collected and placed in a table called the se c t i o n h e a d e r t a b l e The ELF

header contains information to lo cate this section header table

The sections which are relevant for the purp ose of disassembly can b e briey

summarized as follows

text section This section contains the program co de

data section This section contains the initialized global program data

ro data section This section contains the initialized global readonly data

for example constants

bss section This section contains uninitialized global data

symtab section This section contains information regarding various sym

b ols used in the program fu n c t i o n s g l o b a l v a r i a b l e s e t c Typ e size and lex

emes are the chief pieces of information maintained for each entry The lo cation

se c t i o n o s e t pa i r is also stored for each entry

reltext section or relatext section As the name suggests this

section is the relo cation section with resp ect to the text section This section

contains the information needed by the linker to allow it to ll in values of

symb ols used in the text section which are only available at the link time

Basically this section provides for a mechanism to asso ciate a given oset in

the co de with an entry in the symb ol table This information is used in the

disassembly to regenerate the symb ol names in the output

Two Pass Design of Disassembler

In order to generate a full assembly le as output we need to generate

instructions p r e f e r a b l y u s i n g s y m b o l i c n a m e s fo r m e m o r y a n d s y m b o l re f e r

e n c e s within the text section

initialized data in c l u d i n g si z e a n d t y p e in f o r m a t i o n a l o n g w i t h sy m b o l n a m e s

from the data section

readonly data t h i s co u l d be c h a r a c t e r s t r i n g s from the rodata section

the makeup of the bss section t h e se c t i o n m e a n t fo r un in it ia liz ed d a t a w h i c h

d o e s n o t oc c u p y a n y s p a c e in t h e o b j e c t l e b u t w h o s e m a k e u p n e e d s t o be p r e

se r v e d be c a u s e s y m b o l s m a y be d e n e d w i t h re s p e c t to it

the assembler directives to glue up the whole output

The pro cessing of the text section is almost indep endent from the rest The

text section may contain data and other things apart from the program co de Al

though compilers do not mix data with co de an assembly language programmer may

do so Apart from this even simple actions like aligning the co de for a new function

to the nearest byte b oundary can intro duce gaps in the text section

The assembler lls in these gaps by some random or irrelevant value since these

lo cations are never executed The problem is that there is no way to distinguish data

from gaps within instructions If the normal pro cess of disassembly is allowed to take

its own course by treating these gaps as genuine co de the op co de alignment may b e

destroyed Once misaligned there is no way to recover and we may get unreliable

disassembly Thus it is absolutely essential to prevent the pro cessing of such gaps

We do this by identifying the basic blo cks in the co de section Each basic blo ck

constitutes a valid address range in the text section This is achieved by making

one extra pass of co de analysis on the text section Thus our disassembler is a two

pass disassembler

For a generic disassembler information ab out a pro cessors instruction set such as

s y n t a x and im a g e of instructions must b e extracted from the intermediate representa

tion of the pro cessor sp ecication When some instruction uses a reference which can

b e a symb ol the disassembler needs to resolve the reference for symb olic disassem

bly and use the symb olname in the assembly language output program Therefore

b efore discussing ab out the working of rst and second pass we will discuss ab out

references and then ab out the information extracted from the IR

Resolving References

The programmers normally co de their applications by dening symb ols in the program

in various sections text data etc The basic purp ose that these symb ols serve

is to asso ciate a name with a lo cation in one of the sections The programmer can

then refer to these lo cations using symb ol names

In the ELF le there is a relo cation section reltext or relatext This

section provides relo cation information with resp ect to the text section in most of

the relo catable ob ject les When the assembler encounters a symb ol say in the text

section it creates an entry in the relo cation section which asso ciates the o ccurrence

t h e o s e t a t w h i c h t h e sy m b o l re f e r e n c e oc c u r r e d in t h e re l o c a t a b l e o b j e c t l e a n d n o t

t h e lo c a t i o n o f th e a c t u a l s y m b o l it s e l f with the symb ol table entry of the symb ol In

some cases the assembler may even asso ciate the o ccurrence with the symb ol table

entry of the section with resp ect to which the symb ol is dened and include the oset

of the symb ol as the addend in the reference lo cation This primarily happ ens for

static variables whose information is not exp orted at the link time Both these cases

may arise and need to b e handled separately

Now when some instruction uses a reference which can b e a symb ol the disas

sembler needs to resolve the o ccurrence of the reference and use the symb olname

In the b est case the name used could b e the same as the original symb ol name

While resolving a reference to a lo cation we normally pro ceed to determine if there

is an entry in the relo cation table corresp onding to the o ccurrence in which case it

is enough to resolve whether the entry refers to an ob ject g l o b a l d a t a it e m or to a

section glo ba lst a tic it e m In the former case the name is available in the symb ol

table itself In the later case we need to generate a name for the symb ol but only

after ensuring that no other name has already b een generated for that symb ol

Extracting Information from Intermediate Rep

resentation

The in t e r m e d i a t e re p r e s e n t a t i o n I R of a pro cessor sp ecication provides a lot of in

formation ab out the pro cessor For the purp ose of disassembly we need the following

information ab out a pro cessors instruction set

Syntax and Image What is the assembly language sy n t a x and corresp onding

binary im a g e for the instructions

Arguments information For each instruction how many arguments are

needed the typ e and length of each of the arguments how to deco de the argu

ments and how to present the arguments in the assembly language

Control transfer instructions which are the instructions which can trans

fer control from one place to another These can b e further sub divided as un

conditional or conditional jump instructions unconditional or conditional call

instructions to a pro cedure and unconditional or conditional return from a

pro cedure instructions

Oset calculation For a control transfer instruction how do es a pro cessor

enco de the address of the next instruction

For a sp ecic disassembler for a pro cessor all this information can b e hardco ded

However for a generic disassembler this information must b e extracted from the

intermediate representation of the pro cessor sp ecication

Extracting Syntax and Image of instructions

The IR of the pro cessor sp ecication contains sy n t a x and im a g e records for all the

instructions We extract these corresp onding to the instruction o p r u l e These

records enco de the syntax of an assembly language instruction corresp onding binary

image and information ab out the arguments Arguments typ e information is found

with the help of the a n d r u l e t a b l e

The im a g e record includes a string corresp onding to the binary image of the

instruction The string do es not hold the binary image of the instruction verbatim

For example a record for add instruction describ ed in gure has the syntaxstring as

add rdfgrdfg and the imagestring as bfgbfg

The instruction is bits long and it takes two arguments Both arguments are

represented in bits are card typ e If instruction add rr is assembled

then its corresp onding binary image will b e Therefore we should have

a way to asso ciate the corresp ondence b etween the string stored in the im a g e record

and the binary image of the instruction read from the input binary le Further we

should b e able to nd the value of the arguments used by the instructions

For this purp ose we evaluate two binary strings namely image and imagemask

for each of the im a g e record Basically the image can b e taken as the string value

which results from the bitwise a n d ing of the binary im a g e of the instruction and

the imagemask Length of image is equal to the instructions length in bits The

algorithm is given in gure For the example of add instruction the image will b e

and the imagemask will b e

Now it is easy to nd out whether a given sequence of bits matches with any of the

instruction in the instruction set of the pro cessor It will b e a sequential a n d ing and

comparing op erations on the instruction set Moreover if an instruction is matched

then values of all the arguments can b e computed to generate the assembly language

Algorithm

For each im a g e record rep eat the following steps and

Take the imagestring from the im a g e record

If a bit or is stored in the imagestring

then f

copy the bit value as it is in the image

copy bit in the imagemask

g

If a parameter reference such as d is stored in the im a g e string

then f

Find the length L of the parameter

Copy L s in the image

Copy L s in the imagemask

Note the information ab out the parameter It includes p osition typ e

length a n d r u l e numb er s u b r u l e numb er and parameternumb er The

last three elds are available in the im a g e record as a tuple

g

Figure Algorithm for Calculating Mask Values

instruction

Instruction Matching Algorithm

As we describ ed earlier we can identify whether a given sequence of bits represent an

instruction or not using the sequential a n d ing and comparing algorithm This algo

rithm is simple but inecient The ineciencies will b e even higher if the instruction

set have variable length instructions

We have designed an ecient algorithm as given in gure This algorithm is

based on an observation that instruction set of a pro cessor uses some xed numb er

of bits for in any instruction By lo oking at these bits all the instructions

can b e divided uniquely into dierent categories All instructions will have same bit

Algorithm

Find maximum length l of the instruction

max

Initialize a generalmask G of length l with all s

max

Do bitwise a n d ing of string image of all the instructions with G At end the

G will have the required value

Now rep eat the following steps for all the instructions

Do bitwise a n d ing of the image with the G and call it R

if a bucket is having the bucketvalue same as the R then store the

instruction in the bucket

Otherwise create a new bucket and store the instruction in the bucket

Assign R as the bucketvalue for this bucket

For each bucket compute bucketmask The bucketmask is a string resulting

from the bitwise a n d ing of all image strings of the instructions stored in the

bucket

Sort the buckets according to the bucketvalues

Sort the instructions within each bucket according to the image string

Figure Algorithm for Calculating More Mask Values

values for the opcode in each category In each category again some xed numb er

of bits dierentiate among the instructions We call these bits as subcode Most

of the pro cessors use this twolevel of hierarchy in assignment of the opcode to the

instruction

In the algorithm a binary string named generalmask is calculated to iden

tify the instruction category We call these category as dierent buckets The

generalmask will have at bit p ositions used for For example we will

get the generalmask value as xFC x x x for PowerPC pro cessor

that has bit long instruction with rst bits as an opcode The instructions are

group ed in buckets Each bucket is assigned a bucketvalue which is the binary

string resultant from the bitwise a n d ing of the binary string image of the instruction

and the generalmask Each bucket is assigned a bucketmask to identify a instruc

tion among the instructions stored in each bucket The bucketmask has bit value

at all those p ositions which are used for the opcode and the subcode All buckets

and the instructions within each bucket are sorted with resp ect to the bucketvalue

and the binary string image resp ectively to reduce the searching time

Now the instruction matching algorithm is describ ed as follows

Call given sequence of bits to b e identied as D

Do the bitwise a n d ing of the D and generalmask

Find the bucket B where the instruction might b e stored For this purp ose do

the binary search with comparison of resultant string and bucketvalue

If no bucket is found then there is no such instruction

Otherwise do the bitwise a n d ing of the D and the bucketmask asso ciated with

the bucket B

Find the instruction I For this do the binary search with comparison of the

resultant string and the image

If search fails then there is no such instruction

Extracting Control Transfer Instruction

In the binary le there may b e gaps in b etween the instructions due to the alignment

constraints Control of the program execution never reaches to such gaps The ow of

a program is aected by the control transfer instructions We have divided the control

transfer instructions under six categories namely unconditional and conditional jump

instructions unconditional and conditional call to a pro cedure instructions and

lastly unconditional and conditional return from a pro cedure instructions The

pro cess of disassembly takes care of such instructions An o ccurance of an instruction

of such typ e is used to identify the address ranges that contains the co de Otherwise

disassembled instructions sequence might b e completely wrong Therefore we need

the information ab out all such instructions

Instructions under each category can b e found by a simple metho d The metho d

is based on the assumption that instructions are describ ed in a hierarchical manner in

the pro cessor sp ecication If a complete instruction sp ecication tree is made then

instruction of a category can b e marked under a subtree ie an instruction is put

under a particular category if the ro ot no de of the corresp onding subtree is traversed

during attening of the instruction If a pro cessor sp ecication is not written in this

manner then a little eort is needed to mo dify it One can add an o r r u l e with all

the instructions of a category as children no des of the o r r u l e

The disassembler takes an identier name for each category from the user These

identiers denote no des of various subtrees asso ciated with various categories As

describ ed earlier the s y n t a x and im a g e records of an instruction hold dotexpressions

which provide the sequence of no des traversed during attening of the instruction If

an instruction b elongs to any of these category then the ro ot no de of the category

tree must b e enco ded in the dotexpression If any of these no des is found in the dot

expression then the instruction is put under the corresp onding category Otherwise

the instruction is not a control transfer instruction and termed as a simple instruction

There can b e a situation when an instruction b elongs to two such subtrees This

will happ en if tree of one category is also a subtree of another category of tree For

example the PowerPC pro cessor do es not have any call typ e instruction It uses

jump typ e of instructions itself to transfer the control to a subroutine It stores a

return address in the link register and set some bits to treat the jump instruction

as a call instruction For such conditions instructions are matched according to

a priority rule We have assigned the priority to unconditional jump conditional

jump unconditional call conditional call unconditional return and conditional return

typ e of instructions resp ectively in that order If an instruction matches under two

categories it is put under the higher priority category

Evaluation of Next Instruction Address

To identify valid co de address ranges it is necessary to evaluate the target address

of the control transfer instructions A control transfer instruction either gives the

starting address of a new address range or causes the end of current address range

For a control transfer instruction each pro cessor enco des the address of the next

instruction in a dierent way For example the next instruction address for jump typ e

of instructions may b e sp ecied relative to the current The relative

oset value is enco ded in the instruction itself The enco ding of oset value might

b e dierent on dierent pro cessors In some cases the next instruction address can

b e determined only at the run time for example the case when the next instruction

address is taken from a pro cessor register or memory We are interested in nding out

the next instruction address from an instruction image of binary le if p ossible If its

value can not b e determined we do not use it for the identication of co de address

ranges

The intermediate representation of the pro cessor holds the attribute denitions

for all the instructions The attribute denition corresp onding to a c t i o n attribute

simulates the semantic of an instruction In the SimnML a register called PC has

sp ecial semantic and normally p oints to the nexttob e executed instruction For

controltransfer instructions the attribute denition of a c t i o n attribute must mo dify

the PC value in some way In some pro cessors there are more than one such sp ecial

register such as oldpc newpc currentpc

The disassembler takes a set of identier names from the user These are essen

tially the names of such sp ecial purp ose registers We extract the attribute denitions

corresp onding to a c t i o n attribute for all controltransfer instructions If we have the

binary image of an instruction we can get the address of the next instruction by

simulating the execution of the attribute denition When a statement mo dies the

value of the program counter any of the identier in set entered by the user its new

value is taken as the address of the next instruction If the statement requires a value

which is unknown then we can not determine the address of the next instruction

Implementation Details of the Disassembler

As we said earlier the disassembler is a two pass disassembler In reality these two

passes are made over text section only While disassembling the text section

information is gathered which aid in disassembly of the other sections In the rst

pass it gathers information like references list of basic blo ck etc which is used to

pro duce output in the second pass using symb ol name for references The disassembler

pro ceeds in following phases

Initialization Phase

As said earlier the disassembler takes ELF binary le and pro cessor sp ecication in

the IR as input The disassembler do es the following tasks in this phase

It identies the data enco ding of the host pro cessor using the algorithm given

in gure

It checks the integrity of the IR le by lo oking for the META TABLEtable

of contents at the start

It then reads the Meta Table entry and detects the data enco ding used in the

IR le as describ ed in section If the data enco ding of the host pro cessor

is dierent from that in the IR then a ag is set to indicate that the data read

from the IR le must b e converted to prop er data enco ding b efore its use

The disassembler then extracts the information required for disassembly from

the IR le as describ ed in the section

It checks the integrity of the binary le by checking the magic numb er in the

ELF header

From the ELF header it detects the data enco ding of the binary le and sets

a ag if data enco ding diers from the source architecture This indicates that

data read from the binary le must b e converted into prop er data enco ding

b efore its use

Lastly it reads in the information from the binary le to b e used for future ac

cess This information which includes things like the ELF header symb ol table

etc is held in appropriate data structures so that all the required information

is easily available

First Pass of Disassembly

In the rst pass of disassembly all basic blo cks of the co de are identied The

identication pro cess is based on the assumption that there must always b e some

way a pa t h to reach the co de If there is no such path ie there is no jumpcall

Assume that the u n s i g n e d c h a r a c t e r is byte long and the u n s i g n e d in t e g e r is

byte long

Take a u n s i g n e d c h a r a c t e r p ointer P and u n s i g n e d in t e g e r p ointer I Let I and

P b oth p oints to the same byte structure

Store at P P and P Store xFF at P

If value at I is equal to xFF

then dataenco ding of the pro cessor is b i g e n d i a n

If value at I is equal to xFF

then dataenco ding of the pro cessor is li t t l e e n d i a n

Figure Algorithm to Find Data Enco ding of the Host Pro cessor

to this co de it shall never get executed and hence we need not worry ab out it

Since the ELF binary le contains the names of the functions in the symb ol table

these are taken as the basic blo cks in the b eginning The algorithm then pro ceeds to

trace each one of these one by one in order to discover all p ossible program paths

A stack is used for storing the unpro cessed entry p oints An instruction is identied

using the instruction matching algorithm describ ed earlier in section After

each instruction matching instruction buer p ointer is moved ahead according to

instruction length of the matched instruction No attempt is made to interpret the

contents of the instructions except that a constant vigil is kept over control transfer

instructions

A control transfer instruction either gives a new entry p oint or causes the end

of the current trace If the instruction comes under the category of unconditional

return instruction or unconditional jump instruction then the instruction is the last

instruction of the current trace Otherwise tracing is continued All control instruc

tions except the instructions coming under the category of unconditional return and

conditional return give a new entry p oint The address of the next instruction is

found by the approach describ ed in the section

The information gathered in the rst pass is stored for use in the second pass The

disassembler maintains a list of pairs Each asso ciation consists of one entry p oint

in the text section and the corresp onding name by which it is referred The list is

built up during this pass Whenever a control transfer instruction refers to an oset

in the co de section an attempt is made to resolve the reference u s i n g th e a p p r o a c h

g i v e n in s e c t i o n If the reference is not resolved a new symb ol is generated

and app ended to the list Future references to the same oset would resolve into this

new name At the end the information a b o u t t h e sy m b o l s a n d t h e e n t r y po i n t s are

sorted with resp ect to the address of the entry p oints

At the end of this pass all adjacent basic blo cks are merged to form a bigger

basic blo ck The intention is to obtain range of addresses which contain only co de

and no datagaps After obtaining these ranges the second pass simply pro cesses

the regions covered by them Lastly all basic blo cks are sorted with resp ect to the

starting address to ease the translation to the assembly instructions

Second Pass of Disassembly

The ob jective of this pass is to generate assembly language instruction from their

binary counterpart Since the address ranges of valid co de have b een identied in the

rst pass we only need to disassemble the instructions in each address range The

instruction disassembly is carried out in the following steps

S y m b o l G e n e r a t i o n To p erform the symb olic disassembly at the b eginning of

the disassembly of an instruction it is checked whether a symb ol is asso ciated

with the address of the current instruction and if so the symb ol typ e function

name or just a lab el is also extracted If the symb ol refers to a function further

information regarding the typ e and size of the function is also extracted

In s t r u c t i o n G e n e r a t i o n An instruction is matched using the instruction match

ing algorithm a s d e s c r i b e d in se c t i o n and corresp onding assembly lan

guage instruction is output with appropriate parameters If the instruction is

a control transfer instruction a symb ol is found from the symb ol table con

structed during the rst pass and corresp onding symb ol name is used in the

disassembled instruction Further memory references to the data rodata

and other such sections are found In case of such a reference we try to resolve

the address The size of the op erand and the symb ol name is stored for the

reference

This pro cedure is rep eated for each instruction in all address ranges The gaps

within the text section are overlo oked for the purp ose of text disassembly One

p ossibility is to simply ignore the bytes in the gaps and change the current lo cation

counter so that it reaches the b eginning of the next valid address range However it

is p ossible that these gaps contain initialized data w h i c h a r e n o t re f e r e n c e d b y t h e

n o r m a l m e t h o d s fo r in s t a n c e u s i n g re g i s t e r in d i r e c t i o n in s t e a d o f s y m b o l s o t h e r w i s e

it w o u l d h a v e be e n d i s c o v e r e d d u r i n g t h e r s t pa s s In such a case ignoring them

might break the intended equivalence b etween the relo catable ob ject le and the

generated assembly co de Therefore we simply output bytes in the gaps as data and

generate appropriate pseudoops to glue the co de

Disassembly of Other Sections

While making the second pass through the text section information is gathered

regarding the references made to the other sections This information together with

the symb ol table information is used to disassemble the data section We simply

scan the data section lo oking for those osets for which a symb ol name is available

At these osets the corresp onding symb ol name is output as a lab el Moreover the

length information gathered during the second pass is also output for the data item

At all other osets the data is dump ed byte by byte

The rodata section is dealt similarly except that the symb ol names are not

retained in the binary le Thus they need to b e generated afresh

Chapter

Results and Conclusion

Results

We have discussed the design of the intermediate representation The IR fullls all

the goals which were setup b ehind the design of the IR Some advantages of the IR

are enumerated b elow

All information which was available in the SimnML sp ecication can b e re

trieved Moreover parsing eort needed in other to ols to get the required

information has b een saved

All forward references of identiers have b een resolved Thus multiple passes

are not necessarily needed in the to ols to resolve the references

Most of the tables in the IR contain xed size records Thus it is easy and

ecient to retrieve the required information from the IR

Hierarchy of the information has b een attened while retaining the path of

attening Therefore pro cessing needed in other to ols is reduced

All sy n t a x and im a g e attribute denitions of the instructions are available collec

tively in one table Therefore design of the to ols such as assembler disassembler

simulator etc is simplied as they need to gather information only from one

place

In the IR the attribute denitions for all the attributes except sy n t a x and

im a g e are represented in prex notation Therefore to ols such as simulator

trace generator compiler backend generator etc can b e made to run fast

The IR is exible enough for further extension One can add more tables in the

IR without any problem

The to ol IRGenerator is tested for PowerPC pro cessor sp ecication The

IRGenerator is tried on Pentium littleendian based Linux machines DECAlpha

littleendian based DEC machines and UltraSparc bigendian based Sun OS ma

chines The interop erability among these machines is also tried and found to work

We have also implemented the generic symb olic disassembler The disassembler

can take the IR for a pro cessor sp ecication and ELF binary for that pro cessor as

inputs The salient features of the disassembler are as follows

The disassembler is generic and pro cessor indep endent

The disassembler uses symb ols to refer to the lo cations and functions Therefore

the output le resembles the original source from which binary le was pro duced

The disassembler is tested for PowerPC IR generated through the IRGenerator

The disassembler is also tried on the ab ove mentioned architectures It is also veried

that it can take the IR generated on a littleendian pro cessor while running on big

endian pro cessor and viceversa

The disassembler is tested for several programs Some of the test results are given

in the table All the C programs are compiled using GNU C crosscompiler for

PowerPC pro cessor running on Pentium based Linux machines It is observed that

all the corresp onding instructions are matching in the source assembly program and

output assembly programs except those instructions which are not implemented in

the sp ecication Dierences in total line numb ers are coming due to unimplemented

instructions b ecause corresp onding binary images are output byte by byte in dierent

lines in the output assembly program

Some of the example outputs of the disassembler are given here

Program Numb er No of Lines No of Lines in No of Lines in

in C Program Assembly Program Output Assembly Program

Table Test Results

Example

The following C program compiled using GNU C crosscompiler for PowerPC

pro cessor running on Pentium based Linux machines

file examplec

main

int abc

a

b

c a b

The compilation results into the follwoing assembly program

file examplec

gcccompiled

section text

align

globl main

type mainfunction

main

stwu

stw

mr

li

stw

li

stw

lwz

lwz

add

stw

L

lwz

lwz

mr

blr

The result of disassembling the corresp onding relo catable le using our disassembler

is shown b elow

section text

align

globl main

size main

type mainfunction

main

gcccompiled

stwu

stw

or

addi

stw

addi

stw

lwz

lwz

add

stw

lwz

lwz

or

bclr

As it evident the disassembler generates a correct assembly language le with nec

essary assembler directives Moreover the symb olic name of the function main is

retained together with its typ e size and binding information The unnecessary lab els

used in the source assembly program have b een removed Instructions such as or and

mr are alias of each other In the output le or instruction is generated as that was

the one sp ecied in the pro cessor sp ecication The output le is crosscompiled and

disassembler is run on the corresp onding binary le The generated output is similar

to the original one

Example

Let us now take a more complicated example The source C program is shown b elow

The program has a conditional if statement that gets compiled to multiple branches

file examplec

main

int abmin

a

b

if a b min b

else min a

The assembly program generated by the crosscompiler is shown b elow

file examplec

gcccompiled

section text

align

globl main

type mainfunction

main

stwu

stw

mr

li

stw

li

stw

lwz

lwz

cmpw

bc L

lwz

stw

b L

L

lwz

stw

L

L

lwz

lwz

mr

blr

The result of disassembling the corresp onding relo catable le using our disassembler

is shown b elow

section text

align

globl main

size main

type mainfunction

main

gcccompiled

stwu

stw

or

addi

stw

addi

stw

lwz

lwz

cmp

bc text

lwz

stw

b text

text

lwz

stw

text

lwz

lwz

or

bclr

For this example we can observe that the sequence of instructions generated by the

disassembler is almost similar to the original except for a few symb ol names which

are generated by the disassembler New lab els like text text have the format

sectionname counter These symb ols are not there in the binary le as these are

considered lo cal and thrown away by the assembler

Conclusion

SimnML language an extension of nML machine description formalism is a simple

elegant and p owerful enough to mo del machine b ehavior at instruction level In this

thesis we have discussed an integrated environment where SimnML acts as the sp ec

ication language for pro cessor p erformance mo del in a generic way The integrated

environment helps in automatic generation of compiler assembler disassembler in

struction set simulator and trace generator

As part of the thesis work we have designed an in t e r m e d i a t e re p r e s e n t a t i o n I R

for pro cessor sp ecication written in SimnML language We have demonstrated how

the intermediate representation simplies the development of various to ols such as

compiler assembler disassembler instruction set simulator trace generator etc We

have also develop ed a to ol IR G e n e r a t o r which takes a pro cessor sp ecication writ

ten in SimnML language and pro duces the intermediate representation of pro cessor

sp ecication Further a G e n e r i c S y m b o l i c Disassem b ler is develop ed which takes an

intermediate representation of a pro cessor and a relo catable binary le in ELF format

as input and pro duces an equivalent program in assembly language of the pro cessor

We have also given the test results for PowerPC pro cessor Although these to ols

are tested only for PowerPC pro cessor sp ecications their design is generic enough

to b e used for all typ e of RISC and CISC pro cessor sp ecications

Future Work and Extensions

There are many things which can b e undertaken as an extension of this work

F l a t t e n i n g o f a l l A t t r i b u t e s In the IR all except sy n t a x and im a g e attribute

denitions are stored without any attening involving parameter substitution

If attribute denitions of all the attributes can b e attened then it might further

simplify the design of some to ols such as compiler backend generator

S u p p o r t fo r O t h e r F i l e F o r m a t Currently the disassembler can only accept

relo catable ob ject les in the ELF format It would b e nice if it could b e

extended to understand other format also such as COFF aout etc

App endix A

Grammar of SimnML Language

C o n v e n t i o n We have used following convention in describing the Context Free Gram

mar CFG of SimnML language

r u l e X jY means either X or Y is derived from rule We have written X and

Y in separate lines

Keywords are written in smallcase letters

The start symb ol is MachineSp ec

Y means the derivation of Y where X is used as a qualier For example X

Let Identier means an identier sp ecied in LetDef Const Expr means a con

Expr means an expression of card typ e stant expression Card

X Y Z means the derivation of Z where X and Y b oth are used as quali

ers For example Card Const Expr means a constant expression of card typ e

Para Mo de Identier means an identier used as a parameter name which is of

m o d e typ e

Following are the terminal symb ols used in describing the grammar We have

used regular grammar notation here

letter azAZ

digit

bin

hex af

alpha azAZ

Identifier letter alpha

CARDCONSTANT digit

FIXEDCONSTANT digit digit

BINARYCONSTANT bbin

HEXCONSTANT xhex

STRINGCONSTANT sequence of characters written in doublequotes

Following is the Context Free Grammar for SimnML language

MachineSpec

MachineSpec LetDef

MachineSpec MacroDef

MachineSpec TypeSpec

MachineSpec ResourceSpec

MachineSpec ExceptionSpec

MachineSpec MemorySpec

MachineSpec ModeSpec

MachineSpec OpSpec

LetDef let LetIdentifier ConstExpr

MacroDef macro MacroIdentifierList MacroExpr

TypeSpec type TypeIdentifier TypeDef

TypeDef bool

int CardConstExpr

card CardConstExpr

fix CardConstExpr CardConstExpr

float CardConstExpr CardConstExpr

IntConstExpr IntConstExpr

enum EnumIdentifierList

instidtype

IdentifierList Identifier

IdentifierList Identifier

ResourceSpec resource ResourceIdentifierList

ExceptionSpec exception ExceptionIdentifierList

MemorySpec MemRegPart CardConstExpr Type OptMemAttrList

MemRegPart mem MemIdentifier

reg MemIdentifer

Type TypeDef

TypeIdentifier

OptMemAttrList

MemAttrDefList

MemAttrDefList MemAttrDef

MemAttrDefList MemAttrDef

MemAttrDef volatile StringConstExpr

alias MemLocation

initial ConstExpr

uses UsesDef

MemLocation MemIdentifier ConstOptBitOptr

MemIdentifier CardConstExpr ConstOptBitOptr

ModeSpec mode ModeIdentifier ModeSpecPart

ModeSpecPart AndRule OptionModeExpr AttrDefList

OrRule

OptionModeExpr

Expr

OpSpec op OpIdentifier OpRulePart

OpRulePart AndRule AttrDefList

OrRule

OrRule OrIdentifierList

AndRule ParamList

ParamList

ParamListPart

ParamList ParamListPart

ParamListPart ParamIdentifier ParamType

ParamIdentifier ParamRuleIdentifier

ParamMemIdentifier

ParamType Type

RuleIdentifier

RuleIdentifier OpIdentifier

ModeIdentifier

AttrDefList

AttrDefList AttrDef

AttrDef AttrIdentifier AttrDefPart

syntax AttrStringExpr

image AttrStringExpr

action Sequence

uses UsesDef

AttrDefPart Expr

Sequence

UsesDef UsesOrSequence

UsesDef UsesOrSequence

UsesOrSequence UsesIfAtom

UsesOrSequence UsesIfAtom

UsesIfAtom UsesCondAtom

if BoolExpr then UsesIfAtom OptionElseAtom endif

OptionElseAtom

else UsesIfAtom

UsesCondAtom UsesAndAtom

BoolExpr UsesAndAtom

UsesAndAtom UsesActionAtom

UsesAndAtom UsesActionAtom

UsesActionAtom UsesDefAtom

UsesDefAtom UsesActionAttr OptionalTime

UsesActionAttr AttrIdentifier

action

ParamRuleIdentifier action

ParamRuleIdentifier AttrIdentifier

OptionalTime

CardExpr

UsesLocation MemIdentifier OptBitOptr

ResourceIdentifier

MemIdentifier CardExpr OptBitOptr

UsesDefAtom UsesLocation OptionalTime

ParamRuleIdentifier uses

UsesOrSequence

AttrStringExpr ParamRuleIdentifier syntax

ParamRuleIdentifier image

StringConstExpr

format STRINGCONST FormatIdlist

FormatIdlist FormatId

FormatIdlist FormatId

FormatId ParamMemIdentifier

ParamRuleIdentifier image

ParamRuleIdentifier syntax

Sequence

StatementList

StatementList Statement

StatementList Statement

Statement

nop

action

AttrIdentifier

ParamRuleIdentifier action

ParamRuleIdentifier AttrIdentifier

Location Expr

CondStatement

FunctionName ArgList

error STRINGCONST

FunctionName STRINGCONST

ArgList

Expr

ArgList Expr

OptBitOptr

CardExpr CardExpr

Location MemLocation

ParaLocation

Location Location

MemLocation MemIdentifier OptBitOptr

MemIdentifier CardExpr OptBitOptr

ParaLocation ParaMemIdentifier OptBitOptr

ParaMemIdentifier CardExpr OptBitOptr

CondStatement if BoolExpr then Sequence OptionalElse endif

switch Expr CaseList

OptionalElse

else Sequence

CaseList CaseStat

CaseList CaseStat

CaseStat CaseOption Sequence

CaseOption case ConstExpr

default

Expr UnconditionalExpr

AttrExpr

if BoolExpr then Expr OptionElseExpr endif

switch Expr CaseExprList

AttrExpr ParamRuleIdentifier syntax

ParamRuleIdentifier image

ParamRuleIdentifier AttrIdentifier

OptionElseExpr

else Expr

CaseExprList CaseExprStat

CaseExprList CaseExprStat

CaseExprStat CaseOption Expr

UnconditionalExpr ExprPart

coerce Type Expr

format StringExpr ArgList

FunctionName ArgList

ExprPart LogAndExpr

ExprPart LogAndExpr

LogAndExpr NotExpr

LogAndExpr NotExpr

NotExpr InclusiveOrExpr

InclusiveOrExpr

InclusiveOrExpr ExclusiveOrExpr

InclusiveOrExpr ExclusiveOrExpr

ExclusiveOrExpr AndExpr

ExclusiveOrExpr AndExpr

AndExpr EqualityExpr

AndExpr EqualityExpr

EqualityExpr RelationalExpr

EqualityExpr RelationalExpr

EqualityExpr RelationalExpr

RelationalExpr ShiftExpr

RelationalExpr ShiftExpr

RelationalExpr ShiftExpr

RelationalExpr ShiftExpr

RelationalExpr ShiftExpr

ShiftExpr AddExpr

ShiftExpr AddExpr

ShiftExpr AddExpr

ShiftExpr AddExpr

ShiftExpr AddExpr

AddExpr MulExpr

AddExpr MulExpr

AddExpr MulExpr

MulExpr PowerExpr

MulExpr PowerExpr

MulExpr PowerExpr

MulExpr PowerExpr

PowerExpr SimpleExpr

SimpleExpr CardExpr

SimpleExpr Expr

SimpleExpr

SimpleExpr

SimpleExpr

LocationOpd

SimpleOpearand

LocationOpd MemLocationOpd

ParamLocationOpd

LocationOpd LocationOpd

MemLocationOpd MemIdentifier CardExpr OptBitOptr

MemIdentifier OptBitOptr

ParamLocationOpd ParamMemIdentifier CardExpr OptBitOptr

ParamMemIdentifier OptBitOptr

SimpleOperand FIXEDCONST

CARDCONST

STRINGCONST

BINARYCONST

HEXCONST

N o t e Following op erators give the b o olean result in an expression

jj

N o t e We have used the zero value as false and the nonzero value as true for

b o olean expression

N o t e For MemSp ec one new attribute initial is dened to store the initial

values of a memory variable

App endix B

File Format of Intermediate

Representation

In this app endix we will discuss the layout of the le for the intermediate representa

tion The le consists of various xed or variable size tables where the name of each

table is xed A table named as meta table is always the rst table in the le All

other tables can reside anywhere in the le and can b e lo cated using the meta table

The following are the tables available presently in the IR

META TABLE

CONSTANT TABLE

ATTRIBUTE TABLE

RESOURCE TABLE

IDENTIFIER TABLE

MEMORY TABLE

AND RULE TABLE

OR RULE TABLE

SYNTAX TABLE

IMAGE TABLE

STRING TABLE

INTEGER TABLE

PREFIX ATTR DEF TABLE

Each table consists of an array of records Each record in a table constitutes of various

elds For each table all the elds of rst records are written rst in the le Then

all the elds of second record are written and so on We have used the word re c o r d

and e n t r y interchangeably The elds might b e stored either in li t t l e e n d i a n enco ding

or b i g e n d i a n enco ding dep ending on the pro cessor on which the le is created

C o n v e n t i o n Each table is describ ed by dening its record format We have

used a Clike struct denition to describ e a record For each record elds are

written from top to b ottom in the le In describing the record following data

typ es are b eing used

Byte u n s i g n e d c h a r

Word u n s i g n e d s h o r t in t

Dword u n s i g n e d in t

SByte si g n e d c h a r

SWord si g n e d sh o r t in t

SDWord si g n e d in t

String Null terminated array of c h a r a c t e r s

Address Dword

Offset Dword

B Meta Table

The Meta table holds the table of contents for all the tables which are present in the

le Each record of the meta table stores the information to lo cate a table Each

record has the following format

typ edef struct f

String ta b l e n a m e

s i z e Dword ta b l e

Address t a b l e o s e t

Dword to t a l re c o r d

Dword re c o r d s i z e

g Meta Record

t a b l e n a m e This eld stores the xed name of a table which is a byte null

terminated string Name of all the tables are written earlier

t a b l e si z e This eld holds the size in bytes of a table

o s e t This eld holds the starting oset in bytes of a table in the le t a b l e

re c o r d This eld holds the numb er of record stored in a table For the t o t a l

string table it holds the value

re c o r d si z e This eld holds the size of a record in bytes of a table If a

record for a table is variable in size then this eld contains the

value

The data enco ding of the IR is dep endent on the pro cessor on which it is created

ie data enco ding can b e little endian or big endian dep ending on the pro cessor

A to ol can gure out the endianness of the IR by reading the table of contents

irresp ective of the typ e of the machine on which the to ols is running First record of

the table represent the m e t a t a b l e entries itself Therefore the n o o f r e c contains the

total numb er of tables including the m e t a t a b l e si z e o f r e c contains the size of each

record in the meta table and siz eo fta b le contains the total size of the meta table

including the rst record A to ol can read these values and check if the following

equation is satised

n o o f r e c siz e o frec si z e o f t a b l e

If this equation is not satised then the endianness of the IR and the machine on

which the to ol is running are not the same otherwise they are the same In the

former case this equation must b e satised after the endianness conversion of the

elds values

B Constant Table

Each record of the constant table holds the informations ab out the constants see

section in the following format

typ edef struct f

Oset id n a m e

Dword v a l t y p

SDword v a l u e

g Const Record

id n a m e This eld holds the index into the string table As discussed

earlier string table holds null terminated strings Thus this

eld represents a reference to the constant name

v a l t y p This eld indicates typ e of the value asso ciated with the constant

for in t e g e r typ e or for a st r i n g typ e

v a l u e If the v a l t y p eld represents in t e g e r then this eld holds the

corresp onding s i g n e d in t e g e r value If the v a l t y p eld represents

st r i n g then this eld holds the u n s i g n e d in t e g e r index into the

string table from where a null terminated string value can b e

retrieved

B Resource Table

Each entry of this table holds the information ab out a re s o u r c e Each re s o u r c e is

assigned a unique integer key by which it is referenced at other places Each record

has the following format

typ edef struct f

n a m e Oset re s

Dword re s k e y

g Resource Record

n a m e This eld holds the index into the string table In the re s

string table the name of the re s o u r c e is stored at this in

dex

re s k e y This eld holds the key value assigned to the re s o u r c e

B Identier Table

This table holds the informations ab out all the identiers used in the pro cessor sp ec

ication le other than those sp ecied in the constant table and the resource

table Each identier is assigned a unique integer key which is used to refer to the

identier at other places Each record has the following format

typ edef struct f

Oset id n a m e

t y p Dword id

Dword id k e y

g Identier Record

id n a m e This eld holds an index into the string table The string

table holds a null terminated string at this index which is the

name of the identier

t y p This eld indicates the typ e of the identier and may have one id

of the following values

Undened Identier

Name of a memory Variable

Name of an o r r u l e of m o d e typ e

Name of an a n d r u l e of m o d e typ e

Name of an o r r u l e of o p typ e

Name of an a n d r u l e of o p typ e

Name of an Exception

others Unsp ecied

id k e y This eld holds the key value assigned to the identier

B Attribute Table

Each entry of this table holds the name of an a t t r i b u t e Each a t t r i b u t e is assigned a

unique integer key to refer to it at other places Each record has the following format

typ edef struct f

Oset a t t r n a m e

k e y Dword a t t r

g Attribute Record

a t t r n a m e This eld holds an index into the string table The string

table holds a null terminated string at this index which is the

name of the a t t r i b u t e

a t t r k e y This eld holds the key value assigned to the a t t r i b u t e

val is dened to store the N o t e For m o d e sp ecication one new attribute

optional expression asso ciated with

B Memory Table

Each entry of this table holds the information ab out a memory variable sp ecied

with re g or m e m sp ecication construct of SimnML language Each record has the

following format

typ edef struct f

Dword id k e y

Dword si z

Dword t o t a t t r

Dword m e m re g

t y p Dword d a t a

Dword v a l u e

Dword v a l u e

Dword a t t r li s t in d e x

Record g Memory

id k e y This eld stores the key value asso ciated with the identier

name of a memory variable The key value is assigned in the

identifier table

si z A memory declaration denes a memory base ie a set of mem

ory lo cations accessible under a name and an index This eld

sp ecies the numb er of such lo cations

t o t a t t r A memory declaration may also dene values for some predened

attributes This eld sp ecies how many attributes are dened

for the memory variable

re g This eld holds a value if the memory identier is declared m e m

using R e g sp ecication It holds if the memory identier is

declared using m e m sp ecication Both typ e of identiers are

similar in nature except that rst typ e of identiers refer to pro

cessor registers and second typ e of identiers refer to memory

lo cations

d a t a ty p

v a l u e

v a l u e A memory lo cation might hold values of dierent data typ es

The data typ e is enco ded in a tuple d a t a t y p v a l u e v a l u e

ty p sp ecies what typ e of values can b e stored First eld d a t a

in a memory lo cation Second and third eld stores the value

t y p eld Table shows the p ossible values according to the d a t a

for these eld

li s t in d e x If the t o t a t t r eld has a value then this eld is ignored and a t t r

should b e Otherwise it sp ecies an index into the integer

table At this index three integers are stored for each of the a t

a t t r t r i b u t e s Therefore the total numb er of integers are t o t a l

Each integer triple indicates a t t r k e y o s e t le n where the

k e y is the key corresp onding to a t t r i b u t e n a m e assigned in a t t r

the attribute table The second eld of triple o s e t is the

starting tuple numb er into the prefixattributedefinit ion

table where denition of the attribute is stored in prex nota

tion Third eld of triple le n is the numb er of tuples for its

attribute denition

B AndRule Table

This table holds the information ab out all the a n d r u l e s m o d e and o p type It

1

includes the information ab out su b r u l e s and a t t r i b u t e s The su b r u l e s of an

a n d r u l e are numb ered from to n and parameters are numb ered as to m from

1

Refer to section

Data Typ e d a t a t y p v a l u e v a l u e

b o ol

cardn n

intn n

xn m n m

oatn m n m

rangen m n m

enumid id m m

Table Enco ding of data typ es

left to right Each record has the following format

typ edef struct f

Dword a n d k e y

Dword id k e y

s u b ru l e Dword t o t a l

Dword t o t a l pa r a

a t t r Dword t o t a l

Dword a t t r li s t in d e x

li s t in d e x Dword pa r a

Rule Record gAnd

a n d k e y This eld holds an integer which is a unique key assigned to an

a n d r u l e This key is used later to refer to the a n d r u l e

k e y This eld holds the key value which is assigned to the identier id

name of the a n d r u l e in the identifier table

su b ru l e This eld holds the numb er of su b r u l e s generated by attening t o t a l

of the a n d r u l e

t o t a l pa r a This eld holds the numb er of parameters taken by the a n d r u l e

t o t a l a t t r This eld sp ecies the numb er of attributes dened for the a n d

ru l e

li s t in d e x If t o t a l a t t r eld has value then this eld is ignored and has a a t t r

value otherwise it sp ecies an index into the integer table

At this index three integers are stored for each of the attributes

Each integer triple indicates a t t r k e y o s e t a n d le n similar

to the one describ ed in the memory table There are two excep

tions here If a t t r k e y refers to a sy n t a x or im a g e attribute

then o s e t eld contains the starting index in the syntax table

or the image table and le n eld contains the total numb er of

sy n t a x or im a g e records corresp onding to the a n d r u l e

li s t in d e x If t o t a l pa r a eld has value then this eld is ignored Other pa r a

wise it sp ecies an index into the integer table At this index

three integers are stored for each of the parameter Initially all

parameters triples of rst su b r u l e are written then all parame

ter triples of second su b r u l e are written and so on Thus if we

have n su b r u l e s and m parameters then there will b e nm such

integer triples Each integer triple indicates d a t a t y p v a l u e

v a l u e ie the data typ e of parameter Table shows p ossible

values for elds of the triples

Data Typ e d a t a ty p v a l u e v a l u e

b o ol

cardn n

intn n

xn m n m

oatn m n m

rangen m n m

enumid id m m

andrule a n d k e y

Table Parameter Typ e for a n d r u l e

B OrRule Table

This table holds the information of all o r r u l e s m o d e or o p typ e Each entry de

2

scrib es the children no des of an o r r u l e Each record has the following format

typ edef struct f

k e y Dword o r

Dword id k e y

c h i l d Dword t o t a l

Dword c h i l d li s t in d e x

gOr Rule Record

2

Refer to section

o r k e y This eld holds an integer which is a unique key assigned to

an o r r u l e

k e y This eld holds the key value asso ciated with the identier id

name of the o r r u l e in the identifier table

t o t a l c h i l d This eld holds the integer numb er which indicate numb er of

children generated by the attening pro cedure for the o r r u l e

li s t in d e x This eld holds the index into the integer table where a list c h i l d

of a n d k e y values are stored Numb er of such a n d k e y values

is given by the value of to t a l c h i l d These a n d k e y are uses to

refer to the a n d r u l e assigned in the andrule table

B Syntax Table

This table holds the sy n t a x re c o r d s asso ciated with the s y n t a x attribute denition of

all a n d r u l e s Each record has the following format

typ edef struct f

Dword syn key

expr len Dword dot

Oset dot expr oset

expr len Dword syn

Oset syn expr oset

g Syntax Record

k e y This eld holds an integer which is a unique key assigned to sy n

a sy n t a x re c o r d In the andrule table the key is used to

get the attribute information of sy n t a x attribute

d o t e x p r le n This eld holds the length of a character string named as

do tex p ressio n Refe r to section

e x p r o s e t This eld holds the oset in bytes into the string table d o t

where actual dote x p ressio n is stored as a sequence of charac

ters

sy n e x p r le n This eld holds the length of the character string named as

sy n t a x s t r i n g of the instruction

sy n e x p r o s e t This eld holds the oset in bytes into the string table

where the sy n ta x strin g is stored as a sequence of characters

B Image Table

This table holds the im a g e re c o r d s asso ciated with the im a g e attribute denition of

all a n d r u l e s Each record has the following format

typ edef struct f

Dword img key

Dword dot expr len

Oset dot expr oset

Dword syn expr len

expr oset Oset img

g Image Record

im g k e y This is the unique integer assigned to each im a g e re c o r d In

the andrule table this value is used to get the attribute

information of im a g e attribute

e x p r le n This eld holds the length of the character string named as d o t

do tex p ressio n Refer to section

d o t e x p r o s e t This eld holds the oset in bytes into the string table

where actual dote x p ressio n is stored as a sequence of charac

ters

sy n e x p r le n This eld holds the length of the character string named as

im a g e s t r i n g of the instruction

e x p r o s e t This eld holds the oset in bytes into the string table sy n

where the im a g e s t r i n g is stored as a sequence of characters

B String Table

This table holds null terminated character sequences commonly called s t r i n g s These

strings are referred to by an index into the string table The rst byte at index

zero always contains a n u l l character Similarly the last byte also contains a n u l l

character ensuring n u l l termination for all strings A string whose index is zero

sp ecies either no name or a null name dep ending on the context We show one

example of the string table of size bytes in table and the st r i n g s asso ciated

with various indices in table

null i d e n t i f i e

r null P C null null i n s t

r u c t i o n null null

Table Example of the String Table

Index st r i n g

identier

PC

instruction

struction

null

Table Interpretation of the String Table

B Integer Table

This table holds list of u n s i g n e d in t e g e r values Dword typ e These integers represent

dierent meanings in dierent contexts The integers are referred to by an in d e x into

the integer table The rst entry always stored in this table contains The in d e x

refers to the starting entry and not the starting oset The oset can b e found by

multiplying the index and the the size of Dword

B PrexAttributeDenition Table

This table holds various a t t r i b u t e denitions in prex notation All attributes ex

cept the sy n t a x and im a g e are converted into the prex notation and stored in this

table Each item of the prex expression is stored in the following record of typ e

Record Tuple

typ edef struct f

Word ty p

SDword v a l u e

g Tuple Record

t y p This eld holds an integer value to indicate the typ e of tuple ie

an o p e r a t o r t u p l e or o p e r a n d t u p l e If tuple is of o p e r a n d ty p e

then this eld also enco des the typ e of op erand

v a l u e This eld holds a integer value which will b e interpreted accord

ing to the value of t y p eld

An attribute denition is stored in the andrule table and in the memory table

with the starting index into the prefixattributedefinition table and the num

b er of items in the prex notation of the denition Table shows the p ossible values

of ty p eld and corresp onding interpretation of v a l u e eld If the t y p eld holds the

value then the tuple is o p e r a t o r t u p l e otherwise the tuple is o p e r a n d t u p l e If the

tuple is of o p e r a t o r t y p e then v a l u e eld holds an integer which indicates op erator

name and arity Table shows all p ossible values for this eld and corresp onding

arity of the op erator

Typ e of the tuple t y p eld v a l u e eld

Op erator op erator numb er see table

Fixed constant signed integer value of

op erand

Card constant unsigned integer value of

op erand

Binary constant Oset into the string table

Hex constant Oset into the string table

String constant Oset into the string table

Memory variable key of the identier as assigned in

the memory table

Attribute typ e key of the attribute name as as

signed in the attribute table

Parameter typ e parameter numb er left most is

assigned numb er

Resource typ e key of the resource name as as

signed in the resource table

Exception typ e Key of the identier as assigned

in the identifier table

Table Interpretation of the tuple used in Prex Notation

There are as many op erands available as needed for an op erator Since the arity for

an op erator is xed the numb er of arguments is implicit For example an expression

P C P C is P C P C in prex notation and it has items The rst item is

an op erator Second is a m e m o r y v a r i a b l e with value eld b eing the index into the

memory table Third item is again an op erator The last eld is a x ed co n stan t

v a l u e Name of Op erator Symb ol Arity of Op erator

Addition Binary

Subtraction Binary

Multiplication Binary

Division Binary

MOD Binary

EXP Binary

Greator than Binary

Less than Binary

Equal to Binary

Not equal to Binary

GEQ Binary

LEQ Binary

Logical AND Binary

Logical OR j Binary

Logical XOR Binary

AND Binary

OR jj Binary

Left Shift Binary

Right Shift Binary

Rotate Left Binary

Rotate Right Binary

Dot Binary

Concatenation Binary

Indexing Binary

Assignment Binary

Statement Separator Binary

Unary Addition Unary

UNOT OPERATOR Unary

Unary Subtraction Unary

Bitwise NOT Unary

Bit Range Ternary

IF if then else Ternary

Function canonical function nary

Switch switch nary

default Expression default ary

NULL nothing ary

Hash Binary

Comma Binary

Condition fg Unary

Colon Binary

Table Op erators Used in Prex Attribute Denition

For detailed description of each op erator read the SimnML sp ecication given

in A p p e n d i x A There are some sp ecial cases which are describ ed here

The rst case is for Bit Range op erator which has the inx notation as

o p d o p d o p d

Equivalent prex notation used is as follows

o p e r a t o r b itr an g eop er ator o p d o p d o p d

The second case is for if then else If there is no op erand in e l s e part then

NULL op erator ary see table is b eing used

The third case is when there is a no attribute expression for an attribute We

have used NULL op erator to denote it

The fourth case is that of a switch op erator General inx notation for this is

switch expr

case Expr Sequence

case Expr Sequence

default Sequencei

case Exprn Sequencen

The corresp onding prex notation is as follows

operator switch

n expr

Expr Sequence

Expr Sequence

DEFAULT OPERATOR Sequencei

Exprn Sequencen

The rst item is an op erator with op erator name as switch Then next item is

a simple op erand tuple of Card constant typ e and value as n After that expr

will b e again written in prex notation It will b e followed by nop erands where

each op erand is an expression in prex notation and sequence of statements in

prex notation Default op erator is a ary op erator so it can b e taken as a

prex expression

The fth case is that of a canonical function General notation for this is as

follows

function name A r g A r g A r g A r g n

where each argument is again an expression The corresp onding prex notation

is as follows

operator function

length of name function name string

n Arg ArgArgn

The rst item is a function op erator Second tuple is a string constant typ e typ

String constant value byte oset into the string table where function name

is written Next item is a simple op erand tuple with typ as Card constant and

value as n Then each argument is represented in prex notation

There is one sp ecial case with function op erator where the function name is

co e r c e This function takes rst argument as a data typ e In the IR we con

vert data typ es to the basic data typ es and represent them using three numb ers

d a t a ty p e v a l u e and v a l u e as describ ed in table Thus the data typ e param

eter for the co e r c e function is converted to three integers internally Therefore

we have two extra parameters for this function Thus numb er of parameters

are increased by two

App endix C

Users Manual

In this thesis we develop ed two to ols IRGenerator and disassembler Both the to ols

have a command line interface that is conventional for the utilitiescommands in a

Unix system If a to ol is run without any arguments then it displays a small help

giving all the options

C IRGenerator

The IRGenerator is used to translate SimnML sp ecication into the IR It is avail

able as a command called irg

C Usage

Use irg d h w o irfilename SimnMLinputfile

d To get info in debugtmp

h to get this message

w to get warning messages Default no warning

o irfilename IR will be in file irfilename otherwise

default file name is IR

Descriptions for all options are as follows

d This is an optional argument It makes available a lot of debugging informa

tion in the debugtmp le in the current directory By default no debugging

information is generated

h This is an optional argument If this option is sp ecied then a small help

message is generated and all other arguments are ignored

w This option is used to see the warning messages These message are gener

ated while translating the SimnML sp ecication le into the IR By default no

warning messages are displayed These messages provide the information which

might b e useful for the sp ecication writers For example a c t i o n attribute for

all a n d r u l e s must b e sp ecied If there is an a n d r u l e with no a c t i o n attribute

denition then a warning is displayed Sometime this may b e the intention of

the user while sometime this may b e an error

l e n a m e This option is used to set the output le name By default o ir

output le named IR is created in the current directory

S i m n M L in p u t l e This argument must always b e present which represents

the input SimnML le without m a c r o s

If some errors o ccur in the translation then appropriate error messages are displayed

and to ol exits

C Disassembler

The disassembler is used to translate a relo catable binary co de to its assembly lan

guage counterpart The input binary le must b e in the ELF format The disassem

bler also requires a pro cessor sp ecication in the IR The disassembler is available as

a command disa

C Usage

Use disa d h w o outputfilename i irinputfile

c configfile objfilename

d To get debug info in debugtmp

h to get this message

w to get warning messages Default no warning

o outputfilename output assembly language file name

otherwise default file name is outfiles

i irfilename input file having IR of processor specification

otherwise default file name is IR

c configfile input file having various arguments for Disassembler

otherwise default file name is CONFIG

objfilename input relocatable ELF file to be disassembled

Descriptions for all options are as follows

d This is an optional argument It makes available a lot of debugging informa

tion in the debugtmp le in the current directory By default no debugging

information is generated

h This is an optional argument If this option is sp ecied then a small help

message is generated and all other arguments are ignored

w This option is used to see the warning messages These message are

generated while disassembling the binary le By default no warning messages

are displayed

l e n a m e This option is used to name the output le containing the o o u t p u t

assembly language program By default output le is named outles and is

created in the current directory

l e n a m e This option is used to name the pro cessor sp ecication le in i ir

the IR By default input le named IR is used in the current directory

c co n g l e This option is used to name the conguration le This le

contains the list of identiers corresp onding to control transfer instructions

By default input le named CONFIG in the current directory is used as

conguration le The format of each line in the conguration le is as follows

identifiertype identifiername

Each line starts with followed by a typ e name This typ e name provides

the typ e of identier followed and can b e one of the following

BRANCH UNCOND

BRANCH COND

CALL UNCOND

CALL COND

RETURN UNCOND

RETURN COND

After the id e n t i e r t y p e name of an identier is followed Basically each iden

tier name corresp onds to the category of control transfer instructions given by

1

id e n t i e r ty p e and represents a su b t r e e for its class in the sp ecication le

The co n g l e also holds the names of identiers which are used as program

counter in the pro cessor sp ecication Format for sp ecifying this information is

similar except that a list of identier names can b e given separated by commas

t y p e eld will have the value PC CLASS All these lines can b e in The id e n t i e r

any order An example le is given in gure for more clarity Here CIA and

CLASS registers used in the sp ecication le NIA are the names of the PC

PCCLASS CIA NIA

BRANCHUNCOND branchuncond

BRANCHCOND branchcond

CALLUNCOND calluncond

CALLCOND callcond

RETURNUNCOND brancondlr

RETURNCOND retcond

Figure Example of the Conguration File

l e n a m e This mandatory option provide the ELF binary le name to b e o b j

disassembled

1

Refer to section

Bibliography

Freerick M The nML Machine Description Formalism July

d o c s n m l d v i g z http w w w cstu berlin d e m f x d v i

Guilfanov I IDAPro The MultiPro cessor MultiOS Interactive Disassem

bler httpwwwdatarescuecomidahtm

Maturana James and Jeffery A Cycle Accurate Mo del of UltraSPARC

In Proceed in g s o f t h e In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t e r D e s i g n I C C D

httpwwwcomputerorgconferenpro ceediccdabstracthtm

Moona R and VRajesh Pro cessor Mo deling for HardwareSoftware Co de

sign In t e r n a t i o n a l C o n f o n VLSI D e s i g n Jan

Mslee Pro cessor Mo deling and Verication May

verhtml httpcamarskaistackrmsleeabstractmo d

Raksey N and Fernandez Sp ecifying Representations of Machine In

structions ACM Tra n s a c t i o n o n Progra m m in g La n g a u g e s a n d S y s t e m s

May httpwwwcsvirginiaedunrpubssp ecifyingabstracthtml

httpwwwcsvirginiaedunrto olkitexamplessparcsparcdishtml

Ramsey N and Cifuentes C SPARC Disassembler

httpwwwcsvirginiaedunrto olkitexamplessparcsparcdishtml

Rose Steeves and Carpenter VHDL Performance Mo delling Aug

httpwwwhtchoneywellcompro jectsrasspRASSPconf html

Pro cessor Mo dels References to VHDL To ols httprasspscraorg

vhdlmo delsmo dsl quick indexhtml

Shen J P and Noonburg D B A Framework for Sta

tistical Mo deling of Sup erscalar Pro cessor Performance T h i r d IE E E

S y m p o s i u m o n High P erform an ce C o m p u t e r A rc h i t e c t u r e H P C A

httpwwwfo olabscomderekn

Trimaran An Infrastructure for Research in InstructionLevel Parallelism Sep

httpwwwtrimaranorg

MDES Manual httpwwwtrimaranorgdo csmdes manualp df

Trung A D and John Paul S VMW A VisualizationBased Microar

chitecture Workb ench IE E E C o m p u t e r Dec

VRajesh A Generic Approach to Performance Mo deling and its Application

to Simulator Generator Masters thesis Department of Computer Science and

Eng I IT Kanpur July http w w w cseiitk acin u sersv rajesh sim n m l

Wolf and Yen Performance Estimation for Real Time Dis

tributed Emb edded Systems In Proceed in g s o f t h e In

te r n a t i o n a l C o n f e r e n c e o n C o m p u t e r D e s i g n I C C D

httpwwwcomputerorgconferenpro ceediccdabstracthtm

Y Subhash C MTech Thesis Work yet to b e submitted Masters thesis

Department of Computer Science and Eng I IT Kanpur April

Disassembling References to various Disassembler httpwwwituqeduau

GROUPScsmdecompilation disasmhtml MENURESEARCH

UNIX S y s t e m V R e l e a s e Prog ra m m ers G u i d e ANSI C a n d Prog ra m m in g

S u p p o r t To o l s

P o w e r P C RISC M ic ro p roce ss o r U s e r s M a n u a l

GNU Assembler Manual httpwwwfreebsdorginfoasallasallinfo

Manualhtml