
Introspection via Self Debugging Russell Harmon [email protected] Rochester Institute of Technology Computer Science December 12, 2013 1 Introduction The omnipresent support forintrospection inmodernprogramming languages indicates the usefulness ofthe tool. [4, 5, 19, 21] Unfortunately, C, which is one ofthe most pervasiveprogramming languages and the foundation of nearly every modern operating system, does not support introspection. Byleveraging an existing debugger, an API entitled Ruminatebrings introspection to the programming language C. Debuggers have long had access to the type information which is needed for introspection. On most UNIX platforms,this is accomplished bythe debuggerreading any debugging symbolswhich may be presentin the target binary, and inspecting the stateofthe target program.These debugging symbolsarenot present by default and the compiler must be instructed to add the symbolsat compile time.These techniques are leveraged to gain the information which isneededtocreate an introspection API and building on that, an API which can convert arbitrary types to and from JSON. 2Motivation One ofthe motivating factors for anylanguage introducing introspection as a feature is the following use case: You are tasked with changing the save game format of a popular1980sstyle terminal based game from a binary format composed of writing the structswhich compose the game state to disk toamore flexible JSON format. Afterinvestigation,you discoverthatinordertodothis, you can use the Jansson [17] C library toproduce JSON.Inordertodoso,you invoke variants of the json_object_set function as given by the following prototype: int json_object_set( json_t *object, const char *key, json_t *value ); You observe that json_object_set takes as parameters the name and value ofthe field tobe written necessitating the writing of aseparate json_object_set call for every fieldof every aggregate type. After considering the literally thousands offields across the nearly three hundred structs in the game you give up in frustration. If aprogrammer were able to introspecttypes inC,they couldwrite a generalized JSON conversion function which coulddetermine the name of every aggregate type and aggregatemember procedurally thereby 1 significantly shortening the amount of code needed. A programmer couldalso use an introspective library for creation of platform independent binary structure representations for use innetwork communication. Clearly, it isasignificant convenience todevelopers to be able towritecodewhich is able to introspect upon data in a meta-programming style. 3 Introspection in Current Programming Languages Introspection is found in many ofthe programming languages commonly used todayincluding Java [19], Ruby [4], Python [21], Perl [5] and a limited formofintrospection inC++ [29]. The various approaches to introspection differin implementation details; some receiving introspection as a direct consequence ofthe way theyimplement objectswhile some provide it as part ofthe standard library. Despite this,they all provide approximately the same set offeatures.Itisbythese features thatintrospection can be defined,ratherthan the details of how the features are implemented. Introspection implementations generally provide several differentforms ofintrospection. A common form ofintrospection provided is type introspection. Specifically, aprogram leveraging type introspection is able to inspectthe types ofidentifiersorvalues used in the program. Anotherformofintrospection is function introspection.This formofintrospection allows programs to retrieve information aboutfunctions which isnot part ofthe type system, such as the function’s name or argument’s names. Finally, a third formof introspection is stack introspection.Thisallows a program to retrievealist of stack frames at agiven point in a program’s execution, commonly referred to as a stacktrace or backtrace. Existing attempts to add introspection toCor C++ frequently requireaseparatedescription ofthe object tobeimplemented which is generated using a separateparser[23], a complementary metadataobject [2], or requirespecificcodetobewritten that describes the type. All ofthese introspection implementations have the limitation that objectswhich come from externallibraries cannot be introspected. Ruminate has neither this library boundary limitation norrequires external compile-time toolsor hand written object descriptions inorderto operate.Instead, Ruminate requires only thatthe library or executable to introspect contain debugging symbols. 4 Debugging in C Therealready exist anumber oftools forinteractive debugging of Cprograms. Some ofthe morewell known ones include GDB [9], WinDBG, Visual Studio’s debugger and LLDB [25]. Traditionally, these debuggers havebeenusedinteractively via the command line wheremore recently debuggers such as the one embedded 2 within Visual Studio integrate into an IDE. Anunderstanding of debugging in general, and about LLDBspecifically arecrucialto the understanding of this document, so some time will be spent explaining debugging. Conceptually, a debuggeris composed oftwo major components, asymbol parser and a process controller. Among othertypes of symbols inabinary, Linux usually uses DWARF [6] debugging symbols.These debugging symbolsare intended for a debuggertoparse and informs the debugger about some information which isnot availableotherwise from inspection ofthe compiled binary. This information includes the source file name(s), line numberto compiled instruction mappings and type information.Interactive debugging using a debugger is possible without debugging symbols, but difficult. The other major piece of a debugger is the ability to control another process.This isnecessary inorderforthe debuggerto inspect or modify a debugee’s runtime state, set break or watchpoints and intercept signals.Inorderto accomplish this, specific support must existin the kernel which is hosting the process to be controlled. Across the various modern platforms,thereexistsseveral differentimplementations enabling one process to control another. On Linux, the API for process control is ptrace(2) [20]. An important aspect ofthe type information which isavailable to a debuggeris thatthis information isalmost entirely static. Forinstance, during an interactive debugging session when printing a variable the debugger knows only the type ofthe variable being displayed,ratherthan the type ofthe data itself. This is instark contrast withotherintrospective languages where the type information iscarried with the data and can be recovered without any additional context. An example of the result of this under LLDBisshownin Fig. 1. Notice that even though the value of baz is the string "Hello World!", because the type of baz is void *, LLDB is unable to deduce the type. 4.1 LLDB LLDB [25] is a debugger built on the LLVM [27] framework. Designed to be used as a library, it vends a public C++ API which is promised to be relatively stable, and has bindings to Python in which the LLDB authors have written its unit test suite [24]. Figure2showsasimple debugging session using LLDB.In it, a test program is launched and the value of astack-localvariable isprinted.Take note that LLDB isaware thatthe type of foo.bar is char *.In factregardless ofthe language most debuggers make available to their users a non-strict subset ofthe type information which is available to the programmer writing the original source file. Under LLDB’spublic API, a type is represented by an SBType [18]. Inorderto get an instance of SBType,you can eitherretrieve the type by name, orretrieve the type of a variablebythatvariable’s name 3 Process 12066 stopped * thread #1: tid = 0x1c03, 0x0000000100000f64 a.out‘main + 20 at a.c:3 frame #0: 0x0000000100000f64 a.out‘main + 20 at a.c:3 1 int main() { 2 void *baz = "Hello World!"; -> 3 } (lldb) print baz (void *) $0 = 0x0000000100000f66 Figure 1: Static Type Information in Debuggers Current executable set to ’./a.out’ (x86_64). (lldb) breakpoint set -n main Breakpoint created: 1: name = ’main’, locations = 1 (lldb) run Process 10103 launched: ’./a.out’ (x86_64) Process 10103 stopped * thread #1: tid = 0x1c03, 0x0000000100000f60 a.out‘main + 16 at a.c:6 frame #0: 0x0000000100000f60 a.out‘main + 16 at a.c:6 3 }; 4 int main() { 5 struct foo foo; -> 6 foo.bar = "Hello World!"; 7} (lldb) next Process 10103 stopped * thread #1: tid = 0x1c03, 0x0000000100000f64 a.out‘main + 20 at a.c:7 frame #0: 0x0000000100000f64 a.out‘main + 20 at a.c:7 4 int main() { 5 struct foo foo; 6 foo.bar = "Hello World!"; -> 7 } (lldb) print foo.bar (char *) $0 = 0x0000000100000f66 "Hello World!" Figure 2: Interactive Debugging with LLDB 4 with the debugee stopped at abreakpoint. Once thatis accomplished, an SBType can give you much ofthe static type information about that variable which exists in the target’s debugging symbols. When an operation isperformed on an SBType, LLDB lazily retrieves the type information needed to service that operation. Building on clang [3], LLDB uses the debugging symbols to generate a partial clang AST. This AST is then retained for future inspection of that type. 5RelatedWork A System for Runtime Type Introspection inC++ [2] discusses an approach to introspection for C++ whereby metadata objectsarecreated using macros which areexpected tobecalled atthe definition ofthe object which is to be introspected. The Seal C++ Reflection System [23] discusses an introspection system for C++ which uses a metadata generation tool to create descriptor files which contain the information needed for introspection.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages62 Page
-
File Size-