Namespaces, Source Code Analysis, and Byte Code Compilation User! 2004
Total Page:16
File Type:pdf, Size:1020Kb
Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Introduction Namespaces, Source Code Analysis, • R is a powerful, high level language. and Byte Code Compilation • As R is used for larger programs there is a need for tools to Luke Tierney – help make code more reliable and robust Department of Statistics & Actuarial Science – help improve performance University of Iowa • This talk outlines three approaches: – name space management – code analysis tools – byte code compilation 1 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Why Name Spaces Static Binding of Globals Two issues: • R functions usually use other functions and variables: • static binding of globals f <- function(z) 1/sqrt(2 * pi)* exp(- z^2 / 2) • hiding internal functions • Intent: exp, sqrt, pi from base. Common solution: name space management tools. • Dynamic global environment: definitions in base can be masked. 2 3 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Hiding Internal Functions Name Spaces for Packages Starting with 1.7.0 a package can have Some useful programming guidelines: a name space: • Build more complex functions from simpler ones. • Only explicitly exported variables are • Create and (re)use functional building blocks. visible when attached or imported. • A function too large to fit in an editor window may be too • Variables needed from other packages complex. can be imported. Problem: All package variables are globally visible • Imported packages are loaded; may • Lots of little functions means clutter for user. not be attached. • Lots of functions means name conflicts more likely. • Consequence: often use big functions with repeated code. 4 5 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Name Spaces for Packages (cont.) NAMESPACE File Directives Adding a name space to a package involves: • export • Adding a NAMESPACE file export(as.stepfun, ecdf, is.stepfun, stepfun) • Replacing require calls by import directives. • exportPattern • Replacing .First.lib by .onLoad (and maybe exportPattern("\\.test$") ). .onAttach • import import(mva) • importFrom importFrom(stepfun, as.stepfun) 6 7 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 NAMESPACE File Directives (cont.) NAMESPACE File Directives (cont.) • useDynLib • exportClass, exportClasses useDynLib(stats) exportClasses(mle, profile.mle, summary.mle) • S3method • exportMethods S3method(print, dendrogram) exportMethods(AIC, BIC, coef, confint, logLik, ...) • importClassFrom • importMethods 8 9 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Name Spaces and Method Dispatch Name Spaces and Method Dispatch (cont.) • S3 dispatch is based on combining generic and class name. • One solution: register S3 methods with the generic. – no hope of private classes – methods are always available to the generic. – methods need not be exported • Looked up in environment where generic is called. ∗ enforces calling methods only via generic. • Problem: if a package is imported but not attached its ∗ simplifies author/maintainer’s task methods may not be visible at the call site. • Name space integration is conceptually simpler for S4 classes, methods, and generic functions. • The current implementation is evolving and may become simpler. 10 11 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Name Space Odds and Ends Source Code Analysis • Name spaces are sealed. • R provides a powerful infrastructure for managing test code. – cannot add internal variables, imports, exports • Test code alone cannot cover all possible execution paths. – cannot change values by assignment • Source code analysis provides a complementary approach. – simplifies implementation • Source code analysis examines code for possibly erroneous – helps with byte code compilation constructs: • Exports can be accessed by “fully qualified name”, e.g. – using variables that are not defined stats::ppr. – calling functions with the wrong number of arguments • Internal variables can be accessed using a triple colon, e.g. – calling functions with incorrect argument types stats:::vcov.coxph • Name spaces help to make checks more precise. 12 13 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Some Source Code Analysis Tools An Artificial Example g<-function(x, y = TRUE) { • The package codetools provides some experimental tools: exp <- y w <- x – checkUsage checks individual functions. y <- x – checkUsagePackage checks a loaded package. if (exp) exp(x+3) + ext(z-3) else log(x, bace=2) • It is likely that these will eventually be merged into the } tools package. Results of a code analysis: > checkUsage(g,name="g") g: no visible global function definition for ’ext’ g: no visible binding for global variable ’z’ g: possible error in log(x, bace = 2): unused argument(s) (bace ...) g: local variable ’w’ assigned but may not be used 14 15 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 An Artificial Example (cont.) Issues and Tradeoffs A More Sensitive Analysis: • Finding the right sensitivity is challenging > checkUsage(g,name="g",all=TRUE) g: local variable ’exp’ may shadow global value – high sensitivity causes too many false positives g: no visible global function definition for ’ext’ – low sensitivity misses too many real errors g: no visible binding for global variable ’z’ g: possible error in log(x, bace = 2): unused argument(s) • Current approach allows tuning by various parameters (bace ...) g: local variable ’exp’ used as function with no apparent local function definition • Another option is to attempt to prioritize messages g: local variable ’w’ assigned but may not be used g: parameter ’y’ changed by assignment – has had some successes on Linux kernel code 16 17 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Issues and Tradeoffs (cont.) Profiling • Rprof takes snapshots of call • More sophisticated checks are needed stack • Are user-supplied checks possible? • summaryRprof reports • Inferred or estimated type information may help cumulative time in each function. • Intra-procedural analysis may also help • Tools to show more detail may • Partial evaluation may be useful help. • May also be able to detect possible inefficiencies • One example: call graph color • Source annotation mechanisms may help coded by total time in function. • Package proftools contains some first steps. 18 19 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Byte Code Compilation A Simple Example • Compilation can improve efficiency: • Simplified normal density function: – user code will run faster f<-function(x, mu=0,sigma=1) – less native system code needed (1/sqrt(2 * pi)) * exp(-0.5 * ((x - mu)/sigma)^2) / sigma • Developing a compiler can clarify the language: • Compiled with – features that are hard to compile are hard to understand fc<-cmpfun(f) • Code analysis is closely related to compilation – code analysis for compilation is more conservative – code analysis for correctness is more speculative 20 21 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Generated Code Generated Code (cont.) • Byte code and assembly code for a stack machine: • The compiler √ √ 16 1 LDCONST 0.398942 push 1/ 2π – folds constant expressions like 1/ 2π 16 2 LDCONST -0.5 push constant −0.5 20 3 GETVAR x get, push x – inlines basic arithmetic functions 20 4 GETVAR mu get, push mu 45 SUB subtract • Some timings for 1,000,000 repetitions: 20 5 GETVAR sigma get, push sigma 47 DIV divide 16 6 LDCONST 2 push 2 Function x = 1 x = seq(0,3,len=5) 48 EXPT pop x, y, push xy 46 MUL multiply f 14.62 17.75 50 EXP pop x, push ex 46 MUL multiply fc 3.95 7.67 20 5 GETVAR sigma get, push sigma 47 DIV divide dnorm 4.59 7.24 1 RETURN pop, return value • Most improvement comes from constant folding. 22 23 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 The Virtual Machine Compiler Operation • byte code instruction set • Optimizations: • stack architecture – constant folding – special opcodes for most SPECIALs, many BUILTINs • similar approach to Python, Perl, many Scheme systems – inlines simple .Internal calls: dnorm(y, 2, 3) is • also related to JVM, .NET replaced by .Internal(dnorm(y, mean = 2, sd = 3, log = FALSE)) • Alternatives: – special opcodes for many .Internals – threaded code (using GCC extensions) • Compiler currently uses a single recursive pass: – generate C code – generate JVM, .NET code – fast, but limits optimizations. 24 25 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Namespaces, Source Code Analysis, and Byte Code Compilation UseR! 2004 Timings and Performance