Towards the Future Internet 41 G. Tselentis et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-539-6-41 Towards Security Climate Forecasts1

Stephan NEUHAUS 2 and Fabio MASSACCI Università degli Studi di Trento, Trento, Italy

Abstract. The complexity and interdependencies of deployed software systems has grown to the point where we can no longer make confident predictions about the security properties of those systems from first principles alone. Also, it is very com- plex to state correctly all relevant assumptions that underlie proofs of security, so that attackers constantly seek to undermine these assumptions. Complexity metrics generally do not correlate with vulnerabilities and security best practices are usu- ally founded on anecdotes rather than empirical data. In this paper, we argue that we will therefore need to embrace empirical methods from other sciences that also face this problem, such as physics, meteorology, or medicine. Building on previous promising work, we suggest a system that can deliver security forecasts just like climate forecasts.

Keywords. Large System Security, Empirical research, Climate forecasts

Introduction

Software systems have become so complex that we can no longer effectively understand the behaviour of the entire system from the behaviour of its smallest parts alone: in order to find out about the security of a complex system, it is no longer sufficient to know the precise semantics of an if or while statement. This is not because such knowledge is useless, but rather because the amount of interaction between program statements, interpreters, compilers, libraries, operating systems and the run-time environment creates an explosion in interconnection that makes predicting system behaviour uncertain at best and impossible at worst. This situation is worsening as we progress towards the Future Internet, where applications will consist of large numbers of interacting components, written by different organisations according to different development standards. Physicists face a similar problem when asked to describe the behaviour of an ideal gas: from Newtonian mechanics, it should be possible to compute the temperature or pressure of a gas by calculating the movements of individual gas moecules, but in prac- tice there are so many that this is impossible. Thus, high-level gas laws describing the behaviour of a gas as a whole were invented. In a similar way, medical researchers and meteorologists use models based on differential equations to forecast the spread of in- fectious diseases and to make climate forecasts, respectively. We propose to use a similar approach, where we no longer infer the macroscopic be- haviour of a system from properties of its microscopic constituents such as library com-

1Research supported by the EU under the project EU-IST-IP-MASTER (FP7-216917) 2Corresponding author 42 S. Neuhaus and F. Massacci / Towards Security Climate Forecasts ponents, programming language statements or even machine instructions, but where we make statistical predictions about the system’s behaviour from simpler, higher-level met- rics. Note that our point is not that it is only complexity, as understood by software engi- neers and expressed in complexity metrics, that is causing the inability to predict system behaviour. If that were the case, adding more computing power would make predictions possible again. Rather, it is that causes and effects of vulnerabilities are non-local (i.e., can occur at widely separated points in the same source code, in different source codes, between code in one system and code in a library, in the interaction between some code and the , etc.) and are hence not accessible to source code complexity metrics. Vulnerability is a systemic issue. In the rest of this paper, we will first show case studies of two very large and widely deployed software systems (Section 1) in order to make the point that the sheer size of software makes it important to make reasonably precise predictions about the location of vulnerabilities: otherwise, quality assurance teams won’t know where to look. Next, we briefly review the state of the art in software security (Section 2), and then propose our solution to the problems that we identified (Section 3). We finish with the outline of a research program that could address these points, and conclude (Section 4).

1. Case Studies

1.1. Mozilla

Mozilla3 is a large software system: As of 4 January 2007, Mozilla contained 1,799 directories and 13,111 C/C++ files, which can be combined into 10,452 components, which are files whose names only differ in their suffixes. These files have a large degree of interconnection: there are over ten thousand unique #include statements, and over ninety thousand unique function calls. At the same time, the Mozilla Foundation has published 134 Mozilla Foundation Security Advisories (MFSAs), which caused 302 bug reports. Of all 10,452 components, 424 or 4.05% were found to be vulnerable. [16] Studies by Shin and Williams [19] have looked at a particularly vulnerability-ridden part of Mozilla: the JavaScript engine. JavaScript is a way to execute code from within Web pages; it is executed inside a sandbox, which regulates all access attempts by the JavaScript program to the outside world. Still, attacks using JavaScript have shown an remarkable ability to break out of their sandboxes into the execution environment. Shin and Williams have looked at the correlation between code complexity metrics and vulnerabilities inside Mozilla’s JavaScript engine and found only low correlations; such correlations would not be enough to predict which components inside the JavaScript engine had as yet undiscovered vulnerabilities. This supports our point, since code com- plexity metrics operate under the assumption that vulnerabilities are local phenomena; after all, code complexity metrics are, by definition, computed at the source code level. Let us now put ourselves in the shoes of a Mozilla QA engineer. The new release is days away and we have the suspicion that there are components with undetected vul- nerabilities. Yet, even with a large staff, we cannot hope to inspect comprehensively all 10,000 components: we have to make a choice, since our QA resources are finite. Out of

3http://www.mozilla.org/ S. Neuhaus and F. Massacci / Towards Security Climate Forecasts 43

Figure 1. Location of vulnerabilities in Mozilla source code. those 10,000 components we can perhaps choose ten for a thorough inspection. Which ten are going to be the lucky ones? Going only by personal experience, we would probably tend to use the anecdotal knowledge floating around in our QA team and look closely at those modules that already have reported vulnerabilities and are therefore known troublemakers. How good would such an approach be? In our own study of Mozilla [16], we we first mapped Mozilla Foundation Secu- rity Advisories (MFSAs) back to those source files that were fixed as a consequence of these MFSAs. From this purely empirical approach, we got two main results. The first is shown in Figure 1, which contains a graphical depiction of the distribution of vulnerabil- ities in Mozilla’s source code. There, named rectangles represent directories, unnamed rectangles represent components. The size of a rectangle is proportional to the size of the component in bytes, and a rectangle is shaded the darker the more vulnerabilities it has had. The JavaScript engine, the object of Shin and Williams’s study above, occu- pies the lower left-hand corner of the picture, and it is apparent that JavaScript is indeed vulnerability-ridden. But what can also be seen from the picture is that there are subsys- tems that are almost as bad, such as the “layout” subsystem (above the JavaScript rectan- gle), which is concerned with cascading style sheets and the like. Overall, the distribution of vulnerabilities is very uneven: there are also large parts that have no vulnerabilities. The second result is shown in Figure 2 (left). In this figure, we counted how many MFSAs applied to a vulnerable component. The most striking feature is that there are more than twice as many components with one fix than there are components with two or more fixes. This is apparent from the tall spike at the left, and the subsequent series of bars that decrease so rapidly in height that there is be no discernible height difference for 6 and 14 MFSAs. To return to our QA perspective, components with many vulnerabilities h ncoal eotdtnec fmngr odmn zr eet” u actually but defects”, with “zero Combined demand [4]. to safe managers perfectly be of actually tendency might reported which anecdotally practices but the flag the sometimes insecure On will production. look therefore into may and it positives, make that false won’t has that analysis bug static a hand, is other fixed subsequently is which and catch with is of of problems practices. piece security best classes a security control known with that to or for verification attempt metrics Another rigorous code complexity specification. the source its allow check to that conforms systems ei- that code assistant methods, tools proof formal analysis as employ or code to defects, usually source is static problems as security ther code to answer typical The Art the of State 2. [23]. analysis prototypes static research general of Python, outside like exist languages, even these not of in do deveoped some tools are however, For Hat, languages. Red of in variety packages a The C++. language, single a (right). finding in 2 developed again, Figure so see difficult; vulnerabilties, be study two will the or measures again, QA one for and have candidates negligent, packages promising were vulnerable packages most Security of those is that total of there Hat shows developers Again, a Red inclusive. the of 1,646 2008, that out August in evidence that and reported no showed 2000 vulnerabilities It January [15]. between had (RHSAs) 35% Advisories Hat or Red 1,133 in packages, packages 3,241 the at looked also We Hat Red vulnerabilities. future 1.2. for predictor good a not waste is would vulnerabilities we we of vulnerabilities, if history of Therefore, past history established common. effort: an are with vulnerability components one at only only looked with components whereas rare are (right) Advisories Security Har 2. Figure 44 ttcaayi ol a rvd uebnfis fpoel mlyd vr u they bug every employed: properly if benefits, huge provide can tools analysis Static least at is Mozilla Mozilla. for than dire more even is situation the Hat, Red For Number of Components

h ubro opnnsvru h ubro FA nMzla(et,addsrbto fRed of distribution and (left), Mozilla in MFSAs of number the versus components of number The 0 50 100 200 300 135791113 Distribution ofMFSAs S. Number ofMFSAs Neuhaus and F . Massacci / T owar

Number of Packages ds

0 100 200 300 400 500 600 Security 93 17 812129 112 88 73 41 30 19 9 1 Climate Distribution ofRHSAs Number ofRHSAs F or ecasts S. Neuhaus and F. Massacci / Towards Security Climate Forecasts 45 meaning “zero complaints from the static analysis tool”, this may lead developers to “fix” perfectly good code for the sake of shutting up the static analysis tool. Another problem is that static analysis will generally only catch bugs (defined as an “implementation-level software problem”), but not flaws (defined as “a problem at a deeper level[, ...] present (or absent!) at the design level”) [10]. As static analysis makes bugs easier to find and harder to exploit, we believe that attackers will tend to go more after flaws. This trend is visible in the decline of the buffer overflow and the rise of cross-site scripting and request forgery attacks [6]; also, see below. Let us look now at proof assistant systems. They promise the freedom from bugs and flaws by first requiring rigorous specification of the desired system behaviour, usually in the form of some kind of formal logic, and then helping to construct a proof that shows that the implementation fulfils the specification. The practical problems with proof assis- tants are manifold. First, they have a steep learning curve that many software engineers find impossible to scale [8, Chapter 4]. Second, they do not work on legacy systems like Microsoft Windows or Linux, since they do not have a formal specification and cannot retroactively be equipped with one. Third, proof assistants are very difficult to deploy for very large systems. To the author’s knowledge, only one industrial-strength system has been developed in this way, in the Verisoft project4, and development speed has been anecdotally reported as about one page of fully verified C code per developer per week, even though the developers were MS and PhD students who were all highly trained in formal logic. Development and verification of a string function library was so difficult as to be accepted as an MS thesis [21]. Finally, even when the system is fully specified and the implementation proven to be correct, there may still be practical attacks. That last point is usually surprising to advocates of proof assistants, but is in reality true of every scientific modeling, including the one we are proposing. Practical attacks arise not because the security proof is wrong; rather, they arise because an attacker has found a clever way to subvert the assumptions of the security proof. For example, a key recovery attack on AES succeeded because the (explictly stated) assumption that array accesses would take constant time was found to be wrong [1]. Proof assistants can carry out their promise of defect-free systems only if all the relevant assumptions have been stated and correctly stated, and that is simply very difficult to do. Therefore, to paraphrase a point made by Brian Chess for static analysis tools [5], having a system certified as defect-free by a proof assistant does not mean that good design principles like defense in depth can be ignored. So one of the main benefits of employing proof assistants—guaranteed freedom from defects—is weakened. Now, let us look at complexity metrics. As we have already said above, the one major study of the correlation between code complexity metrics and vulnerabilities [19] found very weak correlations (ρ ≤ 0.21), even for a piece of code that has so many vulnerabilities as Mozilla’s JavaScript subsystem. The only other major study that we know of was done on five large subsystems of Microsoft Windows, to find if metrics correlated with defects, a more general class than vulnerabilities [13]. They found that for every subsystem there were metrics that gave weak to moderate correlations (generally, ρ < 0.5), but that there was no single metric that would perform well for all subsystems. Finally let us look at security best practices, which are usually coding or process guidelines. These best practices are based on anecdotal evidence that some development

4http://www.verisoft.de/ 46 S. Neuhaus and F. Massacci / Towards Security Climate Forecasts practice is better or worse than another, but which are not generally founded on actual data. For example, the SANS System Administrator Security Best Practices [18] contains such blanket statement as “know more about security of the systems you are adminis- tering” (know more than what?), “The system console should be physically protected” (which does not work on a laptop), or advice on user passwords that absolutely goes against what can be expected from users [17]. At best, best practices are derived from carefully questioning practicioners and find- ing out what works and what doesn’t, such as the Building Security In Maturity Model (BSIMM) [11]. This approach is much more in line with what we have in mind, but has two drawbacks. First, scientific truths cannot in general be elicited by polls and ques- tionnaires. Second, the answers that one gets when asking people running large-scale software security initiatives in many domains, such as finance, software and so on, will necessarily be very general and very broad: the advice in the BSIMM is more on the organisational side of things and probably not helpful when we need to decide which of the ten thousand Mozilla components to examine before the next release.

3. An Empirical Approach

As we have seen in the last section, static analysis, while useful, has a number of prob- lems, such as high rigidity, focus on bugs instead of flaws, and false positives. Proof as- sistants can not be usefully deployed in the majority of software development projects today due to its learning curve and system size restrictions. Metrics simply fail to cor- relate enough to make vulnerability prediction possible, and security best practices are unfounded or very general. To remedy this situation, we propose a two- program. In a first step, there would have to be empirical studies, in order to get reliable information on the kinds and distri- bution of real vulnerabilities in real software. This addresses the problem that security best practices usually are not founded on actual data. The second step (which can in fact be done concurrently with the first) would build predictive models that predict the location of vulnerabilities, vulnerability trends, or any other interesting vulnerability metric. The models would need to be evaluated against real software. This addresses the problems of rigidity, focus, and correlation. The question is, does this work? In other words, could high correlation actually result? What about the learning curve and the sizes of systems that can be analysed in this way? We give three examples of our own empirical work that have shown promise. We studied Mozilla in 2007 [16] and apart from the purely empirical results we were interested in predicting which files in Mozilla would need fixing because of unknown vul- nerabilities. We found that imports (#include preprocessor statements) and function calls were excellent predictors of vulnerability: show me what you import (or what func- tions you call), and I tell you how vulnerable you are. We used a machine learning tool to create a model that would allow us to predict how many vulnerabilities a component ought to have. We applied the model to the about 10,000 components that had no known vulnerabilities in January 2007, and took the top ten components. When we looked again in July 2007, about 50 of those 10,000 components needed to be fixed because of newly discovered vulnerabilities. Our list of ten contained five of those 50; see table 1. Random selection would on average have produced no hits. S. Neuhaus and F. Massacci / Towards Security Climate Forecasts 47

Table 1. Top ten Mozilla components that we predicted in January 2007 to have vulnerabilities. The compo- nents marked ‘’ actually had vulnerabilities discovered in them between January and July 2007.

No. Component No. Component

 1 js/src/jsxdrapi 6 cck/expat/xmlparse/xmlparse  2 js/src/jsscope  7 layout/xul/base/src/nsSliderFrame 3 modules/plugin/base/src/nsJSNPRuntime 8 dom/src/base/nsJSUtils 4 js/src/jsatom  9 layout/tables/nsTableRowFrame 5 js/src/jsdate  10 layout/base/nsFrameManager

Precision versus Recall

● SVM ● ● ● ● Decision Tree ●● ● ● ●● ● ● ●● ●● ● ●●●● ● ● ●● ● ● ●● ● ● ● ●● ●●●● ● ● ● Precision 0.4 0.5 0.6 0.7 0.8 0.9 0.4 0.5 0.6 0.7 0.8 0.9 Recall

Figure 3. Performance of two different machine learning methods when predicting vulnerable packages in Red Hat. Good predictors have high x and y values, so better methods have symbols more to the top right.

Similarly, we studied Red Hat Linux and vulnerabilities in its packages [15]. Using techniques such as Formal Concept Analysis we were able to identify packages with a high risk of containing unidentified vulnerabilities. We also found that we could use machine learning again to extend the results from Mozilla: show me on which packages you depend and I tell you how vulnerable you are. Results for two different machine learning methods are shown in Figure 3. In this picture, a better prediction method will have symbols that are located higher and more to the right than an inferior method. What can be seen from this picture is that different methods have different prediction power. We will return to this point later. Finally, just as with Mozilla, we used the regression model to prepare a list of packages that we predicted would develop vulnerabilities. Out of the top 25 packages, nine actually were fixed within six months of making the prediction; see Table 2. Again, random selection would have produced no hits. Both systems are very large, yet building models was very fast: for Mozilla, bulding a complete regression model took less than two minutes on a MacBook. For Red Hat, building a model was even faster. Both approaches give information that is easy to un- derstand (“Package/Source File x likely contains unknown vulnerabilities”) and that do not require one to scale a learning curve. For a completely different example, we looked at the database of Common Vulner- abilities and Exposures (CVE) [12] with the goal of automatically finding vulnerability trends [14]. We used topic models [9,3,22,2] on the entire corpus of the CVE entries and 48 S. Neuhaus and F. Massacci / Towards Security Climate Forecasts

Table 2. Top 25 predicted Red Hat packages. Packages marked ‘’ actually had vulnerabilities.

No. Package No. Package

 #1 mod_php  #13 dovecot #2 php-dbg #14 kde2-compat #3 php-dbg-server #15 gq #4 perl-DBD-Pg  #16 vorbis-tools #5 kudzu #17 #6 irda-utils #18 taskjuggler #7 hpoj  #19 ddd #8 libbdevid-python #20 tora #9 mrtg  #21 libpurple  #10 evolution28-evolution-data-server #22 libwvstreams #11 lilo  #23 pidgin  #12 ckermit #24 linuxwacom  #25 policycoreutils-newrole found a number of prevalent topics. Figure 4 shows our main results. The x axes show the year from 2000 to 2008, inclusive. The y axis shows the relative frequency of the topic: if the value is 0.1, this means that 10% of all CVE entries in that year were about this topic. All topics are shown using the same y axis to facilitate comparison. What we can see from that figure is that SQL injection, cross-site scripting and cross-site request forgery are all on the rise since 2000, whereas insecure defaults and buffer overflows are on the decline since 2000. An earlier publication [6] comes to a different conclusion with respect to cross-site request forgery, but from analysing this data, we agree with Jeremiah Grossman, who has called cross-site request forgery a “sleeping giant” [7]. Interestingly, there are also some topics to which we could not assign suitable names, so these might be examples of as yet unknown trends. Again, the data set is very large: the CVE contains data as far back as 1988. And again, the information we extract is easy to understand (“Vulnerability type x is rising/falling”).

4. Research Program and Conclusion

Building on the successes of the above programs in order to produce climate forecasts for security would involve two distinct steps: Data Sets. There is no scarcity of vulnerability information. Still, it is often very diffi- cult to compare research that was done using different data sets, even when such research is topically close. This is because different data sets such as Mozilla and Red Hat vulner- ability data are maintained for different purposes and are often of different quality levels. Data sets such as the CVE cannot even be said to be internally consistent with respect to quality, since many people can make entries. This leads to an enormous spread in entry quality, ranging from almost useless entries to complete descriptions of a vulnerability’s cause and its remedy. However, comparability of data sets is highly desirable, since it is otherwise impossible to consistently apply and rank different prediction methods. Therefore, data sets will have to be analysed with the goal of finding good indicators of their quality, and of finding ways to make them comparable. Perhaps standards can be S. Neuhaus and F. Massacci / Towards Security Climate Forecasts 49

Topics

Arbitrary Code Buffer Overflow Cross−Site Request Forgery Cross−Site Scripting

0.10 0.10 0.10 0.10

0.00 0.00 0.00 0.00 Format String Insecure Defaults Link Resolution PHP

0.10 0.10 0.10 0.10

0.00 0.00 0.00 0.00 Privilege Escalation Resource Management SQL Injection 2000 2004 2008 Year

0.10 0.10 0.10

0.00 0.00 0.00 2000 2004 2008 2000 2004 2008 2000 2004 2008 Year Year Year

Figure 4. Relative frequency of some topics identified by topic modelling the CVE corpus. devised that would ensure comparability of results on different data sets that were done using the same methods. It would also be possible to develop benchmark data sets on which researchers and practicioners can try out their methods. Prediction Models. The two studies on Mozilla and Red Hat have shown that it is possi- ble to construct high-precision forecasts on vulnerability locations. However, the predic- tor (imports, function calls or package dependencies) was not arrived at by a systematic process. Therefore, other models would need to be developed that predict vulnerabilities in different dimensions, such as: • granularity (method, file, package, ...); • predictors (dependencies, developers, complexity metrics [19], day-of-work of committing the source code [20], ...); • degree of interaction: does reacting to the predictions coming out of the model improve security, and how do predictors change when such interaction is done? • robust trend analysis: can we predict whether classes of vulnerabilities will be- come more or less prevalent, and can we find as yet unknown trends in data? These models need to be applied to wide ranges of data in order to see where they work and where they don’t. At the end of this process, we would ideally understand how vulnerabilities appear in software and therefore arrive at a theory that would allow us to make predictions just like climate forecasts. To conclude, building on previous successful work in predicting vulnerabilities in large software systems, we proposed a system which could produce high-precision climate forecast-like security predictions. Such a system would deal with the following undesir- able properties of conventional approaches: • It would be based on real data, not on anecdotal evidence. • It would be as project-specific as need be, not constrained by built-in rules of what is unsafe programming and what is not. 50 S. Neuhaus and F. Massacci / Towards Security Climate Forecasts

• Its results would be actionable. • It could handle projects of any size. • It could be used on legacy projects. • It would have a comparatively flat learning curve.

References

[1] Dan Bernstein. Cache-timing attacks on AES. http://cr.yp.to/papers.html\ #cachetiming, 2004. [2] David Blei and Jon McAuliffe. Supervised topic models. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 121–128, Cambridge, MA, 2008. MIT Press. [3] David M. Blei and John D. Lafferty. Dynamic topic models. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 113–120, New York, NY, USA, 2006. ACM. [4] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A static analyzer for finding dynamic programming errors. Software Practice and Experience, 30(7):775–802, 2000. [5] Brian Chess and Gary McGraw. Static analysis for security. IEEE Security and Privacy, 2(6):76–79, 2004. [6] Steven M. Christey and Robert A. Martin. Vulnerability type distributions in CVE. http://cwe. mitre.org/documents/vuln-trends/index.html, May 2007. [7] Jeremiah Grossman. CSRF, the sleeping giant. http://jeremiahgrossman.blogspot.com/ 2006/09/csrf-sleeping-giant.html, September 2006. [8] Peter Gutmann. Cryptographic Security Architecture. Springer Verlag, October 2003. [9] David Hall, Daniel Jurafsky, and Christopher Manning. Studying the history of ideas using topic mod- els. In Proceedings from the EMNLP 2008: Conference on Empirical Me thods in Natural Language Processing, pages 363–371, October 2008. [10] Gary McGraw. Software Security: Building Security In. Addison-Wesley, February 2006. [11] Gary McGraw, Brian Chess, and Sammy Migues. Building Security In Maturity Model v 1.5 (Europe Edition). Fortify, Inc., and Cigital, Inc., 2009. [12] MITRE. Common vulnerabilities and exposures. http://cve.mitre.org/, September 2009. [13] Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. Mining metrics to predict component failures. In 27th International Conference on Software Engineering, May 2005. [14] Stephan Neuhaus. CVE trend forecasting with topic models. Technical report, Università degli Studi di Trento, September 2009. to appear. [15] Stephan Neuhaus and Thomas Zimmermann. The beauty and the beast: Vulnerabilities in Red Hat’s packages. In Proceedings of the 2009 Usenix Annual Technical Conference, Berkeley, CA, USA, July 2009. USENIX Association, USENIX Association. [16] Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. Predicting vulnerable software components. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS), New York, New York, USA, October 2007. ACM Press. [17] Shannon Riley. Password security: What users know and what they actually do. Usability News, 8(1), 2006. [18] Harish Setty. System administrator security best practices. White paper, SANS Institute, 2001. [19] Yonghee Shin and Laurie Williams. An empirical model to predict security vulnerabilities using code complexity metrics. In Proc. Second Int’l. Symposium on Empirical Software Engineering and Mea- surement (ESEM 2008), pages 315–317, 2008. [20] Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. Don’t program on fridays! how to locate fix-inducing changes. In Proceedings of the 7th Workshop Software Reengineering, May 2005. [21] Artem Starostin. Formal verification of a c-library for strings. Master’s thesis, Saarland University, 2006. [22] Xuerui Wang and Andrew McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD ’06: Proceedings of the 12th ACM SIGKDD international c onference on Knowledge discovery and data mining, pages 424–433, New York, NY, USA, 2006. ACM. [23] Andrzej Wasylkowski. Statyczne sprawdzanie poprawnosci´ typowej programów napisanych w pythonie. Master’s thesis, Wroclaw University, July 2005.