Type-Check Python Programs with a Union Type System
Total Page:16
File Type:pdf, Size:1020Kb
日本ソフトウェア科学会第 37 回大会 (2020 年度) 講演論文集 Type-check Python Programs with a Union Type System Senxi Li Tetsuro Yamazaki Shigeru Chiba We propose that a simple type system and type inference can type most of the Python programs with an empirical study on real-world code. Static typing has been paid great attention in the Python language with its growing popularity. At the same time, what type system should be adopted by a static checker is quite a challenging task because of Python's dynamic nature. To exmine whether a simple type system and type inference can type most of the Python programs, we collected 806 Python repositories on GitHub and present a preliminary evaluation over the results. Our results reveal that most of the real-world Python programs can be typed by an existing, gradual union type system. We discovered that 82.4% of the ex- pressions can be well typed under certain assumptions and conditions. Besides, expressions of a union type are around 3%, which indicates that most of Python programs use simple, literal types. Such preliminary discovery progressively reveals that our proposal is fairly feasible. while the effectiveness of the type system has not 1 Introduction been empirically studied. Other works such as Py- Static typing has been paid great attention in type implement a strong type inferencer instead of the Python language with its growing popularity. enforcing type annotations. At the same time, what type system should be As a dynamically typed language, Python excels adopted by a static checker is quite a challenging at rapid prototyping and its avail of metaprogram- task because of Python's dynamic nature. There ming and reflection. Pythonists can also customize have been sufficient existing works on static typing their own metaclasses and create classes at run- for Python. Mypy [5], one of the most famed static time. Beside, programmers can introspect or fur- type checkers in Python, use type annotations in- ther create, modify Python objects dynamically in troduced from Python 3.0 and statically typecheck any sense. However, these powerful features would programs by giving semantics to the annotations. be nightmare for static checkers. Modifying a class Mypy has a powerful type system with features attribute with eval(`Rect.setColor("red")') at such as bidirectional type inference and generics, runtime will not provide any type information to a static checker. Therefore, the checker will complain ∗Type-check Python Programs with a Union Type and raise an attribute error if the program inquires System this attibute by accessing Rect.color, or will just This is an unrefereed paper. Copyrights belong to the Author(s). throw an unknown type if the type system is grad- Senxi Li, Tetsuro Yamazaki, Shigeru Chiba, 東京大 ual. The operations can hardly provide any type 学大学院情報理工学系研究科, Graduate School of information before runtime and thus a checker will Information Science and Technology, University of Tokyo. fail to typecheck that piece of code. Consequently, it is impossible for any exising type system to type- To answer the question, we collected Python reposi- check all Python programs at a theoretical level. tories from GitHub and analyzed their source code While such discussion can easily drive us to an with a static checker driven by a designed, sim- end that no existing type system can fully type- ple gradual union type system. We believe that check all Python programs, the following assump- union type can handle abstraction and polymor- tions play a substantial role in our motivation: phism within bounds, which can meet the damand • Python programs seems easy. Being a popu- of real-world Python code. In addition, we ap- lar programming language used by people from ply a type inference approach that infers function web developers to data scientists, most of the and method types by type information from run- Python programs shall be easy to read and time. While runtime can only tell properties of ob- share. As another aspect, machine learning jects under limited conditions, we claim that vari- tasks take a significant role in Python appli- ables, functions and objects can be correctly in- cations in recent years. In spite of their indeed ferred and typed if runtime can consistently provide complicated algorithms, it can be smoothly their types. imagined that simple data types can handle Our preliminary result shows that our designed, their code and many of the expressions are just simple type system and type inference algorithms literals like numbers and lists. Initializing a can typecheck most of the Python programs. The tensor with a value of List[float], function investigation indicates that 82.4% of the expres- of mathemetical multiplication over matrix and sions in the source code are well typed under certain vector being typed as List[List[float]] ! conditions. Besides, only 3.1% of the expressions List[float] ! List[float] and so on may are of union type. This paper also involves further serve as evidence for our assumption. evaluation to better analyze if such a type system • Programmers are competent and well-educated. can satisfy Python's need. This hypothesis is heavily rooted in the com- This paper proceeds in the following manner. petent programmer hypothesis [1], which states Section 2 discuesses our motivation and briefly re- that most software faults introduced by ex- views benefits of static typing over dynamic typing. perienced programmers are due to small syn- Section 3 proposes a pseudo language and by de- tactic errors. Furthermore, the majority of scribing its core calculus we further discuss how our Python programmers are believed to have a system performs type checking. Section 4 shows background of some statically typed languages. our empirical study of real-world Python programs In other words, well-educated programmers and its results will help answer the research ques- write programs that may obey a type system tions that we address. Section 5 mentions related they are familiar with implicitly, even Python works and a conclusion with directions for future syntatically never impedes them writing \un- work ends this paper. typable" code. Accordingly, a simple type sys- tem for objects can type most of such Python 2 Motivation programs. There have been many complex type systems and These concerns and assumptions lead us to our type inference algorithms proposed for Python in research question: Is there an exising type system order to statically typecheck programs. However, able to type most of real-world Python programs? they are not successful because of the dynamic na- ture of Python. Many of the existing works ap- ning the user code. ply strong type systems such as callable type to At this moment it seems that we can type Python type functions/methods and generics to abstract programs if a programmer carefully inject type an- objects more precisely. Type inference algorithms notations with the usage of a well implemented infer types with hints avaiable in the source code. type checker. However, the situation will not let In the following of this section, we will illustrate us stay optimistic as long as we are engaging with how some of existing type systems can and cannot Python. As a powerful dynamic language, Python type Python programs with examples. has a strong functionality on introspection and re- One of the primary benefits of static typing over flection, enabling programmers to introspect or fur- dynamic typing is that it helps localize errors. For ther modify objects at runtime. Consider the fol- example, suppose a programmer misunderstands lowing piece of code: the use of a library function, such as a getSum func- 1 class Rect: 2 pass tion in a module arith which expects one argument 3 − of a list of integers but instead the programmer 4 def touchColor(arg: Rect) > str: 5 exec("arg.color = `red'") passes in a list of strings. 6 return arg.color 1 from arith import getSum 2 Though this program can be seriously complained 3 lst = [ , , ] "sum" "the" "list" about the bad attribute usage, the given code is 4 getSum(lst) absolutely valid in Python. In the body of func- In a purely dynamically typed language, calling the tion touchColor, the behavior of the argument arg function in such a way will cause a runtime error. is modified by the builtin function exec such that Perhaps such runtime error will occur deep inside a string is assigned to an attribute color of arg. the body of the function in arith, which contains Calling this function by passing in an instance of operations that can be performed on numbers but class Rect at runtime will return "red" just as de- not strings. On the other hand, if the programmer signed. Nevertheless, this piece of code cannot be make use of a static type checker, the error can be well typed with a normal type system, or any ex- detected and caught before the call to getSum. The isting ones even its argument and return are finely following shows that the parameter and return of annotated. A static checker, such as Mypy [5], will function getSum is annotated using type annota- complain that \Rect" has no attribute \color". The tions. confliction stems from the fact that the reflection 1 from typing import List at line 5 does not render any type information to 2 3 def getSum(argList: List[int]) −> int: the checker statically. 4 ... 3 Core Calculus The parameter of the function is well annotated with a generic list type so that a static checker Observations and presumptions bring us to the can understand what type of elements a list should following idea: a simple but well designed type sys- hold. By documenting and enforcing type con- tem can type most of the Python programs.