Improving Software Security with a C Pointer Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Improving Software Security with a C Pointer Analysis Improving Software Security with a C Pointer Analysis Dzintars Avots Michael Dalton V. Benjamin Livshits Monica S. Lam Computer Science Department Stanford University Stanford, CA 94305 {dzin, mwdalton, livshits, lam}@cs.stanford.edu ABSTRACT 1. INTRODUCTION This paper presents a context-sensitive, inclusion-based, Software vulnerabilities in C programs have led to tremen- field-sensitive points-to analysis for C and uses the analysis dous losses of productivity and have caused damages esti- to detect and prevent security vulnerabilities in programs. mated in billions of dollars. These vulnerabilities often stem In addition to a conservative analysis, we propose an opti- from the lack of type safety in the C language, with buffer mistic analysis that assumes a more restricted C semantics overflows and format string exploits being two well-known that reflects common C usage to increase the precision of security vulnerabilities. the analysis. Tools exist which audit programs for vulnerabilities; how- This paper uses the proposed pointer alias analyses to ever, the unsafe nature of C makes it difficult to create pre- infer the types of variables in C programs and shows that cise software analysis tools. In particular, pointer alias an- most C variables are used in a manner consistent with their alysis is an important component in any program auditing declared types. We show that pointer analysis can be used tool. The semantics of C can cause a sound pointer analysis to reduce the overhead of a dynamic string-buffer overflow to generate many false points-to relations. Existing software detector by 30% to 100% among applications with signifi- checkers often minimize false positive warnings by using un- cant overheads. Finally, using pointer analysis, we statically sound pointer analysis instead. The Metal compiler and the found six format string vulnerabilities in two of the 12 pro- Intrinsa system, known to find numerous serious vulnerabil- grams we analyzed. ities in widely used operating systems, assume that pointers are unaliased until proven otherwise with a local analysis [4, 9]. The same assumption was used in a buffer overflow Categories and Subject Descriptors detection tool by Wagner et al. [18]. However, by making D.2.5 [Software Engineering]: Testing and Debugging; such unsound assumptions, these tools may not find all pos- D.3.4 [Programming Languages]: Processors—Compil- sible vulnerabilities and cannot be used to guarantee that a ers; D.2.3 [Operating Systems]: Security and Protection program is safe. This paper explores the use of more advanced pointer alias analysis to create practical tools for improving software se- General Terms curity. We present a context-sensitive, inclusion-based, and Algorithms, Programming languages, Software security, field-sensitive points-to analysis for C. We apply the analysis Vulnerabilities, Software errors. to (1) infer the type of variables in C based on their usage, (2) reduce the overhead of a dynamic bounds checker, and (3) find format string vulnerabilities. Keywords Program analysis, context-sensitive, pointer analysis, type 1.1 Pointer Alias Analysis safety, error detection, software security, buffer overflows, Our analysis is based on a points-to algorithm devel- dynamic analysis, security flaws, format string violations. oped originally for Java [20] using a binary-decision-diagram (BDD) representation. Our analysis is context-sensitive, This material is based upon work supported by the National meaning that calling contexts along different acyclic call Science Foundation under Grant No. 0326227 and an NSF paths have distinct points-to relations. It is inclusion-based, Graduate Student Fellowship. meaning that two pointers may point to overlapping but dif- ferent sets of objects. It is also field-sensitive, meaning that separate fields in an object have distinct points-to relations. The BDD representation efficiently handles the exponential Permission to make digital or hard copies of all or part of this work for number of contexts by exploiting their similarities. It also personal or classroom use is granted without fee provided that copies are accelerates inclusion-based analysis through efficient transi- not made or distributed for profit or commercial advantage and that copies tive closure computations [3, 20, 24]. Field-sensitive analy- bear this notice and the full citation on the first page. To copy otherwise, to ses are known to be much more precise than field-insensitive republish, to post on servers or to redistribute to lists, requires prior specific approaches. Our analysis, however, is flow-insensitive: the permission and/or a fee. ICSE’05, May 15–21, 2005, St. Louis, Missouri, USA. order of execution of program statements is not modeled. Copyright 2005 ACM 1-58113-963-2/05/0002 ...$5.00. Because of the semantics of the C language, pointer anal- ysis for C is much more complex than that for Java. While threat in software systems today, consistently dominating Java enforces type safety by limiting user access to memory, CERT advisories [5]. Despite years of effort on this topic, C affords the user complete, direct control. A C programmer there is no practical static checker that can guarantee to can take the address of any field within an object, treat any find all buffer overflows automatically. The C semantics portion of memory as a contiguous region of bytes, perform make it difficult to create a dynamic bounds checker that can arbitrary address arithmetic, and perform arbitrary object ensure no buffer overflows occur without breaking existing casts, whether between variant types of a union or in com- programs. plete violation of declared type. Although many of these CRED (C Range Error Detector) is a recently developed actions may violate the letter or spirit of the C99 standard, dynamic bounds checker that has been shown to work prop- in practice most C compilers allow them. None of the pro- erly on over one million lines of C code[15]. Since buffer over- posed C pointer alias analyses are strictly sound because flow exploits tend to be transported as user input strings, such an analysis would conclude that most locations in me- CRED can be run with less overhead by checking only string mory may point to any location in memory. In particular, buffer overflows. Such an approach is shown to be effective they all assume that the relative placement of objects in me- against a testbed of 20 different buffer overflow attacks [21]. mory is undefined. Most programs satisfy this assumption, CRED is based on Jones and Kelly’s concept of referent which is part of the C99 standard, for the sake of portability. objects [11]. A pointer’s referent object is the object that it We propose in this paper a conservative pointer analysis, re- is intended to reference. A pointer arithmetic operation pro- ferred to as cons, that is sound for all programs satisfying duces a new address, but the result has the same referent ob- the above assumption. ject as the original pointer. When a pointer is dereferenced, Our conservative pointer analysis is likely to be imprecise it must point within the bounds of its referent object. CRED because programs frequently conform to the C99 standard. uses an object table to dynamically track the base and extent By imposing additional assumptions on the C programs that of each object. It instruments pointer arithmetic, derefer- reflect common usage, we can improve upon the pointer an- ences, and string-manipulation functions such as memcpy to alysis results. We thus also propose an optimistic analysis, associate referent objects with pointer addresses and check referred to as pcp (practical C pointers), that is not only that out-of-bounds pointers are not dereferenced. sound for programs adhering to the C99 standard, but also Normally, CRED enters all referent objects into the ob- for programs that may violate its type model. For example, ject table. To reduce the overhead of the strings-only CRED we use structural equivalence to determine type compatibil- system, we use our points-to results to inform CRED about ity, whereas C99 resorts to name equivalence for aggregates objects which do not contain string-typed values. This op- such as structures and unions. timization reduces the overhead of some of the benchmarks Like the analysis for Java, we express our points-to anal- by 30% to 100%. ysis in Datalog, a logic programming language used in de- ductive databases [17]. We use the bddbddb tool to translate 1.4 Format String Vulnerabilities the Datalog rules into efficient BDD operations [20]. The Another common security threat is the format string vul- resulting points-to relations can be used in subsequent Dat- nerability exploit. A program may be exploited if it passes a alog programs, making it easy to develop further algorithms user-supplied string as the format string argument to a sys- based on the results. tem function of the printf family. The user-supplied string may contain format specifiers that can cause the program to 1.2 Type Inference write to unintended memory locations. The static detection The declared types of C variables are hints indcating how of user-supplied format strings is an example of a taint an- the variables are likely to be used. Type inference deter- alysis, and requires the results of a pointer analysis. We use mines the actual types of objects by analyzing the usage of our pointer analysis to identify strings containing data that those objects. This is of great interest, since it can show may be supplied by the user, and to report instances of for- that a program is type safe and portable, or identify un- mat string arguments which point to these tainted strings. safe program operations that should be audited statically or We find actual format string vulnerabilities in two of our checked dynamically. benchmarks.
Recommended publications
  • Yutaka Oiwa. "Implementation of a Fail-Safe ANSI C Compiler"
    Implementation of a Fail-Safe ANSI C Compiler 安全な ANSI C コンパイラの実装手法 Doctoral Dissertation 博士論文 Yutaka Oiwa 大岩 寛 Submitted to Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo on December 16, 2004 in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Programs written in the C language often suffer from nasty errors due to dangling pointers and buffer overflow. Such errors in Internet server programs are often ex- ploited by malicious attackers to “crack” an entire system, and this has become a problem affecting society as a whole. The root of these errors is usually corruption of on-memory data structures caused by out-of-bound array accesses. The C lan- guage does not provide any protection against such out-of-bound access, although recent languages such as Java, C#, Lisp and ML provide such protection. Never- theless, the C language itself should not be blamed for this shortcoming—it was designed to provide a replacement for assembly languages (i.e., to provide flexible direct memory access through a light-weight high-level language). In other words, lack of array boundary protection is “by design.” In addition, the C language was designed more than thirty years ago when there was not enough computer power to perform a memory boundary check for every memory access. The real prob- lem is the use of the C language for current casual programming, which does not usually require such direct memory accesses. We cannot realistically discard the C language right away, though, because there are many legacy programs written in the C language and many legacy programmers accustomed to the C language and its programming style.
    [Show full text]
  • WWW-Based Collaboration Environments with Distributed Tool Services
    WWWbased Collab oration Environments with Distributed To ol Services Gail E Kaiser Stephen E Dossick Wenyu Jiang Jack Jingshuang Yang SonnyXiYe Columbia University Department of Computer Science Amsterdam Avenue Mail Co de New York NY UNITED STATES fax kaisercscolumbiaedu CUCS February Abstract Wehave develop ed an architecture and realization of a framework for hyp ermedia collab oration environments that supp ort purp oseful work by orchestrated teams The hyp ermedia represents all plausible multimedia artifacts concerned with the collab orative tasks at hand that can b e placed or generated online from applicationsp ecic materials eg source co de chip layouts blueprints to formal do cumentation to digital library resources to informal email and chat transcripts The environment capabilities include b oth internal hyp ertext and external link server links among these artifacts which can b e added incrementally as useful connections are discovered pro jectsp ecic hyp ermedia search and browsing automated construction of artifacts and hyp erlinks according to the semantics of the group and individual tasks and the overall pro cess workow application of to ols to the artifacts and collab orativework for geographically disp ersed teams We present a general architecture for what wecallhyp ermedia subwebs and imp osition of groupspace services op erating on shared subwebs based on World Wide Web technology which could b e applied over the Internet andor within an organizational intranet We describ e our realization in OzWeb which
    [Show full text]
  • The Pros of Cons: Programming with Pairs and Lists
    The Pros of cons: Programming with Pairs and Lists CS251 Programming Languages Spring 2016, Lyn Turbak Department of Computer Science Wellesley College Racket Values • booleans: #t, #f • numbers: – integers: 42, 0, -273 – raonals: 2/3, -251/17 – floang point (including scien3fic notaon): 98.6, -6.125, 3.141592653589793, 6.023e23 – complex: 3+2i, 17-23i, 4.5-1.4142i Note: some are exact, the rest are inexact. See docs. • strings: "cat", "CS251", "αβγ", "To be\nor not\nto be" • characters: #\a, #\A, #\5, #\space, #\tab, #\newline • anonymous func3ons: (lambda (a b) (+ a (* b c))) What about compound data? 5-2 cons Glues Two Values into a Pair A new kind of value: • pairs (a.k.a. cons cells): (cons v1 v2) e.g., In Racket, - (cons 17 42) type Command-\ to get λ char - (cons 3.14159 #t) - (cons "CS251" (λ (x) (* 2 x)) - (cons (cons 3 4.5) (cons #f #\a)) Can glue any number of values into a cons tree! 5-3 Box-and-pointer diagrams for cons trees (cons v1 v2) v1 v2 Conven3on: put “small” values (numbers, booleans, characters) inside a box, and draw a pointers to “large” values (func3ons, strings, pairs) outside a box. (cons (cons 17 (cons "cat" #\a)) (cons #t (λ (x) (* 2 x)))) 17 #t #\a (λ (x) (* 2 x)) "cat" 5-4 Evaluaon Rules for cons Big step seman3cs: e1 ↓ v1 e2 ↓ v2 (cons) (cons e1 e2) ↓ (cons v1 v2) Small-step seman3cs: (cons e1 e2) ⇒* (cons v1 e2); first evaluate e1 to v1 step-by-step ⇒* (cons v1 v2); then evaluate e2 to v2 step-by-step 5-5 cons evaluaon example (cons (cons (+ 1 2) (< 3 4)) (cons (> 5 6) (* 7 8))) ⇒ (cons (cons 3 (< 3 4))
    [Show full text]
  • The Use of UML for Software Requirements Expression and Management
    The Use of UML for Software Requirements Expression and Management Alex Murray Ken Clark Jet Propulsion Laboratory Jet Propulsion Laboratory California Institute of Technology California Institute of Technology Pasadena, CA 91109 Pasadena, CA 91109 818-354-0111 818-393-6258 [email protected] [email protected] Abstract— It is common practice to write English-language 1. INTRODUCTION ”shall” statements to embody detailed software requirements in aerospace software applications. This paper explores the This work is being performed as part of the engineering of use of the UML language as a replacement for the English the flight software for the Laser Ranging Interferometer (LRI) language for this purpose. Among the advantages offered by the of the Gravity Recovery and Climate Experiment (GRACE) Unified Modeling Language (UML) is a high degree of clarity Follow-On (F-O) mission. However, rather than use the real and precision in the expression of domain concepts as well as project’s model to illustrate our approach in this paper, we architecture and design. Can this quality of UML be exploited have developed a separate, ”toy” model for use here, because for the definition of software requirements? this makes it easier to avoid getting bogged down in project details, and instead to focus on the technical approach to While expressing logical behavior, interface characteristics, requirements expression and management that is the subject timeliness constraints, and other constraints on software using UML is commonly done and relatively straight-forward, achiev- of this paper. ing the additional aspects of the expression and management of software requirements that stakeholders expect, especially There is one exception to this choice: we will describe our traceability, is far less so.
    [Show full text]
  • Java for Scientific Computing, Pros and Cons
    Journal of Universal Computer Science, vol. 4, no. 1 (1998), 11-15 submitted: 25/9/97, accepted: 1/11/97, appeared: 28/1/98 Springer Pub. Co. Java for Scienti c Computing, Pros and Cons Jurgen Wol v. Gudenb erg Institut fur Informatik, Universitat Wurzburg wol @informatik.uni-wuerzburg.de Abstract: In this article we brie y discuss the advantages and disadvantages of the language Java for scienti c computing. We concentrate on Java's typ e system, investi- gate its supp ort for hierarchical and generic programming and then discuss the features of its oating-p oint arithmetic. Having found the weak p oints of the language we pro- p ose workarounds using Java itself as long as p ossible. 1 Typ e System Java distinguishes b etween primitive and reference typ es. Whereas this distinc- tion seems to b e very natural and helpful { so the primitives which comprise all standard numerical data typ es have the usual value semantics and expression concept, and the reference semantics of the others allows to avoid p ointers at all{ it also causes some problems. For the reference typ es, i.e. arrays, classes and interfaces no op erators are available or may b e de ned and expressions can only b e built by metho d calls. Several variables may simultaneously denote the same ob ject. This is certainly strange in a numerical setting, but not to avoid, since classes have to b e used to intro duce higher data typ es. On the other hand, the simple hierarchy of classes with the ro ot Object clearly b elongs to the advantages of the language.
    [Show full text]
  • A Metaobject Protocol for Fault-Tolerant CORBA Applications
    A Metaobject Protocol for Fault-Tolerant CORBA Applications Marc-Olivier Killijian*, Jean-Charles Fabre*, Juan-Carlos Ruiz-Garcia*, Shigeru Chiba** *LAAS-CNRS, 7 Avenue du Colonel Roche **Institute of Information Science and 31077 Toulouse cedex, France Electronics, University of Tsukuba, Tennodai, Tsukuba, Ibaraki 305-8573, Japan Abstract The corner stone of a fault-tolerant reflective The use of metalevel architectures for the architecture is the MOP. We thus propose a special implementation of fault-tolerant systems is today very purpose MOP to address the problems of general-purpose appealing. Nevertheless, all such fault-tolerant systems ones. We show that compile-time reflection is a good have used a general-purpose metaobject protocol (MOP) approach for developing a specialized runtime MOP. or are based on restricted reflective features of some The definition and the implementation of an object-oriented language. According to our past appropriate runtime metaobject protocol for implementing experience, we define in this paper a suitable metaobject fault tolerance into CORBA applications is the main protocol, called FT-MOP for building fault-tolerant contribution of the work reported in this paper. This systems. We explain how to realize a specialized runtime MOP, called FT-MOP (Fault Tolerance - MetaObject MOP using compile-time reflection. This MOP is CORBA Protocol), is sufficiently general to be used for other aims compliant: it enables the execution and the state evolution (mobility, adaptability, migration, security). FT-MOP of CORBA objects to be controlled and enables the fault provides a mean to attach dynamically fault tolerance tolerance metalevel to be developed as CORBA software. strategies to CORBA objects as CORBA metaobjects, enabling thus these strategies to be implemented as 1 .
    [Show full text]
  • LISTSERV 16.0 Site Manager's Operations Manual
    Information in this document is subject to change without notice. Companies, names, and data used in examples herein are fictitious unless otherwise noted. L-Soft does not endorse or approve the use of any of the product names or trademarks appearing in this document. Permission is granted to copy this document, at no charge and in its entirety, if the copies are not used for commercial advantage, the source is cited, and the present copyright notice is included in all copies. Recipients of such copies are equally bound to abide by the present conditions. Prior written permission is required for any commercial use of this document, in whole or in part, and for any partial reproduction of the contents of this document exceeding 50 lines of up to 80 characters, or equivalent. The title page, table of contents, and index, if any, are not considered to be part of the document for the purposes of this copyright notice, and can be freely removed if present. Copyright 2009 L-Soft international, Inc. All Rights Reserved Worldwide. LISTSERV is a registered trademark licensed to L-Soft international, Inc. ListPlex, CataList, and EASE are service marks of L-Soft international, Inc. LSMTP is a registered trademark of L-Soft international, Inc. The Open Group, Motif, OSF/1 UNIX and the “X” device are registered trademarks of The Open Group in the United State and other countries. Digital, Alpha AXP, AXP, Digital UNIX, OpenVMS, HP, and HP-UX are trademarks of Hewlett- Packard Company in the United States and other countries. Microsoft, Windows, Windows 2000, Windows XP, and Windows NT are registered trademarks of Microsoft Corporation in the United States and other countries.
    [Show full text]
  • Cognitive Abilities, Interfaces and Tasks: Effects on Prospective Information Handling in Email
    Cognitive Abilities, Interfaces and Tasks: Effects on Prospective Information Handling in Email by Jacek Stanisław Gwizdka A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Mechanical and Industrial Engineering University of Toronto © Copyright by Jacek Gwizdka 2004 “Cognitive Abilities, Interfaces and Tasks: Effects on Prospective Information Handling in Email” Degree of Doctor of Philosophy, 2004 Jacek Stanisław Gwizdka Department of Mechanical and Industrial Engineering, University of Toronto Abstract This dissertation is focused on new email user interfaces that may improve awareness and handling of task-laden messages in the inbox. The practical motivation for this research was to help email users process messages more effectively. A field study was conducted to examine email practices related to handling messages that refer to pending tasks. Individual differences in message handling style were observed, with one group of users transferring such messages out of their email pro- grams to other applications (e.g., calendars), while the other group kept prospective messages in email and used the inbox as a reminder of future events. Two novel graphical user interfaces were designed to facilitate monitoring and retrieval of prospective information from email messages. The TaskView interface displayed task- laden messages on a two-dimensional grid (with time on the horizontal axis). The WebT- askMail interface retained the two-dimensional grid, but extended the representation of pending tasks (added distinction between events and to-do's with deadlines), a vertical date reading line, and more space for email message headers. ii Two user studies were conducted to test hypothesized benefits of the new visual repre- sentations and to examine the effects of different levels of selected cognitive abilities on task-laden message handling performance.
    [Show full text]
  • Unix Quickref.Dvi
    Summary of UNIX commands Table of Contents df [dirname] display free disk space. If dirname is omitted, 1. Directory and file commands 1994,1995,1996 Budi Rahardjo ([email protected]) display all available disks. The output maybe This is a summary of UNIX commands available 2. Print-related commands in blocks or in Kbytes. Use df -k in Solaris. on most UNIX systems. Depending on the config- uration, some of the commands may be unavailable 3. Miscellaneous commands du [dirname] on your site. These commands may be a commer- display disk usage. cial program, freeware or public domain program that 4. Process management must be installed separately, or probably just not in less filename your search path. Check your local documentation or 5. File archive and compression display filename one screenful. A pager similar manual pages for more details (e.g. man program- to (better than) more. 6. Text editors name). This reference card, obviously, cannot de- ls [dirname] scribe all UNIX commands in details, but instead I 7. Mail programs picked commands that are useful and interesting from list the content of directory dirname. Options: a user's point of view. 8. Usnet news -a display hidden files, -l display in long format 9. File transfer and remote access mkdir dirname Disclaimer make directory dirname The author makes no warranty of any kind, expressed 10. X window or implied, including the warranties of merchantabil- more filename 11. Graph, Plot, Image processing tools ity or fitness for a particular purpose, with regard to view file filename one screenfull at a time the use of commands contained in this reference card.
    [Show full text]
  • Chapter 15 Functional Programming
    Topics Chapter 15 Numbers n Natural numbers n Haskell numbers Functional Programming Lists n List notation n Lists as a data type n List operations Chapter 15: Functional Programming 2 Numbers Natural Numbers The natural numbers are the numbers 0, 1, Haskell provides a sophisticated 2, and so on, used for counting. hierarchy of type classes for describing various kinds of numbers. Introduced by the declaration data Nat = Zero | Succ Nat Although (some) numbers are provided n The constructor Succ (short for ‘successor’) as primitives data types, it is has type Nat à Nat. theoretically possible to introduce them n Example: as an element of Nat the number 7 through suitable data type declarations. would be represented by Succ(Succ(Succ(Succ(Succ(Succ(Succ Zero)))))) Chapter 15: Functional Programming 3 Chapter 15: Functional Programming 4 Natural Numbers Natural Numbers Every natural number is represented by a Multiplication ca be defined by unique value of Nat. (x) :: Nat à Nat à Nat m x Zero = Zero On the other hand, not every value of Nat m x Succ n = (m x n) + m represents a well-defined natural number. n Example: ^, Succ ^, Succ(Succ ^) Nat can be a member of the type class Eq Addition ca be defined by instance Eq Nat where Zero == Zero = True (+) :: Nat à Nat à Nat Zero == Succ n = False m + Zero = m Succ m == Zero = False m + Succ n = Succ(m + n) Succ m == Succ n = (m == n) Chapter 15: Functional Programming 5 Chapter 15: Functional Programming 6 1 Natural Numbers Haskell Numbers Nat can be a member of the type class Ord instance
    [Show full text]
  • Site Manager's Operations Manual for LISTSERV®, Version 14.3
    L-Soft international, Inc. Site Manager's Operations Manual for LISTSERV®, version 14.3 7 December 2004 LISTSERV 14.3 Release The reference number of this document is 0412-MD-01. Information in this document is subject to change without notice. Companies, names and data used in examples herein are fictitious unless otherwise noted. L-Soft international, Inc. does not endorse or approve the use of any of the product names or trademarks appearing in this document. Permission is granted to copy this document, at no charge and in its entirety, provided that the copies are not used for commercial advantage, that the source is cited and that the present copyright notice is included in all copies, so that the recipients of such copies are equally bound to abide by the present conditions. Prior written permission is required for any commercial use of this document, in whole or in part, and for any partial reproduction of the contents of this document exceeding 50 lines of up to 80 characters, or equivalent. The title page, table of contents and index, if any, are not considered to be part of the document for the purposes of this copyright notice, and can be freely removed if present. The purpose of this copyright is to protect your right to make free copies of this manual for your friends and colleagues, to prevent publishers from using it for commercial advantage, and to prevent ill-meaning people from altering the meaning of the document by changing or removing a few paragraphs. Copyright © 1996-2004 L-Soft international, Inc.
    [Show full text]
  • Subtyping on Nested Polymorphic Session Types 3 Types [38]
    Subtyping on Nested Polymorphic Session Types Ankush Das1, Henry DeYoung1, Andreia Mordido2, and Frank Pfenning1 1 Carnegie Mellon University, USA 2 LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Portugal Abstract. The importance of subtyping to enable a wider range of well- typed programs is undeniable. However, the interaction between subtyp- ing, recursion, and polymorphism is not completely understood yet. In this work, we explore subtyping in a system of nested, recursive, and polymorphic types with a coinductive interpretation, and we prove that this problem is undecidable. Our results will be broadly applicable, but to keep our study grounded in a concrete setting, we work with an extension of session types with explicit polymorphism, parametric type construc- tors, and nested types. We prove that subtyping is undecidable even for the fragment with only internal choices and nested unary recursive type constructors. Despite this negative result, we present a subtyping algo- rithm for our system and prove its soundness. We minimize the impact of the inescapable incompleteness by enabling the programmer to seed the algorithm with subtyping declarations (that are validated by the al- gorithm). We have implemented the proposed algorithm in Rast and it showed to be efficient in various example programs. 1 Introduction Subtyping is a standard feature in modern programming languages, whether functional, imperative, or object-oriented. The two principal approaches to un- derstanding the meaning of subtyping is via coercions or as subsets. Either way, subtyping allows more programs as well-typed, programs can be more concise, and types can express more informative properties of programs. When it comes to the interaction between advanced features of type systems, specifically recursive and polymorphic types, there are still some gaps in our arXiv:2103.15193v1 [cs.PL] 28 Mar 2021 understanding of subtyping.
    [Show full text]