Object-Oriented Programming in R Susana Eyheramendy
Total Page:16
File Type:pdf, Size:1020Kb
Object-Oriented programming in R Susana Eyheramendy 1 Introduction • Object-oriented programming (OOP) has become a widely used and valuable tool for software engineering. • Its value derives from the fact that it is often easier to design, write and maintain software when there is some clear separation of the data representation from the operations that are to be performed on it. 2 Introduction • In an OOP system, real physical things are generally represented by classes, and methods (functions) are written to handle the different manipulations that need to be performed on the objects. • Many people’s view if OOP is based on a class-centric system, in which classes define objects and there are repositories for the methods that can act on those objects. • R separates the class specification from the specification of generic functions, it’s a function- centric system. 3 Introduction • R supports two internal OOP systems: S3 and S4. • S3 is easy to use but can be made unreliable through nothing other than bad luck, or a poor choice of names. • S4 is better suited for developing large software projects but has an increased complexity of use. 4 Introduction • Four general elements that an oop language should support: • Objects: encapsulate state information and control behavior. • Classes: describe general properties for groups of objects. • Inheritance: new classes can be defined in terms of existing classes. • Polymorphism: a (generic) function has different behaviors, although similar outputs, depending on the class of one or more of its arguments. 5 Introduction • In S3, there is no formal specification for classes and hence there is a weak control of objects and inheritance. The emphasis of the S3 system was on generic functions and polymorphism. • In S4, formal class definitions were included in the language and based on these, more controlled software tools and paradigms for the creation of objects and the handling of inheritance were introduced. 6 The basic of OOP • Classes describe the objects that will be represented in computer code. • A class specification details all the properties that are needed to describe an object. • An object is an instance of exactly one class and it is the class definition and representation that determine the properties of the object. Instances of a class difer only in their state. • New classes can be defined in terms of existing classes through an operation called inheritance. 7 The basic of OOP • Inheritance allows new classes to extend, often by adding new slots or by combining two or more existing classes into a single composite entity. • If a class A extends the class B , then we say that A is a superclass of B , and equivalently that B is a subclass of A. • No class can be its own subclass. A class is a subclass of each of its superclasses. 8 The basic of OOP • If the language only allows a class to extend, at most, one class, then we say that language has single inheritance. • Computing the class hierarchy is then very simple, since the resulting hierarchy is a tree and there is a single unique path from any node to the root of the tree. This path yields the class linearization. • In the S3 system, the class of an instance is determined by the values in the class attribute, which is a vector. 9 The basic of OOP • If the language allows a class to directly extend several classes, then we say that the language supports multiple inheritance and computing the class linearization is more difcult. • S4 supports multiple dispatch. 10 The basic of OOP • A method is a type of function that is invoked depending on the class of one or more of its arguments and this process is called dispatch. While in some systems, such as S3, methods can be invoked directly, it is more common for them to be invoked via a generic function. • When a generic function is invoked, the set of methods that might apply must be sorted into a linear order, with the most specific method first and the least specific method last. This is often called method linearization and computing it depends on being able to linearize the class hierarchy. 11 The basic of OOP • If the language supports dispatching on a single argument, then we say it has single dispatch. The S3 system use single dispatch. • When the language supports dispatching on several arguments, we say that the language supports multiple dispatch and the set of specific classes of the arguments for each formal parameter of the generic function is called the signature. S4 supports multiple dispatch. • With multiple dispatch, the additional complication of precedence of the arguments arises. In particular, when method selection depends on inheritance, there may be more than one superclass for which a method has been defined. In this case, a concept of the distance between the class and its superclasses is used to guide12 selection. The basic of OOP • The evaluation process for a call to a generic function is roughly as follows: • The actual classes of supplied arguments that match the signature of the generic function are determined. Based on these, the available methods are ordered from most specific to least. Then, after evaluating any code supplied in the generic, control is transferred to the most specific method. • In S4, a generic function has a fixed set of named formal arguments and these form the basis of the signature. Any call to the generic will be dispatched with respect to its signature. 13 70 R Programming for Bioinformatics would define the Passenger class as having slots for passenger name (which might itself be a class), an origin and a destination. Now, to implement a new class for frequent flyers, e.g., FreqFlyer, we do not want to create a whole new set of class definitions, but rather we can extend the Passenger by adding one or more slots to describe new properties that will be recorded only for frequent flyers. We then say that the FreqFlyer is a subclass of Passenger and that Passenger is a superclass of FreqFlyer. The inheritance relationships imply a form of polymorphism. Any instance of the subclass, in our example the FreqFlyer, can be used in place of an instance of the superclass, in our example Passenger. This must be true since the FreqFlyer class has every slot that an instance of the Passenger class has. The relationship between a subclass and its superclasses should be an is a relationship. Every frequent flyer is a passenger and not all passengers are frequentThe flyers. basic of OOP: Sometimes the notion of subclass and superclass can be confusing. One reason that the more specialized class is called a subclass is because the set of objects that can beinheritance used exchangeably with the FreqFlyer class are a subset of those that can be used exchangeably with the Passenger class. In the exampleConsider below, we as provide an example a very basic S4of implementation the S4 system of the Passenger andtheFreqFlyer modelingclasses. of airline passengers. > setClass("Passenger", representation(name = "character", + origin = "character", destination = "character")) [1] "Passenger" > setClass("FreqFlyer", representation(ffnumber = "numeric"), + contains = "Passenger") [1] "FreqFlyer" > getClass("FreqFlyer") We then say that the FreqFlyer is a subclass of Slots: Passenger and that Passenger is a superclass of Name: ffnumber name origin FreqFlyer . Class: numeric character14 character Name: destination Class: character Extends: "Passenger" > subClassNames("Passenger") [1] "FreqFlyer" Exercise Define a class for passenger names that has slots for the first name, middle initial and last name. Change the definition of the Passenger class to reflect your new class. Does this change the inheritance properties of the Passenger class or the FreqFlyer class? 15 70 R Programming for Bioinformatics would define the Passenger class as having slots for passenger name (which might itself be a class), an origin and a destination. Now, to implement a new class for frequent flyers, e.g., FreqFlyer, we do not want to create a whole new set of class definitions, but rather we can extend the Passenger by adding one or more slots to describe new properties that will be recorded only for frequent flyers. We then say that the FreqFlyer is a subclass of Passenger and that Passenger is a superclass of FreqFlyer. The inheritance relationships imply a form of polymorphism. Any instance of the subclass, in our example the FreqFlyer, can be used in place of an instance of the superclass, in our example Passenger. This must be true since the FreqFlyer class has every slot that an instance of the Passenger class has. The relationship between a subclass and its superclasses should be an is a relationship. Every frequent flyer is a passenger and not all passengers are frequent flyers. Sometimes the notion of subclass and superclass can be confusing. One reason that the more specialized class is called a subclass is because the set of objects that can be used exchangeably with the FreqFlyer class are a subset of those that can be used exchangeably with the Passenger class. In the example below, we provide a very basic S4 implementation of the Passenger and FreqFlyer classes. > setClass("Passenger", representation(name = "character", + origin = "character", destination = "character")) [1] "Passenger" > setClass("FreqFlyer",The basic representation(ffnumber of OOP: = "numeric"), + contains = "Passenger") [1] "FreqFlyer" inheritance > getClass("FreqFlyer") Slots: Name: ffnumber name origin Class: numeric character character Name: destination Class: character Extends: "Passenger" Object-Oriented Programming in R 71 > subClassNames("Passenger")Object-Oriented Programming in R 71 [1] "FreqFlyer" > superClassNames("FreqFlyer")> superClassNames("FreqFlyer") [1] "Passenger" [1] "Passenger" 16 Exercise 3.1 Exercise 3.1 Define a class for passenger names that has slots for the first name, middle initial and last name.