Object-Oriented programming in Susana Eyheramendy

1 Introduction

• Object-oriented programming (OOP) has become a widely used and valuable tool for software engineering. • Its value derives from the fact that it is often easier to design, write and maintain software when there is some clear separation of the data representation from the operations that are to be performed on it.

2 Introduction • In an OOP system, real physical things are generally represented by classes, and methods (functions) are written to handle the different manipulations that need to be performed on the objects. • Many people’s view if OOP is based on a class-centric system, in which classes define objects and there are repositories for the methods that can act on those objects. • R separates the class specification from the specification of generic functions, it’s a function-

centric system. 3 Introduction

• R supports two internal OOP systems: S3 and S4. • S3 is easy to use but can be made unreliable through nothing other than bad luck, or a poor choice of names. • S4 is better suited for developing large software projects but has an increased complexity of use.

4 Introduction

• Four general elements that an oop language should support: • Objects: encapsulate state information and control behavior. • Classes: describe general properties for groups of objects. • Inheritance: new classes can be defined in terms of existing classes. • Polymorphism: a (generic) function has different behaviors, although similar outputs, depending on the class of one or more of its arguments. 5 Introduction

• In S3, there is no formal specification for classes and hence there is a weak control of objects and inheritance. The emphasis of the S3 system was on generic functions and polymorphism. • In S4, formal class definitions were included in the language and based on these, more controlled software tools and paradigms for the creation of objects and the handling of inheritance were introduced.

6 The basic of OOP

• Classes describe the objects that will be represented in computer code.

• A class specification details all the properties that are needed to describe an object.

• An object is an instance of exactly one class and it is the class definition and representation that determine the properties of the object. Instances of a class difer only in their state.

• New classes can be defined in terms of existing classes through an operation called inheritance. 7 The basic of OOP

• Inheritance allows new classes to extend, often by adding new slots or by combining two or more existing classes into a single

composite entity.

• If a class A extends the class B , then we say that A is a superclass of B , and equivalently

that B is a subclass of A.

• No class can be its own subclass. A class is a subclass of each of its superclasses.

8 The basic of OOP

• If the language only allows a class to extend, at most, one class, then we say that language has single inheritance.

• Computing the class hierarchy is then very simple, since the resulting hierarchy is a tree and there is a single unique path from any node to the root of the tree. This

path yields the class linearization. • In the S3 system, the class of an instance is determined by the values in the class attribute, which is a vector. 9 The basic of OOP

• If the language allows a class to directly extend several classes, then we say that the language supports multiple inheritance and

computing the class linearization is more difcult.

• S4 supports .

10 The basic of OOP

• A method is a type of function that is invoked depending on the class of one or more of its arguments and this process is called dispatch. While in some systems, such as S3, methods can be invoked directly, it is more common for them to be invoked via a generic function.

• When a generic function is invoked, the set of methods that might apply must be sorted into a linear order, with the most specific method first and the least specific

method last. This is often called method linearization and

computing it depends on being able to linearize the class hierarchy. 11 The basic of OOP

• If the language supports dispatching on a single argument, then we say it has single dispatch. The S3 system use single dispatch.

• When the language supports dispatching on several arguments, we say that the language supports multiple dispatch and the set of specific classes of the arguments for each formal parameter

of the generic function is called the signature. S4 supports multiple dispatch.

• With multiple dispatch, the additional complication of precedence of the arguments arises. In particular, when

method selection depends on inheritance, there may be more

than one superclass for which a method has been defined. In this case, a concept of the distance between the class and its

superclasses is used to guide12 selection. The basic of OOP • The evaluation process for a call to a generic function is roughly as follows: • The actual classes of supplied arguments that match the signature of the generic function are determined. Based

on these, the available methods are ordered from most

specific to least. Then, after evaluating any code supplied in the generic, control is transferred to the most specific method.

• In S4, a generic function has a fixed set of named formal arguments and these form the basis of the signature. Any call

to the generic will be dispatched with respect to its signature. 13 70 R Programming for Bioinformatics would define the Passenger class as having slots for passenger name (which might itself be a class), an origin and a destination. Now, to implement a new class for frequent flyers, e.g., FreqFlyer, we do not want to create a whole new set of class definitions, but rather we can extend the Passenger by adding one or more slots to describe new properties that will be recorded only for frequent flyers. We then say that the FreqFlyer is a subclass of Passenger and that Passenger is a superclass of FreqFlyer. The inheritance relationships imply a form of polymorphism. Any instance of the subclass, in our example the FreqFlyer, can be used in place of an instance of the superclass, in our example Passenger. This must be true since the FreqFlyer class has every slot that an instance of the Passenger class has. The relationship between a subclass and its superclasses should be an is a relationship. Every frequent flyer is a passenger and not all passengers are frequentThe flyers. basic of OOP: Sometimes the notion of subclass and superclass can be confusing. One reason that the more specialized class is called a subclass is because the set of objects that can beinheritance used exchangeably with the FreqFlyer class are a subset of those that can be used exchangeably with the Passenger class. In the exampleConsider below, we as provide an example a very basic S4of implementation the S4 system of the Passenger andtheFreqFlyer modelingclasses. of airline passengers.

> setClass("Passenger", representation(name = "character", + origin = "character", destination = "character"))

[1] "Passenger"

> setClass("FreqFlyer", representation(ffnumber = "numeric"), + contains = "Passenger")

[1] "FreqFlyer"

> getClass("FreqFlyer") We then say that the FreqFlyer is a subclass of Slots: Passenger and that Passenger is a superclass of Name: ffnumber name origin FreqFlyer . Class: numeric character14 character

Name: destination Class: character

Extends: "Passenger"

> subClassNames("Passenger")

[1] "FreqFlyer" Exercise

Define a class for passenger names that has slots for the first name, middle initial and last name. Change the definition of the Passenger class to reflect your new class. Does this change the inheritance properties of the Passenger class or the FreqFlyer class?

15 70 R Programming for Bioinformatics

would define the Passenger class as having slots for passenger name (which might itself be a class), an origin and a destination. Now, to implement a new class for frequent flyers, e.g., FreqFlyer, we do not want to create a whole new set of class definitions, but rather we can extend the Passenger by adding one or more slots to describe new properties that will be recorded only for frequent flyers. We then say that the FreqFlyer is a subclass of Passenger and that Passenger is a superclass of FreqFlyer. The inheritance relationships imply a form of polymorphism. Any instance of the subclass, in our example the FreqFlyer, can be used in place of an instance of the superclass, in our example Passenger. This must be true since the FreqFlyer class has every slot that an instance of the Passenger class has. The relationship between a subclass and its superclasses should be an is a relationship. Every frequent flyer is a passenger and not all passengers are frequent flyers. Sometimes the notion of subclass and superclass can be confusing. One reason that the more specialized class is called a subclass is because the set of objects that can be used exchangeably with the FreqFlyer class are a subset of those that can be used exchangeably with the Passenger class. In the example below, we provide a very basic S4 implementation of the Passenger and FreqFlyer classes.

> setClass("Passenger", representation(name = "character", + origin = "character", destination = "character"))

[1] "Passenger"

> setClass("FreqFlyer",The basic representation(ffnumber of OOP: = "numeric"), + contains = "Passenger") [1] "FreqFlyer" inheritance > getClass("FreqFlyer")

Slots:

Name: ffnumber name origin Class: numeric character character

Name: destination Class: character

Extends: "Passenger" Object-Oriented Programming in R 71 > subClassNames("Passenger")Object-Oriented Programming in R 71 [1] "FreqFlyer" > superClassNames("FreqFlyer")> superClassNames("FreqFlyer")

[1] "Passenger" [1] "Passenger" 16

Exercise 3.1 Exercise 3.1 Define a class for passenger names that has slots for the first name, middle initial and last name. Change the definition of the Passenger class to reflect Define a class for passengeryour new names class. Does that this has change slots the for inheritance the first properties name, of middle the Passenger initial and last name.class Change or the theFreqFlyer definitionclass? of the Passenger class to reflect your new class. Does this change the inheritance properties of the Passenger class or the FreqFlyer class?3.2.2 Dispatch We must also briefly digress to consider some of the issues involved in ap- plying methods to objects. A method is a specialized function that can be 3.2.2 Dispatch applied to instances of one or more classes. The process of determining the appropriate method to invoke is called dispatch. A call to a function, such as We must also brieflyplot digress, will invoke to consider a method that some is determined of the issues by the involved class of the in first ap- argument plying methods to objects.in the call A to methodplot. is a specialized function that can be applied to instances of oneWhen or a genericmore classes.function is The called, process it must examine of determining the supplied the arguments and determine the applicable methods. All applicable methods are ordered; appropriate method todetails invoke on howis called this is done dispatch. for S4 are A given call in to Section a function, 3.4.9, while such for S3 as the hi- plot, will invoke a methoderarchy that is intrinsically is determined linear and by hence the class has an of obvious the first order. argument In both systems, in the call to plot. the applicable methods are arranged from most specific to least specific and the most specific method is invoked. During evaluation, control may be passed When a generic functionto less is specific called, methods it must by calling examineNextMethod thein supplied S3 and via argumentscallNextMethod for and determine the applicableS4. This strategy methods. tends All to help applicable simplify the methods code. If we are return ordered; to our frequent details on how this is doneflyer forexample, S4 are we givencan imagine in Section a print method 3.4.9, for while passengers for S3 that the prints hi- their names and flight details. A print method for frequent flyers could simply in- erarchy is intrinsically linearvoke the and passenger hence method, has an and obvious then add order. a line indicating In both systems,the frequent flyer the applicable methodsnumber. are arranged Using this from approach, most very specific little additional to least code specific is needed; and and if the most specific methodthe is printing invoked. of passenger During information evaluation, is changed, control the may update be is passed automatically to less specific methodsapplied by calling to printingNextMethod of frequentin flyer S3 information.and via callNextMethod for S4. This strategy tendsExercise to help 3.2 simplify the code. If we return to our frequent flyer example, we can imagineWrite a simple a printshow methodmethod for for the passengersPassenger class. that Write prints a show theirmethod names and flight details.for the AFreqFlyer print methodclass that for makes frequent use of the flyersshow couldmethod simply for passengers. in- For S4 you will want to use setMethod and callNextMethod, while for an S3 im- voke the passenger method,plementation and then you will add need a toline use indicatingNextMethod and the name frequent the print flyer methods number. Using this approach,print.Passenger veryand littleprint.FreqFlyer additional. code is needed; and if the printing of passengerWith information both S3 and is S4, changed, dispatching the is implemented update is through automatically the use of generic applied to printing of frequentfunctions. In flyer the S3 information. system, the generic function typically only examines the first argument and dispatches depending on its class. In S3, methods are not Exercise 3.2 explicitly registered but are determined by a function naming convention that Write a simple show method for the Passenger class. Write a show method for theFreqFlyer class that makes use of the show method for passengers. For S4 you will want to use setMethod and callNextMethod, while for an S3 im- plementation you will need to use NextMethod and name the print methods print.Passenger and print.FreqFlyer.

With both S3 and S4, dispatching is implemented through the use of generic functions. In the S3 system, the generic function typically only examines the first argument and dispatches depending on its class. In S3, methods are not explicitly registered but are determined by a function naming convention that The basic of OOP: Dispatch • A method is a specialized function that can be applied to instances of one or more classes (objects). • The process of determining the appropriate method to invoke is called dispatch. • A call to a function, such as plot, will invoke a method that is determined by the class of the first argument in the call to plot.

17 The basic of OOP: Dispatch

• When a generic function is called, it must examine the supplied arguments and determine the applicable methods. All applicable methods are arranged from most specific to least specific and the most specific method is invoked.

• During evaluation, control may be passed to less specific methods by calling NextMethod in S3 and via

callNextMethod for S4.

18 The basic of OOP: Dispatch • For example, consider a print method for passengers that prints their names and flight details. • A print method for frequent flyers could simply invoke the passenger method, and then add a line

indicating the frequent flyer number. Using this

approach, very little additional code is needed; and if the printing of passenger information is changed, the

update is automatically applied to printing of frequent

flyer information.

19 The basic of OOP: Dispatch

• With both S3 and S4, dispatching is implemented through the use of generic functions.

20 9 OBJECT-ORIENTED PROGRAMMING

Many programmers believe that object- oriented programming (OOP) makes for clearer, more reusableR code. Though very different from the familiar OOP languages like C++, Java, and Python, R is very much OOP in outlook. The following themes are key to R: • Everything you touch in R—ranging from numbers to character strings to matrices—is an object. • R promotes encapsulation, which is packaging separate but related data items into one class instance. Encapsulation helps you keep track of related variables, enhancing clarity. • R classes are polymorphic, which means that the same function call leads to different operations for objects of different classes. For instance, a call to print() on an object of a certain class triggers a call to a print function tailored to that class. Polymorphism promotes reusability. • R allows inheritance, which allows extending a given class to a more spe- cialized class.

21 The S3 system • S3 is the original R structure for classes • S3 is still the dominant class paradigm in R use today • Most of R’s built-in classes are of the S3 type • An S3 class consists of a list, with a class name attribute and dispatch capability added, which enables the use of generic functions • S4 classes were developed later with the goal of adding safety (cannot accidentally access a class component that is not already in existence).

22 The S3 system

• Generic functions and methods are widely used but there is little use of inheritance and classes are quite loosely defined. • Some classes are internal or implicit and others are specified explicitly, typically by using the class attribute. • One determines the class of an object using the function class().

23 74 R Programming for Bioinformatics

the variables, and possibly about the samples. Defining a suitable class yields self-describing data. The major benefits that we have found to programming with self-describing data are that it is easy to return to a project after some months and re-do an analysis. We have also found that it is relatively easy to hand off a project from one analyst to another. But perhaps the greatest benefit has come from defining specialized subsetting methods, that is, methods for [ that help to construct an appropriate subset of the object, with all variables correctly aligned.

3.3 S3 OOP The S3 system is relatively easy to describe and to use. It is particularly well suited to interactive use but is not particularly robust. Generic functions and methods are quite widely used but there is little use of inheritance and classes are quite loosely defined. In some sense, all objects in R are instances of some class. Some classes are internal or implicit and others are specified explicitly, typically by using the class attribute. In the S3 system, one determines the class of an object using the function class, and for most purposes this is suf- ficient; however, there are some important exceptions that arise with respect to internal functions. While there is no formal mechanism for organizing or representing instances of a class, they are typically lists, where the different slots are represented as named elements in the list. Using setOldClass will register anThe S3 class as anS3 S4 class. system The class attribute is a vector of character values, each of which specifies a particular class. The most specific class comes first, followed by any less specific classes. For our frequent flyer example from Section 3.2.1, the class vector should always have FreqFlyer first and Passenger second. The recom- The class attribute is a vector of character • mended way of testing whether an S3 object is an instance of a particular class is to use the inherits function. Direct inspection of the class attribute is not values, each of which specifies a particular recommended since implicit classes, such as matrix and array, are not listed class.in the class The attribute. most Notice specific in the code class below comes that the class first, of x changes Object-Orientedwhen a dimension Programming attribute is in added, R that there is no75class attribute, and followedthat once x is abymatrix anyit isless no longer specific considered classes. to be an integer . [1] "matrix" > x = 1:10 > attr(x, "class")> class(x)

NULL [1] "integer" Object-Oriented Programming in R 75 > inherits(x, "integer")> dim(x) = c(2, 5) > class(x) [1] FALSE [1] "matrix" > attr(x, "class") In the next example we return to our FreqFlyer example and provide an S3 implementation. NULL 24 > inherits(x, "integer") > x = list(name = "Josephine Biologist", origin = "SEA", + destination[1] = "YXY") FALSE > class(x) = "Passenger" > y = list(name = "JosephineIn the next example Physicist", we return origin to our =FreqFlyer "SEA", example and provide an S3 + destinationimplementation. = "YVR", ffnumber = 10) > class(y) = c("FreqFlyer", "Passenger") > inherits(x, "Passenger") > x = list(name = "Josephine Biologist", origin = "SEA", [1] TRUE + destination = "YXY") > class(x) = "Passenger" > inherits(x, "FreqFlyer")> y = list(name = "Josephine Physicist", origin = "SEA", + destination = "YVR", ffnumber = 10) [1] FALSE > class(y) = c("FreqFlyer", "Passenger") > inherits(x, "Passenger") > inherits(y, "Passenger") [1] TRUE [1] TRUE > inherits(x, "FreqFlyer")

A major problem[1] with FALSE this approach is that there is no mechanism that programmers can use to ensure that all instances of the Passenger or FreqFlyer classes have the correct> inherits(y, slots, the correct "Passenger") types of values in those slots, and the correct class attribute. One can easily produce an object with these classes that has none of the[1]slots TRUEwe have defined. And as a result, one typically has to do a great deal of checking of arguments in every S3 method. The function is.object tests whether or not an R object has a class at- A major problem with this approach is that there is no mechanism that tribute. This is somewhat important as the help page for class indicates that programmers can use to ensure that all instances of the Passenger or FreqFlyer some dispatch is restricted to objects for which is.object is true. classes have the correct slots, the correct types of values in those slots, and the correct class attribute. One can easily produce an object with these classes > x = 1:10 that has none of the slots we have defined. And as a result, one typically has > is.object(x) to do a great deal of checking of arguments in every S3 method. The function is.object tests whether or not an R object has a class at- tribute. This is somewhat important as the help page for class indicates that some dispatch is restricted to objects for which is.object is true.

> x = 1:10 > is.object(x) The S3 system

Object-Oriented Programming in R 75 A way of testing whether an S3 object is an •[1] "matrix" instance of a particular class is to use the > attr(x, "class") inherits function. NULL

> inherits(x, "integer")

[1] FALSE

In the next example we return to our FreqFlyer example and provide an S3 implementation.

> x = list(name = "Josephine Biologist", origin = "SEA", + destination = "YXY") 25 > class(x) = "Passenger" > y = list(name = "Josephine Physicist", origin = "SEA", + destination = "YVR", ffnumber = 10) > class(y) = c("FreqFlyer", "Passenger") > inherits(x, "Passenger")

[1] TRUE

> inherits(x, "FreqFlyer")

[1] FALSE

> inherits(y, "Passenger")

[1] TRUE

A major problem with this approach is that there is no mechanism that programmers can use to ensure that all instances of the Passenger or FreqFlyer classes have the correct slots, the correct types of values in those slots, and the correct class attribute. One can easily produce an object with these classes that has none of the slots we have defined. And as a result, one typically has to do a great deal of checking of arguments in every S3 method. The function is.object tests whether or not an R object has a class at- tribute. This is somewhat important as the help page for class indicates that some dispatch is restricted to objects for which is.object is true.

> x = 1:10 > is.object(x) Object-Oriented Programming in R 75

[1] "matrix"

> attr(x, "class")

NULL

> inherits(x, "integer") [1] FALSE Example In the next example we return to our FreqFlyer example and provide an S3 implementation.

> x = list(name = "Josephine Biologist", origin = "SEA", + destination = "YXY") > class(x) = "Passenger" > y = list(name = "Josephine Physicist", origin = "SEA", + destination = "YVR", ffnumber = 10) > class(y) = c("FreqFlyer", "Passenger") > inherits(x, "Passenger")

[1] TRUE

> inherits(x, "FreqFlyer")

[1] FALSE

> inherits(y, "Passenger")

[1] TRUE

26 A major problem with this approach is that there is no mechanism that programmers can use to ensure that all instances of the Passenger or FreqFlyer classes have the correct slots, the correct types of values in those slots, and the correct class attribute. One can easily produce an object with these classes that has none of the slots we have defined. And as a result, one typically has to do a great deal of checking of arguments in every S3 method. The function is.object tests whether or not an R object has a class at- tribute. This is somewhat important as the help page for class indicates that some dispatch is restricted to objects for which is.object is true.

> x = 1:10 > is.object(x) Object-Oriented Programming in R 75

[1] "matrix"

> attr(x, "class")

NULL

> inherits(x, "integer")

[1] FALSE

In the next example we return to our FreqFlyer example and provide an S3 implementation.

> x = list(name = "Josephine Biologist", origin = "SEA", + destination = "YXY") > class(x) = "Passenger" > y = list(name = "Josephine Physicist", origin = "SEA", + destination = "YVR", ffnumber = 10) > class(y) = c("FreqFlyer", "Passenger") > inherits(x, "Passenger")

[1] TRUE

> inherits(x, "FreqFlyer")

[1] FALSE

> inherits(y, "Passenger")

[1] TRUE

A major problem with this approach is that there is no mechanism that programmers can use to ensure that all instances of the Passenger or FreqFlyer classes have the correct slots, the correct types of values in those slots, and the correct classThe attribute. S3 One cansystem easily produce an object with these classes that has none of the slots we have defined. And as a result, one typically has to do a great deal of checking of arguments in every S3 method. The function is.object tests whether or not an R object has a class at- The function is.object tests whether or • tribute. This is somewhat important as the help page for class indicates that somenot dispatchan R object is restricted has toa class objects for attribute. which is.object is true.

>76 x = 1:10 R Programming for Bioinformatics > is.object(x) [1] FALSE

> class(x) = "myint" > is.object(x)

[1] TRUE

3.3.1 Implicit classes 27 The earliest versions of the S language predate the widespread use of object- oriented programming and hence the class representations for some of the more primitive or basic classes do not use the class attribute. For example, functions and closures are implicitly of class function while matrices and arrays are implicitly of classes matrix and array, respectively.

> x = matrix(1:10, nc = 2) > class(x) = "matrix" >x [,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 > is.object(x) [1] FALSE > oldClass(x) = "matrix" >x [,1] [,2] [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 attr(,"class") [1] "matrix" > is.object(x) [1] TRUE The S3 system: Implicit classes • The earliest versions of the S language predate the widespread use of object- oriented programming

and hence the class representations for some of the

more primitive or basic classes do not use the

class attribute.

• For example, functions and closures are implicitly of class function while matrices and arrays are implicitly

of classes matrix and array , respectively.

28 The S3 system v <- 1:10 > v [1] 1 2 3 4 5 6 7 8 9 10 > attributes(v) NULL > class(v) [1] "integer" > class(v) <- "character" > attributes(v) NULL > class(v) [1] "character" 29 This chapter covers OOP in R. We’ll discuss programming in the two types of classes, S3 and S4, and then present a few useful OOP-related R utilities.

9.1 S3 Classes The original R structure for classes, known as S3, is still the dominant class paradigm in R use today. Indeed, most of R’s own built-in classes are of the S3 type. An S3 class consists of a list, with a class name attribute and dispatch capability added. The latter enables the use of generic functions, as we saw in Chapter 1. S4 classes were developed later, with goal of adding safety, meaning that you cannot accidentally access a class component that is not already in existence.

9.1.1 S3 Generic Functions As mentioned, R is polymorphic, in the sense that the same function can lead to different operations for different classes. You can apply plot(), for example, to many different types of objects, getting a different type of plot for each. The same is true for print(), summary(), and many other functions. In this manner, we get a uniform interface to different classes. For exam- ple, if you are writing code that includes plot operations, polymorphism may allow you to write your program without worrying about the various types of objects that might be plotted. In addition, polymorphism certainly makes things easier to remem- ber for the user and makes it fun and convenient to explore new library functions and associated classes. If a function is new to you, just try running plot() on the function’s output; it will likely work. From a programmer’s viewpoint, polymorphism allows writing fairly general code, without worry- ing about what type of object is being manipulated, because the underlying class mechanisms take care of that. The functions that work with polymorphism, such as plot() and print(), are known as generic functions. When a generic function is called, R will then dispatch the call to the proper class method, meaning that it will reroute the call to a function defined for the object’s class. OOP in the lm() Linear 9.1.2 Example: OOP in the lm() Linear Model Function As an example,Model let’s look function at a simple regression analysis run via R’s lm() func- tion. First, let’s see what lm() does: Let’s try creating an instance of this object and then printing it: > ?lm > x <- c(1,2,3) > y <- c(1,3,8) The> lmout output <- lm(y of ~ x) this help query will tell you, among other things, that this function> class(lmout) returns an object of class "lm". [1] "lm" > lmout

Call: 208 Chapter 9 lm(formula = y ~ x)

Coefficients: (Intercept) x

-3.0 3.5 30

Here, we printed out the object lmout. (Remember that by simply typ- ing the name of an object in interactive mode, the object is printed.) The R interpreter then saw that lmout was an object of class "lm" and thus called print.lm(), a special print method for the "lm" class. In R terminology, the call to the generic function print() was dispatched to the method print.lm() associated with the class "lm". Let’s take a look at the generic function and the class method in this case:

> print function(x, ...) UseMethod("print") > print.lm function (x, digits = max(3, getOption("digits") - 3), ...) { cat("\nCall:\n", deparse(x$call), "\n\n", sep = "") if (length(coef(x))) { cat("Coefficients:\n") print.default(format(coef(x), digits = digits), print.gap = 2, quote = FALSE) } else cat("No coefficients\n") cat("\n") invisible(x) }

You may be surprised to see that print() consists solely of a call to UseMethod(). But this is actually the dispatcher function, so in view of print()’s role as a generic function, you should not be surprised after all.

Object-Oriented Programming 209 OOP in the lm() Linear Model function # S3 classes library(car) # for data mod.prestige <- lm(prestige ~ income + education + women, data=Prestige) attributes(mod.prestige) $names [1] "coefficients" "residuals" "effects" "rank" "fitted.values" [6] "assign" "qr" "df.residual" "xlevels" "call" [11] "terms" "model"

$class [1] "lm" class(mod.prestige) [1] "lm"

31 S3 generic functions and methods • The generic function is responsible for setting up the evaluation environment and for initiating dispatch.

• A generic function does this through a call to UseMethod that initiates the dispatch on a single

argument, usually the first argument to the generic function.

• The generic is typically a very simple function with only two formal arguments, one often named x and

the other the ... argument. 32 This chapter covers OOP in R. We’ll discuss programming in the two types of classes, S3 and S4, and then present a few useful OOP-related R utilities.

9.1 S3 Classes The original R structure for classes, known as S3, is still the dominant class paradigm in R use today. Indeed, most of R’s own built-in classes are of the S3 type. An S3 class consists of a list, with a class name attribute and dispatch capability added. The latter enables the use of generic functions, as we saw in Chapter 1. S4 classes were developed later, with goal of adding safety, meaning that you cannot accidentally access a class component that is not already in existence.

9.1.1 S3 Generic Functions As mentioned, R is polymorphic, in the sense that the same function can lead to different operations for different classes. You can apply plot(), for example, to many different types of objects, getting a different type of plot for each. The same is true for print(), summary(), and many other functions. In this manner, we get a uniform interface to different classes. For exam- ple, if you are writing code that includes plot operations, polymorphism may allow you to write your program without worrying about the various types of objects that might be plotted. In addition, polymorphism certainly makes things easier to remem- ber for the user and makes it fun and convenient to explore new library functions and associated classes. If a function is new to you, just try running plot() on the function’s output; it will likely work. From a programmer’s viewpoint, polymorphism allows writing fairly general code, without worry- ing about what type of object is being manipulated, because the underlying class mechanisms take care of that. The functions that work with polymorphism, such as plot() and print(), are known as generic functions. When a generic function is called, R will then dispatch the call to the proper class method, meaning that it will reroute the call to a function defined for the object’s class. OOP in the lm() Linear 9.1.2 Example: OOP in the lm() Linear Model Function As an example,Model let’s look function at a simple regression analysis run via R’s lm() func- tion. First, let’s see what lm() does: Let’s try creating an instance of this object and then printing it: > ?lm What happened here? > x <- c(1,2,3) In R terminology, the call to > y <- c(1,3,8) the generic function print() was The> lmout output <- lm(y of ~ x) this help query will tell you, among other things, that this function> class(lmout) returns an object ofdispatched class "lm". to the method [1] "lm" print.lm() associated with the > lmout class "lm".

Call: 208 Chapter 9 lm(formula = y ~ x)

Coefficients: (Intercept) x

-3.0 3.5 33

Here, we printed out the object lmout. (Remember that by simply typ- ing the name of an object in interactive mode, the object is printed.) The R interpreter then saw that lmout was an object of class "lm" and thus called print.lm(), a special print method for the "lm" class. In R terminology, the call to the generic function print() was dispatched to the method print.lm() associated with the class "lm". Let’s take a look at the generic function and the class method in this case:

> print function(x, ...) UseMethod("print") > print.lm function (x, digits = max(3, getOption("digits") - 3), ...) { cat("\nCall:\n", deparse(x$call), "\n\n", sep = "") if (length(coef(x))) { cat("Coefficients:\n") print.default(format(coef(x), digits = digits), print.gap = 2, quote = FALSE) } else cat("No coefficients\n") cat("\n") invisible(x) }

You may be surprised to see that print() consists solely of a call to UseMethod(). But this is actually the dispatcher function, so in view of print()’s role as a generic function, you should not be surprised after all.

Object-Oriented Programming 209 S3 genericObject-Oriented Programming functions in R 79 and new arguments that are appropriate to the computations they will perform. A disadvantage of this approach is that mistakes in naming arguments will be silently ignored. The mis-typed name will not match any formal argument and hence is placed in the ... argument,methods where it is never used. In R, UseMethod dispatches on the class as returned by class, not that returned by oldClass. Not all method dispatch honors implicit classes. In particular, group generics (Section 3.3.5) and internal generics do not. Group • genericsMethods dispatch onare the oldClass regularfor effi ciencyfunctions reasons, and internaland genericsare identified by only dispatch on objects for which is.object is TRUE. An internal generic is a functiontheir that name, calls directly which to C code is (a a primitive concatenation or internal function), of and the name of there checks to see if it should dispatch. To make use of these, you will need to explicitlythe generic set the class attribute.and the You canname do that usingof theclass<- ,classoldClass<- that they are or by setting the attribute directly using attr<-. intendedFor most generic to functions, apply a default to, method separated will be needed. by The a default dot. method is invoked if no applicable methods are found, or if the least specific method makes a call to NextMethod. • AMethods simple are regular generic functions andfunction are identified bynamed their name, fun which and is a a default concatenation of the name of the generic and the name of the class that they aremethod intended to apply are to, shown separated by abelow. dot. A simple The generic string function named default is used fun and a default method are shown below. The string default is used as ifas it wereif it a classwere and indicates a class that theand method indicates is a default method that for the the method is a generic. default method for the generic. > fun = function(x, ...) UseMethod("fun") > fun.default = function(x, ...) print("In the default method") > fun(2)

[1] "In the default method" 34

Next, consider a class system with two classes, Foo which extends Bar. Then we define two methods: fun.Foo and fun.Bar. We have them print out a message, call the function NextMethod and then print out a second message.

> fun.Foo = function(x) { + print("start of fun.Foo") + NextMethod() + print("end of fun.Foo") +} > fun.Bar = function(x) { + print("start of fun.Bar") + NextMethod() + print("end of fun.Bar") +} Object-Oriented Programming in R 79 new arguments that are appropriate to the computations they will perform. A disadvantage of this approach is that mistakes in naming arguments will be silently ignored. The mis-typed name will not match any formal argument and hence is placed in the ... argument, where it is never used. In R, UseMethod dispatches on the class as returned by class, not that returned by oldClass. Not all method dispatch honors implicit classes. In particular, group generics (Section 3.3.5) and internal generics do not. Group generics dispatch on the oldClass for efficiency reasons, and internal generics only dispatch on objects for which is.object is TRUE. An internal generic is a function that calls directly to C code (a primitive or internal function), and there checks to see if it should dispatch. To make use of these, you will need to explicitly set the class attribute. You can do that using class<-, oldClass<- or by setting the attribute directly using attr<-. For most generic functions, a default method will be needed. The default method is invoked if no applicable methods are found, or if the least specific method makes a call to NextMethod. Methods are regular functions and are identified by their name, which is a concatenation of the name of the generic and the name of the class that they are intended to apply to, separated by a dot. A simple generic function named fun and a default method are shown below. The string default is used as if it were a class and indicates that the method is a default method for the generic.

>S3 fun = function(x,generic ...) UseMethod("fun") functions and > fun.default = function(x, ...) print("In the default method") > fun(2) methods [1] "In the default method" Consider a class system with two classes, Foo which extends Next,Bar. considerThen we adefine class systemtwo methods: with two fun.Foo classes, andFoo fun.Barwhich. extends We haveBar . Thenthem we print define twoout methods:a message,fun.Foo calland thefun.Bar function. We NextMethod have them printand then out a message,print out call a the second function message.NextMethod and then print out a second message.

> fun.Foo = function(x) { + print("start of fun.Foo") + NextMethod() + print("end of fun.Foo") +} > fun.Bar = function(x) { + print("start of fun.Bar") + NextMethod() + print("end of fun.Bar") +}

35 S3 generic functions and 80 R Programming for Bioinformatics

Now we can show howmethods dispatch occurs by creating an instance that has both classes and calling fun with that instance as the first argument. Now we can show how dispatch occurs by creating an instance that has both classes and calling fun with that instance as the first argument. >x=1 > class(x) = c("Foo", "Bar") > fun(x)

[1] "start of fun.Foo" [1] "start of fun.Bar" [1] "In the default method" [1] "end of fun.Bar" [1] "end of fun.Foo"

Notice that the call to NextMethod transfers control to the next most specific method. Notice that the call to NextMethod transfers control to the next most specific method. This is one of the benefits of using an OOP paradigm. Typically,

less code needs to be written, and36 it is easier to maintain as the methods for Foo do not need to know much about Bar and vice versa, as a specific method for that class can handle the computations.

Exercise 3.4 Returning to our ExpressionSet example, Section 3.3.2, instances of EXPRS3 can be very large and we want to control the default information that is printed by R. Write S3 print methods for the PHENODS3 and EXPRS3 classes.

3.3.3.1 Finding methods Due to the somewhat simple nature of the S3 system, there is very little introspection or reflection possible. The function methods reports on all avail- able methods for a given generic function but it does this simply by looking at the names. We demonstrate its use on the S3 generic function mean in the code below.

> methods("mean")

[1] mean.Date mean.POSIXct mean.POSIXlt [4] mean.data.frame mean.default mean.difftime

One can also use methods to find all available methods for a given class. In the code below we find all methods for the class glm.

> methods(class = "glm") Let’s try creating an instance of this object and then printing it:

> x <- c(1,2,3) > y <- c(1,3,8) > lmout <- lm(y ~ x) > class(lmout) [1] "lm" > lmout

Call: lm(formula = y ~ x)

Coefficients: (Intercept) x -3.0 3.5

Here, we printed out the object lmout. (Remember that by simply typ- ing the name of an object in interactive mode, the object is printed.) The R interpreter then saw that lmout was an object of class "lm" and thus called print.lm()OOP, a special print methodin forthe the "lm" class.lm() In R terminology, Linear the call to the generic function print() was dispatched to the method print.lm() associated with the class "lm". Let’s take a lookModel at the generic function function and the class method in this case:

> print Printing depends on function(x, ...) UseMethod("print") context, with a special > print.lm function (x, digits = max(3, getOption("digits") - 3), ...) print function called { for the “lm” class. cat("\nCall:\n", deparse(x$call), "\n\n", sep = "") if (length(coef(x))) { cat("Coefficients:\n") print.default(format(coef(x), digits = digits), print.gap = 2, quote = FALSE) } else cat("No coefficients\n") cat("\n") invisible(x) }

You may be surprised to see that print() 37consists solely of a call to UseMethod(). But this is actually the dispatcher function, so in view of print()’s role as a generic function, you should not be surprised after all.

Object-Oriented Programming 209 OOP in the lm() Linear Model function

WhatDon’t worry happens about the details when of print.lm() we. The mainprint point isthis that the object with its printing depends on context, with a special print function called for the "lm" class. Now let’s see whatclass happens attribute when we print this objectremoved? with its class attribute removed:

> unclass(lmout) $coefficients (Intercept) x -3.0 3.5 The author of lm() decided to

$residuals make print.lm() much more 123 0.5 -1.0 0.5 concise, limiting it to printing $effects a few key quantities. (Intercept) x -6.928203 -4.949747 1.224745

$rank [1] 2 ...

I’ve shown only the first few lines here—there’s a lot38 more. (Try run- ning this on your own!) But you can see that the author of lm() decided to make print.lm() much more concise, limiting it to printing a few key quantities.

9.1.3 Finding the Implementations of Generic Methods You can find all the implementations of a given generic method by calling methods(), like this:

> methods(print) [1] print.acf* [2] print.anova [3] print.aov* [4] print.aovlist* [5] print.ar* [6] print.Arima* [7] print.arima0* [8] print.AsIs [9] print.aspell* [10] print.Bibtex* [11] print.browseVignettes* [12] print.by [13] print.check_code_usage_in_package* [14] print.check_demo_index* [15] print.checkDocFiles*

210 Chapter 9 80 R Programming for Bioinformatics

Now we can show how dispatch occurs by creating an instance that has both classes and calling fun with that instance as the first argument.

>x=1 > class(x) = c("Foo", "Bar") > fun(x)

[1] "start of fun.Foo" [1] "start of fun.Bar" [1]80 "In the defaultR Programming method" for Bioinformatics [1] "end of fun.Bar" [1]Now "end we can of show fun.Foo" how dispatch occurs by creating an instance that has both classes and calling fun with that instance as the first argument.

>x=1Notice that the call to NextMethod transfers control to the next most specific method.> class(x) This = c("Foo", is one of "Bar") the benefits of using an OOP paradigm. Typically, > fun(x) less code needs to be written, and it is easier to maintain as the methods for Foo[1]do "start not need of fun.Foo" to know much about Bar and vice versa, as a specific method for[1] that "start class of can fun.Bar" handle the computations. [1] "In the default method" [1] "end of fun.Bar" Exercise[1] "end of 3.4 fun.Foo" Returning to our ExpressionSet example, Section 3.3.2, instances of EXPRS3 can be very large and we want to control the default information that is printed Notice that the call to NextMethod transfers control to the next most specific bymethod. R. Write This S3 is oneprint of themethods benefits of for using the anPHENODS3 OOP paradigm.and Typically,EXPRS3 classes. less code needs to be written, and it is easier to maintain as the methods for 3.3.3.1Foo do not need Finding to know methods much about Bar and vice versa, as a specific method for that class can handle the computations. Due to the somewhat simple nature of the S3 system, there is very little Exercise 3.4 introspectionReturning to our orExpressionSet reflection possible.example, Section The function 3.3.2, instancesmethods of EXPRS3reports on all avail- ablecan be methods very large for and a we given want to generic control the function default information but it does that this is printed simply by looking atby the R. Write names. S3 print Wemethods demonstrate for the itsPHENODS3 use on theand S3EXPRS3 genericclasses. function mean in the code below. 3.3.3.1S3 Findinggeneric methods functions and Due to the somewhat simple nature of the S3 system, there is very little >introspection methods("mean") or reflectionmethods possible. The function methods reports on all avail- able methods for a given generic function but it does this simply by looking at the names. We demonstrate its use on the S3 generic function mean in the The function methods reports on all available methods for a given [1]code mean.Date below. mean.POSIXct mean.POSIXlt [4]generic mean.data.frame function but it does mean.default this simply by looking mean.difftime at the names.

> methods("mean")

[1]One mean.Date can also use methods mean.POSIXctto find all mean.POSIXlt available methods for a given class. In the[4] code mean.data.frame below we find mean.default all methods mean.difftime for the class glm.

One can also use methods to find all available methods for a given class. In the code below we findmethods all methods for the class glm . One canObject-Oriented also use Programmingto in R find all available81 methods for a given class. In >the methods(class code below we find = all "glm") methods for the class glm. [1] add1.glm* anova.glm [3] confint.glm* cooks.distance.glm* [5] deviance.glm* drop1.glm* [7]> effects.glm* methods(class extractAIC.glm* = "glm") [9] family.glm* formula.glm* [11] influence.glm* logLik.glm* [13] model.frame.glm predict.glm [15] print.glm residuals.glm [17] rstandard.glm rstudent.glm [19] summary.glm vcov.glm* [21] weights.glm* 39 Non-visible functions are asterisked

To retrieve the definition of a method, even those that are not exported from a name space, the function getS3method can be used, as can the more general function getAnywhere. There is no simple way to determine which S3 classes are defined nor much about those classes.

3.3.4 Details of dispatch This section provides a detailed discussion of how S3 dispatch works, and can be skipped by readers who are not interested in the inner workings of that system. As we noted above, methods are identified based solely on their names so a function named plot.Foo would be interpreted as a plot method for objects from the class Foo , whether or not that is what the author of that function intended. This can lead to problems, as different package authors may use what they believe are perfectly innocent function names, such as plot.Foo, never intending for them to be dispatched on. For this reason it is advised that you not use function names with an embedded ‘.’ unless they are intended to be S3 methods. S3 dispatch works essentially as follows. A call to the function UseMethod finds the most specific method and creates a new function call with arguments in the same order and with the same names as they were supplied to the generic function. Any local variables defined in the body of the generic function, before the call to UseMethod, are retained in the evaluation environment. Any statements in the body of the generic function after the call to UseMethod will not be evaluated as UseMethod does not return. UseMethod dispatches on the value returned by class. In the example below we redefine the Foo method for the function fun in order to demonstrate that some special variables have been installed into the evaluation environment of the method. The S3 system

# S3 generic functions and methods print # the print generic print.lm # print method for "lm" objects mod.prestige print(mod.prestige) # equivalent print.lm(mod.prestige) # equivalent, but bad form methods("print") # print methods methods(class="lm") # methods for objects of class "lm"

[1] add1.lm* alias.lm* anova.lm case.names.lm* [5] confint.lm* cooks.distance.lm* deviance.lm* dfbeta.lm* [9] dfbetas.lm* drop1.lm* dummy.coef.lm* effects.lm* [13] extractAIC.lm* family.lm* formula.lm* hatvalues.lm [17] influence.lm* kappa.lm labels.lm* logLik.lm* [21] model.frame.lm model.matrix.lm plot.lm predict.lm [25] print.lm proj.lm* residuals.lm rstandard.lm [29] rstudent.lm simulate.lm* summary.lm variable.names.lm* [33] vcov.lm* Non-visible functions are asterisked 40 So, the function is in the utils namespace, and we can execute it by addingThe such S3 a qualifier: system

> utils:::print.aspell(aspout) • Youmispelled can find the invisible functions via the functionwrds:1:15 getAnywhere() • You can see all the generic methods this way: You can see all the generic methods this way:

> methods(class="default") ...

41 9.1.4 Writing S3 Classes S3 classes have a rather cobbled-together structure. A class instance is cre- ated by forming a list, with the components of the list being the member variables of the class. (Readers who know Perl may recognize this ad hoc nature in Perl’s own OOP system.) The "class" attribute is set by hand by using the attr() or class() function, and then various implementations of generic functions are defined. We can see this in the case of lm() by inspect- ing the function:

> lm ... z <- list(coefficients = if (is.matrix(y)) matrix(,0,3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) } ... class(z) <- c(if(is.matrix(y)) "mlm", "lm") ...

Again, don’t mind the details; the basic process is there. A list was cre- ated and assigned to z, which will serve as the framework for the "lm" class instance (and which will eventually be the value returned by the function). Some components of that list, such as residuals, were already assigned when the list was created. In addition, the class attribute was set to "lm" (and possi- bly to "mlm", as will be explained in the next section). As an example of how to write an S3 class, let’s switch to something simpler. Continuing our employee example from Section 4.1, we could write this:

> j <- list(name="Joe", salary=55000, union=T) > class(j) <- "employee" > attributes(j) # let's check

212 Chapter 9 Writing S3 classes

• A class instance is created by forming a list, with the components of the list being the member variables of the class. • The “class” attribute is set by hand by using the attr() or class() function.

42 So, the function is in the utils namespace, and we can execute it by adding such a qualifier:

> utils:::print.aspell(aspout) mispelled wrds:1:15

You can see all the generic methods this way:

> methods(class="default") ...

9.1.4 Writing S3 Classes S3 classes have a rather cobbled-together structure. A class instance is cre- ated by forming a list, with the components of the list being the member variables of the class. (Readers who know Perl may recognize this ad hoc nature in Perl’sWriting own OOP system.) TheS3"class" classesattribute is set by hand by using the attr() or class() function, and then various implementations of generic functions are defined. We can see this in the case of lm() by inspect- ing the function:

> lm ... z <- list(coefficients = if (is.matrix(y)) matrix(,0,3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) } ... class(z) <- c(if(is.matrix(y)) "mlm", "lm") ...

Again, don’t mind the details; the basic process is there. A list was cre- ated and assigned to z, which will serve as the framework for the "lm" class instance (and which will eventually be the value returned by the function). Some components of that list, such as residuals, were already assigned when the list was created. In addition, the class attribute was set to "lm" (and possi- 43 bly to "mlm", as will be explained in the next section). As an example of how to write an S3 class, let’s switch to something simpler. Continuing our employee example from Section 4.1, we could write this:

> j <- list(name="Joe", salary=55000, union=T) > class(j) <- "employee" > attributes(j) # let's check

212 Chapter 9 So, the function is in the utils namespace, and we can execute it by adding such a qualifier:

> utils:::print.aspell(aspout) mispelled wrds:1:15

You can see all the generic methods this way:

> methods(class="default") ...

9.1.4 Writing S3 Classes S3 classes have a rather cobbled-together structure. A class instance is cre- ated by forming a list, with the components of the list being the member variables of the class. (Readers who know Perl may recognize this ad hoc nature in Perl’s own OOP system.) The "class" attribute is set by hand by using the attr() or class() function, and then various implementations of generic functions are defined. We can see this in the case of lm() by inspect- ing the function:

> lm ... z <- list(coefficients = if (is.matrix(y)) matrix(,0,3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) } ... class(z) <- c(if(is.matrix(y)) "mlm", "lm") ...

Again, don’t mind the details; the basic process is there. A list was cre- ated and assigned to z, which will serve as the framework for the "lm" class $names instance (and which will eventually be the value returned by the function). [1] "name" "salary" "union" Some components of that list, such as residuals, were already assigned when "lm" the list was created. In addition, the class attribute was set to $class(and possi- bly to "mlm", as will be explained in the next section). [1] "employee" As an example of how to write an S3 class, let’s switch to something simpler. Continuing our employee example from Section 4.1, we couldBefore we write a print method for this class, let’s see what happens write this: Writing S3 classeswhen we call the default print():

> j <- list(name="Joe", salary=55000, union=T) >j > class(j) <- "employee" $name [1] "Joe" > attributes(j) # let's check

$names $salary [1] "name" "salary" "union" [1] 55000

212 Chapter 9 $class $union [1] "employee" [1] TRUE

Before we write a print method for this class, let’s see whatattr(,"class") happens when we call the default print(): [1] "employee" Before we write a print >jmethod for this class, Essentially, j was treated as a list for printing purposes. $nameletʼs see what happens Now let’s write our own print method: [1] "Joe" Essentially, j was when we call the default treatedprint.employee as a list for <- function(wrkr) { $salaryprint(): printingcat(wrkr$name,"\n") purposes [1] 55000 cat("salary",wrkr$salary,"\n") cat("union member",wrkr$union,"\n") $union } 44 [1] TRUE So, any call to print() on an object of class "employee" should now be attr(,"class") referred to print.employee(). We can check that formally: [1] "employee" > methods(,"employee") [1] print.employee Essentially, j was treated as a list for printing purposes. Now let’s write our own print method: Or, of course, we can simply try it out:

print.employee <- function(wrkr) { >j cat(wrkr$name,"\n") Joe cat("salary",wrkr$salary,"\n") salary 55000 cat("union member",wrkr$union,"\n") union member TRUE }

So, any call to print() on an object of class "employee" should now be referred to print.employee(). We can check that formally:

> methods(,"employee") Object-Oriented Programming 213 [1] print.employee

Or, of course, we can simply try it out:

>j Joe salary 55000 union member TRUE

Object-Oriented Programming 213 $names [1] "name" "salary" "union"

$class [1] "employee"

Before we write a print method for this class, let’s see what happens when we call the default print():

>j $name [1] "Joe"

$salary [1] 55000

$union [1] TRUE attr(,"class") [1] "employee"Writing S3 classes

Essentially, j was treated as a list for printing purposes. Now let’s write our own print method: print.employee <- function(wrkr) { cat(wrkr$name,"\n") cat("salary",wrkr$salary,"\n") cat("union member",wrkr$union,"\n") }

So, any call to print() on an object of class "employee" should now be referred to print.employee(). We can check that formally:

> methods(,"employee") [1] print.employee

Or, of course, we can simply try it out:

>j Joe salary 55000 union member TRUE

45

Object-Oriented Programming 213 Using inheritance

The idea of inheritance is to9.1.5 form Using new Inheritance • The idea of inheritance is to form new classes as specialized versions of old classes as specialized versionsones. Inof our old previous ones. employee example, for instance, we could form a new class devoted to hourly employees, "hrlyemployee", as a subclass of "employee", •9.1.5For Usingexample, Inheritance we could formas follows: a new class Thedevoted idea of inheritance to hourly is to form employees, new classesk as <- specialized list(name="Kate", versions salary= of old 68000, union=F, hrsthismonth= 2) ones. In our previous employee example, forclass(k) instance, <- we c("hrlyemployee","employee") could form a new class“hrlyemployee”, devoted to hourly employees, as a"hrlyemployee" subclass, asof a subclass“employee”, of "employee" , as follows: Our new class has one extra variable: hrsthismonth. The name of the new as follows: class consists of two character strings, representing the new class and the k <- list(name="Kate", salary= 68000, union=F,old hrsthismonth= class. Our new 2) class inherits the methods of the old one. For instance, class(k) <- c("hrlyemployee","employee") print.employee() still works on the new class:

>k OurOur new new classclass has one extra variable: hrsthismonth. The name of the new Kate class consists of two character strings, representing the new class and the inherits the methods salary 68000 old class. Our new class inherits the methodsunion of the member old one. FALSE For instance, fromprint.employee() the old stillclass works on the new class: 46 Given the goals of inheritance, that is not surprising. However, it’s impor- >k tant to understand exactly what transpired here. Kate Once again, simply typing k resulted in the call print(k). In turn, that salary 68000 caused UseMethod() to search for a print method on the first of k’s two class union member FALSE names, "hrlyemployee". That search failed, so UseMethod() tried the other class name, "employee", and found print.employee(). It executed the latter. Given the goals of inheritance, that is not surprising.Recall However, that in inspecting it’s impor- the code for "lm", you saw this line: tant to understand exactly what transpired here. Once again, simply typing k resulted in theclass(z) call print(k) <- c(if(is.matrix(y)). In turn, that "mlm", "lm") caused UseMethod() to search for a print method on the first of k’s two class names, "hrlyemployee". That search failed, so UseMethod()You cantried now the see thatother"mlm" classis a subclass of "lm" for vector-valued response name, "employee", and found print.employee().variables. It executed the latter. Recall that in inspecting the code for "lm", you saw this line: 9.1.6 Extended Example: A Class for Storing Upper-Triangular Matrices class(z) <- c(if(is.matrix(y)) "mlm", "lm") Now it’s time for a more involved example, in which we will write an R class "ut" for upper-triangular matrices. These are square matrices whose ele- You can now see that "mlm" is a subclass ofments"lm" for below vector-valued the diagonal response are zeros, such as shown in Equation 9.1. variables. 1 5 12 06 9 (9.1) 9.1.6 Extended Example: A Class for Storing Upper-Triangular Matrices 00 2 Now it’s time for a more involved example, in whichOur we motivation will write here an R is class to save storage space (though at the expense of a "ut" for upper-triangular matrices. These arelittle square extra matrices access time)whose by ele- storing only the nonzero portion of the matrix. ments below the diagonal are zeros, such as shown in Equation 9.1. NOTE The R class "dist" also uses such storage, though in a more focused context and with- 1 5 12 out the class functions we have here. 06 9 (9.1) 00 2 Our motivation here is to214 save storageChapter 9 space (though at the expense of a little extra access time) by storing only the nonzero portion of the matrix.

NOTE The R class "dist" also uses such storage, though in a more focused context and with- out the class functions we have here.

214 Chapter 9 S3 system: group generic • The S3 object system has the capability for defining methods for groups of functions simultaneously.

These tools are mainly used to define methods for three defined sets of operators.

• This means that operators such as == or < can have their behavior modified for members of special classes. • The functions and operators have been grouped into three categories and group methods can be written for each of these categories. 47 Group generic functions

Object-Oriented Programming in R 83

Group Functions Math abs, acos, acosh, asin, asinh, atan, atanh, ceiling, cos, cosh, cumsum, exp, floor, gamma, lgamma, log, log10, round, signif, sin, sinh, tan, tanh, trunc Summary all, any, max, min, prod, range, sum Ops +, -, *, /, ^, < , >, <=, >=, !=, ==, %%, %/%, &, |, !

Table 3.1: Group generic functions.

48 Generic functions require no special markup, but must be exported if they are intended for others to use.

3.3.5 Group generics The S3 object system also has the capability for defining methods for groups of functions simultaneously. These tools are mainly used to define methods for three defined sets of operators. For several types of builtin functions, R provides a dispatching mechanism for operators. This means that operators such as == or < can have their behavior modified for members of special classes. The functions and operators have been grouped into three categories and group methods can be written for each of these categories. There is currently no mechanism to add groups. It is possible to write methods specific to any function within a group and then a method defined for a single member of group takes precedence over the group method. The three groups of operators (Table 3.1) are called Math, Summary and Ops. The online help system provides more detail, as do R Development Core Team (2007b), Chambers and Hastie (1992), Venables and Ripley (2000), and Chambers (2008). Determining which method to use for operators in the Ops group is deter- mined as follows. If both operands correspond to the same method or if one operand corresponds to a method that takes precedence over that of the other operand, then that method is used. If both operands have methods and the methods are conflicting, then the default method is used. If either operand has no corresponding method, then the method for the other operand is used. Class methods dominate group methods.

3.3.6 S3 replacement methods It is possible in R to have a complex statement as the left-hand side of an assignment, and such an assignment is referred to as a replacement function; see Section 2.7 for more details. The general idea is easily extended to generic Group generic functions

• It is possible to write methods specific to any function within a group and then a

method defined for a single member of group takes precedence over the group

method.

49 The S3 system

# S3 "inheritance" mod.mroz <- glm(lfp ~ ., family=binomial, data=Mroz) class(mod.mroz)

50 The S3 system

# Example: a logistic-regression function lreg3 <- function(X, y, predictors=colnames(X), max.iter=10, tol=1E-6, constant=TRUE) { if (!is.numeric(X) || !is.matrix(X)) # data checks stop("X must be a numeric matrix") if (!is.numeric(y) || !all(y == 0 | y == 1)) stop("y must contain only 0s and 1s") if (nrow(X) != length(y)) stop("X and y contain different numbers of observations") if (constant) { # attach constant? X <- cbind(1, X) colnames(X)[1] <- "Constant" } b <- b.last <- rep(0, ncol(X)) it <- 1 while (it <= max.iter){ p <- as.vector(1/(1 + exp(-X %*% b))) var.b <- solve(crossprod(X, p * (1 - p) * X)) b <- b + var.b %*% crossprod(X, y - p) if (max(abs(b - b.last)/(abs(b.last) + 0.01*tol)) < tol) break b.last <- b it <- it + 1 } if (it > max.iter) warning("maximum iterations exceeded") dev <- -2*sum(y*log(p) + (1 - y)*log(1 - p)) result <- list(coefficients=as.vector(b), var=var.b, deviance=dev, converged= it <= max.iter, predictors=predictors) class(result) <- "lreg3" # assign class result } 51 The S3 system

Mroz$lfp <- with(Mroz, ifelse(lfp == "yes", 1, 0)) Mroz$wc <- with(Mroz, ifelse(wc == "yes", 1, 0)) Mroz$hc <- with(Mroz, ifelse(hc == "yes", 1, 0))

mod.mroz.3 <- with(Mroz, lreg3(cbind(k5, k618, age, wc, hc, lwg, inc), lfp))

class(mod.mroz.3) mod.mroz.3 # whoops!

print.lreg3 <- function(x, ...) # print method for class "lreg3" { coef <- x$coefficients names(coef) <- x$predictors print(coef) if (!x$converged) cat("\n *** lreg did not converge ***\n") invisible(x) # note: passes through argument invisible } mod.mroz.3

52 The S3 system

summary # summary generic summary.lreg3 <- function(object, ...) # summary method for class "lreg3" { b <- object$coefficients se <- sqrt(diag(object$var)) z <- b/se table <- cbind(b, se, z, 2*(1-pnorm(abs(z)))) colnames(table) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)") rownames(table) <- object$predictors result <- list(coef=table, deviance=object$deviance, converged=object$converged) class(result) <- "summary.lreg3" # creates an object of class "summary.lreg3" result } print.summary.lreg3 <- function(x, ...) # print method for class "summary.lreg3" { printCoefmat(x$coef) cat("\nDeviance =", x$deviance,"\n") if (!x$converged) cat("\n Note: *** lreg did not converge ***\n") } summary(mod.mroz.3)

53 The S3 system

# writing a generic function names(summary(mod.prestige)) rsq <- function(model, ...) { UseMethod("rsq") } rsq.lm <- function(model, adjusted=FALSE, ...) { summary <- summary(model) if (adjusted) summary$adj.r.squared else summary$r.squared } rsq(mod.prestige) rsq(mod.prestige, adjusted=TRUE) rsq(mod.mroz) # via inheritance (doesn't work) names(summary(mod.mroz)) names(mod.mroz) rsq.glm <- function(model, ...) { 1 - model$deviance/model$null.deviance } rsq(mod.mroz) 54 9.1.5 Using Inheritance The idea of inheritance is to form new classes as specialized versions of old ones. In our previous employee example, for instance, we could form a new class devoted to hourly employees, "hrlyemployee", as a subclass of "employee", as follows:

k <- list(name="Kate", salary= 68000, union=F, hrsthismonth= 2) class(k) <- c("hrlyemployee","employee")

Our new class has one extra variable: hrsthismonth. The name of the new class consists of two character strings, representing the new class and the old class. Our new class inherits the methods of the old one. For instance, print.employee() still works on the new class:

>k Kate salary 68000 union member FALSE

Given the goals of inheritance, that is not surprising. However, it’s impor- tant to understand exactly what transpired here. Once again, simply typing k resulted in the call print(k). In turn, that caused UseMethod() to search for a print method on the first of k’s two class names, "hrlyemployee". That search failed, so UseMethod() tried the other class name, "employee", and found print.employee(). It executed the latter. S3 example:Recall that inA inspecting class the code for for "lm" storing, you saw this line: upperclass(z) triangular <- c(if(is.matrix(y)) "mlm", matrices "lm")

You can now see that "mlm" is a subclass of "lm" for vector-valued response • We willvariables. write an R class “ut” for upper triangular matrices (squared matrices whose 9.1.6 Extended Example: A Class for Storing Upper-Triangular Matrices elementsNow below it’s time forthe a more diagonal involved are example, zeros). in which we will write an R class "ut" for upper-triangular matrices. These are square matrices whose ele- • The motivationments below theis to diagonal save are storage zeros, such space. as shown in Equation 9.1. For example, the matrix 1 5 12 • 06 9 (9.1) 00 2 Our motivation here is to save storage space (though at the expense of a will be littlestored extra accessin time) > bymat storing <- only c(1,5,6,12,9,2) the nonzero portion of the matrix. NOTE The R class "dist" also uses such storage, though in a more focused context and with- out the class functions we have here. 55

214 Chapter 9 The component mat of this class will store the matrix. As mentioned, to save on storage space, only the diagonal and above-diagonal elements will be stored, in column-major order. Storage for the matrix (9.1), for instance, consists of the vector (1,5,6,12,9,2), and the component mat has that value. We will include a component ix in this class, to show where in mat the variousS3 columns example: begin. For the preceding A case,classix is c(1,2,4) for, meaning storing that column 1 begins at mat[1], column 2 begins at mat[2], and column 3 begins at mat[4]. This allows for handy access to individual elements or columns of the matrix. upper triangular matrices The following is the code for our class.

1 # class "ut", compact storage of upper-triangular matrices

2

3 # utility function, returns 1+...+i 4 sum1toi <- function(i) return(i*(i+1)/2) This function is a constructor 5

6 # create an object of class "ut" from the full matrix inmat (0s included)

7 ut <- function(inmat) {

8 n <- nrow(inmat) 9 rtrn <- list() # start to build the object vector that 10 class(rtrn) <- "ut" contains where in 11 rtrn$mat <- vector(length=sum1toi(n)) mat each column 12 rtrn$ix <- sum1toi(0:(n-1)) + 1 begins 13 for (i in 1:n) {

14 # store column i

15 ixi <- rtrn$ix[i]

16 rtrn$mat[ixi:(ixi+i-1)] <- inmat[1:i,i]

17 }

18 return(rtrn)

19 } 20 56 21 # uncompress utmat to a full matrix

22 expandut <- function(utmat) {

23 n <- length(utmat$ix) # numbers of rows and cols of matrix

24 fullmat <- matrix(nrow=n,ncol=n)

25 for (j in 1:n) {

26 # fill jth column

27 start <- utmat$ix[j]

28 fin <- start+j-1

29 abovediagj <- utmat$mat[start:fin] # above-diag part of col j

30 fullmat[,j] <- c(abovediagj,rep(0,n-j))

31 }

32 return(fullmat)

33 }

34

35 # print matrix

36 print.ut <- function(utmat)

37 print(expandut(utmat))

Object-Oriented Programming 215 The component mat of this class will store the matrix. As mentioned, to save on storage space, only the diagonal and above-diagonal elements will be stored, in column-major order. Storage for the matrix (9.1), for instance, consists of the vector (1,5,6,12,9,2), and the component mat has that value. We will include a component ix in this class, to show where in mat the various columns begin. For the preceding case, ix is c(1,2,4), meaning that column 1 begins at mat[1], column 2 begins at mat[2], and column 3 begins at mat[4]. This allows for handy access to individual elements or columns of the matrix. The following is the code for our class.

1 # class "ut", compact storage of upper-triangular matrices

2

3 # utility function, returns 1+...+i 4 sum1toi <- function(i) return(i*(i+1)/2) 5

6 # create an object of class "ut" from the full matrix inmat (0s included)

7 ut <- function(inmat) {

8 n <- nrow(inmat)

9 rtrn <- list() # start to build the object

10 class(rtrn) <- "ut"

11 rtrn$mat <- vector(length=sum1toi(n))

12 rtrn$ix <- sum1toi(0:(n-1)) + 1

13 for (i in 1:n) {

14 # store column i S315 example:ixi <- rtrn$ix[i] A class for storing 16 rtrn$mat[ixi:(ixi+i-1)] <- inmat[1:i,i]

17 }

18 upperreturn(rtrn) triangular matrices

19 }

20

21 # uncompress utmat to a full matrix

22 expandut <- function(utmat) {

23 n <- length(utmat$ix) # numbers of rows and cols of matrix

24 fullmat <- matrix(nrow=n,ncol=n)

25 for (j in 1:n) {

26 # fill jth column

27 start <- utmat$ix[j]

28 fin <- start+j-1

29 abovediagj <- utmat$mat[start:fin] # above-diag part of col j

30 fullmat[,j] <- c(abovediagj,rep(0,n-j))

31 }

32 return(fullmat)

33 }

34

35 # print matrix

36 print.ut <- function(utmat)

37 print(expandut(utmat))

Object-Oriented Programming 215 57 S3 example: A class for storing upper triangular matrices 38

39 # multiply one ut matrix by another, returning another ut instance;

40 # implement as a binary operation

41 "%mut%" <- function(utmat1,utmat2) {

42 n <- length(utmat1$ix) # numbers of rows and cols of matrix allocate space for the 43 utprod <- ut(matrix(0,nrow=n,ncol=n)) product matrix 44 for (i in 1:n) { # compute col i of product

45 # let a[j] and bj denote columns j of utmat1 and utmat2, respectively,

46 # so that, e.g. b2[1] means element 1 of column 2 of utmat2

47 # then column i of product is equal to 48 # bi[1]*a[1] + ... + bi[i]*a[i] 49 # find index of start of column i in utmat2

50 startbi <- utmat2$ix[i] 51 # initialize vector that will become bi[1]*a[1] + ... + bi[i]*a[i] 52 prodcoli <- rep(0,i) 53 for (j in 1:i) { # find bi[j]*a[j], add to prodcoli the ith columns is a linear 54 startaj <- utmat1$ix[j]

55 bielement <- utmat2$mat[startbi+j-1] combination of the 56 prodcoli[1:j] <- prodcoli[1:j] + columns of utmat2 57 bielement * utmat1$mat[startaj:(startaj+j-1)] 58 }

59 # now need to tack on the lower 0s

60 startprodcoli <- sum1toi(i-1)+1

61 utprod$mat[startbi:(startbi+i-1)] <- prodcoli

62 }

63 return(utprod) 64 } 58

Let’s test it.

> test function() { utm1 <- ut(rbind(1:2,c(0,2))) utm2 <- ut(rbind(3:2,c(0,1))) utp <- utm1 %mut% utm2 print(utm1) print(utm2) print(utp) utm1 <- ut(rbind(1:3,0:2,c(0,0,5))) utm2 <- ut(rbind(4:2,0:2,c(0,0,1))) utp <- utm1 %mut% utm2 print(utm1) print(utm2) print(utp) }

216 Chapter 9 S3 example: A class for storing upper triangular matrices product can be expressed as a linear combination of the columns of the first factor. It willproduct column help can i toof be the see expressed product a specific can as a be linear exampleexpressed combination as of a thislinear of the property,combination columns of shownof the the first columns in Equa- of factor.the first It willfactor. help to see a specific example of this property, shown in Equa- tion 9.2. tion 9.2. 123123 432432 459459 012 012 = 014 (9.2) 012  012 = 014 (9.2)  005  001 005  The comments005 say that, for001 instance, column  3 of the005 product is equal to the following:      TheThe comments third column say that, forof the instance, product column can be 3 of calculated the product as is equal to the following: 1 2 3 2 0 +2 1 +1 2       1 0 20 5 3       Inspection of Equation2 0 9.2 confirms+2 the1 relation.+1 2 Couching the multiplication problem  in terms of columns of the two input matrices enables0 us to compact0 the code and5 to likely increase the speed. The latter again stems from vectorization,  a benefit discussed Inspectionin of detail Equation in Chapter 9.2 14. confirms This approach the is relation. used in the loop beginning at Couchingline 53. the (Arguably, multiplication in this case, problem the increase in59 in speed terms comes of columns at the expense of the two input matricesof readability enables of the code.) us to compact the code and to likely increase the speed. The latter again stems from vectorization, a benefit discussed 9.1.7 Extended Example: A Procedure for Polynomial Regression in detail in Chapter 14. This approach is used in the loop beginning at As another example, consider a statistical regression setting with one pre- line 53. (Arguably,dictor variable. in this Since case, any statistical the increase model is merely in speed an approximation, comes at the in expense of readabilityprinciple, of the you code.) can get better and better models by fitting polynomials of higher and higher degrees. However, at some point, this becomes over- fitting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. 9.1.7 ExtendedThe class Example:"polyreg" aims A Procedure to deal with thisfor issue. Polynomial It fits polynomials Regression of var- As anotherious example, degrees but consider assesses fits a via statistical cross-validation regression to reduce setting the risk of with over- one pre- fitting. In this form of cross-validation, known as the leaving-one-out method, dictor variable.for each Since point we any fit thestatistical regression model to all the is data merelyexcept this an observation, approximation, and in principle, youthen wecan predict get better that observation and better from the models fit. An object by fitting of this class polynomials consists of higher andof higher outputs from degrees. the various However, regression at models, some plus point, the original this becomes data. over- The following is the code for the "polyreg" class. fitting, so that the prediction of new, future data actually deteriorates for degrees higher1 # "polyreg," than S3 some class for value. polynomial regression in one predictor variable The2 class "polyreg" aims to deal with this issue. It fits polynomials of var- 3 # polyfit(y,x,maxdeg) fits all polynomials up to degree maxdeg; y is ious degrees4 # vector but assessesfor response fits variable, via cross-validation x for predictor; creates to reduce an object the of risk of over- fitting. In5 this# class form "polyreg" of cross-validation, known as the leaving-one-out method, for each6 pointpolyfit we <- fit function(y,x,maxdeg) the regression { to all the data except this observation, and 7 # form powers of predictor variable, ith power in ith column then we predict8 pwrs <- that powers(x,maxdeg) observation # could from use the orthog fit. polys An for object greater of accuracy this class consists of outputs9 fromlmout the <- list() various # start regression to build class models, plus the original data. 10 class(lmout) <- "polyreg" # create a new class The following is the code for the "polyreg" class.

Object-Oriented Programming 219 1 # "polyreg," S3 class for polynomial regression in one predictor variable

2

3 # polyfit(y,x,maxdeg) fits all polynomials up to degree maxdeg; y is

4 # vector for response variable, x for predictor; creates an object of

5 # class "polyreg"

6 polyfit <- function(y,x,maxdeg) {

7 # form powers of predictor variable, ith power in ith column

8 pwrs <- powers(x,maxdeg) # could use orthog polys for greater accuracy

9 lmout <- list() # start to build class

10 class(lmout) <- "polyreg" # create a new class

Object-Oriented Programming 219 S3 Example: A procedure for polynomial regression

Consider a statistical regression setting with one predictor variable.

In principle, you can get better and better models by fitting polynomials of higher and higher degrees.

However, at some point, this becomes overfitting, so that the prediction of new, future data actually deteriorates for degrees higher than some value.

60 S3 Example: A procedure for polynomial regression

The class "polyreg" aims to deal with this issue. It fits polynomials of various degrees but assesses fits via cross-validation to reduce the risk of overfitting.

In this form of cross-validation, known as the leaving-one-out method, for each point we fit the regression to all the data except this observation, and then we predict that observation from the fit.

An object of this class consists of outputs from the various regression models, plus the original data.

61 product can be expressed as a linear combination of the columns of the first factor. It will help to see a specific example of this property, shown in Equa- tion 9.2.

123 432 459 012 012 = 014 (9.2)  005  001  005 The comments say that, for instance, column  3 of the product is equal to the following:

1 2 3 2 0 +2 1 +1 2  0   0   5  Inspection of Equation 9.2 confirms  the relation.   Couching the multiplication problem in terms of columns of the two input matrices enables us to compact the code and to likely increase the speed. The latter again stems from vectorization, a benefit discussed in detail in Chapter 14. This approach is used in the loop beginning at line 53. (Arguably, in this case, the increase in speed comes at the expense of readability of the code.)

9.1.7 Extended Example: A Procedure for Polynomial Regression As another example, consider a statistical regression setting with one pre- dictor variable. Since any statistical model is merely an approximation, in principle, you can get better and better models by fitting polynomials of higher and higher degrees. However, at some point, this becomes over- fitting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. The class "polyreg" aims to deal with this issue. It fits polynomials of var- ious degrees but assesses fits via cross-validation to reduce the risk of over- fitting.S3 In thisExample: form of cross-validation, known A as the procedureleaving-one-out method, for each point we fit the regression to all the data except this observation, and then we predict that observation from the fit. An object of this class consists offor outputs frompolynomial the various regression models, plus theregression original data. The following is the code for the "polyreg" class.

1 # "polyreg," S3 class for polynomial regression in one predictor variable

2

3 # polyfit(y,x,maxdeg) fits all polynomials up to degree maxdeg; y is

4 # vector for response variable, x for predictor; creates an object of

5 # class "polyreg"

6 polyfit <- function(y,x,maxdeg) {

7 # form powers of predictor variable, ith power in ith column

8 pwrs <- powers(x,maxdeg) # could use orthog polys for greater accuracy

9 lmout <- list() # start to build class

10 class(lmout) <- "polyreg" # create a new class

11 for (i in 1:maxdeg) {

12 lmo <- lm(y ~ pwrs[,1:i]) Object-Oriented Programming 219 13 # extend the lm class here, with the cross-validated predictions

14 lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F])

15 lmout[[i]] <- lmo

16 }

17 lmout$x <- x

18 lmout$y <- y

19 return(lmout) 20 } 62 21

22 # print() for an object fits of class "polyreg": print

23 # cross-validated mean-squared prediction errors

24 print.polyreg <- function(fits) {

25 maxdeg <- length(fits) - 2

26 n <- length(fits$y)

27 tbl <- matrix(nrow=maxdeg,ncol=1)

28 colnames(tbl) <- "MSPE"

29 for (i in 1:maxdeg) {

30 fi <- fits[[i]]

31 errs <- fits$y - fi$fitted.cvvalues

32 spe <- crossprod(errs,errs) # sum of squared prediction errors

33 tbl[i,1] <- spe/n

34 }

35 cat("mean squared prediction errors, by degree\n")

36 print(tbl)

37 }

38

39 # forms matrix of powers of the vector x, through degree dg

40 powers <- function(x,dg) {

41 pw <- matrix(x,nrow=length(x))

42 prod <- x

43 for (i in 2:dg) { 44 prod <- prod * x 45 pw <- cbind(pw,prod)

46 }

47 return(pw)

48 }

49

50 # finds cross-validated predicted values; could be made much faster via

51 # matrix-update methods

52 lvoneout <- function(y,xmat) {

53 n <- length(y)

54 predy <- vector(length=n)

55 for (i in 1:n) {

56 # regress, leaving out ith observation

57 lmo <- lm(y[-i] ~ xmat[-i,])

58 betahat <- as.vector(lmo$coef)

220 Chapter 9 11 for (i in 1:maxdeg) {

12 lmo <- lm(y ~ pwrs[,1:i])

13 # extend the lm class here, with the cross-validated predictions

14 lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F])

15 lmout[[i]] <- lmo

16 } S317 lmout$xExample: <- x A procedure 18 lmout$y <- y

19 return(lmout)

for20 } polynomial regression

21

22 # print() for an object fits of class "polyreg": print

23 # cross-validated mean-squared prediction errors

24 print.polyreg <- function(fits) {

25 maxdeg <- length(fits) - 2

26 n <- length(fits$y)

27 tbl <- matrix(nrow=maxdeg,ncol=1)

28 colnames(tbl) <- "MSPE"

29 for (i in 1:maxdeg) {

30 fi <- fits[[i]]

31 errs <- fits$y - fi$fitted.cvvalues

32 spe <- crossprod(errs,errs) # sum of squared prediction errors

33 tbl[i,1] <- spe/n

34 }

35 cat("mean squared prediction errors, by degree\n")

36 print(tbl)

37 } 38 63 39 # forms matrix of powers of the vector x, through degree dg

40 powers <- function(x,dg) {

41 pw <- matrix(x,nrow=length(x))

42 prod <- x

43 for (i in 2:dg) { 44 prod <- prod * x 45 pw <- cbind(pw,prod)

46 }

47 return(pw)

48 }

49

50 # finds cross-validated predicted values; could be made much faster via

51 # matrix-update methods

52 lvoneout <- function(y,xmat) {

53 n <- length(y)

54 predy <- vector(length=n)

55 for (i in 1:n) {

56 # regress, leaving out ith observation

57 lmo <- lm(y[-i] ~ xmat[-i,])

58 betahat <- as.vector(lmo$coef)

220 Chapter 9 11 for (i in 1:maxdeg) {

12 lmo <- lm(y ~ pwrs[,1:i])

13 # extend the lm class here, with the cross-validated predictions

14 lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F])

15 lmout[[i]] <- lmo

16 }

17 lmout$x <- x

18 lmout$y <- y

19 return(lmout)

20 }

21

22 # print() for an object fits of class "polyreg": print

23 # cross-validated mean-squared prediction errors

24 print.polyreg <- function(fits) {

25 maxdeg <- length(fits) - 2

26 n <- length(fits$y)

27 tbl <- matrix(nrow=maxdeg,ncol=1)

28 colnames(tbl) <- "MSPE"

29 for (i in 1:maxdeg) {

30 fi <- fits[[i]]

31 errs <- fits$y - fi$fitted.cvvalues 32S3 Example:spe <- crossprod(errs,errs) A # sumprocedure of squared prediction errors 33 tbl[i,1] <- spe/n for34 } polynomial regression 35 cat("mean squared prediction errors, by degree\n")

36 print(tbl)

37 }

38

39 # forms matrix of powers of the vector x, through degree dg

40 powers <- function(x,dg) {

41 pw <- matrix(x,nrow=length(x))

42 prod <- x

43 for (i in 2:dg) { 44 prod <- prod * x 45 pw <- cbind(pw,prod)

46 }

47 return(pw)

48 }

49

50 # finds cross-validated predicted values; could be made much faster via

51 # matrix-update methods 64 52 lvoneout <- function(y,xmat) {

53 n <- length(y)

54 predy <- vector(length=n)

55 for (i in 1:n) {

56 # regress, leaving out ith observation

57 lmo <- lm(y[-i] ~ xmat[-i,])

58 betahat <- as.vector(lmo$coef)

220 Chapter 9 11 for (i in 1:maxdeg) {

12 lmo <- lm(y ~ pwrs[,1:i])

13 # extend the lm class here, with the cross-validated predictions

14 lmo$fitted.cvvalues <- lvoneout(y,pwrs[,1:i,drop=F])

15 lmout[[i]] <- lmo

16 }

17 lmout$x <- x

18 lmout$y <- y

19 return(lmout)

20 }

21

22 # print() for an object fits of class "polyreg": print

23 # cross-validated mean-squared prediction errors

24 print.polyreg <- function(fits) {

25 maxdeg <- length(fits) - 2

26 n <- length(fits$y)

27 tbl <- matrix(nrow=maxdeg,ncol=1)

28 colnames(tbl) <- "MSPE"

29 for (i in 1:maxdeg) {

30 fi <- fits[[i]]

31 errs <- fits$y - fi$fitted.cvvalues

32 spe <- crossprod(errs,errs) # sum of squared prediction errors

33 tbl[i,1] <- spe/n

34 }

35 cat("mean squared prediction errors, by degree\n")

36 print(tbl)

37 }

38

39 # forms matrix of powers of the vector x, through degree dg

40 powers <- function(x,dg) {

41 pw <- matrix(x,nrow=length(x))

42 prod <- x

43 for (i in 2:dg) {

44 S3 Example: A procedure prod <- prod * x 45 pw <- cbind(pw,prod) 46for} polynomial regression 47 return(pw)

48 }

49

50 # finds cross-validated predicted values; could be made much faster via

51 # matrix-update methods

52 lvoneout <- function(y,xmat) {

53 n <- length(y)

54 predy <- vector(length=n)

55 for (i in 1:n) {

56 # regress, leaving out ith observation

57 lmo <- lm(y[-i] ~ xmat[-i,])

58 betahat <- as.vector(lmo$coef)

59 # the 1 accommodates the constant term 220 Chapter 9 60 predy[i] <- betahat %*% c(1,xmat[i,]) 61 }

62 return(predy)

63 }

64 65 65 # polynomial function of x, coefficients cfs

66 poly <- function(x,cfs) {

67 val <- cfs[1]

68 prod <- 1

69 dg <- length(cfs) - 1

70 for (i in 1:dg) { 71 prod <- prod * x 72 val <- val + cfs[i+1] * prod 73 }

74 }

As you can see, "polyreg" consists of polyfit(), the constructor function, and print.polyreg(), a print function tailored to this class. It also contains several utility functions to evaluate powers and polynomials and to perform cross-validation. (Note that in some cases here, efficiency has been sacrificed for clarity.) As an example of using the class, we’ll generate some artificial data and create an object of class "polyreg" from it, printing out the results.

> n <- 60 > x <- (1:n)/n > y <- vector(length=n) > for (i in 1:n) y[i] <- sin((3*pi/2)*x[i]) + x[i]^2 + rnorm(1,mean=0,sd=0.5) > dg <- 15 > (lmo <- polyfit(y,x,dg)) mean squared prediction errors, by degree MSPE [1,] 0.4200127 [2,] 0.3212241 [3,] 0.2977433 [4,] 0.2998716 [5,] 0.3102032 [6,] 0.3247325 [7,] 0.3120066 [8,] 0.3246087 [9,] 0.3463628 [10,] 0.4502341 [11,] 0.6089814 [12,] 0.4499055 [13,] NA [14,] NA [15,] NA

Object-Oriented Programming 221 59 # the 1 accommodates the constant term S360 Example:predy[i] <- betahat A %procedure*% c(1,xmat[i,]) 61 } for62 return(predy)polynomial regression 63 }

64

65 # polynomial function of x, coefficients cfs

66 poly <- function(x,cfs) {

67 val <- cfs[1]

68 prod <- 1

69 dg <- length(cfs) - 1

70 for (i in 1:dg) { 71 prod <- prod * x 72 val <- val + cfs[i+1] * prod 73 }

74 }

66 As you can see, "polyreg" consists of polyfit(), the constructor function, and print.polyreg(), a print function tailored to this class. It also contains several utility functions to evaluate powers and polynomials and to perform cross-validation. (Note that in some cases here, efficiency has been sacrificed for clarity.) As an example of using the class, we’ll generate some artificial data and create an object of class "polyreg" from it, printing out the results.

> n <- 60 > x <- (1:n)/n > y <- vector(length=n) > for (i in 1:n) y[i] <- sin((3*pi/2)*x[i]) + x[i]^2 + rnorm(1,mean=0,sd=0.5) > dg <- 15 > (lmo <- polyfit(y,x,dg)) mean squared prediction errors, by degree MSPE [1,] 0.4200127 [2,] 0.3212241 [3,] 0.2977433 [4,] 0.2998716 [5,] 0.3102032 [6,] 0.3247325 [7,] 0.3120066 [8,] 0.3246087 [9,] 0.3463628 [10,] 0.4502341 [11,] 0.6089814 [12,] 0.4499055 [13,] NA [14,] NA [15,] NA

Object-Oriented Programming 221 24 n <- length(fits$y)

25 tbl <- matrix(nrow=maxdeg,ncol=1)

26 cat("mean squared prediction errors, by degree\n")

27 colnames(tbl) <- "MSPE"

28 for (i in 1:maxdeg) {

29 fi <- fits[[i]]

30 errs <- fits$y - fi$fitted.xvvalues

31 spe <- sum(errs^2) 32S3 Example:tbl[i,1] <- spe/n A procedure 33 }

34 print(tbl) for35 } polynomial regression 36

37 # generic plot(); plots fits against raw data

38 plot.polyreg <- function(fits) {

39 plot(fits$x,fits$y,xlab="X",ylab="Y") # plot data points as background

40 maxdg <- length(fits) - 2

41 cols <- c("red","green","blue")

42 dg <- curvecount <- 1

43 while (dg < maxdg) {

44 prompt <- paste("RETURN for XV fit for degree",dg,"or type degree",

45 "or q for quit ")

46 rl <- readline(prompt)

47 dg <- if (rl == "") dg else if (rl != "q") as.integer(rl) else break

48 lines(fits$x,fits[[dg]]$fitted.values,col=cols[curvecount%%3 + 1])

49 dg <- dg + 1

50 curvecount <- curvecount + 1

51 }

52 }

53

54 # forms matrix of powers of the vector67 x, through degree dg

55 powers <- function(x,dg) {

56 pw <- matrix(x,nrow=length(x))

57 prod <- x

58 for (i in 2:dg) { 59 prod <- prod * x 60 pw <- cbind(pw,prod)

61 }

62 return(pw)

63 }

64

65 # finds cross-validated predicted values; could be made much faster via

66 # matrix-update methods

67 lvoneout <- function(y,xmat) {

68 n <- length(y)

69 predy <- vector(length=n)

70 for (i in 1:n) {

Graphics 267 S4 classes

Some programmers feel that S3 does not provide the safety normally associated with OOP.

For example, consider our earlier employee database example where our class "employee" had three fields: name, salary, and union. Here are some possible mishaps: •We forget to enter the union status. •We misspell union as onion. •We create an object of some class other than "employee" but accidentally set its class attribute to "employee".

In each of these cases, R will not complain.

The goal of S4 is to elicit a complaint and prevent such accidents.

68 example, where our class "employee" had three fields: name, salary, and union. Here are some possible mishaps: • We forget to enter the union status. • We misspell union as onion. • We create an object of some class other than "employee" but accidentally Overviewset its class attribute between to "employee" the. differences In each of these cases, R will not complain. The goal of S4 is to elicit a complaintbetween and prevent S3 such accidents.and S4 classes S4 structures are considerably richer than S3 structures, but here we present just the basics. Table 9-1 shows an overview of the differences between the two classes.

Table 9-1: Basic R Operators

Operation S3 S4

Define class Implicit in constructor code setClass() Create object Build list, set class attr new() Reference member variable $ @ Implement generic f() Define f.classname() setMethod() Declare generic UseMethod() setGeneric()

9.2.1 Writing S4 Classes

You define an S4 class by calling69 setClass(). Continuing our employee exam- ple, we could write the following:

> setClass("employee", + representation( + name="character", + salary="numeric", + union="logical") +) [1] "employee"

This defines a new class, "employee", with three member variables of the specified types. Now let’s create an instance of this class, for Joe, using new(), a built-in constructor function for S4 classes:

> joe <- new("employee",name="Joe",salary=55000,union=T) > joe An object of class "employee" Slot "name": [1] "Joe"

Object-Oriented Programming 223 example, where our class "employee" had three fields: name, salary, and union. Here are some possible mishaps: • We forget to enter the union status. • We misspell union as onion. • We create an object of some class other than "employee" but accidentally set its class attribute to "employee". In each of these cases, R will not complain. The goal of S4 is to elicit a complaint and prevent such accidents. S4 structures are considerably richer than S3 structures, but here we present just the basics. Table 9-1 shows an overview of the differences between the two classes.

Table 9-1: Basic R Operators

Operation S3 S4

Define class Implicit in constructor code setClass() Create object Build list, set class attr new() Reference member variable $ @ Implement generic f() Define f.classname() setMethod() Declare generic UseMethod() setGeneric()

9.2.1 Writing S4 Classes YouWriting define an S4 classS4 by classes calling setClass(). Continuing our employee exam- ple, we could write the following:

> setClass("employee", + representation( + name="character", + salary="numeric", + union="logical") +) [1] "employee"

This defines a new class, "employee", with three member variables of the specified types. 70 Now let’s create an instance of this class, for Joe, using new(), a built-in constructor function for S4 classes:

> joe <- new("employee",name="Joe",salary=55000,union=T) > joe An object of class "employee" Slot "name": [1] "Joe"

Object-Oriented Programming 223 example, where our class "employee" had three fields: name, salary, and union. Here are some possible mishaps: • We forget to enter the union status. • We misspell union as onion. • We create an object of some class other than "employee" but accidentally set its class attribute to "employee". In each of these cases, R will not complain. The goal of S4 is to elicit a complaint and prevent such accidents. S4 structures are considerably richer than S3 structures, but here we present just the basics. Table 9-1 shows an overview of the differences between the two classes.

Table 9-1: Basic R Operators

Operation S3 S4

Define class Implicit in constructor code setClass() Create object Build list, set class attr new() Reference member variable $ @ Implement generic f() Define f.classname() setMethod() Declare generic UseMethod() setGeneric()

9.2.1 Writing S4 Classes You define an S4 class by calling setClass(). Continuing our employee exam- ple, we could write the following:

> setClass("employee", + representation( + name="character", + salary="numeric", + union="logical") +) [1] "employee"Writing S4 classes

This defines a new class, "employee", with three member variables of the specified types. Now letʼs create an instance of this class, for Joe, using new(), a Now let’s create an instance of this class, for Joe, using new(), a built-in built-inconstructor constructor function function for S4 classes: for S4 classes:

> joe <- new("employee",name="Joe",salary=55000,union=T) > joe An object of class "employee" Slot "name": [1] "Joe" Slot "salary": [1] 55000 Object-Oriented Programming 223 Slot "union": [1] TRUE

Note that the member variables are called , referenced via the @ sym- Note that the member variables are called slots, referencedslots via the @ symbol. bol. Here’s an example: 71 > joe@salary [1] 55000

We can also use the slot() function, say, as another way to query Joe’s salary:

> slot(joe,"salary") [1] 55000

We can assign components similarly. Let’s give Joe a raise:

> joe@salary <- 65000 > joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 65000

Slot "union": [1] TRUE

Nah, he deserves a bigger raise that that:

> slot(joe,"salary") <- 88000 > joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE

224 Chapter 9 Slot "salary": [1] 55000

Slot "union":Writing S4 classes [1] TRUE

Note that the member variables are called slots, referenced via the @ sym- bol. Here’s an example:

> joe@salary [1] 55000

We can also use the slot() function, say, as another way to query Joe’s salary:

> slot(joe,"salary") [1] 55000

We can assign components similarly. Let’s give Joe a raise:

> joe@salary <- 65000 > joe An object of class "employee" Slot "name": 72 [1] "Joe"

Slot "salary": [1] 65000

Slot "union": [1] TRUE

Nah, he deserves a bigger raise that that:

> slot(joe,"salary") <- 88000 > joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE

224 Chapter 9 Slot "salary": [1] 55000

Slot "union": [1] TRUE

Note that the member variables are called slots, referenced via the @ sym- bol. Here’s an example:

> joe@salary [1] 55000

We can also use the slot() function, say, as another way to query Joe’s salary: Writing S4 classes > slot(joe,"salary") [1] 55000

We can assign components similarly. Let’s give Joe a raise:

> joe@salary <- 65000 > joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 65000

Slot "union": [1] TRUE

Nah, he deserves a bigger raise that that: 73 > slot(joe,"salary") <- 88000 > joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE

224 Chapter 9 Writing S4 classes

As noted, an advantage of using S4 is safety. To illustrate this, suppose we were to accidentally spell salary as salry, like this:

> joe@salry <- 48000 Error in checkSlotAssignment(object, name, value) : "salry" is not a slot in class "employee"

By contrast, in S3 there would be no error message. S3 classes are just lists, and you are allowed to add a new component (deliberately or not) at any time.

9.2.2 Implementing a Generic Function on an S4 Class

To define an implementation of a generic74 function on an S4 class, use setMethod(). Let’s do that for our class "employee" here. We’ll implement the show() function, which is the S4 analog of S3’s generic "print". As you know, in R, when you type the name of a variable while in inter- active mode, the value of the variable is printed out:

> joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE

Since joe is an S4 object, the action here is that show() is called. In fact, we would get the same output by typing this:

> show(joe)

Let’s override that, with the following code: setMethod("show", "employee", function(object) { inorout <- ifelse(object@union,"is","is not") cat(object@name,"has a salary of",object@salary, "and",inorout, "in the union", "\n") } )

The first argument gives the name of the generic function for which we will define a class-specific method, and the second argument gives the class name. We then define the new function.

Object-Oriented Programming 225 Implementing a generic function on an S4 class

To define an implementation of a generic function on an S4 class, use setMethod().

Letʼs do that for our class "employee" here.

Weʼll implement the show() function, which is the S4 analog of S3ʼs generic "print".

75 As noted, an advantage of using S4 is safety. To illustrate this, suppose we were to accidentally spell salary as salry, like this:

> joe@salry <- 48000 Error in checkSlotAssignment(object, name, value) : "salry" is not a slot in class "employee"

By contrast, in S3 there would be no error message. S3 classes are just lists, and you are allowed to add a new component (deliberately or not) at Implementingany time. a generic

9.2.2 Implementing a Generic Function on an S4 Class functionTo define an implementation ofon a generic an function onS4 an S4 class,class use setMethod(). Let’s do that for our class "employee" here. We’ll implement In theR, whenshow() youfunction, type the name which of isa thevariable S4 analog while in of interactive S3’s generic mode,"print" the value. of the variableAs is youprinted know, out: in R, when you type the name of a variable while in inter- active mode, the value of the variable is printed out:

> joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE

Since joe is an S4 object, the action here is that show() is called. In fact, we would get the same output by typing this:

> show(joe) 76 Let’s override that, with the following code:

setMethod("show", "employee", function(object) { inorout <- ifelse(object@union,"is","is not") cat(object@name,"has a salary of",object@salary, "and",inorout, "in the union", "\n") } )

The first argument gives the name of the generic function for which we will define a class-specific method, and the second argument gives the class name. We then define the new function.

Object-Oriented Programming 225 As noted, an advantage of using S4 is safety. To illustrate this, suppose we were to accidentally spell salary as salry, like this:

> joe@salry <- 48000 Error in checkSlotAssignment(object, name, value) : "salry" is not a slot in class "employee"

By contrast, in S3 there would be no error message. S3 classes are just lists, and you are allowed to add a new component (deliberately or not) at any time.

9.2.2 Implementing a Generic Function on an S4 Class To define an implementation of a generic function on an S4 class, use setMethod(). Let’s do that for our class "employee" here. We’ll implement the show() function, which is the S4 analog of S3’s generic "print". As you know, in R, when you type the name of a variable while in inter- active mode, the value of the variable is printed out:

> joe An object of class "employee" Slot "name": [1] "Joe"

Slot "salary": [1] 88000

Slot "union": [1] TRUE ImplementingSince joe is an S4 object, the action here a is that genericshow() is called. In fact, we would get the same output by typing this: >function show(joe) on an S4 class Let’s override that, with the following code:

setMethod("show", "employee", function(object) { inorout <- ifelse(object@union,"is","is not") cat(object@name,"has a salary of",object@salary, "and",inorout, "in the union", "\n") } )

The first argument gives the name of the generic function for which we The first argument gives the name of the generic function for which we will define a class- specificwill define method, aand class-specific the second argument method, gives and the theclass second name. We argument then define givesthe new the class function.name.Let’s We try it then out: define the new function.

> joe Object-Oriented Programming 225 Joe has a salary of 55000 and is in the union

77 9.3 S3 Versus S4 The type of class to use is the subject of some controversy among R program- mers. In essence, your view here will likely depend on your personal choice of which you value more—the convenience of S3 or the safety of S4. John Chambers, the creator of the S language and one of the central developers of R, recommends S4 over S3 in his book Software for Data Anal- ysis (Springer, 2008). He argues that S4 is needed in order to write “clear and reliable software.” On the other hand, he notes that S3 remains quite popular. Google’s R Style Guide, which you can find at http://google-styleguide .googlecode.com/svn/trunk/google-r-style.html, is interesting in this regard. Google comes down squarely on the S3 side, stating “avoid S4 objects and methods when possible.” (Of course, it’s also interesting that Google even has an R style guide in the first place!)

NOTE A nice, concrete comparison of the two methods is given in Thomas Lumley’s “Programmer’s Niche: A Simple Class, in S3 and S4,” R News, April 1, 2004, pp. 33–36.

9.4 Managing Your Objects As a typical R session progresses, you tend to accumulate a large number of objects. Various tools are available to manage them. Here, we’ll look at the following:

• The ls() function • The rm() function • The save() function • Several functions that tell you more about the structure of an object, such as class() and mode() • The exists() function

9.4.1 Listing Your Objects with the ls() Function The ls() command will list all of your current objects. A useful named argu- ment for this function is pattern, which enables wildcards. Here, you tell ls() to list only the objects whose names include a specified pattern. The follow- ing is an example.

226 Chapter 9 S4 system • The S4 system was designed to overcome some of the deficiencies of the S3 system as well as to provide

other functionality that was simply missing from the S3 system.

• Among the major changes between S3 and S4 are the explicit representation of classes, together with tools

that support programmatic inspection of the class definitions and properties.

• Multiple dispatch is supported in S4, but not in S3, and S4 methods are registered directly with the appropriate generic. 78 S4 system

• These changes greatly increase the stability of the system and make it much more likely that code will perform as intended by its authors. • This comes with some costs; code is slightly slower and it is more difcult to design and

modify a system interactively.

79 S4 system: classes

A class definition specifies the structure, inheritance and initialization of instances of that class. A class is defined by a call to the function setClass. The following arguments can be specified in the call to setClass:

Class a character string naming the class. representation a named vector of types or classes. The names correspond to the slot names in the class and the types indicate what type of value can be stored in the slot. contains a character vector of class names, indicating the classes extended or subclassed by the new class. prototype an object (usually a list) providing the default data for the slots specified in the representation. validity a function that checks the validity of instances of the class. It must return either TRUE or a character vector describing how the object is invalid. 80 S4 system: classes

• Once a class has been defined by a call to setClass, it is possible to create instances of

the class through calls to new.

• The prototype argument can be used to define default values to use for the diferent

components of the class. Prototype values can be overridden by expressly setting the value

for the slot in the call to new.

81 Object-Oriented Programming in R 85

3.4.1 Classes A class definition specifies the structure, inheritance and initialization of instances of that class. A class is defined by a call to the function setClass. Classes are instances of the classRepresentation class, and are first-class ob- jects in the language. They can be created by users and existing classes can typically be extended or subclassed. Classes can be instantiable or virtual; instances can be created for instantiable classes but not for virtual classes. The following arguments can be specified (there are others as well) in the call to setClass:

Class a character string naming the class. representation a named vector of types or classes. The names correspond to the slot names in the class and the types indicate what type of value can be stored in the slot. contains a character vector of class names, indicating the classes extended or subclassed by the new class. prototype an object (usually a list) providing the default data for the slots specified in the representation. validity a function that checks the validity of instances of the class. It must return either TRUE or a character vector describing how the object is invalid.

Once a class has been defined by a call to setClass, it is possible to create instances of the class through calls to new. The prototype argument can be used to define default values to use for the different components of the class. Prototype values can be overridden by expressly setting the value for the slot in the call to new. Example In the code below, we create a new class named A that has a single slot, s1, that contains numeric data and we set the prototype for that slot to be 0.

> setClass("A", representation(s1 = "numeric"), + prototype = prototype(s1 = 0))

[1] "A"

> myA = new("A") > myA

An object of class "A" Slot "s1": [1] 0

82 Example

86 R Programming for Bioinformatics

> m2 = new("A", s1 = 10) > m2

An object of class "A" Slot "s1": [1] 10

We can create a second class B that contains A, so that B is a direct subclass of A or, put another way, B inherits from class A. Any instance of the class B will have all the slots in the A class and any additional ones defined specifically for B. Duplicate slot names are not allowed, so the slot names for B must be distinct from those83 for A.

> setClass("B", contains = "A", representation(s2 = "character"), + prototype = list(s2 = "hi"))

[1] "B"

> myB = new("B") > myB

An object of class "B" Slot "s2": [1] "hi"

Slot "s1": [1] 0

Classes can be removed using the function removeClass. However, this is not especially useful since you cannot remove classes from attached packages. The removeClass is most useful when experimenting with class creation in- teractively. But in most cases, users are developing classes within packages, and the simple expedient of removing the class definition and rebuilding the package is generally used instead. We demonstrate the use of this function on a user-defined class in the code below.

> setClass("Ohno", representation(y = "numeric"))

[1] "Ohno"

> getClass("Ohno") S4 system: classes

• We can create a second class B that contains A, so that B is a direct subclass of A or, B

inherits from class A.

• Any instance of the class B will have all the slots in the A class and any additional ones

defined specifically for B . Duplicate slot

names are not allowed, so the slot names for

B must be distinct from those for A.

84 86 R Programming for Bioinformatics

> m2 = new("A", s1 = 10) > m2

An object of class "A" Slot "s1": [1] 10

We can create a second class B that contains A, so that B is a direct subclass of A or, put another way, B inherits from class A. Any instance of the class B will have all the slots in the A class and any additional ones defined specifically for B.Example Duplicate slot names are not allowed, so the slot names for B must be distinct from those for A.

> setClass("B", contains = "A", representation(s2 = "character"), + prototype = list(s2 = "hi"))

[1] "B"

> myB = new("B") > myB

An object of class "B" Slot "s2": [1] "hi"

Slot "s1": [1] 0

85 Classes can be removed using the function removeClass. However, this is not especially useful since you cannot remove classes from attached packages. The removeClass is most useful when experimenting with class creation in- teractively. But in most cases, users are developing classes within packages, and the simple expedient of removing the class definition and rebuilding the package is generally used instead. We demonstrate the use of this function on a user-defined class in the code below.

> setClass("Ohno", representation(y = "numeric"))

[1] "Ohno"

> getClass("Ohno") S4 system: classes

• Classes can be removed using the function removeClass. However, this is not especially useful since you cannot remove classes from attached

packages.

• The removeClass is most useful when experimenting with class creation interactively.

86 86 R Programming for Bioinformatics

> m2 = new("A", s1 = 10) > m2

An object of class "A" Slot "s1": [1] 10

We can create a second class B that contains A, so that B is a direct subclass of A or, put another way, B inherits from class A. Any instance of the class B will have all the slots in the A class and any additional ones defined specifically for B. Duplicate slot names are not allowed, so the slot names for B must be distinct from those for A.

> setClass("B", contains = "A", representation(s2 = "character"), + prototype = list(s2 = "hi"))

[1] "B"

> myB = new("B") > myB

An object of class "B" Slot "s2": [1] "hi"

Slot "s1": [1] 0

Classes can be removed using the function removeClass. However, this is not especially useful since you cannot remove classes from attached packages. The removeClass is most useful when experimenting with class creation in- teractively. But in most cases, users are developing classes within packages, and the simple expedient of removing the class definition and rebuilding the package is generally used instead.Example We demonstrate the use of this function on a user-defined class in the code below.

> setClass("Ohno", representation(y = "numeric")) Object-Oriented Programming in R 87 [1] "Ohno"

>Slots: getClass("Ohno")

Name: y Class: numeric

> removeClass("Ohno")

[1] TRUE

> tryCatch(getClass("Ohno"), error = function(x) "Ohno is gone")

[1] "Ohno is gone" 87

3.4.1.1 Introspection Once a class has been defined, there are a number of software tools that can be used to find out about that class. These include getSlots that will report the slot names and types, the function slotNames that will report only the slot names. These functions are demonstrated using the class A defined above.

> getSlots("A")

s1 "numeric"

> slotNames("A")

[1] "s1"

The class itself can be retrieved using getClass.The function extends can be called with either the name of a single class, or two class names. If called with two class names, it returns TRUE if its first argument is a subclass of its second argument. If called with a single class name, it returns the names of all subclasses, including the class itself. However, this is slightly confusing and additional helper functions have been defined in the RBioinf package, superClassNames and subClassNames, to print the names of the superclasses and of the subclasses, respectively. The use of these functions is shown in the code below.

> extends("B")

[1] "B" "A" S4 system: classes

• Once a class has been defined, there are a number of software tools that can be used to find out about that class. • These include:

• getSlots that will report the slot names and types,

• the function slotNames that will report only the slot names.

88 Object-Oriented Programming in R 87

Slots:

Name: y Class: numeric

> removeClass("Ohno")

[1] TRUE

> tryCatch(getClass("Ohno"), error = function(x) "Ohno is gone")

[1] "Ohno is gone"

3.4.1.1 Introspection Once a class has beenExample defined, there are a number of software tools that can be used to find out about that class. These include getSlots that will report the slot names and types, the function slotNames that will report only the slot names. These functions are demonstrated using the class A defined above.

> getSlots("A")

s1 "numeric"

> slotNames("A")

[1] "s1"

The class itself can be retrieved using getClass.The function extends can be called with either the name of a single class, or two class names. If called with two class names, it returns TRUE if its first argument is a subclass of its second argument. If called with a single89 class name, it returns the names of all subclasses, including the class itself. However, this is slightly confusing and additional helper functions have been defined in the RBioinf package, superClassNames and subClassNames, to print the names of the superclasses and of the subclasses, respectively. The use of these functions is shown in the code below.

> extends("B")

[1] "B" "A" S4 system: classes

• The class itself can be retrieved using getClass.

• The function extends can be called with either the name of a single class, or two class names.

• If called with two class names, it returns TRUE if its first argument is a subclass of its second argument.

• If called with a single class name, it returns the names of all subclasses, including the class itself.

• Additional helper functions have been defined in the RBioinf package, superClassNames and subClassNames, to print the names of

the superclasses and of the subclasses, respectively. 90 Object-Oriented Programming in R 87

Slots:

Name: y Class: numeric

> removeClass("Ohno")

[1] TRUE

> tryCatch(getClass("Ohno"), error = function(x) "Ohno is gone")

[1] "Ohno is gone"

3.4.1.1 Introspection Once a class has been defined, there are a number of software tools that can be used to find out about that class. These include getSlots that will report the slot names and types, the function slotNames that will report only the slot names. These functions are demonstrated using the class A defined above.

> getSlots("A")

s1 "numeric"

> slotNames("A")

[1] "s1"

The class itself can be retrieved using getClass.The function extends can be called with either the name of a single class, or two class names. If called with two class names, it returns TRUE if its first argument is a subclass of its second argument. If called with a single class name, it returns the names of all subclasses, including the class itself. However, this is slightly confusing and additional helper functions have been defined in the RBioinf package, superClassNames and subClassNames, to print the names of the superclasses and of the subclasses, respectively. The use of these functions is shown in the code below. Example

> extends("B") 88 R Programming for Bioinformatics [1] "B" "A" > extends("B", "A")

[1] TRUE

> extends("A", "B")

[1] FALSE

> superClassNames("B")

[1] "A"

> subClassNames("A")

[1] "B"

These functions also provide information91 about builtin classes that have been converted via setOldClass.

> getClass("matrix")

No Slots, prototype of class "matrix"

Extends: Class "array", directly Class "structure", by class "array", distance 2 Class "vector", by class "array", distance 3, with explicit co erce

Known Subclasses: Class "array", directly, with explicit test and coerce

> extends("matrix")

[1] "matrix" "array" "structure" "vector"

To determine whether or not a class has been defined, use isClass. You can test whether or not an R object is an instance of an S4 class using isS4. All S4 objects should also return TRUE for is.object, but so will any object with a class attribute. 88 R Programming for Bioinformatics 88 R Programming for Bioinformatics >88 extends("B", "A")R Programming for Bioinformatics > extends("B", "A") [1] TRUE > extends("B", "A") [1] TRUE > extends("A", "B") [1] TRUE > extends("A", "B") [1] FALSE > extends("A", "B") [1] FALSE > superClassNames("B") [1] FALSE > superClassNames("B") [1] "A" > superClassNames("B") [1] "A" > subClassNames("A") [1] "A" > subClassNames("A") [1] "B" > subClassNames("A") [1] "B" [1] "B" S4 system: classes These functions also provide information about builtin classes that have beenThese converted functions via setOldClass also provide. information about builtin classes that have beenThese converted functions via alsosetOldClass provide. information about builtin classes that have beenThese converted functions via setOldClass also .provide information about built-in classes that > getClass("matrix") have been converted via setOldClass. > getClass("matrix") > getClass("matrix") No Slots, prototype of class "matrix" No Slots, prototype of class "matrix" Extends: No Slots, prototype of class "matrix" ClassExtends: "array", directly Class "structure", by class "array", distance 2 Extends:Class "array", directly ClassClass "vector", "structure", by class by class "array", "array", distance distance 3, with 2 explicit co erceClass "array", directly ClassClass "structure", "vector", by by class class "array", "array", distance distance 3, 2 with explicit co erce KnownClass Subclasses:"vector", by class "array", distance 3, with explicit co erce ClassKnown "array", Subclasses: directly, with explicit test and coerce Class "array", directly, with explicit test and coerce Known Subclasses: Class "array", directly, with explicit test and coerce > extends("matrix") > extends("matrix") [1] "matrix" "array" "structure" "vector" >[1] extends("matrix") "matrix" "array" "structure" "vector" [1] "matrix" "array" "structure" "vector" To determine whether or not a class has been defined, use isClass. You 92 canTo test determine whether whether or not an or R not object a class is an has instance been defined, of an S4 use classisClass using.isS4 You. Allcan S4 test objects whether should or not also an return R objectTRUE isfor an instanceis.object of, butan S4 so class will anyusing objectisS4. isClass withAllTo S4 a determineclass objectsattribute. should whether also or return not aTRUE classfor hasis.object been defined,, but useso will any. object You canwith test a class whetherattribute. or not an R object is an instance of an S4 class using isS4. All S4 objects should also return TRUE for is.object, but so will any object with a class attribute. S4 system: classes

• To determine whether or not a class has been defined, use isClass.

• Yo u can test whether or not an R object is an instance of an S4 class using isS4.

• All S4 objects should also return TRUE for is.object, but so will any object with a class attribute.

93 S4 system: classes

• The standard mechanism for coercing objects from one class to another is the function as, which has two forms.

• One form is coercion where an instance of one class is coerced to the other class, and

• the second form is an assignment version, where a portion of the object supplied is coerced.

• The second form is really only applicable to situations where one class is a subclass of the other.

94 Object-Oriented Programming in R 89

3.4.1.2 Coercion The standard mechanism for coercing objects from one class to another is the function as, which has two forms. One form is coercion where an instance of one class is coerced to the other class, and the second form is an assignment version, where a portion of the object supplied is coerced. The second form is really only applicable toExample situations where one class is a subclass of the other. In the example below, we first create an instance of B, then coerce it to be an instance of A. The method for this is automatically available since the classes are nested, and in fact you can also coerce from the superclass to the subclass, with missing slots being filled in from the prototype.

> myb = new("B") > as(myb, "A")

An object of class "A" Slot "s1": [1] 0

The second form is the assignment form where we replace the A part of myb with the new values in mya.

> mya = new("A", s1 = 20) 95 > as(myb, "A") <- mya > myb

An object of class "B" Slot "s2": [1] "hi"

Slot "s1": [1] 20

When classes are not nested, the user must provide an explicit version of the coercion function, and optionally of the replacement function. The syntax is setAs(from, to, def, replace), where the from and to are the names of the classes between which coercion is being defined. The coercion function is supplied as the argument def and it must be a function of one argument, an instance of the from class and return an instance of the to class. In the example below we show the call to setAs that defines the coercion between the graphAM class, from the graph package, and the matrix class. The graphAM class is a class that represents a graph in terms of an adjacency Object-Oriented Programming in R 89

3.4.1.2 Coercion The standard mechanism for coercing objects from one class to another is the function as, which has two forms. One form is coercion where an instance of one class is coerced to the other class, and the second form is an assignment version, where a portion of the object supplied is coerced. The second form is really only applicable to situations where one class is a subclass of the other. In the example below, we first create an instance of B, then coerce it to be an instance of A. The method for this is automatically available since the classes are nested, and in fact you can also coerce from the superclass to the subclass, with missing slots being filled in from the prototype.

> myb = new("B") > as(myb, "A")

An object of class "A" Slot "s1": [1] 0 Example The second form is the assignment form where we replace the A part of myb with the new values in mya.

> mya = new("A", s1 = 20) > as(myb, "A") <- mya > myb

An object of class "B" Slot "s2": [1] "hi"

Slot "s1": [1] 20

When classes are not nested, the user must provide an explicit version of the coercion function, and optionally of the replacement function. The syntax 96 is setAs(from, to, def, replace), where the from and to are the names of the classes between which coercion is being defined. The coercion function is supplied as the argument def and it must be a function of one argument, an instance of the from class and return an instance of the to class. In the example below we show the call to setAs that defines the coercion between the graphAM class, from the graph package, and the matrix class. The graphAM class is a class that represents a graph in terms of an adjacency S4 system: classes

• When classes are not nested, the user must provide an explicit version of the coercion function, and optionally of the replacement function.

• The syntax is setAs(from, to, def, replace), where the from and to are the names of the classes between which coercion is being defined.

• The coercion function is supplied as the argument def and it must be a function of one argument, an instance of the from class and return an instance of

the to class. 97 S4 system: classes

• Once a class has been defined, users will want to create instances of that class. The creation of instances

is controlled by three separate but related tools: • the specification of a prototype for the class, • the creation of an initialize method, or • through values supplied in the call to new.

98 90 R Programming for Bioinformatics matrix, so the coercion is quite straightforward. The coercion in the other direction is more complicated.

> setAs(from = "graphAM", to = "matrix", function(from) { if ("weight" %in% names(edgeDataDefaults(from))) { tm <- t(from@adjMat) tm[tm != 0] <- unlist(edgeData(from, attr = "weight")) m <- t(tm) } else { m <- from@adjMat } rownames(m) <- colnames(m) m })

Calls to setAs install a method, constructed from the supplied function, on the generic function coerce. You can view the available methods using showMethods.

3.4.1.3 Creation of new instances Once a class has been defined, users will want to create instances of that class. The creation of instances is controlled by three separate but related tools: the specification of a prototype for the class, the creation of an initialize method, or through values supplied in the call to new. It is essential that the value returned by the initialize method is a valid object of the class being initialized and in general this is a suitably transformed version of the .Object parameter. Alternatively, and for complex objects, or large objects, we rec- ommend creating your own constructor function since calls to new tend to be somewhat fragile and can be inefficient. When a call to new is made, the following procedure is used. First the class prototype is used to create an initial instance; that prototype is then passed to the initialize method hierarchy. Provided any user-supplied initialize methods have a call to callNextMethod, this hierarchy will be traversed until the default method is encountered. In this method the value is modified according to the arguments supplied to new and the result is returned. The prototype can be setExample using either a list or a call to prototype. In the example below, we define a class, Ex1 , whose prototype has a random sample of values from the N(0, 1) distribution in its s1 slot.

> setClass("Ex1",Object-Oriented representation(s1 Programming = "numeric"), in R 91 prototype = prototype(s1 = rnorm(10))) [1] "Ex1"

> b = new("Ex1") >b

An object of class "Ex1" Slot "s1": [1] -1.3730 -0.5483 0.2648 0.0487 1.4423 0.0283 1.1793 [8] -1.6695 -0.0536 0.0729

Exercise 3.6

What happens if you generate a second99 instance of the Ex1 class? Why might this not be desirable? Examine the prototype for the class and see if you can understand what has happened. Will changing the prototype to list(s1=quote(rnorm(10))) fix the problem?

When a subclass, such as B from our previous example, is defined, then a prototype is constructed from the prototypes of the superclasses for slots that are not specified in the prototype for the subclass. We see, below, that the prototype for B has a value for the s1 slot, even though none was formally supplied, and that value is the one for the superclass A.

> bb = getClass("B") > bb@prototype

attr(,"s2") [1] "hi" attr(,"s1") [1] 0

If desired, one can define an initialize method for a class. The default initialize method takes either named arguments, where the names are those of slots, or one or more unnamed arguments that correspond to instances of any superclass. It is an error to have more than one instance of any superclass or to have the same named argument twice. In constructing the object, the procedure is to first use all values corresponding to superclasses and then the named arguments are applied. Thus, named arguments take precedence. In the example below, we define two new classes, one a simple class, W , and then a class that is a subclass of both A, defined earlier, and W . When creating new instances of W and A, we made use of named arguments to the initialize method, but when creating a new instance of the WA class, we used the unnamed variant and supplied instances of the superclasses. Object-Oriented Programming in R 91

[1] "Ex1"

> b = new("Ex1") >b

An object of class "Ex1" Slot "s1": [1] -1.3730 -0.5483 0.2648 0.0487 1.4423 0.0283 1.1793 [8] -1.6695 -0.0536 0.0729

Exercise 3.6 What happens if you generate a second instance of the Ex1 class? Why might this not be desirable? Examine the prototype for the class and see if you can understand what has happened. Will changing the prototype to list(s1=quote(rnorm(10))) fix the problem? When a subclass, such asExampleB from our previous example, is defined, then a prototype is constructed from the prototypes of the superclasses for slots that are not specified in the prototype for the subclass. We see, below, that the prototype for B has a value for the s1 slot, even though none was formally supplied, and that value is the one for the superclass A.

> bb = getClass("B") > bb@prototype

attr(,"s2") [1] "hi" attr(,"s1") [1] 0

If desired, one can define an initialize method for a class. The default initialize method takes either named arguments, where the names are those of slots, or one or more unnamed arguments100 that correspond to instances of any superclass. It is an error to have more than one instance of any superclass or to have the same named argument twice. In constructing the object, the procedure is to first use all values corresponding to superclasses and then the named arguments are applied. Thus, named arguments take precedence. In the example below, we define two new classes, one a simple class, W , and then a class that is a subclass of both A, defined earlier, and W . When creating new instances of W and A, we made use of named arguments to the initialize method, but when creating a new instance of the WA class, we used the unnamed variant and supplied instances of the superclasses. Example In the example below, we define two new classes, one a simple class, W, and then a class that is a subclass of both A, defined earlier, and W . When creating new instances of W and A, we made use of named arguments to the initialize method, but when creating a new instance of the WA class, we used the unnamed variant and supplied instances of the superclasses. 92 R Programming for Bioinformatics

> setClass("W", representation(c1 = "character"))

[1] "W"

> setClass("WA", contains = (c("A", "W")))

[1] "WA"

> a1 = new("A", s1 = 20) > w1 = new("W", c1 = "hi") > new("WA", a1, w1)

An object of class "WA" Slot "s1": [1] 20

Slot "c1": [1] "hi"

101 In the next example we define an initialize method that takes a value for one of the slots and computes the value for the other, depending on the value of the supplied argument. While we named the formal argument to the initialize method b1, that was not necessary and any other name will work. However, we find it helpful to use the slot name if the intention is that the value corresponds to a slot. The user-supplied initialize method overrides the default method, and you can no longer use the slot names, or an instance of a subclass, in the call to new.

> setClass("XX", representation(a1 = "numeric", b1 = "character"), prototype(a1 = 8, b1 = "hi there"))

[1] "XX"

> new("XX")

An object of class "XX" Slot "a1": [1] 8

Slot "b1": [1] "hi there" Types of classes

• A class can be instantiable or virtual. • Direct instances of virtual classes cannot be created. • One can test whether or not a class is virtual using isVirtualClass().

102 100 R Programming for Bioinformatics

3.4.5 Accessor functions Accessing slots directly using the @ operator relies on the implementation details of the class, and such access will make it very difficult to change that implementation. In many cases it will be advantageous to provide accessor functions for some, or all, of the components of an object. Suppose that the class Foo has a slot named a. To create an accessor function for this slot, we create a generic function named a and a method for instances of the class Foo.

> setClass("Foo", representation(a = "ANY"))

[1] "Foo"

> setGeneric("a", function(object) standardGeneric("a"))

[1] "a"

> setMethod("a", "Foo", function(object) object@a)

[1] "a"

> b = new("Foo", a = 10) >Using a(b) S3 classes with S4 [1] 10 classes S3 classes can be used to describe the contents of a slot in an S4 class, and they can be 3.4.6used for dispatch Using in S3S4 methods classes by withfirst creating S4 classes an S4 virtualization of the class. This is doneS3 with classes a call can to setOldClass be used to, and describe many such the classes contents are ofcreated a slot when in the an methods S4 class, andpackage they is attached. can be usedThe resulting for dispatch S4 classes in are S4 virtual methods classes, by so first that creatinginstances an S4 virtualizationcannot be created of directly. the class. All classes This iscreated done by with a call a callto inherit to setOldClass from the class, and many sucholdClass classes . are created when the methods package is attached.

> setOldClass("mymatrix") > getClass("mymatrix")

Virtual Class

No Slots, prototype of class "S4"

Extends: "oldClass"

The resulting S4 classes are virtual103 classes, so that instances cannot be created directly; instead, you create instances, as for other S3 classes, by S4 generic functions and methods

• Generic functions are created by calls to setGeneric and, once created, methods can be associated with

them through calls to setMethod. • The arguments of the method must conform, to some extent, with those of the generic function. The method definition indicates the class of each of the formal arguments and this is called the signature of the

method. There can be, at most, one method with any

signature.

104 S4 generic functions and methods

• In most cases the call to setGeneric will follow a very simple pattern. There are a number of arguments that can be specified

when calling setGeneric:

• the name argument specifies the name of the generic function

• the def argument provides the definition for the generic function.

105 S4 generic functions and methods • In almost all cases the body of the function supplied as the def argument will be a call to

standardGeneric since this function is used to: • dispatch to methods based on the supplied arguments to the generic function and • it also establishes a default method that will be used if no function with matching signature

is found.

106 setMethod()

setMethod(f, signature=character(), definition, where = topenv(parent.frame()), valueClass = NULL, sealed = FALSE) f A generic function or the character-string name of the function. signature A match of formal argument names for f with the character-string names of corresponding classes. See the details below; however, if the signature is not trivial, you should use method.skeleton to generate a valid call to setMethod. definition A function definition, which will become the method called when the arguments in a call to f match the classes in signature, directly or through inheritance. where the environment in which to store the definition of the method. For setMethod, it is recommended to omit this argument and to include the call in source code that is evaluated at the top level; that is, either in an R session by something equivalent to a call to source, or as part of the R source code for a package. For removeMethod, the default is the location of the (first) instance of the method for this signature. valueClass Obsolete and unused, but see the same argument for setGeneric. sealed If TRUE, the method so defined cannot be redefined by another call to setMethod (although it can be removed and then re-assigned).

107 102 R Programming for Bioinformatics generic function and it also establishes a default method that will be used if no function with matching signature is found. The syntax is quite straightforward. The def argument is a function, each named argument can be dispatchedExample on, and the ... argument should be used if other arguments to the generic will be permitted. These arguments cannot be dispatched on, however. So in the code below, the generic function has two named arguments, object and x, and methods can be defined that indicate different signatures for these two arguments.

> setGeneric("foo", function(object, x) standardGeneric("foo"))

[1] "foo"

> setMethod("foo", signature("numeric", "character"), function(object, x) print("Hi, I m method one"))

[1] "foo"

Exercise 3.9 Define another method for the generic function foo defined above, with a different signature. Test that the correct method is dispatched to for different arguments. 108

Any argument passed through the ... argument cannot be dispatched on. It is possible to have named arguments that are not part of the signature of the generic function. This is achieved by explicitly stating the signature for the generic function using the signature argument in the call to setGeneric, as is demonstrated below. In that case it may make sense for a method to provide default values for the arguments not in the signature.

> setGeneric("genSig", signature = c("x"), function(x, y = 1) standardGeneric("genSig"))

[1] "genSig"

> setMethod("genSig", signature("numeric"), function(x, y = 20) print(y))

[1] "genSig"

> genSig(10)

[1] 20 S4 generic functions and methods

• Any argument passed through the . . . argument cannot be dispatched on. • It is possible to have named arguments that are not part of the signature of the generic function. This is

achieved by explicitly stating the signature for the

generic function using the signature argument in the

call to setGeneric.

109 102 R Programming for Bioinformatics generic function and it also establishes a default method that will be used if no function with matching signature is found. The syntax is quite straightforward. The def argument is a function, each named argument can be dispatched on, and the ... argument should be used if other arguments to the generic will be permitted. These arguments cannot be dispatched on, however. So in the code below, the generic function has two named arguments, object and x, and methods can be defined that indicate different signatures for these two arguments.

> setGeneric("foo", function(object, x) standardGeneric("foo"))

[1] "foo"

> setMethod("foo", signature("numeric", "character"), function(object, x) print("Hi, I m method one"))

[1] "foo"

Exercise 3.9 Define another method for the generic function foo defined above, with a different signature. Test that the correct method is dispatched to for different arguments.

Any argument passed through the ... argument cannot be dispatched on. It is possible to have named arguments that are not part of the signature of the generic function. ThisExample is achieved by explicitly stating the signature for the generic function using the signature argument in the call to setGeneric, as is demonstrated below. In that case it may make sense for a method to provide default values for the arguments not in the signature.

> setGeneric("genSig", signature = c("x"), function(x, y = 1) standardGeneric("genSig"))

[1] "genSig"

> setMethod("genSig", signature("numeric"), function(x, y = 20) print(y))

[1] "genSig"

> genSig(10)

[1] 20

110 S4 generic functions and methods • Whether or not a function is a generic function can be determined using isGeneric.

• Generic functions can be removed using removeGeneric, but this is not too useful since only generic functions defined in the user’s

workspace are easily removed. • To find all generic functions that are defined, and the packages that they are defined in, use the

function getGenerics, with no arguments.

111 Example

> getClass("ObjectsWithPackage") Class "ObjectsWithPackage" [package "methods"]

104Slots: R Programming for Bioinformatics Name: .Data package Class: character character generic functions defined in that package. In the example below, we load the BiobaseExtends: package and then try to find all generic functions that are defined Class "character", from data part inClass it. "vector", by class "character", distance 2 Class "data.frameRowLabels", by class "character", distance 2 Class "characterORMIAME", by class "character", distance 2

> library("Biobase") > allG = getGenerics() > allGs = split([email protected], allG@package) > allGBB = allGs[["Biobase"]] > length(allGBB)

[1] 78 112

Next we use the where argument to only get generic functions defined in Biobase. But we see that there are more generic functions reported than above. This is because in using this approach, we are getting all generic functions that have a method defined for them in the package, not all generic functions defined in the package. If we restrict these generic functions to those whose package description is Biobase, then we get the same answer as above.

> allGbb = getGenerics("package:Biobase") > length(allGbb)

[1] 90

3.4.7.1 Evaluation model for generic functions

When the generic function is invoked, the supplied arguments are matched to the arguments of the generic function; those that correspond to arguments in the signature of the generic are evaluated. This eager evaluation of ar- guments in the signature is a substantial change from the lazy evaluation semantics that are used for standard function invocation. Once evaluation of the generic function begins, all methods registered with the generic function are inspected and the applicable methods are determined. A method is applicable if for all arguments in its signature, the class speci- fied in the method either matches the class of the supplied argument or is a superclass of the class of the supplied argument. The applicable methods are ordered from most specific to least specific. Dispatch is entirely determined by the signature and the registered methods at the time evaluation of the generic function begins. Evaluation model for generic functions

• When the generic function is invoked, the supplied arguments are matched to the arguments of the generic function; those that correspond to

arguments in the signature of the generic are evaluated. • Once evaluation of the generic function begins, all methods registered with the generic function are inspected and the applicable methods are

determined.

113 Evaluation model for generic functions • A method is applicable if for all arguments in its signature, the class specified in the method either matches the class of the

supplied argument or is a superclass of the class of the supplied argument.

• The applicable methods are ordered from most specific to least specific. Dispatch is

entirely determined by the signature and the registered methods at the time evaluation of

the generic function begins.

114 The syntax of method declaration • Methods are declared and assigned to generic functions through calls to setMethod.

• They can be removed through a call to either removeMethod or removeMethods. • The method should have one argument matching each argument in the signature of the generic function.

• These arguments can correspond to any defined class or they can be either of the two special classes: ANY

and missing . 115 The syntax of method declaration

• Use ANY if the method will accept any value for that argument.

• The class missing is appropriate when the method will handle some, but not all, of the

arguments in the signature of the generic.

116 The syntax of method declaration

• When . . . is an argument to the generic function, you can define methods with named arguments that

will be handled by the . . . argument to the generic function. But some care is needed because these

arguments, in some sense, do not count. • There can be only one method, with any given signature (set of classes defined for the formal

arguments to the generic), regardless of whether or

not other argument names match.

117 Object-Oriented Programming in R 105

3.4.8 The syntax of method declaration Methods are declared and assigned to generic functions through calls to setMethod. They can be removed through a call to either removeMethod or removeMethods. The method should have one argument matching each argu- ment in the signature of the generic function. These arguments can correspond to any defined class or they can be either of the two special classes ANY and missing. Use ANY if the method will accept any value for that argument. The class missing is appropriate when the method will handle some, but not all, of the arguments in the signature of the generic. Exercise 3.10 Write different methods for the generic function foo defined above, that make use of ANY, and missing in the signature. Test these methods to be sure they behave as you expect. When ... is an argument to the generic function, you can define methods with named arguments that will be handled by the ...argument to the generic function. But some care is needed because these arguments, in some sense, do not count. There can be only one method, with any given signature (set of classes defined for the formal arguments to the generic), regardless of whether or not other argument namesExample match.

> setGeneric("bar", function(x, y, ...) standardGeneric("bar")) [1] "bar" > setMethod("bar", signature("numeric", "numeric"), function(x, y, d) print("Method1")) [1] "bar" > ##removes the method above > setMethod("bar", signature("numeric", "numeric"), function(x, y, z) print("Method2")) [1] "bar" > bar(1,1,z=20) [1] "Method2" > bar(2,2,30) [1] "Method2" > tryCatch(bar(2,4,d=20), error=function(e) print("no method1"))118 [1] "no method1" 102 R Programming for Bioinformatics generic function and it also establishes a default method that will be used if no function with matching signature is found. The syntax is quite straightforward. The def argument is a function, each named argument can be dispatched on, and the ... argument should be used if other arguments to the generic will be permitted. These arguments cannot be dispatched on, however. So in the code below, the generic function has two named arguments, object and x, and methods can be defined that indicate different signatures for theseExample two arguments.

> setGeneric("foo", function(object, x) standardGeneric("foo"))

[1] "foo"

> setMethod("foo", signature("numeric", "character"), function(object, x) print("Hi, I m method one"))

[1] "foo"

> foo(5,"l") Exercise[1] "Hi, I m 3.9 method one" Define another method for the generic function foo defined above, with a >di fffoo(5,3)erent signature. Test that the correct method is dispatched to for different Errorarguments. en function (classes, fdef, mtable) : unable to find an inherited methodAny argument for function passed "foo", through for signature the ... argument"numeric","numeric" cannot be dispatched on. It is possible to have named arguments119 that are not part of the signature of the generic function. This is achieved by explicitly stating the signature for the generic function using the signature argument in the call to setGeneric, as is demonstrated below. In that case it may make sense for a method to provide default values for the arguments not in the signature.

> setGeneric("genSig", signature = c("x"), function(x, y = 1) standardGeneric("genSig"))

[1] "genSig"

> setMethod("genSig", signature("numeric"), function(x, y = 20) print(y))

[1] "genSig"

> genSig(10)

[1] 20 100 R Programming for Bioinformatics

3.4.5S4 Accessor system: functions Accessor Accessing slots directly using the @ operator relies on the implementation details of the class, and such access will make it very difficult to change that implementation. In many casesfunctions it will be advantageous to provide accessor functions for some, or all, of the components of an object. Suppose that the class ToFoo createhas a an slot accessor named a .function To create for an this accessor slot, we function create for a generic this slot, function we named createa aand generic a method function for namedinstancesa and of a the method class for Foo. instances of the class Foo.

> setClass("Foo", representation(a = "ANY"))

[1] "Foo"

> setGeneric("a", function(object) standardGeneric("a"))

[1] "a"

> setMethod("a", "Foo", function(object) object@a)

[1] "a"

> b = new("Foo", a = 10) > a(b)

[1] 10

120 3.4.6 Using S3 classes with S4 classes S3 classes can be used to describe the contents of a slot in an S4 class, and they can be used for dispatch in S4 methods by first creating an S4 virtualization of the class. This is done with a call to setOldClass, and many such classes are created when the methods package is attached.

> setOldClass("mymatrix") > getClass("mymatrix")

Virtual Class

No Slots, prototype of class "S4"

Extends: "oldClass"

The resulting S4 classes are virtual classes, so that instances cannot be created directly; instead, you create instances, as for other S3 classes, by The S4 system

# definition of S4 classes setClass("lreg4",representation(coefficients="numeric", var="matrix",iterations="numeric", deviance="numeric", predictors="character"))

121 The S4 system

lreg4 <- function(X, y, predictors=colnames(X), constant=TRUE, max.iter=10, tol=1E-6) { if (!is.numeric(X) || !is.matrix(X)) stop("X must be a numeric matrix") if (!is.numeric(y) || !all(y == 0 | y == 1)) stop("y must contain only 0s and 1s") if (nrow(X) != length(y)) stop("X and y contain different numbers of observations") if (constant) { X <- cbind(1, X) colnames(X)[1] <- "Constant" } b <- b.last <- rep(0, ncol(X)) it <- 1 while (it <= max.iter){ p <- as.vector(1/(1 + exp(-X %*% b))) var.b <- solve(crossprod(X, p * (1 - p) * X)) b <- b + var.b %*% crossprod(X, y - p) if (max(abs(b - b.last)/(abs(b.last) + 0.01*tol)) < tol) break b.last <- b it <- it + 1 } if (it > max.iter) warning("maximum iterations exceeded") # create an instance of the "lreg4" class: result <- new("lreg4", coefficients=as.vector(b), var=var.b, iterations=it, deviance=-2*sum(y*log(p) + (1 - y)*log(1 - p)), predictors=predictors) result } 122 The S4 system

mod.mroz.4 <- with(Mroz, lreg4(cbind(k5, k618, age, wc, hc, lwg, inc), lfp))

class(mod.mroz.4) mod.mroz.4

123 The S4 system

show # the S4 generic function show

# defining an S4 method setMethod("show", signature(object="lreg4"), definition=function(object) { coef <- object@coefficients names(coef) <- object@predictors print(coef) } ) mod.mroz.4 # invokes show method

124 The S4 system

setMethod("summary", signature(object="lreg4"), definition=function(object, ...) { b <- object@coefficients se <- sqrt(diag(object@var)) z <- b/se table <- cbind(b, se, z, 2*(1-pnorm(abs(z)))) colnames(table) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)") rownames(table) <- object@predictors printCoefmat(table) cat("\nDeviance =", object@deviance,"\n") } ) summary(mod.mroz.4)

125 The S4 system

# Lexical scope f <- function (x) x + a a <- 10 x <- 5 f(2) # x bound to 2 in frame of f(), a to 10 in global frame x # global x is undisturbed f <- function (x) { a <- 5 g(x) } g <- function(y) y + a f(2) # a bound to 10 in global frame a # global a is undisturbed f <- function (x) { a <- 5 g <- function (y) y + a g(x) } f(2) # a is bound to 5, x to 2 in frame of f(), y to 2 in frame of g()

126 The S4 system

# a function that returns a closure (function + environment) makePower <- function(power) { function(x) x^power } square <- makePower(2) square # power bound to 2 square(4) cuberoot <- makePower(1/3) cuberoot # power bound to 1/3 cuberoot(64)

127 The semantics of method invocation • When a generic function is invoked, the classes of all supplied arguments that are in the signature of the

generic function form the target signature.

• A method is said to be applicable for this target signature if for every argument in the signature the class specified by the method is the same as the class

of the corresponding supplied argument, a superclass

of that class, or has class ANY. • To order the applicable methods, we need a metric on the classes. 128 The semantics of method invocation • A simple metric is the following: • if the classes are the same, the distance is zero;

• if the class in the signature of the method is a direct superclass of the class of the supplied argument, then the distance is one, and so on.

• The distance from a class to ANY is chosen to be larger than any other distance.

• The distance between an applicable method and the target signature can then be computed by summing up the distances

over all arguments in the signature of the generic function, and

these distances can then be used to order the methods. 129 The semantics of method invocation

• Once the the ordered list of methods has been computed, control is passed to the most specific method. • S4 control will return to the generic function, so post-processing is possible.

130 Finding methods

• We will often need to be able to determine which methods are registered with a particular generic function.

• At other times we will want to be able to determine whether a particular signature

will be handled by a generic. • Functionality of this sort is provided by the functions listed next.

131 Finding methods

• showMethods shows the methods for one or more generic functions. The class argument can be used to ask for all methods that have a particular class in their signature.

The output is printed to stdout by default and cannot easily

be captured for programmatic use.

• getMethod returns the method for a specific generic function whose signature is congruent with the specified

signature. An error is thrown if no such method exists.

• findMethod returns the packages in the search path that contain a definition for the generic and signature specified. 132 Finding methods

• selectMethod returns the method for a specific generic function and signature, but difers from getMethod in that

inheritance is used to identify a method.

• existsMethod tests for a method with a congruent signature (to that provided) registered with the specified generic

function. No inheritance is used. Returns either TRUE or FALSE.

• hasMethod tests for a method with a congruent signature for the specified generic function. It seems that this would always

return TRUE (since there must be a default method). It does return

FALSE if there is no generic function, but it seems that there are

better ways to handle that. 133 Finding Documentation

• Either a direct call to help or the use of the ? operator will obtain the help page for most functions. • To find out about classes an infix syntax is used, for example, t he syntax for displaying the help page for

the graph class, from the graph package is:

class?graph

help("graph-class")

134 Finding Documentation

• Help for generic functions requires no special syntax; one just looks for help on the name of the generic function. • The syntax for two diferent ways to find the help page for a method for the nodes generic function, for

an argument of class graphNEL.

method?nodes("graphNEL") help("nodes,graphNEL-method")

135 Finding Documentation

• library(“RBioinf”) • S4Help() • The function takes the name of either a S4 generic function or a S4 class and provides a selection menu to choose a help page.

136 Managing S3 and S4 together

• Testing for inheritance is done diferently between S3 and S4. The former uses the

function inherits while the latter uses is.

137 Object-Oriented Programming in R 113

> setOldClass(c("C1",Managing "C2")) S3 and S4 > is(x, "C2") [1] TRUE together The function isS4 returns TRUE for an instance of an S4 class. For primitive functions that support dispatch, S4 methods are restricted to S4 objects. The function asS4 can be used to allow an instance of an S3 class to be passed to The function asS4 can be used to allow an • an S4 method. In the next example we show that when x is an S3 instance, we do not dispatchinstance to the S4 method, of an but onceS3 we class use asS4, thento dispatch be topassed the S4 to an S4 methodmethod. occurs. >x=1 > setClass("A", representation(s1 = "numeric"))

[1] "A"

> setMethod("+", c("A", "A"), function(e1, e2) print("howdy"))

[1] "+"

> class(x) = "A" >x+x

[1] 2 attr(,"class") [1] "A"

> asS4(x) + x

[1] "howdy" [1] "howdy" 138

3.8.1 Getting and setting the class attribute Another difference between the S3 and S4 systems comes from the return value for the class function. For instances of S3 classes, the class attribute should hold the names of all classes that the object inherits from and this vector is returned. For instances of S4 objects, the class attribute is always of length one, the most specific class, and this is returned. Inheritance is determined from the existing class definitions. Use of the oldClass mechanism