Lew - Week 3 Background a Plot Is Created in Base R Graphics by ﬁrst Calling a High-Level Function That Creates a Complete Plot

R Graphics Statistics X410A - Spring 2014 - Lew - Week 3 Background A plot is created in base R graphics by first calling a high-level function that creates a complete plot Second, low-level functions to add more output to the plot (such as additional lines or points) Plots in R are created in layers like a cake. In RStudio, plots appear in the plot window (lower right hand corner) Basic Rules If there is only going to be one plot per page, then a high-level function is used to start a new plot on a new page. We can put multiple plots on a page: A high- level function is instructed to start the next plot on the same page, only starting a new page when the number of plots per page is exceeded. All low-level functions add output to the current plot only. plot( ) The most important high-level function in base R graphics is plot( ). plot( ) can take many arguments The first argument to plot( ) is data, such as a vector but it could be a data frame. The second argument can be data if a scatterplot is desired, but it can be nearly anything. R is quite flexible. Example Example Example Example Explanation In the first case, all of the data to plot are specified in a single data frame. The result is a scatter plot matrix - good for exploring. In the second case, a grouping variable such as Diet in a plot will yield a bar chart (bar plot) Third and Fourth cases look the same. Separate x and y variables are specified as two separate arguments (3rd). A formula of the form y ~ x, plus a data frame that contains the variables mentioned in the formula (4th). More Explanations The plot( ) function is generic therefore, the plot( ) function can cope with the same data being specified in several different formats (and it will produce the same result). However, the fact that plot( ) is generic also means that if plot( ) is given different types of data, plot( ) might produce different types of plots. Example Features R graphics does not make a big distinction between different types of plots. R just sees different types of plots as variations in the presentation of the data. For example, we do not need an entirely new command to have a line plot. We can modify options on a scatterplot. Example Additional Features R graphics also does not make a big distinction between a plot of a single set of data and a plot containing multiple series of data. Additional data series can be added to a plot using low-level functions such as points( ) or lines(). Example Type of Graphs R can make just about any type of plot. Some do not have names. Others we see often, such as histograms, kernel density plots, dot plots, bar charts (simple, stacked, grouped), line charts, pie charts (simple, annotated, 3D), boxplots (simple, notched, violin plots, bagplots) scatter plots (simple, with fit lines, scatterplot matrices, high density plots, and 3D plots). Customizing Plots Using the par( ) function we can examine the default graphical settings R lets us change nearly every one (there are over 100) but we will only examine a few commonly modified settings in detail today. Colors R has 657 built-in colors, many more can be created. There are 3 basic settings: col=, fg= and bg= col= is used for the colors of lines, symbols and text fg= specifies the colors of axes and borders, other arguments do this too bg= specifies the color of the background or fill Colors There are several specific settings such as col.main= for the plot title, col.axis = for the axis labels etc. Colors can be specified by number, name, hexadecimal, or RGB Color Example - Names Color by Number Color by RGB Colors in Action Line Type & Width lty= controls the appearance of lines on a plot, the values 1 through 6 are used lwd = controls line thickness, defaults to 1, and lwd=2 is interpreted as twice as thick. lwd=3 is three times and so forth. Line Type & Width Example Line Type & Width Example Fonts The settings family= and font= control drawing text on a plot. family= is a character value like “Gill Sans”, specific fonts must be installed on your operating system to work. Generic settings (work anywhere) are “serif”, “sans”, and “mono”. This affects text in the figure region. font= 1 normal, 2 bold, 3 italic, 4 bold - italic this affects text in the plot region. Family & Fonts Text Size ps= and cex= control the appearance of text on a graphic. ps= controls the absolute font size setting its default is 16 cex= is character expansion, it is a multiplier and its default is 1. The final font size specification is ps * cex. Data Symbols There are 25 basic symbols The pch= setting takes the values 0 through 25. See ?pch for more help pch= symbols 0 through 25 Example Titles You can add titles in the plot( ) OR You can use the title( ) function to add labels to a plot. Settings are: main="main title", sub="sub-title", xlab="x-axis label", ylab="y-axis label" and are the same whether you use plot( ) or title( ) Legend Legends are placed within the plot region in plot( ) A legend needs an x,y coordinate for placement, here we use the minimum of x and y=17. The annotation for the legend follows 3:5 for the number of gears. We specify multiple colors and symbols with c( ) Axes You can create custom axes using the axis( ) function. axis(side, at=, labels=, pos=, lty=, col=, las=, tck=, ...) But you just might use plot( ) because it has the most important axes options (labeling the axis and setting its range of values) with xlab=, ylab=, xlim=, ylim= Break Here Please take a 10 minute break here There is an in class exercise if you want to keep working I will walk around and answer questions Common Stat Graphics Bar plot or Bar Chart Mosaic Plot Stem and Leaf Dot Plot or Stripchart Boxplot Histogram Bar Plot (Bar Chart) A bar plot displays the frequency (or relative frequency) for all observations of a categorical variable. Bar plots can be displayed vertically or horizontally Bar plots can be grouped or stacked Don’t confuse them with histograms (later) Example Mosaic Plot The mosaic plot is a graphical representation of the two-way frequency table or Contingency Table. A mosaic plot is divided into rectangles, so that the area of each rectangle is proportional to the proportions of the Y variable in each level of the X variable. Example Stem & Leaf Plot A stem-and-leaf display presents quantitative data in a graphical format, used to visual the shape of a distribution It is like a histogram Used exploratory data analysis for small datasets. Example Dot Plot or Stripchart A stripchart or dot plot is a one dimensional scatter plot. It is used for small datasets Use the stripchart function to create dot plots and stripcharts. Example Boxplot boxplot ( ) is used to depict groupings of numerical data through their five-number summaries. It will indicate which observations might be considered outliers. boxplots can be drawn horizontally or vertically. boxplot( ) can operate on a single vector of values, but usually two or more are better suited. Example Example Histograms The base R function hist( ) allows you draw histograms. The function truehist() from MASS is preferred. Both functions only need data (a vector) to produce a histogram By default hist() will use the range of the variable on the x-axis and frequency counts on the y-axis By default truehist() its vertical axis has a relative frequency density scale, so the product of the dimensions of any panel gives the relative frequency. Hence the total area under the histogram is 1 and it is directly comparable estimates of the probability density function. Example truehist() & low level functions Exporting Graphics PDF( ), JPEG( ), and PNG( ) are some of the possible graphic device choices that will save a graphic to your computer. They can take arguments which give instructions about the saved graphic (e.g., height and width) dev.off( ) is used to close the device and write the file. Export Example Break Here Please take a 10 minute break here There is an in class exercise if you want to keep working I will walk around and answer questions Basic Mapping R can map data It has a built in library of maps We can combine maps and other functions to produce some interesting geographic analyses The basic package is mapdata map example map example map example map example map example map example package Lattice "Trellis" or Lattice plots extend the usual kind of univariate and bivariate plots to situations where some categorical or factor may influence the distribution of the data. They do this by generating an array of simple plots arranged according to the values of some "conditioning" variables. Steps Call the library(lattice) Create or locate the factor which is used as a condition (use equal.count() for example). In lattice, it is known as a “shingle” Use one of the lattice plotting functions (use xyplot for example) Basic plot + map Conditioning Lattice Example Two “shingles” Layering it on Stacked Bar Chart Boxplot Dot plot Strip plot 3D plot.

Load more