Pandas/Numpy CPSC 501 Dr
Total Page:16
File Type:pdf, Size:1020Kb
Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02 Basics • The primary data structures in pandas are implemented as two classes: • DataFrame, which you can imagine as a relational data table, with rows and named columns. • Series, which is a single column. A DataFrame contains one or more Series and a name for each Series. 2 Loading a file • The following example loads a file with California housing data • The example uses DataFrame.describe to show interesting statistics about a DataFrame • Another useful function is DataFrame.head, which displays the first few records of a DataFrame: 3 Cont’d • Another powerful feature of pandas is graphing. For example, DataFrame.hist lets you quickly study the distribution of values in a column: 4 Accessing/Manupulating Data • Accessing is the same is Python’s list/dict • You can also apply Python’s basic arithmetic operations to series, or you can use them as arguments to NumPy functions. 5 Modifying • Modying is also straightforward. • Adding two series to an existing DataFrame: • Both Series and DataFrame objects also define an index property that assigns an identifier value to each Series item or DataFrame row. • Call DataFrame.reindex to manually reorder the rows. 6 Numpy • NumPy’s main objective is an n-dimensional array (ndarray). Dimensions are called axes. It also goes by its alias: numpy.array • Some of the attributes are: • ndarray.ndim the number of axes (dimensions) of the array. • ndarray.shape the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with nrows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim. • ndarray.size the total number of elements of the array. This is equal to the product of the elements of shape. • ndarray.dtype an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples. • ndarray.itemsize the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8) • ndarray.data the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities. 7 An example 8 Creation • you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences. 9 Cont’d • Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation. 10 Operations • Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result. 11 • By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array: 12 Universal Functions 13 Iterating/Slicing 14 Copying 15.