Cheat Sheet Pandas Python.Indd
Total Page:16
File Type:pdf, Size:1020Kb
Python For Data Science Cheat Sheet Asking For Help Dropping >>> help(pd.Series.loc) Pandas Basics >>> s.drop(['a', 'c']) Drop values from rows (axis=0) Selection Also see NumPy Arrays >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Learn Python for Data Science Interactively at www.DataCamp.com Getting Sort & Rank >>> s['b'] Get one element -5 Pandas >>> df.sort_index() Sort by labels along an axis Get subset of a DataFrame >>> df.sort_values(by='Country') Sort by the values along an axis The Pandas library is built on NumPy and provides easy-to-use >>> df[1:] Assign ranks to entries Country Capital Population >>> df.rank() data structures and data analysis tools for the Python 1 India New Delhi 1303171035 programming language. 2 Brazil Brasília 207847528 Retrieving Series/DataFrame Information Selecting, Boolean Indexing & Setting Basic Information Use the following import convention: By Position (rows,columns) Select single value by row & >>> df.shape >>> import pandas as pd >>> df.iloc[[0],[0]] >>> df.index Describe index Describe DataFrame columns 'Belgium' column >>> df.columns Pandas Data Structures >>> df.info() Info on DataFrame >>> df.iat([0],[0]) Number of non-NA values >>> df.count() Series 'Belgium' Summary A one-dimensional labeled array a 3 By Label Select single value by row & >>> df.sum() Sum of values capable of holding any data type b -5 >>> df.loc[[0], ['Country']] Cummulative sum of values 'Belgium' column labels >>> df.cumsum() >>> df.min()/df.max() Minimum/maximum values c 7 Minimum/Maximum index value Index >>> df.at([0], ['Country']) >>> df.idxmin()/df.idxmax() d 4 'Belgium' >>> df.describe() Summary statistics Mean of values >>> df.mean() By Label/Position >>> df.median() Median of values >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> df.ix[2] Select single row of DataFrame Country Brazil subset of rows Applying Functions Capital Brasília Columns Population 207847528 >>> f = lambda x: x*2 Country Capital Population >>> df.apply(f) Apply function A two-dimensional labeled >>> df.ix[:,'Capital'] Select a single column of >>> df.applymap(f) Apply function element-wise data structure with columns 0 Brussels subset of columns 0 Belgium Brussels 11190846 of potentially different types 1 New Delhi 2 Brasília Data Alignment 1 India New Delhi 1303171035 Index >>> df.ix[1,'Capital'] Select rows and columns Internal Data Alignment 2 Brazil Brasília 207847528 'New Delhi' NA values are introduced in the indices that don’t overlap: Boolean Indexing >>> data = {'Country': ['Belgium', 'India', 'Brazil'], >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s[~(s > 1)] Series s where value is not >1 'Capital': ['Brussels', 'New Delhi', 'Brasília'], >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> s + s3 'Population': [11190846, 1303171035, 207847528]} >>> df[df['Population']>1200000000] Use filter to adjust DataFrame a 10.0 b NaN >>> df = pd.DataFrame(data, Setting c 5.0 columns=['Country', 'Capital', 'Population']) >>> s['a'] = 6 Set index a of Series s to 6 d 7.0 I/O Arithmetic Operations with Fill Methods You can also do the internal data alignment yourself with Read and Write to CSV Read and Write to SQL Query or Database Table the help of the fill methods: >>> pd.read_csv('file.csv', header=None, nrows=5) >>> from sqlalchemy import create_engine >>> s.add(s3, fill_value=0) >>> df.to_csv('myDataFrame.csv') >>> engine = create_engine('sqlite:///:memory:') a 10.0 >>> pd.read_sql("SELECT * FROM my_table;", engine) b -5.0 Read and Write to Excel c 5.0 >>> pd.read_sql_table('my_table', engine) d 7.0 >>> pd.read_excel('file.xlsx') >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> s.sub(s3, fill_value=2) >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') >>> s.div(s3, fill_value=4) read_sql()is a convenience wrapper around read_sql_table() and Read multiple sheets from the same file >>> s.mul(s3, fill_value=3) read_sql_query() >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> pd.to_sql('myDf', engine) DataCamp Learn Python for Data Science Interactively.