Introduction to Python Pandas for Data Analytics

Introduction to Python Pandas for Data Analytics Srijith Rajamohan Introduction to Python Pandas for Data Introduction Analytics to Python Python programming NumPy Srijith Rajamohan Matplotlib Advanced Research Computing, Virginia Tech Introduction to Pandas Case study Tuesday 19th July, 2016 Conclusion 1 / 115 Course Contents Introduction to Python Pandas for Data Analytics This week: Srijith Rajamohan Introduction to Python • Introduction Python Programming to Python • Python NumPy programming • Plotting with Matplotlib NumPy • Introduction to Python Pandas Matplotlib • Introduction Case study to Pandas • Case study Conclusion • Conclusion 2 / 115 Section 1 Introduction to Python Pandas for Data 1 Introduction to Python Analytics Srijith Rajamohan 2 Python programming Introduction to Python 3 NumPy Python programming 4 Matplotlib NumPy Matplotlib 5 Introduction to Pandas Introduction to Pandas 6 Case study Case study Conclusion 7 Conclusion 3 / 115 Python Features Introduction to Python Pandas for Data Analytics Srijith Rajamohan Why Python ? Interpreted Introduction • to Python Intuitive and minimalistic code Python • programming Expressive language NumPy • Dynamically typed Matplotlib • Introduction Automatic memory management to Pandas • Case study Conclusion 4 / 115 Python Features Introduction to Python Pandas for Data Advantages Analytics Srijith Ease of programming Rajamohan • Minimizes the time to develop and maintain code Introduction • to Python Modular and object-oriented • Python Large community of users programming • NumPy A large standard and user-contributed library Matplotlib • Introduction Disadvantages to Pandas Interpreted and therefore slower than compiled languages Case study • Conclusion Decentralized with packages • 5 / 115 Code Performance vs Development Time Introduction to Python Pandas for Data Analytics Srijith Rajamohan Introduction to Python Python programming NumPy Matplotlib Introduction to Pandas Case study Conclusion 6 / 115 Versions of Python Introduction to Python Pandas for Data Analytics Srijith Two versions of Python in use - Python 2 and Python 3 Rajamohan • Python 3 not backward-compatible with Python 2 • Introduction A lot of packages are available for Python 2 to Python • Python Check version using the following command programming • NumPy Matplotlib Example Introduction to Pandas $ python --version Case study Conclusion 7 / 115 Section 2 Introduction to Python Pandas for Data 1 Introduction to Python Analytics Srijith Rajamohan 2 Python programming Introduction to Python 3 NumPy Python programming 4 Matplotlib NumPy Matplotlib 5 Introduction to Pandas Introduction to Pandas 6 Case study Case study Conclusion 7 Conclusion 8 / 115 Variables Introduction to Python Pandas for Data Analytics Variable names can contain alphanumerical characters and • Srijith some special characters Rajamohan It is common to have variable names start with a Introduction • to Python lower-case letter and class names start with a capital letter Python Some keywords are reserved such as ànd', àssert', programming • NumPy `break', `lambda'. A list of keywords are located at Matplotlib https://docs.python.org/2.5/ref/keywords.html Introduction Python is dynamically typed, the type of the variable is to Pandas • derived from the value it is assigned. Case study Conclusion A variable is assigned using the `=' operator • 9 / 115 Variable types Introduction to Python Pandas for Data Analytics Variable types • Srijith • Rajamohan Integer (int) • Float (float) Introduction • Boolean (bool) to Python • Complex (complex) Python programming • String (str) NumPy • ... Matplotlib • User Defined! (classes) Introduction Documentation to Pandas • • https://docs.python.org/2/library/types.html Case study • https://docs.python.org/2/library/datatypes.html Conclusion 10 / 115 Variable types Introduction to Python Pandas for Data Analytics Srijith Use the type function to determine variable type Rajamohan • Introduction Example to Python Python programming >>> log_file = open("/home/srijithr/ NumPy logfile","r") Matplotlib >>> type(log_file) Introduction file to Pandas Case study Conclusion 11 / 115 Variable types Introduction to Python Pandas for Data Analytics Variables can be cast to a different type Srijith • Rajamohan Example Introduction to Python >>> share_of_rent = 295.50 / 2.0 Python programming >>> type(share_of_rent) NumPy float Matplotlib >>> rounded_share = int(share_of_rent) Introduction to Pandas >>> type(rounded_share) Case study int Conclusion 12 / 115 Operators Introduction to Python Pandas for Data Analytics Srijith Rajamohan Arithmetic operators +, -, *, /, // (integer division for Introduction • to Python floating point numbers), '**' power Python Boolean operators and, or and not programming • NumPy Comparison operators >, <, >= (greater or equal), <= • Matplotlib (less or equal), == equality Introduction to Pandas Case study Conclusion 13 / 115 Strings (str) Introduction to Python Example Pandas for Data Analytics >>> dir(str) Srijith Rajamohan [...,'capitalize','center','count',' decode','encode','endswith',' Introduction to Python expandtabs','find','format','index', Python 'isalnum','isalpha','isdigit',' programming islower','isspace','istitle',' NumPy isupper','join','ljust','lower',' Matplotlib Introduction lstrip','partition','replace','rfind to Pandas ','rindex','rjust','rpartition',' Case study rsplit','rstrip','split','splitlines Conclusion ','startswith','strip','swapcase',' title','translate','upper','zfill'] 14 / 115 Strings Introduction to Python Pandas for Data Analytics Example Srijith Rajamohan >>> greeting ="Hello world!" Introduction >>> len(greeting) to Python 12 Python programming >>> greeting NumPy 'Hello world' Matplotlib >>> greeting[0]# indexing starts at0 Introduction 'H' to Pandas >>> greeting.replace("world","test") Case study Hello test ! Conclusion 15 / 115 Printing strings Introduction to Python Example Pandas for Data Analytics # concatenates strings witha space Srijith >>> print("Go","Hokies") Rajamohan Go Hokies Introduction to Python # concatenated without space Python >>> print("Go"+"Tech"+"Go") programming GoTechGo NumPy #C-style string formatting Matplotlib >>> print("Bar Tab=%f" %35.28) Introduction to Pandas Bar Tab = 35.280000 Case study # Creatinga formatted string Conclusion >>> total ="My Share= %.2f. Tip=%d"% (11.76, 2.352) >>> print(total) My Share = 11.76. Tip = 2 16 / 115 Lists Introduction to Python Pandas for Data Analytics Array of elements of arbitrary type Srijith Rajamohan Example Introduction to Python >>> numbers = [1,2,3] Python >>> type(numbers) programming NumPy list Matplotlib >>> arbitrary_array = [1,numbers,"hello"] Introduction >>> type(arbitrary_array) to Pandas list Case study Conclusion 17 / 115 Lists Introduction to Python Pandas for Data Analytics Example Srijith Rajamohan # createa new empty list Introduction >>> characters = [] to Python # add elements usingàppend' Python programming >>> characters.append("A") NumPy >>> characters.append("d") Matplotlib >>> characters.append("d") Introduction to Pandas >>> print(characters) Case study ['A','d','d'] Conclusion 18 / 115 Lists Introduction to Python Pandas for Data Analytics Lists are mutable - their values can be changed. Srijith Rajamohan Example Introduction to Python >>> characters = ["A","d","d"] Python # Changing second and third element programming NumPy >>> characters[1] ="p" Matplotlib >>> characters[2] ="p" Introduction >>> print(characters) to Pandas ['A','p','p'] Case study Conclusion 19 / 115 Lists Introduction to Python Pandas for Example Data Analytics Srijith >>> characters = ["A","d","d"] Rajamohan # Inserting before"A","d","d" Introduction >>> characters.insert(0,"i") to Python >>> characters.insert(1,"n") Python programming >>> characters.insert(2,"s") NumPy >>> characters.insert(3,"e") Matplotlib >>> characters.insert(4,"r") Introduction >>> characters.insert(5,"t") to Pandas Case study >>>print(characters) Conclusion ['i','n','s','e','r','t','A','d',' d'] 20 / 115 Lists Introduction to Python Pandas for Example Data Analytics Srijith >>> characters = ['i','n','s','e','r', Rajamohan 't','A','d','d'] Introduction # Remove first occurrence of"A" from list to Python >>> characters.remove("A") Python programming >>> print(characters) NumPy ['i','n','s','e','r','t','d','d'] Matplotlib # Remove an element ata specific location Introduction >>> del characters[7] to Pandas Case study >>> del characters[6] Conclusion >>> print(characters) ['i','n','s','e','r','t'] 21 / 115 Tuples Introduction to Python Tuples are like lists except they are immutable. Difference is in Pandas for performance Data Analytics Srijith Example Rajamohan >>> point = (10, 20)# Note() for tuples Introduction to Python instead of[] Python >>> type(point) programming NumPy tuple Matplotlib >>> point = 10,20 Introduction >>> type(point) to Pandas tuple Case study >>> point[2] = 40# This will fail! Conclusion TypeError :'tuple' object does not support item assignment 22 / 115 Dictionary Introduction to Python Dictionaries are lists of key-value pairs Pandas for Data Analytics Example Srijith Rajamohan >>> prices = {"Eggs" : 2.30, Introduction ..."Sausage" : 4.15, to Python ..."Spam" : 1.59,} Python programming >>> type(prices) NumPy dict Matplotlib >>> print (prices) Introduction to Pandas {'Eggs': 2.3,'Sausage': 4.15,'Spam': Case study 1.59} Conclusion >>> prices ["Spam"] 1.59 23 / 115 Conditional statements: if, elif, else Introduction to Python Pandas for Example Data Analytics Srijith >>> I_am_tired

Introduction to Python Pandas for Data Analytics

Mastering Machine Learning with Scikit-Learn

Installing Python Ø Basics of Programming • Data Types • Functions • List Ø Numpy Library Ø Pandas Library What Is Python?

Cheat Sheet Pandas Python.Indd

Pandas: Powerful Python Data Analysis Toolkit Release 0.25.3

Image Processing with Scikit-Image Emmanuelle Gouillart

Zero-Cost, Arrow-Enabled Data Interface for Apache Spark

Pandas Writing Data to Excell Spreadsheet

Interaction Between SAS® and Python for Data Handling and Visualization Yohei Takanami, Takeda Pharmaceuticals

Python Data Processing with Pandas

Introduction to Python for Econometrics, Statistics and Data Analysis 4Th Edition

Cheat Sheet: the Pandas Dataframe Object

R Or Python? Dilemma, Dilemma