CSIT

Advance

Python for Data Science

A.Andrew Bergeran M.Sc.,B.Ed.,M.B.A [email protected] 9444473301

1

Table of Contents

Index Description Page. No

Python

Introduction Chapter 1 Python Environment 3 Technology Specifications Chapter 2 Python Data Types 14 Chapter 3 Decision Making and Loops 18 Chapter 4 String, Tuples, Sets, Dictionaries 20 Functions Chapter 5 27 ZIP, MAP, Filter, Lambda Chapter 6 File Handling and Exception Handling 31 Chapter 7 Object Oriented Programming 33 Chapter 8 Regular Expression 34 Chapter 9 CGI – Common Gateway Interface 38 Chapter 10 SQLite DB , JSON 40 Data Science Chapter 11 Pandas - Data structures and analysis 43 Chapter 12 NumPy - Numerical Computing 45 Matplotlib - 2D/3D plotting Chapter 13 46 Excel - Data Visualization Chapter 14 SciPy - Scientific Computing 51 Chapter 15 SymPy - Symbolic mathematics 52 Chapter 16 SciKit - Machine Learning 53 Beautiful Soup - HTML/XML Parser Chapter 17 56 Web Scraping

2

Introduction Chapter 1

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python‟s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. Developed by Guido van Rossum.

The Python interpreter and the extensive standard are freely available in source or binary form for all major platforms from the Python Web site, https://www.python.org and may be freely distributed.

The Python interpreter is easily extended with new functions and data types implemented in or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications.

Python is a general purpose Interpreted programming language that is often applied in scripting roles. So Python is language as well as scripting.

Python is a readable , dynamic, pleasant, flexible, fast, and powerful language. Multi-Purpose ( Web, GUI, Scripting, etc. ).

Python is a Platform Independent Language. Open Source. OOP

Python is widely used in Artificial Intelligence, Machine Learning; Data Analytics Python has very powerful statistical and visualization libraries. Python has efficient for high-level Data Structures and Data Science.

3

History Chapter 1

Invented in the Netherlands, early 90s by Guido van Rossum. Python was conceived in the late 1980s and its implementation was started in December 1989 . Rossum is fan of „Monty Python‟s Flying Circus‟, this is a famous TV show in Netherlands, Named after Monty Python.

Python 1.0 releases in 1994 1.x Python 2.0 releases in 2000 2.x Python 3.0 releases in 2008 3.x Python 3.5 releases in 2015

Python Features

Interpreted Language - Interpreter reads the source code of the program, line by line, parses the source code, and interprets the instructions. Python Interpreters are available for many operating systems.

Object-oriented programming language. Python is an open source scripting language. Large standard libraries to solve common tasks. Cross Platform Language.

Python provides no braces to indicate blocks of code for class and function definitions or flow control. Blocks of code are denoted by line indentation. Python has a very simple and elegant syntax. It's much easier to read and write Python programs compared to other languages like: C++, , C#.

Interpreted, Object Oriented, Open Source, Cross Platform, Line Indentation Large Standard Libraries, Very simple and Elegant Syntax, Automatic memory management, Support third party utilities, Mixable

4

Scope Chapter 1

• Scientific and Numeric • System Programming • Web Applications • Testing Scripts • Programming • Component Integration • Database Application • Network Programming • Game Development

Users of Python

Google –Web Spider and Search Engine

NASA – Scientific Calculations Intel, Cisco, Hewlett-Packard, Seagate, IBM use python for hardware testing

ESRI – Environmental Systems Research Institute GIS – Geographic information systems mapping products

You Tube – Video Sharing

Research Scalars Statistics Projects Data Scientists

5

Compiling and Interpreting Chapter 1

Many languages require compiling your program into a form that the machine understands. Compiled Program is executed.

Python is instead directly interpreted into machine instructions. Interpreter executes the program directly without Compile

Python Interpreter

The Python code we write must always be run by the Interpreter. (Source code can execute instructions directly) Source code is translated to byte code, which is then run by the Python Virtual Machine. Your Code is Automatically Compiled

Compilation is a translation step, and the byte code is a low-level platform- independent.

Note that the Python byte code is not binary machine code.

6

Installation & Configuration Chapter 1

Software Requirements: Key Points • OS : Windows 7 Service Pack1 / Windows 8/ 10 • Python 3.6.2

(Optional IDE) • JDK 1.7 • NetBeans IDE 8.0.2 • Python Plugins for NetBeans 8.0 • Install IIS and configure CGI • DB Browser for SQLite

Run the downloaded file Python 3.6.2. This brings up the Python install wizard, just accept the default settings, and wait until the install is finished.

IDLE - Integrated Development and Learning Environment Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:14:34)

Go to https://www.python.org/downloads/

7

Installation & Configuration Chapter 1

IDLE - Integrated Development and Learning Environment

8

Chapter 1 Python Environment Mode

Python Interpreter can be used in Interactive Mode and Scripting mode.

Interactive Mode:

Quickly interact with python on a Command Line Interface.

Scripting mode:

Write multiple line coding to be save & execute as a program. Python script source code in a file with the .py extension.

9

Integrated Development Environments for Python Chapter 1

, an open source cross-platform IDE with autocomplete. • with the Pydev plug-in. • an IDE for Python and Ruby. • Koding a free online development environment. • IDE an IDE for Python, , PHP and Ruby. Komodo • NetBeans is written in Java and runs everywhere where a JVM is installed. • PIDA open source IDE. • PyCharm, a proprietary and Open Source IDE. • PyScripter Free and open-source software Python IDE. • Python Tools for Visual Studio Free and open-source plug-in for Visual Studio.

Python Standard Library

Numeric and Mathematical Modules numbers,math,cmath,decimal, fractions,random,statistics File and Directory Access pathlib, os.path, fileinput, filecmp Data Persistence sqlite3, pickle Data Compression and Archiving zlib, gzip, bz2, zipfile, tarfile File Formats csv Cryptographic Services hashlib, hmac, secrets

10

Operating System Services Chapter 1 os,io,time Concurrent Execution threading Networking socket,ssl Internet Data Handling email,json,base64 Structured Markup Processing Tools html, .dom Internet Protocols cgi,urllib,http https://docs.python.org/3/library/

Python Libraries for Data Science

Numerical libraries - NumPy, SciPy, SymPy NumPy = Numerical Python (Array Package) advance math functionalities SciPy = Scientific Python. It is a library of algorithms and mathematical tool SymPy = Symbolic Mathematics. (algebraic evaluation, complex numbers )

11

Mathematical libraries - Matplotlib, NumPy, SymPy Chapter 1 Matplotlib = powerful visualizations (2d Plotting). Data Structure and Analysis - Pandas Pandas = data manipulation, aggregation, and visualization. Scientific Computing - SciPy, Scikit SciPy, Scikit-learn (Machine Learning) Web Scraping BeautifulSoup - xml and html parsing library Scrapy - Scrapy is a library for making crawling programs

NLTK - Natural Language Toolkit, (Linguistics, Cognitive Science, Artificial Intelligence) Graphics Frameworks Panda3d, PyGame UI Frameworks PyGTK, PyQt,

pip Package Manager pip, a package manager used to install and manage Python software packages pip install pip list # displays the list of currently installed modules pip uninstall umpy (1.14.0) bs4 (0.0.1) pip (9.0.1) pyexcel (0.5.7) Goto Command Prompt pyexcel-io (0.5.6) pyexcel-xls (0.5.5) C:\Users\Admin>cd\ requests (2.18.4) setuptools (28.8.0) C:\>cd Python simplejson (3.13.2) urllib3 (1.22) C:\Python>cd Scripts xlrd (1.1.0) XlsxWriter (1.0.2) C:\Python\Scripts> pip list xlwt (1.3.0) beautifulsoup4 (4.6.0) 12

Chapter 1 Python Keywords

Keywords are the reserved words in Python. We cannot use a keyword as variable name, function name or any other identifier. They are used to define the syntax and structure of the Python language. Keywords are case sensitive. Each keyword have a special meaning and a specific operation. There are 33 keywords in Python 3.6.4

>>> import keyword

>>> keyword.kwlist

[import, as , from, and, or, not, True, False, None, if else, elif, for, in, while, with, break, continue, class def, pass, return ,del, global, try, except, finally, raise lambda, nonlocal, is, assert, yield]

#this is a comment //Single Line Comment. #is treated as a comment.

"""This is // Multi line Comment

Multi-line comments"""

Python Simple Programs print ("Hello World") a=5 b=5 c=a+b print ("Ans: ",c)

Runtime Input from console a=input("Enter A :") print (a)

13

Python Data Types: Chapter 2

Numbers, Strings, Tuple, Set, List, Dictionaries

Python data types are different in some aspects from other programming languages. Python interpreter can determine which type of data are storing, so no need to define the data type of memory location. There are many native (built-in) data types available in Python.

Numbers: int, float, Long, Complex

Literals: Any number or a string value. Literals can be defined as a data that is given in a variable or constant.

Variables Variable is a name which is used to refer memory location. A user defined container that can hold a literal value. The equal sign (=) is used to assign values to variables. a = 10 //Int b = 15.20 //float c = "Python" //String print (a)

1. Assigning single value to multiple variables x=y=z=50 2. Assigning multiple values to multiple variables: a, b, c=5,10,15 Identifiers Identifier is the name given to entities like class, object, functions, and variables. It helps differentiating one entity from another. Rules: Start with (A-Z a-z _) Case Sensitive. Hello hello, HELLO are 3 separate Identifiers.

14

Operators Chapter 2 Operators are particular symbols that are used to perform operations on operands. It returns result that can be used in application. Symbols are used to perform mathematical or logical manipulations.

 Arithmetic Operators ( + , - , * , / , % , ** , // )  Assignment Operators (=, += , -= , *= , /= , %= , **= , //= )  Comparison (Relational) Operators (==, !=, >, <, <=, >= )  Logical Operators ( and , or , not )  Identity Operators ( is, is not)  Bitwise Operators ( & , | , ^ , ~ , << , >> )  Membership Operators ( in , not in)

Python uses the precedence resolution order ( ), ** ,* , / , + , - PEMDAS is an acronym for the words parenthesis, exponents, multiplication, division, addition, subtraction.

Explicit Type Conversion:

In Explicit Type Conversion, users convert the data type of an object to required data type. We use the predefined functions like int(), float(), str(), x = 5 y = "2.5" Here “2.5” is a string. So that we convert into float data type z=x + float (y) print (z) Implicit Type Conversion In implicit Type conversion automatically convert the data type of an object. a=1 b=1.1 c=a+b print(c) // 2.1 print(type(c)) // float 15

Number Functions Chapter 2

A=int(5.1) # results in 5 A=int(“20”) #results in 20 A=int(“0100”,2) #results in 4, Binary string to Integer. A=float(2) #results in 2.0 A=float(“2”) #results in 2.0 A=float(“2.1”) #results in 2.0

print(chr(65)) #returns ASCII Character print(ord("A")) #returns ASCII Value print(abs(-010.010)) #returns absolute value

q,=divmod(5,3) print(q,r) //op (1,2) The divmod() function returns the quotient as well as the remainder of division round() function print(round(4.3333,2)) # op : 4.33 print(round(4.33)) # # op : 4 print(round(4.33,0)) # # op : 4.0

Math Functions import math #abs(x), pow(x,y) x=16 print (math.sqrt(x)) o/p : 4.0

16

import fractions Chapter 2 a=int(input("Enter the first number:")) 12 b=int(input("Enter the second number:")) 28 print("The GCD of the two numbers is",fractions.gcd(a,b)) 4 #GCD => Greatest Common Divisor

Packages import math import keyword is used to import modules into the current namespace. import math print(math.pi)

from math import cos

it is used to import specific attributes or functions into the current namespace from math import pi print(pi)

Check the versions of libraries import sys print(sys.path)

print('Python: {}'.format(sys.version)) import numpy print('numpy: {}'.format(numpy.__version__))

17

Decision Making Chapter 3 If- Else a=1 Syntax b=5 if (a>b): if expression1: print (a) statement(s) else: print (b) elif expression2: o/p: 5 statement(s)

else: statement(s)

Loops For & While sequence1 P Syntax for i in "Python": y for iterating in sequence: print (i) t statements(s) h o n

for i in ("Name1","Name2"): Name1 print (i) Name2

1 for i in (1,2,3,4,5): 2 print (i) 3

4

5

Range Syntax i=0 1 for iterating in range: for i in range(5): 2 statements(s) i=i+1 3 print (i) 4

5 0 for i in range(5): 1 print (i) 2 3 4 18

for i in range(1,5): 1 Chapter 3 print (i) 2 3 4

Factorial 1 fact=1 2 for i in range(1,6): 6 fact=fact*i 24 print (fact) 120

While 1 Syntax

2 while expression: i=0 3 statements(s) while (i<=5): 4 i=i+1 5 print(i) 6

Random Number import random for i in range(10): print (random.randrange(1,10))

List of array values x=[10,20,30,40,50] 20 for i in range(1,5): 30 40 print (x[i]) 50 Count Characters str="Python is an easy to learn, powerful programming language." c=0 for i in str: if((i=='p')|(i=='P')): c=c+1 print(c)

19

Python Date and Time Chapter 4

Python provides time package to deal with Date and time. It helps to retrieve current date and time and manipulation using built-in methods. In Python, date, time and datetime classes provides a number of function to deal with dates, times and time intervals. import datetime print (datetime.date.today())

d_date = datetime.datetime.now() print (d_date.strftime("%/%m/%Y %H:%M:%S %p"))

Strings

Strings are ordered block of text. Both single and double quotes can be used. A string is a sequence of characters. Python string is a built-in type text sequence. It is used to handle textual data in python.

String Special Operator and Accessing Python Strings str1="Hello"+"World" #concatenation + print("Hello"*3) #Repetition *

Built-in String Methods str1=”hello world” print(str1.capitalize()) //op : Hello world print(str1.count("l")) //op : 3 20

print(len(str1)) #string Length //op : 11 Chapter 4 print(str.upper()) print(str.lower()) print(str.strip()) # remove white space. lstrip() , rstrip()

count number of occurrences of substring str inside this string print(str1.count("l",0,len(str1))) #count(string,begin=0,end=n) find first occurence of str in string print(str1.find("o",0,len(str1))) print(str1.index("o",0,len(str1))) print(str.count("Andrew",0,len(str)))

Test Condition in a string str="Andrew Bergeran" print(str.endswith("geran",0,len(str))) True print(str.startswith("And",0,len(str))) True str="100" print(str.isdigit()) isalnum(), isalpha(), islower(), isupper(), isspace(), isnumeric() if str1==str2: print (“Equal”) else:

print ("Not Equal")

21

Split the String Sequence Chapter 4 str = "Hey there! what should this string be?" print("Split the words of the string: ", str.split(" "))

Slice

Slice starts from the Index value, does not include the index end. End consist Size/length.

Size / Length 1 2 3 4 5 6 7 Value W e l c o m e Forward Index 0 1 2 3 4 5 6 Backward Index -7 -6 -5 -4 -3 -2 -1

str1="Welcome" print(len(str1)) # Length //7 print(str1[0]) # Index // W print(str1[3:]) # slicing //come print(str1[:3]) # slicing // wel print(str1[3:7]) # Range slicing // come print(str1[::-1]) # reverse //emocleW print(str1[-1]) # Backward Index // e print(str1[1:-2]) # Slicing // elco print(str[0:7:2]) # str(start:end:stride) . // wloe stride is the index gap . By default stride is 1.

22

Sequence Chapter 4

A string is an ordered immutable sequence. Sequences are collection of items. It can be of any python type.

#Use of in operator to check for existence str="Andrew Bergeran" if 'Ber' in str:

print(True) else:

print(False)

Tuples

Tuples are ordered immutable sequences. It cannot update or change the values of tuple elements. Tuples can be thought of as read-only lists. tuple =(10,20,30,40,50) print (tuple[0])

Adding two Tuple tuple1=(10,20,30) tuple2=(0,5) c=(tuple1+tuple2) print(c)

Functions of Tuple print(len(tuple1)) print(max(tuple1)) print(min(tuple1))

#del tuple1;

Access tuple values: print(tuple1[0]) print(tuple1[0:2])

23

Sets Chapter 4

Sets are unordered immutable sequences of unique elements. Duplicate element discarded automatically.

A={10,20,30,40}

B={20,50,60}

C=A-B print(C) a but not in b {40, 10, 30}

C=A&B print(C) a and b, Intersection {20}

C=A|B print(C) a or b , Union {50, 20, 40, 10, 60, 30}

C=A^B print(C) a or only in b, exclusive or {40, 10, 50, 60, 30}

List

Lists are ordered mutable sequences; it can be increased or decreased automatically. Lists are similar to arrays in C. list can be of different data type. A List is a sequence of mutable Python objects.

list1=[10,20,30] print(list1) print(list1[0]) list1[0]=100 #updates element at index 0 (mutable)

24

Chapter 4 List =[10,20.5,'a','name'] print (list[3]) list[0]=100 print (list[0])

Functions of List print(min(list1)) print(max(list1)) print(len(list1))

a=[10,20,[1,2],30] #subset of List print(a) print(a[0]) print(a[2][1]) print(len(a)) print(a[1:4]) list1.sort() #sort the List list1.reverse() #reverse the list list1.insert(0,100) #100, inserted as index 0. list1.pop(0) #remove the element at index 0 list1.remove(20) #remove the value of 20 in list list1.append(25) #add 25 as last element of List print(list1.index(10)) #get value of the Index

25

Dictionaries Chapter 4

Python's dictionaries are kind of hash table type. They work like associative arrays or hashes found in Perl and consist of key-value pairs.

Dictionary is mutable i.e., value can be updated. Key must be unique and immutable. Value is accessed by key. Value can be updated while key cannot be changed. Dictionary is known as Associative array since the Key works as Index and they are decided by the user.

Unordered mutable collection of mapping objects. Key-Value pair. Can be nested, vary in size, Hash table based for efficiency. dict ={'id':10,'name':'Andrew'} print (dict['name'])

dict={'sno':101,'sname':'stud1','course':'CS'} print(dict['sno']) print(len(dict))

#dict.clear() print(dict) print(dict.keys()) print(dict.values()) print(dict.items())

26

Function Chapter 5 A function is a block of organized, reusable code that is used to perform a single, related action. Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ).

Syntax

def functionname( parameters ): "function_doc string" function suite return [expression]

def f1(): str="simple Function" print (str) f1()

Return Function def calc(x,y): z=x+y return z print(calc(5,5))

def even(a):

if a%2==0:

return True

else:

return False

27

ZIP Chapter 5

Iterates over multiple iterables in parallel. Returns tuples contain all elements of each iterable l1=[1,2,3] l2=["A","B","C"] for x in zip(l1,l2):

print(x) o/p : display (1,‟A‟) (2,‟B‟) (3,‟c‟) each tuple on separate line

ZIP using two different dataset iterables in parrallel

f1= open('emp.txt') # X Y Z f2= open('sal.txt') # 10000 20000 30000 for line1,line2 in zip(f1,f2):

print(line1+line2) # o/p

Z 10000 Y 20000 Z 30000

28

MAP Chapter 5

Applies a function to every element of input iterable def sum(a,b):

return a+b

L1=[1,2,3]

L2=[10,20,30]

L=[x for x in map(sum,L1,L2)] print(L) #o/p : [11,22,33]

Filter

Takes an expression returning True or False. def even(a):

if a%2==0:

return True

else:

return False

L1=[0,1,2,3,4,5,6,7,8,9,10]

L=[x for x in filter(even,L1)] print(L)

29

Lambda Chapter 5

Similar to inline functions of C, and also called anonymous functions in python. lambda expression is exhibiting properties of a function y=(lambda x : x * 5 ) ( 6 ) print(y) square=lambda x:x*x print(square(5))

Lambda finds a lot of use in map / filter applications. Lambda replacing the map function and eliminate the need of the sum/even function.

Map using Lambda function

L1=[1,2,3] L2=[5,6,7]

L= [ x for x in map(lambda a,b :a+b, L1,L2)] print(L)

Filter using lambda Function fibonacci=[0,1,1,2,3,5,8,13,21,34,55] oddnumbers=list(filter(lambda x:x%2,fibonacci)) print(oddnumbers) evennumbers=list(filter(lambda x:x%2==0,fibonacci)) print(evennumbers)

30

File Handling Chapter 6

fhand =open('newfile.txt','r') #r=read, rb=read binary, w=write,a=append str1= fhand.read() print (str1)

fout=open('newfile.txt','a') str2="Demo" fout.write(str2)

Exception Handling

An exception is an event, which occurs during the execution of a program that disrupts the normal flow of the program's instructions. When a Python script raises an exception, it must either handle the exception immediately otherwise it terminates and quits. Exception can be said to be any abnormal condition in a program resulting to the disruption in the flow of the program. try, raise , except, else, finally

try: program code except Exception: execute code else: In case of no exception, execute the else block code.

31

Chapter 6 except, except IndexError , except IndexError as e, except FileNotFoundError

try: fhand =open('newfile.txt','r') str1= fhand.read() print (str1) except IOError: print ("Error: can\'t find file or read data") else: print ("Written content in the file successfully")

Other Exceptions 1. ZeroDivisionError: Occurs when a number is divided by zero. 2. NameError: It occurs when a name is not found. It may be local or global. 3. IndentationError: If incorrect indentation is given. 4. IOError: It occurs when Input Output operation fails. 5. EOFError: It occurs when end of the file is reached and yet operations are being performed.

32

Class & Object Chapter 7

class stud: def f1(self): print ("Simple Class")

s1=stud() s1.f1()

Constructor

class stud: global a,b a=10 b=10 def __init__(self): print ("Default Constructor") def __init__(self,a,b): self.a=a self.b=b print (self.a, self.b) c=a+b print (c) def __del__(self): print (stud, "destroyed") s1=stud(2,2)

33

Inheritance Chapter 8

class stud: def f1(self): print ("Base Class") class dept(stud): def f2(self): print("Derived Class")

d1=dept() d1.f1() d1.f2()

Regular Expression

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. The module re provides full support for Perl-like regular expressions in Python.

Regular Expression: Metacharacters

\w alphanumeric \W non-alphanumeric

\d digit \D non- digit

\s whitespace character \S non- whitespace character

[a-z] character between a-z [0-9] character between 0-9

[a-z|A-Z] | or condition [^ac] except from this set

^ Beginning of the string $ end of the string

34

+ one or more occurrence of previous character Chapter 8

? zero or one occurrence of previous character

*. zero or more occurrence of previous character import re

#matches from beginning string =('hello [email protected] and us email [email protected]') matchobj=re.match('hello',string) print(matchobj.group()) #o/p hello

#searches through the string string =('hello [email protected] and contact us email [email protected]') searchobj=re.search('contact',string) print(searchobj.group()) #o/p contact

#findall list of matching string string =('hello [email protected] and us email [email protected]') findobj=re.findall('\S+@\S+',string) print(findobj) #o/p [email protected] [email protected]

35

string=("Andrew 9444473301 Bergeran 7904204339") Chapter 8 findobj=re.findall('[a-z|A-Z]+',string)

#findobj=re.findall('[0-9]+',string) print(findobj) #o/p Andrew , Bergeran

#match validation, matches from beginning string =('https://google.co.in') regex=re.compile("http://|https://") #matches any one character match=regex.match(string) #or match=re.match("http://|https://",string) if match:

print("Valid URL", match.group())

#print(match.span()) start and End index. else:

print("Invalid URL")

36

#search validation, searches through the string Chapter 8 string =('hello [email protected] and us email [email protected]') regex=re.compile("[email protected]") search=regex.search(string) #search=re.search("[email protected]",string) if search:

print ("Search Found : ", search.group()) else:

print ("Not Found")

Pattern Matching import re pattern=re.compile("^UG[-](\d\d)[-]F(CS|IT|CA)[-](\d\d)$") sid="UG-18-FCS-11" matchobject=pattern.match(sid) if(matchobject):

print("match found " + matchobject.group())

print("match found " + matchobject.group(1)) else:

print("notfound")

37

CGI – Common Gateway Interface Chapter 9

The Common Gateway Interface, or CGI, is a set of standards that define how information is exchanged between the web server and a custom script. The Common Gateway Interface, or CGI, is a standard for external gateway programs to interface with information servers such as HTTP servers. The line Content-type:text/html\r\n\r\n is part of HTTP header which is sent to the browser to the content.

CGI Programs

print ('Content-type:text/html\r\n\r\n') print ('') print ('') print ('Hello Word - First CGI Program') print ('') print ('') print ('

Python CGI- Program

') print ('') print ('')

38

CGI Chapter 9

First Name:
Last Name:

import cgi, cgitb form = cgi.FieldStorage() first_name = form.getvalue('first_name') last_name = form.getvalue('last_name') print ("Content-type:text/html\r\n\r\n") print ("") print ("") print ("Hello - Second CGI Program") print ("") print ("") print ("

Welcome %s and %s

" % (first_name, last_name)) print ("") print ("")

39

SQLite Chapter 10

Information retrieves from SQLite DB The Python standard for database interfaces is the Python DB-API. Most Python database interfaces adhere to this standard. Python database API supply with database interface for different databases. Python Database API supports a wide range of database servers such as: SQLite, MySQL, Microsoft SQL Server 2000, Oracle, Sybase, PostgreSQL, Informix, Inter-base, etc… import cgi, cgitb import sqlite3 print ("Content-type:text/html\r\n\r\n") conn = sqlite3.connect('Stud.db') print ("Opened database successfully") cursor = conn.execute("SELECT * from T1") for row in cursor: print ("Roll No = ", row[0], "
") print ("NAME = ", row[1], "
") print ("Course = ", row[2] , "
") print ("Operation done successfully") conn.close Information Stored to SQLite DB import cgi, cgitb import sqlite3 print ("Content-type:text/html\r\n\r\n") conn = sqlite3.connect('Stud.db') cursor = conn.execute ('insert into t1(rollno,sname,course) values(?,?,?)',(110,'amal','C#')) conn.commit() conn.close print ("ok")

40

Store informations into Database Chapter 10 import sqlite3 dbcon=sqlite3.connect('NIIT.db') query=dbcon.execute( "insert into stud(S_Id,S_Name,Course)values (?,?,?)" ,(7,'test2','s1')) dbcon.commit() dbcon.close print("Data Saved")

Retrieve information from data Database query=dbcon.execute("select * from stud") for row in query:

print(row[0])

print(row[1])

print(row[2])

dbcon.close

41

JSON Chapter 10

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. The json library can parse JSON from strings or files. The library parses JSON into a Python dictionary or list. It can also convert Python dictionaries or lists into JSON strings.

Data can be presented in different kinds of encoding such as CSV, XML, and JSON, etc. For each case the processing format is different. Python can handle various encoding processes, and different types of modules need to be imported to make these encoding techniques work. import json input = ''' [ { "id" : "01", "name" : "Amalesh" } , { "id" : "02", "name" : "Bergeran" } ]''' info = json.loads(input) print ('Usercount:',len(info)) for item in info: print ('Id',item['id']) print ('Name',item['name'])

Data Science Pandas : Data structures and analysis NumPy : Base n-dimensional array package Matplotlib : Comprehensive 2D/3D plotting SciPy : Fundamental library for scientific computing Sympy : Symbolic mathematics SciKit : Machine Learning Beautiful Soup : HTML/XML Parser, Web Scraping

42

Pandas Chapter 11

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

A fast and efficient DataFrame object. Tools for reading and writing data in different formats: CSV and text files, Microsoft Excel, SQL databases. Intelligent data alignment and integrated handling of missing data. Flexible reshaping and pivoting of data sets. Intelligent label-based slicing, fancy indexing, and subsetting of large data sets. High performance merging and joining of data sets. Time series-functionality. Highly optimized for performance

Pip install pandas import pandas as pd idata=pd.read_csv("C:/Users/admin/Desktop/Andrew/ins.csv") print(idata)

43

Chapter 11

#print(idata.head()) #print(idata.tail()) #print(idata[:3]) #subset #print(idata[::-1]) #reverse #print(idata[["age","sex"]]) #idata1=idata1[idata1['smoker']=='yes']

Data Frames import pandas as pd raw={'sid':[1,2,3],'sname':["A","B","C"]} raw=pd.DataFrame(raw,columns=['sid','sname']) print(raw)

Word Counter import collections raw1=["Cat", "Car","Bus", "Car"] print(collections.Counter(raw1)) #or using this from collections import Counter wordcount = Counter(raw1) print(wordcount)

44

Chapter 12 file = open("Data.txt", "r", encoding="utf-8") from collections import Counter wordcount = Counter(file.read().split()) print(wordcount)

NumPy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

A powerful N-dimensional array object Sophisticated (broadcasting) functions Tools for integrating C/C++ and FORTRAN code Useful linear algebra, Fourier transform, and random number capabilities pip install numpy import numpy as np max=np.max(idata['expenses']) print(max) min=np.min(idata['expenses']) print(min) sum=np.sum(idata['expenses']) print(sum) mean=np.mean(idata['expenses'])

45

Chapter 13 print(mean) median=np.median(idata['expenses']) print(median) std=np.std(idata['expenses']) print(std) des=idata['expenses'].describe() print(des)

Data Visualization

Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science.

Data visualization tools go beyond the standard charts and graphs used in Microsoft Excel spreadsheets, displaying data in more sophisticated ways such as info graphics

Data visualization has become modern business intelligence (BI). They are typically easier to operate than traditional statistical analysis software or earlier versions of BI software.

46

Matplotlib Chapter 13

Matplotlib is a Python 2D/3D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits. pip install matplotlib.pyplot import pandas as pd sdata=pd.read_csv("C:/Users/admin/Desktop/stud.csv") import numpy as np import matplotlib.pyplot as plt data=sdata[['Name','Physics','Chemistry','Maths']] print(data) data.plot(kind='bar') index=np.arange(5) plt.xticks(index,data['Name']) plt.yticks(np.arange(0,110,10)) plt.ylabel("Marks in %") plt.xlabel("Name of the Students") plt.title("performance of the Students") plt.show()

47

Chapter 13

Read and writing XLS Format

from pyexcel_xls import save_data from pyexcel_xls import read_data data={"sheet1":[[1,2,3],[4,5,6]]} save_data("t1.xls",data) print(save_data) data=read_data("t1.xls") print(data)

48

Read and writing XLSX Format Chapter 13 import xlsxwriter workbook=xlsxwriter.Workbook("t2.xlsx") worksheet=workbook.add_worksheet() worksheet.write('A1','Test Data') worksheet.write('A2','Test') worksheet.write('A3','Data') workbook.close()

import xlrd workbook = xlrd.open_workbook("t2.xlsx") worksheet = workbook.sheet_by_name("Sheet1") num_rows = worksheet.nrows num_cols = worksheet.ncols result_data =[] for curr_row in range(0, num_rows, 1): row_data = [] for curr_col in range(0, num_cols, 1): data = worksheet.cell_value(curr_row, curr_col) row_data.append(data)

result_data.append(row_data) print(result_data)

49

Chapter 13 Excel Chart import xlsxwriter workbook=xlsxwriter.Workbook("chart.xlsx") worksheet=workbook.add_worksheet() bold=workbook.add_format({'bold':1}) headings=['S.No','Mark 1','Mark 2'] data=[ [101,102,103,104,105], [80,90,85,95,70], [60,80,90,75,85], ] worksheet.write_row('A1',headings,bold) worksheet.write_column('A2',data[0]) worksheet.write_column('B2',data[1]) worksheet.write_column('C2',data[2]) S.No Mark 1 Mark 2 101 80 60 chart1=workbook.add_chart({'type':'column'}) 102 90 80 chart1.add_series({ 103 85 90 104 95 75 'name': '=Sheet1!$B$1', 105 70 85

'categories': '=Sheet1!$A$2:$A$6', 'values': '=Sheet1!$B$2:$B$6', }) chart1.add_series({ 'name': ['Sheet1',0,2], 'categories': ['Sheet1',1,0,6,0], 'values': ['Sheet1',1,2,6,2], })

chart1.set_title({'name':'Results'}) chart1.set_x_axis({'name':'Students'}) chart1.set_y_axis({'name':'Marks'}) chart1.set_style(10) worksheet.insert_chart('D2',chart1,{'x_offset':25,'y_offset':10}) workbook.close()

50

SciPy Chapter 14

SciPy library for scientific computing problems in Python. The SciPy library has several toolboxes to solve common scientific computing problems.

SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. With SciPy an interactive Python session becomes a data-processing and system-prototyping environment rivaling systems such as MATLAB, IDL, Octave, R-Lab, and SciLab. pip install scipy import pandas as pd idata=pd.read_excel("C:/Users/admin/Desktop/ins.xlsx") from scipy.stats import ttest_ind cat1=idata[idata["smoker"]=="yes"] cat2=idata[idata["smoker"]=="no"] print(ttest_ind(cat1['expenses'],cat2['expenses'],equal_var=False)) o/p Ttest_indResult(statistic=-0.4032651174742153, pvalue=0.6975832044571335) import pandas as pd import numpy as np from scipy import linalg mat = np.array([[2,1],[4,3]]) print(mat) print(linalg.det(mat)) #Compute the determinant of an array. print(linalg.inv(mat)) #Compute the (multiplicative) inverse of a matrix.

51

SymPy Chapter 15

pip install sympy

Symbolic computation deals with the computation of mathematical objects symbolically. This means that the mathematical objects are represented exactly, not approximately, and mathematical expressions with unevaluated variables are left in symbolic form.

SymPy is a Python library for symbolic mathematics. It aims to become a full- featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.

import numpy as np from sympy import * print(sqrt(8)) x, y, z = symbols('x y z ') print(expand((x + 1)**2)) print(diff(sin(x)*exp(x), x)) a = Rational(1,2) print(a)

52

SciKit Chapter 16

Scikit-learn is a machine learning library for the Python programming language.

It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and is designed to interoperate with the Python numerical and scientific libraries NumPy,SymPy and SciPy.

Simple and efficient tools for data mining and data analysis.

Supervised learning Classification Regression Unsupervised learning Clustering Density estimation and Visualization supervised learning, in which the data comes with additional attributes that we want to predict unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values.

53

Chapter 16 Training set and testing set Machine learning is about learning some properties of a data set and applying them to new data. This is why a common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the training set on which we learn data properties and one that we call the testing set on which we test these properties pip install scikit-learn pip install scipy or pip install scikit_learn-0.19b2-cp36-cp36m-win32.whl pip install scipy-1.0.1-cp36-none-win32.whl import numpy as np from sympy import * from sklearn import svm from sklearn import datasets # Load dataset iris = datasets.load_iris() clf = svm.LinearSVC() # learn from the data clf.fit(iris.data, iris.target) # predict for unseen data clf.predict([[ 5.0, 3.6, 1.3, 0.25]]) print(clf.coef_ )

54

Chapter 16 Linear Regression

Revenue TV NEWS 96 5 1.5 90 2 2 95 4 1.5 92 2.5 2.5 95 3 3.3 import pandas as pd df=pd.read_csv('Sales1.csv') x=df[['TV','NEWS']] //Data y=df[['Revenue']] //Target from sklearn.linear_model import LinearRegression linreg=LinearRegression().fit(x,y) print('R-Squared score(training):{:3f}'.format(linreg.score(x,y))) o/p R-Squared score(training):0.953869

55

Beautiful Soup Chapter 17

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. urllib.request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. We can download a webpages HTML using 3 lines of code: #pip install Beautifulsoup4 #pip install urllib3 import urllib.request x = urllib.request.urlopen(„https://www.python.org‟) print(x.read()) from bs4 import BeautifulSoup soup=BeautifulSoup(x,"html.parser") for script in soup(["script","style"]): script.extract() #remove CSS and Javascript text=soup.get_text() print(soup.title) print(soup.title.string) metatags = soup.find_all ('meta',attrs= {'name':'Keywords'}) for tag in metatags: print(tag) for a in soup.find_all('a', href=True): print ("URL: ", a['href'])

56

Web Scraping Chapter 17 import urllib.request from bs4 import BeautifulSoup import re req=urllib.request.Request('https://python.org') page=urllib.request.urlopen(req) #print(page) soup = BeautifulSoup(page, 'html.parser') print(soup.title) print(soup.title.string) links= soup.findAll('a', attrs={'href': re.compile("^https://")}) for link in links: names = link.contents[0] fullLink = link.get('href') print(names +" - "+fullLink) o/p Welcome to Python.org Docs - https://docs.python.org PyPI - https://pypi.python.org/ License - https://docs.python.org/3/license.html Beginner's Guide - https://wiki.python.org/moin/BeginnersGuide Developer's Guide - https://devguide.python.org/ FAQ - https://docs.python.org/faq/ Python Books - https://wiki.python.org/moin/PythonBooks Python Wiki - https://wiki.python.org/moin/ Code of Conduct - https://www.python.org/psf/codeofconduct/

57