Working with data

Working with Web Question for you: What are some examples of “large” data we have used in past Lectures/Labs/Psets?

Nature of Data: Static vs. Dynamic

CS111 Computer Programming Data Storage: In our own computers or somewhere else? Department of Computer Science Wellesley College

19-2

Examples of interesting Data Characteristics of Data o Weather o Data is stored in remote computers connected to the o Sensor data (earthquakes, sea levels, pollution, etc.) . o Sport scores o Companies such as and Facebook operate facilities o Movie schedules known as “data centers”. [Photos from Google Data o Yelp reviews about businesses Centers: o Facebook comments http://www.google.com/about/datacenters/gallery/#/ o Stock markets tech/1] o Economic development o Data is often stored in machine-readable format, such as o Public opinion JSON or XML or within a database. o Wellesley Fresh Menu o Etc. Common features of such data o Data can be displayed in a human-readable format such Continuous change as a web page that can be viewed with a browser. Too big to store in one place 19-3 19-4

Human-Readable Data Machine-Readable Data

Weather forecast Weather data from on Google Now the Weather Underground API

19-5 17-6

Bridging the Gap Things we need to know (at high-level)

o What is the Internet? o What is the Web (WWW)? o What are URLs? o What is an HTTP request? o What is HTML? o What is an API? o What is a Web API?

Once we have briefly looked at these concepts, we will learn how to write Python code that:

One way: Write Python programs o Reads data from remote machines with Web APIs. o Manipulates the JSON results to extract what we need. o Generates HTML pages to view on the browser. 19-7 19-8

What is the Internet? Clients and Servers

A system of interconnected computer Client networks that link sends a together billion of devices request using the TCP/IP communication protocols. Server Take CS242 Computer sends a Networks to learn more reply about TCP/IP.

19-9 19-10

Example: File Transfer Example: WWW

Client: You Server: Client: You Server: and your cs111.wellesley.edu and your cs.wellesley.edu browser. CyberDuck application.

Request: Here is a file to save in my cs server account. Request: Show me the schedule page. Response: Got it. Response: Here you go. 19-11 19-12 WWW = World Wide Web Internet vs. WWW

o The Internet is the physical network of computers all Tim Berners-Lee is the over the world. inventor of the WWW o The World Wide Web is a virtual network of websites (1989), an application connected by hyperlinks (or links). that runs on the Internet. o The Web is only one of the many applications that run He created: on the Internet. - URLs, o The Web uses the HTTP (Hyper Text Transfer - the HTTP protocol, Protocol) to allow clients and servers to communicate. - the HTML language o A client: Chrome, Safari, Firefox, Explorer. He didn’t patent his technology, he put it on the Internet for o Servers: nytimes.com, facebook.com, free, so that other people could build upon it. cs.wellesley.edu 19-13 19-14

What is a URL? Sidenote: Your own space on the server

o URL = Universal Resource Locator o Each of you has a folder in our server. o Specifies the location of a (web page, o Type the following URL in your web broswer (by image, sound file, movie, etc.) in a remote server using your personal account name): on the Internet. http://cs.wellesley.edu/~yourAccountName/ o Also known as a web address. TO DO: http://cs111.wellesley.edu/content/info/simple.html 1. Change something in the file simple.html. protocol domain name path file 2. Upload it to the public_html folder with CyberDuck. host server 3. View the page in the browser.

19-15 19-16 HTTP = HyperTextTransferProtocol An HTTP Request User requests the page:

http://cs111.wellesley.edu/content/info/simple.html

Browser prepares and sends the following to server:

A PC with an Internet A Mac with Safari Explorer 19-17 19-18

An HTTP Response Details about the process

The browser shows details of the HTTP connection header

content

Welcome to CS111!

Learn about Web APIs

... 19-19 19-20 Using Python’s requests module Getting the content

import requests httpResp = requests.get("http://cs111.wellesley.edu/ In [8]: httpResp.content content/info/simple.html") Out [5]: '\n \n\n \n \n \n' \n\n\n

Welcome to CS111!

\n In [6]: httpResp.status_code

Learn about Web APIs

\n \n

Today Out [6]: 200 our topic is how to use Web APIs in our In [7]: httpResp.headers programs.

\n \n

Some examples of Web Out [7]: {'Content-Length': '497', 'Accept-Ranges': APIs are:

\n \n
    \n
  1. Facebook 'bytes', 'Server': 'Apache/2.2.15 (Red Hat)', 'Last- Graph API
  2. \n
  3. Google Maps API
  4. \n Modified': 'Mon, 14 Nov 2016 19:55:37 GMT',
  5. Twitter API
  6. \n
  7. WellesleyFreshPal 'Connection': 'close', 'Date': 'Tue, 15 Nov 2016 API
  8. \n
\n\n\n' 14:49:48 GMT', 'Content-Type': 'text/html; charset=UTF-8'} 19-21 19-22

Print the content HTML – Language of Web Pages HTML = HyperText Markup Language In [8]: print httpResp.content

Welcome to CS111!

Learn about Web APIs

19-23 19-24 Things we need to know (at high-level) Big idea number 1: Abstraction o What is the Internet? o What is the Web (WWW)? o What are URLs? o What is an HTTP request? Contract / API o What is HTML? Implementer / User / Client o What is an API? Designer o What is a Web API?

Now that we have briefly looked at these concepts, we will learn how to write Python code that:

o Reads data from remote machines with Web APIs. o Manipulates the JSON results to extract what we need. o Generates HTML pages to view on the browser. *Visit the Python standard library for some useful Python contracts, which are known as Application Programming Interfaces (APIs). 19-25 19-26

The API for the math module API = Application Programming Interface

Documentation that shows how to access functions and data, without knowing the implementation details.

Python and the math library reside inside our computer.

4-27 19-28 Web APIs Remember Location Tracking in PS7? Documentation that shows how to access functions and https://maps.googleapis.com/maps/api/staticmap? data, which are located in a remote on the size=600x600&markers=label:S| Internet. 42.29408,-71.30208&markers=label:E| 42.29493,-71.30498&path=42.29408,-71.30208| 42.29405,-71.30114|42.29405,-71.30029|42.2948,-71.29971| 42.2953,-71.29925|42.29547,-71.30099|42.29549,-71.30237| 42.29501,-71.30384|42.29493,-71.30498

An example of using a method in a remote server to generate the image of a map with a path on it.

API: Google Static Maps https://developers.google.com/maps/documentation/static-maps/intro 19-29 19-30

Step-by-step [1] Step-by-step [2]

request https://maps.googleapis.com/maps/api/staticmap request https://maps.googleapis.com/maps/api/staticmap? size=600x600 response The Google Maps API server rejected your request. Invalid request. Missing the 'size' parameter. response

Use your browser to follow along with these steps. Enter the URL and then see the result.

19-31 19-32 Step-by-step [3] Step-by-step [4]

request https://maps.googleapis.com/maps/api/staticmap? size=600x600&markers=label:S|42.29408,-71.30208& request https://maps.googleapis.com/maps/api/staticmap? markers=label:E|42.29493,-71.30498 size=600x600&markers=label:S|42.29408,-71.30208 response response

19-33 19-34

Exercise: Find all Middletown-s in the Google Geocoding API U.S. request https://maps.googleapis.com/maps/api/geocode/json? request https://maps.googleapis.com/maps/api/geocode/json? address=106+Central+St,Wellesley,MA address=Middletown

response A JSON string that contains information about the found towns. response

desired Middletown, CT, USA Geocoding output Middletown, NJ, USA Provide text address, get Middletown, OH, USA back latitude/longitude Middletown, NY 10940, USA values. Middletown, RI, USA Middletown, DE, USA 19-35 19-36