<<

CS 655 / 441 Fall 2007 Lecture 9a: Sessions and Cookies

1 Review: Structure of a Web Application

On every interchange between client and server, server must: • Parse request. • Look up session state and global state. • Change session and global states based on request. • Choose which page/screen/interface to show user next. • Create and send that page, possibly with a cookie. Code to act on the user input must be distinct from code to produce user output because the same input must sometimes produce different outputs at different times. The script model of one page per URL does not work. Therefore: • Server dispatches to controller based on URL. • Controller makes necessary changes in session and global states. • Controller calls View at end of execution.

1.1 Templates • A template is a basic HTML page stored in a file. • Placeholders are left in the file. • The servlet fills in the placeholders, thereby combining template with data to produce HTML sent to browser.

1.2 Scripts • Servlets are written as mixture of HTML code and jsp or Java code. • A pre-processor produces the final servlet by putting println() statements around HTML portions of script and removing the jsp tag markers.

1.3 Templates vs. Scripts 1.3.1 Scripts • Easy to write. • Easy to understand. • Presentation (web designer’s job) and computation (web application programmer’s job) mixed to- gether in source file.

1.4 Templates • A bit more structured. • Computation and presentation well separated. We will use templates in this course. CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 2

2 The Model

For web applications, a model consists of one global state and many session states, one of which is active for a given session.

2.1 Global Persistent State Example: database of airline times, census records, etc. • The Server stores global persistent state is stored in a database of a type appropriate for the application. • This state is shared by all requests — thus, by all clients. • What happens to state in server crashes? • Concurrent access?

2.2 User Persistent State • Some aspects of user state should be persistent and therefore, stored in a database. For example, Walgreens keeps track of all of my prescriptions. Barnes and Noble maintain my wish-list (in addition to my shopping cart). • Many web apps permanently keep track of users. • User state objects should be easily loaded and stored — an OO database is nice. (This should be a completely separate database, with a different administrator, from the list of things for sale.) • Cookies should not be used to maintain the user persistent state because they are not secure and they are not reliable (easily and often deleted).

2.3 User Logins and Session State Session state involves a single user at a single session. It is separate from the the User Persistent State — it is discarded at the end of the session. • A Session is created when the user first enters an application. • The act of logging in links the current session state object to the persistent user state object. • Cookies are often used to maintain part of the session state. • Unique cookies are stored in each browser. • The server recognizes cookies on subsequent interchanges. • The cookie ID is used in the server to look up the session state. • Cookies issued by server all must be unique. Example: Shopping carts are often part of the session state.

2.4 Multi-Threaded Session Issues • Multiple threads (from multiple sessions) may be running same servlet object at same time. • Servlets may be created and destroyed (and moved between JVMs) at will by servlet container. • Two servlets may access the same global or session state at same time. • Conclusion: Servlets must be thread-safe. • Best practice: don’t store state information in servlets, use global and session states instead. CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 3

3 Cookies

[http://en.wikipedia.org/wiki/HTTP_cookie] javax.servlet.http.Cookie lecture02.Servlet5.java

Technically, a cookie is an arbitrary piece of data chosen by the and sent to the browser1. Cookies are identified by the triple name/domain/path. The browser returns a cookie to the server with every succeeding request to that server, introducing a state (memory of previous events) into otherwise stateless HTTP transactions. Without cookies, each retrieval of a or component of a Web page is an isolated event, mostly unrelated to all other views of the pages of the same site. By returning a cookie to a web server, the browser provides the server a means of connecting the current page view with prior page views.

3.1 Purposes • Cookies are used by Web servers to differentiate users and to maintain data related to the user during navigation. They were introduced to provide a way for realizing a “shopping cart”. • Cookies are used to support logins; they allow a server to know whether a user is already authenticated, and therefore is allowed to access services or perform operations that are restricted to logged-in users. The procedures is as follows:

– The server receives a username and password from an HTML login and checks them; if correct, it sends back a page confirming that login has been successful together with a cookie, and storing the pair cookie as part of the session state. – Every time the user requests a page from the server, the browser sends the cookie back to the server.

The server compares the cookie with its list of stored cookies; if a match is found, the server knows which user has requested that page. This method is commonly used by sites that allow logging in, such as Yahoo!, Wikipedia, and Facebook. • Cookies can be used for allowing users to express preferences (presentation and functionality) about a Web site. Users select their preferences by entering them in a Web form and submitting it to the server. The server encodes them in a cookie and sends it back to the user’s browser. After that, the browser will send the cookie to the server on every access. This allows the server to consistently adapt to user preferences. For example, the engine allows the user to choose how many results are to be shown for every query. This choice is maintained in a cookie, across sessions, and the cookie is sent to the server on future sessions. Google stores the user preferences in a cookie of name PREF. This cookie is created with default values when the user accesses the site for the first time. The default cookie value contains the string NR=10, to indicate a preference of ten hits displayed in each page. If the user changes this number to 20 in the preference page, the server modifies the cookie with NR=20. • Cookies can be used for tracking the path of a user while visiting the web pages of a site.

1The term “cookie” is derived from “magic cookie,” a well-known concept in unix computing which inspired both the idea and the name of HTTP cookies. CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 4

– If the user requests a page of the site, but the request contains no cookie, the server presumes that this is the first page visited by the user; the server creates a random string and sends it as a cookie back to the browser together with the requested page. – From this point on, the cookie will be automatically sent by the browser to the server every time a new page from the site is requested. – The server sends the page as usual, but also stores the URL of the requested page along with the date/time and the cookie in a log database, using the random number in the cookie as the lookup key. – By searching the log DB for requests with this cookie number, it is possible to find out which pages, and in which sequence, the user has visited.

One application of tracking is to producing usage statistics. Another is be to remember where a user has been. For example, suppose you were browsing a non-secure . There might be several places on that website where you could choose to log in to the secure part of the site. After logging in, you want the server to return you to the page that you were working on before the login. A temporary cookie is one way to do this. • Images or other objects contained in a Web page may reside in servers different from the one holding the page. In order to show such a page, the browser downloads all these objects, possibly receiving cookies. These cookies are called third-party cookies if the server sending them is located outside the domain of the Web page. This condition is common with on-line advertisement. Indeed, web banners are typically stored in servers of the advertising company, which are not in the domain of the Web pages showing them. Third-party cookies are used to create an anonymous profile of the user. This allows the advertising company to select the banner to show to a user based on the user’s profile. The advertising industry has denied any other use of these profiles. Many modern browsers block third party cookies if requested by the user. If third-party cookies are not rejected by the browser, an advertising company can track a user across the sites where it has placed a banner. In particular, whenever a user views a page containing a banner, the browser retrieves the banner from a server of the advertising company. If this server has previously set a cookie, the browser sends it back, allowing the advertising company to link this access with the previous one. By choosing a unique banner URL for every Web page where it is placed or by using the HTTP referer field, the advertising company can then find out which pages the user has viewed. The same technique can be used with web bugs. These, unlike the obvious banners, are images embedded in the Web page that are undetectable by the user (e.g. they are tiny and/or transparent)

3.2 Implementation. Cookies can be set by a web server, or by a script in a language such as JavaScript, if supported and enabled by the . Cookie specifications say that an internet browser should be able to store at least 300 cookies of 4 kilobytes each, and at least 20 cookies per server or domain. Common browsers support 30 to 50 cookies per domain. In practice cookies must be smaller than 4k. (Internet Explorer imposes a 4k total for all cookies stored in a given domain.) Cookie names are case insensitive (standard RFC 2965, October 2000).

Setting a cookie. Transfer of Web pages follows the HTTP Protocol; a browser requests a page from a web server by sending it an HTTP request. For example, a browsers can connect to the server www.w3.org sending it this request: CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 5

Client Browser Server

User visits a web application URL Server creates a for this session. Server returns page + . Browser stores User requests another page; browser sends back Server identifies returning user. Another response.

Interaction continues... Sends URL & cookie. Server identifies returning user.

Figure 1: HTTP cookie exchange protocol.

GET /index. HTTP/1.1 Host: www.w3.org The server replies by sending the requested page preceded by a similar packet of text, called an HTTP header. This packet may contain lines requesting the browser to store cookies:

HTTP/1.1 200 OK Content-type: text/html Set-Cookie: name=value

(content of page) The following is an actual response containing a cookie, generated by Google:

Figure 2: A response header containing a set-cookie request.

Returning a cookie. The line Set-cookie is a request for the browser to store the string name=value and send it back in all future requests to the server. If the browser supports cookies and cookies are enabled, every subsequent page request to the same server contains the cookie. For example, the browser requests the page http://www.w3.org/spec.html by sending the server www.w3.org a request like the following: GET /spec.html HTTP/1.1 Host: www.w3.org Cookie: name=value Accept: */*? This is a request for another page from the same server, and differs from the first one above because it contains the string that the server has previously sent to the browser. This way, the server knows that this request is related to the previous one. The server answers by sending the requested page, possibly adding other cookies as well. The value of a cookie can be modified by the server by sending a new Set-Cookie: oldname=newvalue line in response of a page request. The browser then replaces the old value with the new one. The term CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 6

“cookie crumb” is sometimes used to refer to the name-value pair. Since cookies are identified by the triple name/domain/path, the same name but different domains or paths identify different cookies with possibly different values. As a result, cookie values are changed only if a new value is given for the same name, domain, and path. The Set-Cookie line is typically not created by the HTTP server itself but by a program. The HTTP server only sends the result of the program (a document preceded by the header containing the cookies) to the client browser. Cookies can also be set by JavaScript or similar scripts running within the browser. In JavaScript, the object document.cookie is used for this purpose. For example, the instruction document.cookie = "temperature=20" creates a cookie of name temperature and value 20.

Cookie attributes. As a minimum, a cookie contains a name/value pair. It may also contain an expiration date, a path, a domain name, and whether the cookie is intended only for encrypted connections2. These pieces of data follow the name=value pair and are separated by semicolons. For example, a cookie can be created by the server by sending a line Set-Cookie: name=value; expires=date; path=/; domain=.example.org If the domain and path are not specified, they default to the domain and path in the HTTP request. The domain and path strings may tell the browser to send the cookie when it normally would not. For security reasons, the cookie is accepted only if the server is a member of the domain specified by the domain string.

Persistent cookies. If the cookie setter specifies a deletion date, the cookie will be removed on that date. If the cookie setter does not specify a date, the cookie is removed once the user quits his or her browser. The expiration date is specified in the “Wdy, DD-Mon-YYYY HH:MM:SS GMT” format. Specifying a date allows a cookie survive across sessions. For example, a shopping site could use persistent cookies to store the shopping carts. If the user’s system crashed, or he left his browser without making a purchase, the contents of the cart would still be there when he later returned.

Expiration. Cookies expire under any of these conditions: • At the end of the user session (i.e. when the browser is shut down) if the cookie is not persistent • An expiration date has been specified, and has passed • The expiration date of the cookie is changed (by the server or the script) to a date in the past. This provision allows a server or script to delete a cookie. • The browser deletes the cookie by user request

A sample cookie. The following is an actual cookie sent by a Web server (identifying information has been changed): Set-Cookie: RMID=732423sdfs73242; expires=Fri, 31-Dec-2010 23:59:59 GMT; path=/; domain=.example.net The name of this particular cookie is RMID; its value is the string a random number. The server can use an arbitrary string as the value of a cookie, and it is a good idea to use a string that cannot be easily guessed. An alternative is to form a cookie value by collapsing the value of a number of variables in a single URL- encoded string, like for example a=12&b=abcd&c=32. The path and domain strings / and .example.net tell the browser to send the cookie when requesting an arbitrary page of the domain .example.net, with an arbitrary path.

2Standard RFC 2965 also specifies that cookies must have a mandatory version number, but this is usually omitted. CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 7

4 Alternatives to Cookies

Some of the operations that can be realised using cookies can also be realised using other mechanisms. However, these alternatives to cookies have their own drawbacks, which make cookies usually preferred to them in practice. Most of the following alternatives allow for user tracking, even if not as reliably as cookies. As a result, privacy is an issue even if cookies are rejected by the browser or not set by the server.

IP address. An unreliable technique for tracking users is based on storing the IP addresses of the comput- ers requesting the pages. However, this technique only allows tracking and cannot replace cookies in their other uses. IT-address tracking is typically less reliable in identifying a user than cookies because comput- ers and proxies may be shared by several users, and the same computer may be assigned different Internet addresses in different work sessions (this is often the case for dial-up connections). Moreover, tracking by IP address can be impossible with systems, such as Tor, that are used to retain Internet anonymity. With such systems, not only could one browser carry multiple addresses throughout a session, but multiple users could appear to be coming from the same IP address, thus making IP address useless for tracking/ Finally, some major ISPs, including AOL, route all web traffic through a small number of proxies, making ISP tracking wholly unworkable.

URL (query string) A more precise technique is based on embedding information into . The query string part of the URL is the one that is typically used for this purpose, but other parts can be used as well. The PHP session mechanism uses this method if cookies are not enabled. This method consists of the Web server appending query strings to the links of a Web page it holds when sending it to a browser. When the user follows a link, the browser returns the attached query string to the server. Query strings used in this way are very similar to cookies, both being arbitrary pieces of information chosen by the server and sent back by the browser. However, there are some differences: since a query string is part of a URL, if that URL is later reused, the same attached piece of information is sent to the server. For example, if the preferences of a user are encoded in the query string of a URL and the user sends this URL to another user by e-mail, those preferences will be used for that other user as well. Moreover, even if the same user accesses the same page two times, there is no guarantee that the same query string is used in both views. For example, if the same user arrives to the same page but coming from a page internal to the site the first time and from an external search engine the second time, the relative query strings are typically different while the cookies would be the same. Other drawbacks of query strings are related to security: storing data that identifies a session in a query string enables or simplifies session fixation attacks, referer logging attacks and other security exploits. Trans- ferring session identifiers as HTTP cookies is more secure.

Hidden form fields A form of session tracking, used by ASP.NET, is to use web forms with hidden fields. This technique is very similar to using URL query strings to hold the information and has many of the same advantages and drawbacks. However, it presents two advantages from the point of view of the tracker: first, having the tracking information placed in the HTML source rather than the URL means that it is not noticed by the average user; second, the session information is not copied when the user copies the URL (to save the page on disk or send it via email, for example). A drawback of this technique is that session information is in the HTML code; therefore, each web page must be generated dynamically each time someone requests it, placing an additional workload on the web server.

HTTP authentication For authentication, HTTP supports two protocols: basic access authentication and digest access authentication, both of which allow access to a Web page only when the user has provided the CS 655 / 441 Fall 2007: Lecture 9a: Sessions and Cookies 8 correct username and password. If the server requires such credential for granting access to a Web page, the browser requests them to the user; once obtained, the browser stores and uses them also for accessing subsequent pages, without requiring the user to provide them again. From the point of view of the user, the effect is the same as if cookies were used: username and password are only requested once, and from that point on the user is given access to the site. In the basic access authentication protocol, a combination of username and password is sent to the server in every browser request. This means that someone listening in on this traffic can simply read this information and store for later use. This problem is overcome in the digest access authentication protocol, in which the username and password are encrypted using a random number created by the server.