M. Weintraub CS4500: BASIC SECURITY

Thanks go to Christo Wilson and Nate Derbinsky for allowing incorporation of their materials CORE SECURITY CONCERNS

1. Confidentiality • Information protection from unauthorized access or disclosure

2. Integrity • Information protection from unauthorized modification or destruction

3. Availability • System protection from unauthorized disruption

2 SMALL SAMPLING OF RECENT HEADLINES

May 4, 2017

River City Media March, 2017

3 SECURITY IS A TRADE-OFF

NEED: You need/want to expose a system to specific groups or individuals

RISK: Someone/something will gain access that shouldn't or prevent someone from accessing the system who should

DECISION: Balancing the NEED against the RISK

Security is a cost. The goal is to maximize the impact on the RISK while considering the cost.

4 STORING PASSWORDS

5 AUTHENTICATION ▪ How you verify an actor’s identity

▪ Critical for security of systems ▪ Permissions, capabilities, and access control are all contingent upon knowing the identity of the actor

▪ Typically parameterized as a username and a secret ▪ The secret attempts to limit unauthorized access

6 TYPES OF SECRETS

Actors provide their secret to log-in to a system

Three classes of secrets: 1. Something you know Example: a password 2. Something you have Examples: a smart card or smart phone 3. Something you are Examples: fingerprint, voice scan, iris scan

7 ATTACKER GOALS AND THREAT MODEL

▪ Assume we have a system storing usernames and passwords ▪ The attacker has access to the password I wanna login to database/file those user accounts! Database User Password Cracked Passwords cbw p4ssW0rd User Password sandi puppies cbw p4ssW0rd amislove 3spr3ss0 sandi puppies amislove 3spr3ss0

8 CHECKING PASSWORDS ▪ System must validate passwords provided by users →passwords must be stored somewhere ▪ Simple storage: plain text

password.txt

cbw p4ssw0rd sandi i heart doggies amislove 93Gd9#jv*0x3N bob security

9 PASSWORD FILES CAN BE STOLEN ▪ Attackers often compromise systems ▪ They may be able to steal the password file ▪ Linux: /etc/shadow ▪ Windows: c:\windows\system32\config\sam

▪ If the passwords are plain text, what happens? The attacker can now log-in as any user, including root/administrator

Passwords should never be stored in plain text

10 HIDING THE VALUES: HASHED PASSWORDS

▪ Key idea: store encrypted versions of passwords ▪ Use one-way cryptographic hash functions ▪ Examples: MD5, SHA1, SHA256, SHA512, bcrypt, PBKDF2, scrypt ▪ Cryptographic hash function transform input data into scrambled output data ▪ Deterministic: hash(A) = hash(A) ▪ High entropy: ▪ MD5(security) = e91e6348157868de9dd8b25c81aebfb9 ▪ MD5(security1) = 8632c375e9eba096df51844a5a43ae93 ▪ MD5(Security) = 2fae32629d4ef4fc6341f1751b405e45 ▪ Collision resistant ▪ Locating A’ such that hash(A) = hash(A’) takes a long time (hopefully) ▪ Example: 221 tries for md5

11 HASHED PASSWORD EXAMPLE MD5(p4ssw0rd) = 2a9d119df47ff993b662a8ef36f9ea20 User: cbw

MD5(2a9d119df47ff993b662a8ef36f9ea20) = b35596ed3f0d5134739292faa04f7ca3

hashed_password.txt

cbw 2a9d119df47ff993b662a8ef36f9ea20 sandi 23eb06699da16a3ee5003e5f4636e79f amislove 98bd0ebb3c3ec3fbe21269a8d840127c bob e91e6348157868de9dd8b25c81aebfb9

12 ATTACKING PASSWORD HASHES

▪ Recall: cryptographic hashes are collision resistant ▪ Locating A’ such that hash(A) = hash(A’) takes a long time (hopefully)

▪ Are hashed password secure from cracking? ➢ No!

▪ Problem: users choose poor passwords ▪ Most common passwords: 123456, password ▪ Username: cbw, Password: cbw

▪ Weak passwords enable dictionary attacks

13 MOST COMMON PASSWORDS

Rank 2013 2014 2015 2016

1 123456 123456 123456 123456

2 password 123456 123456 password

3 12345678 12345 12345678 12345

4 qwerty 12345678 qwerty 12345678

5 abc123 qwerty 12345 football

6 123456789 123456789 123456789 qwerty

7 111111 1234 football 1234567890

8 1234567 baseball 1234 1234567

9 iloveyou dragon 1234567 princess

10 adobe123 football baseball 1234 14 DICTIONARY ATTACKS

English List of hashed_ Dictionary possible password.txt password hashes

Common Passwords

Common for 60-70% of hashed passwords to be cracked in <24 hours

15 HARDENING PASSWORD HASHES

▪ Problem: cryptographic hashes are deterministic ▪ hash(p4ssw0rd) = hash(p4ssw0rd) ▪ This enables attackers to build lists of hashes

▪ Solution: make each password hash unique ▪ Add a salt to each password before hashing ▪ hash(salt + password) = password hash ▪ Each user has a unique, random salt ▪ Salts can be stores in plain text

16 EXAMPLE SALTED HASHES

hashed_password.txt

cbw 2a9d119df47ff993b662a8ef36f9ea20 sandi 23eb06699da16a3ee5003e5f4636e79f amislove 98bd0ebb3c3ec3fbe21269a8d840127c bob e91e6348157868de9dd8b25c81aebfb9

hashed_and_salted_password.txt

cbw a8 af19c842f0c781ad726de7aba439b033 sandi 0X 67710c2c2797441efb8501f063d42fb6 amislove hz 9d03e1f28d39ab373c59c7bb338d0095 bob K@ 479a6d9e59707af4bb2c618fed89c245

17 ATTACKING SALTED PASSWORDS No matches

List of hash() possible hashed_ password and_salted_ password.txt hashes

cbw a8 List of sandi 0X possibleList of hash(‘a8’ + word) possible cbw XXXX hash(‘0X’ + word) password sandi YYYY hashespassword w/ salthashes a8 w/ salt 0X

18 BREAKING HASHED PASSWORDS

Stored passwords should always be salted ▪ A modern x86 server can hash all possible 6 ▪ Forces the attacker to brute-force each character long passwords in 3.5 hours password individually ▪ Upper and lowercase letters, numbers, symbols ▪ (26+26+10+32)6 = 690 billion combinations ▪ Problem: it is now possible to compute hashes very quickly ▪ A modern GPU can do the same thing in 16 ▪ GPU computing: hundreds of small CPU minutes cores ▪ nVidia GeForce GTX Titan Z: 5,760 cores ▪ Most users use (slightly permuted) dictionary words, no symbols ▪ GPUs can be rented from the cloud very cheaply ▪ Predictability makes cracking much faster ▪ Lowercase + numbers  (26+10)6 = 2B ▪ 2x GPUs for $0.65 per hour (2014 prices) combinations

19 HARDENING SALTED PASSWORDS

▪ Problem: typical hashing algorithms are too fast ▪ Enables GPUs to brute-force passwords

▪ Old solution: hash the password multiple times ▪ Known as key stretching ▪ Example: crypt used 25 rounds of DES

▪ New solution: use hash functions that are designed to be slow ▪ Examples: bcrypt, PBKDF2, scrypt ▪ These algorithms include a work factor that increases the time complexity of the calculation ▪ scrypt also requires a large amount of memory to compute, further complicating brute-force attacks

20 BCRYPT EXAMPLE

Python example; install the bcrypt package

[cbw@ativ9 ~] python Work factor >>> import bcrypt >>> password = “my super secret password” >>> fast_hashed = bcrypt.hashpw(password, bcrypt.gensalt(0)) >>> slow_hashed = bcrypt.hashpw(password, bcrypt.gensalt(12)) >>> pw_from_user = raw_input(“Enter your password:”) >>> if bcrypt.hashpw(pw_from_user, slow_hashed) == slow_hashed: … print “It matches! You may enter the system” … else: … print “No match. You may not proceed”

21 NONE OF THESE TECHNIQUES ARE PERFECT, SO YOU MUST ACCEPT THERE WILL BE BREACHES ▪ Suppose you build an extremely secure password storage system ▪ All passwords are salted and hashed by a high-work factor function

▪ It is still possible for a dedicated attacker to steal and crack passwords ▪ Given enough time and money, much becomes possible ▪ E.g. The NSA

▪ Question: is there a principled way to detect password breaches?

22 HONEYWORDS

▪ Key idea: store multiple salted/hashed passwords for each user ▪ As usual, users create a single password and use it to login ▪ User is unaware that additional honeywords are stored with their account

▪ Implement a honeyserver that stores the index of the correct password for each user ▪ Honeyserver is logically and physically separate from the password database ▪ Silently checks that users are logging in with true passwords, not honeywords

▪ What happens after a data breach? ▪ Attacker dumps the user/password database… ▪ But the attacker doesn’t know which passwords are honeywords ▪ Attacker cracks all passwords and uses them to login to accounts ▪ If the attacker logs-in with a honeyword, the honeyserver raises an alert!

23 HONEYWORDS EXAMPLE Cracked Passwords User PW 1 PW 2 PW 3

cbw 123456 p4ssW0rd Turtles! cbw sandi puppies iloveyou blizzard SHA512(“fI” | “p4ssW0rd”)  bHDJ8l amislove coff33 3spr3ss0 qwerty

Database Honeyserver !

User Salt 1 H(PW 1) Salt 2 H(PW 2) Salt 3 H(PW 3) User Index cbw aB y4DvF7 fI bHDJ8l 52 Puu2s7 cbw 2 sandi 0x plDS4F K2 R/p3Y8 8W S8x4Gk sandi 3 amislove 1 amislove 9j 0F3g5H /s 03d5jW cV 1sRbJ5 24 TAKEAWAYS

1. Never store passwords in plain text 2. Always salt and hash passwords before storing them 3. Use hash functions with a high work factor 4. Implement honeywords to detect breaches

These rules apply anytime you need to authenticate users ▪ Operating systems, , etc.

25 PASSWORD QUALITY

퐿 푆 푆 = 푙표푔2 푁  퐿 = 푙표푔2 푁

▪ How do we measure password quality? Entropy ▪ N – the number of possible symbols (e.g. lowercase, uppercase, numbers, etc.) ▪ S – the strength of the password, in bits ▪ L – the length of the password ▪ Formula tells you length L needed to achieve a desired strength S… ▪ … for randomly generated passwords ▪ Is this a realistic measure in practice?

26 WHY YOU WANT LONG, RANDOM PASSWORDS

푆 = 퐿 ∗ 푙표푔2푁

200 26+26+10 Characters Very 175 Strong 150 26+26 Characters 125 26 Characters 100

75 Strength (Bits) Strength 50 Very 25 Weak 0 0 5 10 15 20 25 30 35

Password Length (Characters) 27 EVEN WITH GOOD PASSWORDS AND STRONG STORAGE, PEOPLE ARE A WEAK LINK

▪ Most people have difficulty remembering >4 passwords ▪ Thus, people tend to reuse passwords across services ▪ What happens if any one of these services is compromised?

▪ Service-specific passwords are a beneficial form of compartmentalization ▪ Limits the damage when one service is inevitably breaches

▪ Some service providers now check for password reuse ▪ Forbid users from selecting passwords that have appeared in leaks

28 PASSWORDS ARE FORGOTTEN/LOST, WHICH MEANS WE NEED TO RESET THEM

▪ Problem: hashed passwords cannot be recovered (hopefully)

“Hi… I forgot my password. Can you email me a copy? Kthxbye”

• This is why systems typically implement password reset – Use out-of-band info to authenticate the user – Overwrite hash(old_pw) with hash(new_pw) • Be careful: it is possible to crack password reset

29 LONG, WEIRD SECURE PASSWORDS AND USER CENTERED DESIGN

://imgs.xkcd.com/comics/password_strength.png

30 PUBLIC SERVICE ANNOUNCEMENT: HAVE YOUR CREDS BEEN COMPROMISED?

▪ Check: ';--have i been pwned? ▪ User/e-mail ▪ Services ▪ Common passwords

▪ CHANGE YOUR CRED'S!

31 SECURING APPS

32 NEARLY EVERYTHING IS NOW A WEB-APP

▪ The Web has become a powerful platform for developing and distributing applications ▪ Huge user population ▪ Relatively easy to develop and deploy cross-platform

▪ Platform has evolved significantly ▪ Very simple model initially ▪ Today, it is a mix of client- and server-side components ▪ Web services and APIs are now ubiquitous

▪ Geared towards an open model ▪ On the client-side, all documents, data, and code are visible/modifiable ▪ Commonplace to link to or share data/code with other (untrusted) sites

33 TYPICAL WEB ARCHITECTURE POST-2015

Client Side Protocols Server Side Network Protocols HTML Network Protocols Database HTML Parser FTP Application Document HTTP 1.0/1.1 Code Model JS (Java, PHP, and JS Runtime HTTP 2.0 Python, Renderer SSL and TLS Node, etc) CSS CSS Parser Websocket

Storage Cookies

34 WEB APPS ARE VULNERABLE

Web apps can be hacked →Data breaches and theft →Site Defacement →Invasion of privacy →Malware distribution →Beachhead into internal networks

35 BREAKING DOWN THE WEB: HTTP PROTOCOL IS TEXT BASED AND STATELESS

▪ Hypertext Transfer Protocol ▪ Intended for downloading HTML documents ▪ Can be generalized to download any kind of file ▪ HTTP message format ▪ Text based protocol, typically over TCP ▪ Stateless ▪ Requests and responses must have a header, body is optional ▪ Headers includes key: value pairs ▪ Body typically contains a file (GET) or user data (POST)

36 HTTP METHODS

Verb Description

GET Retrieve resource at a given path

HEAD Identical to a GET, but response omits body

POST Submit data to a given path, might create resources as new paths

Submit data to a given path, creating resource if it exists or modifying existing PUT resource at that path

DELETE Deletes resource at a given path

TRACE Echoes request

OPTIONS Returns supported HTTP methods given a path

CONNECT Creates a tunnel to a given network location 37 PROVIDING STATE: COOKIES

▪ Cookies are a basic mechanism for ▪ Originally introduced in 1994 in Mosaic persistent state Netscape 0.9beta ▪ Allows services to store a small amount of data at the client (usually ▪ Originally developed for MCI (now ~4K) Verizon) for an e-commerce ▪ Often used for identification, application authentication, user tracking ▪ Wanted an improved mechanism for access control vs. HTTP Authentication ▪ Attributes ▪ Domain and path restricts resources ▪ Cookies allow the server to set some browser will send cookies to state on the client ▪ Expiration sets how long cookie is valid ▪ State is reflected back to the server in every HTTP request ▪ Additional security restrictions (added much later): HttpOnly, Secure 38 SESSION COOKIE EXAMPLE

1. Client submits login credentials 2. App validates credentials 3. App generates and stores a cryptographically secure session identifier ▪ e.g., Hashed, encoded nonce 4. App uses Set-Cookie to set session ID 5. Client sends session ID as part of subsequent requests using Cookie 6. Session dropped by cookie expiration or removal of server-side session record

nonce is an arbitrary number that may only be used once. See nonce word for origin.

39 SESSION COOKIES

Advantages Disadvantages 1. Flexible – authentication delegated to 1. Flexible – authentication delegated to app layer app layer 2. Support for logout 2. Session security depends on 3. Large number of ready-made session secrecy, unpredictability, and management frameworks tamper-evidence of cookie

40 SOP: SAME ORIGIN POLICY

▪ Same-Origin Policy (SOP) states that subjects from one origin cannot access objects from another origin

▪ SOP is the basis of classic web security ▪ Some exceptions to this policy, as we will see ▪ SOP has been relaxed over time to make controlled sharing easier

▪ For cookies: ▪ Domains are the origins ▪ Cookies are the subjects

41 JAVASCRIPT – FOR ANYTHING YOU WANT DONE DYNAMICALLY CLIENT-SIDE

▪ Language for browser scripting ▪ Developed by Brendan Eich for Netscape in 10 days ▪ Now used server-side, embedded in documents like PDFs, etc.

▪ Features ▪ Dynamic, weakly-typed ▪ Prototype-based inheritance ▪ First-class functions

▪ Many frameworks for abstracting the DOM (jQuery), templating and view management (angular), visualization (d3js)

42 JAVASCRIPT EXAMPLES

▪ Inline

▪ Embedded

▪ External

43 WHAT JAVASCRIPT LOOKS LIKE var n = 1; var fn = function(msg) { var s = 'what'; // ... }; var fn = function(x, y) { return x + y; addEventListener('click', fn, } false); var arr = ['foo', 'bar', 0]; var obj = { msg: s, op: fn, };

44 JSON (DATA STRUCTURE)

{"menu": object { "id": "file", "value": "File", pair "popup": { "menuitem": [ array

{"value": "New", "onclick": "CreateNewDoc()"}, members {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] } } }

45 EXAMPLE OF USING JAVASCRIPT AND WHAT IT MEANS FOR SECURITY

This is my page.

This is my page.

Can JS from google.com read password?

Can JS in the main context do the following: document.getElementById(‘goog’).cookie?

46 SAME ORIGIN POLICY APPLIES HERE TOO

▪ Origin =

▪ The Same-Origin Policy (SOP) states that subjects from one origin cannot access objects from another origin ▪ This applies to JavaScript ▪ JS from origin D cannot access objects from origin D’ ▪ E.g. the iframe example ▪ However, JS included in D can access all objects in D

• E.g.

47 ATTACKING WEB CLIENTS

48 YOUR BROWSER STORES A LOT OF SENSITIVE INFORMATION

1. Your browsing history 2. Saved usernames and passwords 3. Saved forms (i.e. credit card numbers) 4. Cookies (especially session cookies)

▪ Browsers try their hardest to secure this information ▪ i.e. prevent an attacker from stealing this information

However, nobody is perfect ;)

49 THE CLIENT IS THE WEAK-LINK

▪ Attacker’s goal: ▪ Steal information from your browser (i.e. your session cookie for bofa.com)

▪ Browser’s goal: isolate code from different origins ▪ Don’t allow the attacker to exfiltrate private information from your browser

▪ Attackers capability: trick you into clicking a link ▪ May direct to a site controlled by the attacker ▪ May direct to a legitimate site (but in a nefarious way…)

50 CROSS-SITE SCRIPTING (XSS)

▪ XSS means running code from an untrusted origin ▪ Usually a result of a document integrity violation

▪ Documents are compositions of trusted, developer-specified objects and untrusted input ▪ Allowing user input to be interpreted as document structure (i.e., elements) can lead to malicious code execution

▪ Typical hacker goals ▪ Steal authentication credentials (session IDs) ▪ Or, more targeted unauthorized actions

51 XSS TYPES (NOT EXHAUSTIVE)

1. Reflected ▪ Code is included as part of a malicious link ▪ Code included in page rendered by visiting link ▪ So the mal-ware can access cookies,etc. legitimately

2. Stored ▪ Attacker submits malicious code to server ▪ Server app persists malicious code to storage ▪ Victim accesses page that includes stored code ▪ E.g. stored as a form, push, I open form, voila!

52 VULNERABLE , TYPE 1

▪ Suppose we have a search site, www.websearch.com

http://www.websearch.com/search?q=Christo+Wilson

Web Search

Results for: Christo Wilson

Christo Wilson – Professor at Northeastern http://www.ccs.neu.edu/home/cbw/index.html

53 REFLECTED XSS ATTACK

http://www.websearch.com/search?q=

1) Send malicious link to the victim

websearch.com

Origin: www.websearch.com session=xI4f-Qs02fd evil.com 54 VULNERABLE WEBSITE, TYPE 2

▪ Suppose we have a social network, www.friendly.com

friendly

What’s going on?

I hope you like pop-tarts ;)

Update Status

55 VULNERABLE WEBSITE, TYPE 2

▪ Suppose we have a social network, www.friendly.com

friendly

Latest Status Updates

I hope you like pop-tarts ;) Monday, March 23, 2015

56 STORED XSS ATTACK

friendly.com

Origin: www.friendly.com session=xI4f-Qs02fd evil.com 57 MITIGATING XSS ATTACKS FROM THE SERVER SIDE

1. Validate ALL INPUT

2. FILTER ALL OUTPUT

There are protections on the client side, but these are outside our discussion.

58 VALIDATE ANY AND ALL INPUTS x = request.args.get('msg') if not is_valid_base64(x): abort(500)

▪ Goal is to check that application inputs are "valid" ▪ Request parameters, header data, posted data, etc.

▪ Assumption is that well-formed data should also not contain attacks ▪ Also relatively easy to identify all inputs to validate

▪ However, it's difficult to ensure that valid == safe ▪ Much can happen between input validation checks and document interpolation

▪ Assume the worst: the client is clueless or evil. Design for that.

59 OUTPUT SANITIZATION

{{sanitize(data)}}
Template run server side to insert dynamic content ▪ Another approach is to sanitize untrusted data during interpolation ▪ Remove or encode special characters like ‘<‘ and ‘>’, etc. ▪ Easier to achieve a strong guarantee that script can't be injected into a document ▪ But, it can be difficult to specify the sanitization policy (coverage, exceptions) ▪ Must take interpolation context into account ▪ CDATA, attributes, JavaScript, CSS ▪ Nesting! ▪ Requires a robust browser model

60 CHALLENGES OF SANITIZING DATA

HTML Sanitization

User Info

Hi {{user.name}}

Attribute Sanitization

Was this sanitized by the server?

61 ATTACKING WEB SERVERS

62 THE STORY SO FAR…

▪ Thus far, we have looked at client-side attacks ▪ The attacker wants to steal private info from the client ▪ Attacker uses creative tricks to avoid SOP restrictions ▪ Web servers are equally nice targets for attackers ▪ Servers often have access to large amounts of privileged data ▪ E.g. personal information, medical histories, financial data, etc. ▪ Websites are useful platforms for launching attacks ▪ E.g. Redirects to drive-by installs, clickjacking, etc.

63 WEB ARCHITECTURE CIRCA-2015 Protocols Server Side

Database FTP Network Protocols Application Code HTTP 1.0/1.1 (Java, PHP, Python, HTTP 2.0 Node, etc) SSL and TLS CGI Scripts Websocket

64 MODEL-LAYER VULNERABILITIES

▪ Web apps typically require a persistent store, often a relational database (increasingly not) ▪ Structured Query Language (SQL) is a popular interface to relational databases

65 SQL

SELECT user, passwd, admin FROM users; INSERT INTO users(user) VALUES('admin'); UPDATE users SET passwd='...' WHERE user='admin'; DELETE FROM users WHERE user='admin';

▪ Relatively simple declarative language for definition relational data and operations over that data ▪ Note again a natural distinction between structure and content ▪ Common operations: ▪ SELECT retrieves data from the store ▪ INSERT adds data to the store ▪ UPDATE modified data in the store

▪ DELETE removes data from the store 66 SQL INJECTION

▪ SQL queries often involve untrusted data ▪ App is often responsible for interpolating user data into queries ▪ Insufficient sanitization could lead to modification of query semantics ▪ Confidentiality – modify queries to return unauthorized data ▪ Integrity – modify queries to perform unauthorized updates

67 SQL INJECTION EXAMPLES Original query: “SELECT name, description FROM items WHERE id=‘” + req.args.get(‘id’, ‘’) + “’”

Result after injection: SELECT name, description FROM items WHERE id='12' UNION SELECT username, passwd FROM users;--';

Original query: “UPDATE users SET passwd=‘” + req.args.get(‘pw’, ‘’) + “’ WHERE user=‘” + req.args.get(‘user’, ‘’) + “‘”

Result after injection: UPDATE users SET passwd='...' WHERE user='dude' OR 1=1;--';

▪ Similarly to XSS, problem often arises when delimiters are unfiltered

68 SQL INJECTION EXAMPLES

Original query: SELECT * FROM users WHERE id=$user_id;

Result after injection: SELECT * FROM users WHERE id=1 UNION SELECT ... --;

▪ Vulnerabilities also arise from improper validation ▪ e.g., failing to enforce that numbers are valid

69 BLIND SQL INJECTION

▪ Basic SQL injection requires knowledge of the schema ▪ e.g., knowing which table contains user data, and the structure of that table ▪ Blind SQL injection leverages information leakage ▪ Used to recover schemas, execute queries ▪ Requires some observable indicator of query success or failure ▪ e.g., a blank page (success/true) vs. an error page (failure/false) ▪ Leakage performed bit-by-bit

70 BLIND SQL INJECTION

▪ Given the ability to execute queries and an oracle, extracting information is then a matter of automated requests 1. "Is the first bit of the first table's name 0 or 1?" 2. "Is the second bit of the first table's name 0 or 1?" 3. ...

71 SQL INJECTION DEFENSES

SELECT * FROM users WHERE user='{{sanitize($id)}}';

▪ Sanitization ▪ Prepared statements ▪ Trust the database to interpolate user data into queries correctly ▪ Object-relational mappings (ORM) ▪ Libraries that abstract away writing SQL statements ▪ Java – Hibernate ▪ Python – SQLAlchemy, Django, SQLObject ▪ Ruby – Rails, Sequel ▪ Node.js – Sequelize, ORM2, Bookshelf ▪ Domain-specific languages ▪ LINQ (C#), Slick (Scala), ... 72 WHAT ABOUT NOSQL?

▪ SQL databases have fallen out of favor versus NoSQL databases like MongoDB and Redis ▪ Are NoSQL databases vulnerable to injection? ▪ YES. ▪ All untrusted input should always be validated and sanitized ▪ Even with ORM and NoSQL

73 COMMON GATEWAY INTERFACE (CGI)

▪ CGI was the original means of presenting dynamic content to users ▪ Server-side generation of content in response to parameters ▪ Well-defined interface between HTTP input, scripts, HTTP output ▪ Scripts traditionally reside in /cgi-bin ▪ Many improved standards exist (FastCGI, WSGI) ▪ Often, these CGI scripts invoke other programs using untrusted input

74 CGI SHELL INJECTION

[email protected] [email protected]; nc –l 1337 –e /bin/sh; @app.route('/email') cat def email_message(): email = req.args.get('email', '') msg = req.args.get('msg', '') cmd = 'sendmail -f {0} [email protected]'.format(email) p = subprocess.Popen( cmd, stdin=subprocess.PIPE, shell=True) # ...

▪ Shell injection still prevalent on the Web today 75 UNRESTRICTED UPLOADS

▪ Analogous to command injection, apps are often vulnerable to unrestricted uploads ▪ i.e., file injection ▪ One obvious attack is to upload a malicious CGI script ▪ Can trick users into visiting the script ▪ Or, attack the site ▪ Many other possibilities ▪ Upload malicious images that attack image processing code ▪ DoS via upload of massive files ▪ Overwrite critical files

76 GENERAL TIPS

▪ Never trust user input ▪ Always sanitize!! ▪ Heartbleed

▪ Secure software is hard, prone to mistakes ▪ Don’t try to re-implement crypto ▪ Implement multiple layers of security ▪ Principle of least privilege

▪ Assume hacks will succeed ▪ Oracle: 80% of data loss from internal sources ▪ Audit logs

77 XKCD

78 BACKUP

79 HOW THE WEB WORKS

80 THE WEB

▪ The Web has become a powerful platform for developing and distributing applications ▪ Huge user population ▪ Relatively easy to develop and deploy cross-platform

▪ Platform has evolved significantly ▪ Very simple model initially ▪ Today, it is a mix of client- and server-side components ▪ Web services and APIs are now ubiquitous

▪ Geared towards an open model ▪ On the client-side, all documents, data, and code are visible/modifiable ▪ Commonplace to link to or share data/code with other (untrusted) sites

81 WEB APPS ARE VULNERABLE

As the popularity of the Web has grown, attackers have shifted their focus towards it Web apps often possess large degrees of authority ▪ Access to sensitive data on the client- and server-side

Web apps can be exploited to abuse this authority ▪ Data breaches and theft ▪ Site Defacement ▪ Invasion of privacy ▪ Malware distribution ▪ Beachhead into internal networks

82 TIMELINE

▪ 1991: First version of Hypertext Markup Language (HTML) released by Sir Tim Berners- Lee ▪ Markup language for displaying documents ▪ Contained 18 tags, including anchor () a.k.a. a

▪ 1991: First version of Hypertext Transfer Protocol (HTTP) is published ▪ Berners-Lee’s original protocol only included GET requests for HTML ▪ HTTP is more general, many request (e.g. PUT) and document types

83 1992:

84 WEB ARCHITECTURE CIRCA-1992

Client Side Protocols Server Side Network Protocols HTML Network Protocols HTML Parser Gopher Document Renderer FTP HTTP

85 HTML

▪ Hypertext Markup Language ▪ HTML 2.0  3.2  4.0  4.01 → XHTML 1.1 → XHTML 2.0 → HTML 5

▪ Syntax ▪ Hierarchical tags (elements), originally based on SGML

▪ Structure ▪ contains metadata ▪ contains content

86 HTML HTML may embed other resources from the same origin Hello World … or from other origins

Hello World

(cross origin

embedding) I am 12 and what is this?

87 HTTP PROTOCOL

▪ Hypertext Transfer Protocol ▪ Intended for downloading HTML documents ▪ Can be generalized to download any kind of file ▪ HTTP message format ▪ Text based protocol, typically over TCP ▪ Stateless ▪ Requests and responses must have a header, body is optional ▪ Headers includes key: value pairs ▪ Body typically contains a file (GET) or user data (POST) ▪ Various versions ▪ 0.9 and 1.0 are outdated, 1.1 is most common, 2.0 has just been ratified

88 HTTP MESSAGES

89 HTTP METHODS

Verb Description

GET Retrieve resource at a given path

HEAD Identical to a GET, but response omits body

POST Submit data to a given path, might create resources as new paths

Submit data to a given path, creating resource if it exists or modifying existing PUT resource at that path

DELETE Deletes resource at a given path

TRACE Echoes request

OPTIONS Returns supported HTTP methods given a path

CONNECT Creates a tunnel to a given network location 90 HTTP AUTHENTICATION

▪ Access control mechanism built into HTTP itself

▪ Server indicates that authentication is required in HTTP response

▪ WWW-Authenticate: Basic realm="$realmID"

▪ Client submits base64-encoded username and password in the clear

▪ Authorization: Basic BASE64($user:$passwd)

▪ HTTP is stateless, so this must be sent with every request

▪ No real logout mechanism

▪ Digest variant uses hash construction (usually MD5)

▪ Is HTTP Authentication a good idea?

91 TIMELINE

▪ 1994: Cookies introduced in Mosaic Netscape 0.9beta

▪ Originally developed at the behest of MCI for an e-commerce application ▪ Wanted an improved mechanism for access control vs. HTTP Authentication

▪ Cookies allow the server to set some state on the client ▪ State is reflected back to the server in every HTTP request

92 COOKIES

▪ Cookies are a basic mechanism for persistent state ▪ Allows services to store a small amount of data at the client (usually ~4K) ▪ Often used for identification, authentication, user tracking

▪ Attributes ▪ Domain and path restricts resources browser will send cookies to ▪ Expiration sets how long cookie is valid ▪ Additional security restrictions (added much later): HttpOnly, Secure

▪ Manipulated by Set-Cookie and Cookie headers

93 COOKIE EXAMPLE Client Side Server Side

GET /login_form.html HTTP/1.0

HTTP/1.0 200 OK

POST /cgi/login.sh HTTP/1.0

HTTP/1.0 302 Found Set-Cookie: logged_in=1;

This is a terrible, GET /private_data.html HTTP/1.0 insecure way to Cookie: logged_in=1; implement login cookies 94 SESSION COOKIE EXAMPLE

1. Client submits login credentials 2. App validates credentials 3. App generates and stores a cryptographically secure session identifier ▪ e.g., Hashed, encoded nonce ▪ e.g., HMAC(session_id) 4. App uses Set-Cookie to set session ID 5. Client sends session ID as part of subsequent requests using Cookie 6. Session dropped by cookie expiration or removal of server-side session record

95 SESSION COOKIES

Advantages Disadvantages 1. Flexible – authentication delegated to 1. Flexible – authentication delegated to app layer (vs. HTTP Authentication) app layer 2. Support for logout 2. Session security depends on 3. Large number of ready-made session secrecy, unpredictability, and management frameworks tamper-evidence of cookie

96 MANAGING STATE

▪ Each origin may set cookies ▪ Objects from embedded resources may also set cookies

▪ When the browser sends an HTTP request to origin D, which cookies are included? ▪ Only cookies for origin D that obey the specific path constraints

97 SAME ORIGIN POLICY

▪ The Same-Origin Policy (SOP) states that subjects from one origin cannot access objects from another origin ▪ SOP is the basis of classic web security ▪ Some exceptions to this policy, as we will see ▪ SOP has been relaxed over time to make controlled sharing easier

▪ In the case of cookies ▪ Domains are the origins ▪ Cookies are the subjects

98 TIMELINE

▪ 1995: JavaScript introduced with Netscape Navigator 2.0 ▪ Netscape allowed Java plugins to be embedded in webpages ▪ Designed to be a lightweight alternative to Java for beginners ▪ No relationship to Java, other than the name

▪ 1996: Microsoft introduces JScript and VBScript with IE 3.0 ▪ JScript was similar, but not identical to, JavaScript (embrace, extend, extinguish)

99 JAVASCRIPT

▪ Language for browser scripting ▪ Developed by Brendan Eich for Netscape in 10 days ▪ Now used server-side, embedded in documents like PDFs, etc.

▪ Features ▪ Dynamic, weakly-typed ▪ Prototype-based inheritance ▪ First-class functions

▪ Many frameworks for abstracting the DOM (jQuery), templating and view management (angular), visualization (d3js)

100 JAVASCRIPT

▪ Inline ▪

▪ Embedded ▪

▪ External ▪

101 JAVASCRIPT var n = 1; var fn = function(msg) { var s = 'what'; // ... }; var fn = function(x, y) { return x + y; addEventListener('click', fn, } false); var arr = ['foo', 'bar', 0]; var obj = { msg: s, op: fn, };

102

Provides an API for accessing browser state and frame contents ▪ Accessible via JavaScript

1. Browser state ▪ Document, windows, frames, history, location, navigator (browser type and version) 2. Document ▪ Properties – e.g., links, forms, anchors ▪ Methods to add, remove, modify elements ▪ Ability to attach listeners to objects for events (e.g. click, mouse over, etc.)

103 JAVASCRIPT AND THE DOM

window.location = 'http://google.com/';

document.write('');

var ps = document.getElementsByTagName('p');

var es = document.getElementById('msg'); es = es.firstChild; es.innerHTML(‘A new link to Google');

alert(‘My cookies are: ' + document.cookie);

104 SAME ORIGIN POLICY, REVISITED

▪ Javascript enables dynamic inclusion of objects

document.write('');

▪ A webpage may include objects and code from multiple domains ▪ Should Javascript from one domain be able to access objects in other domains?

105 MIXING ORIGINS

This is my page.

This is my page.

Can JS from google.com read password?

Can JS in the main context do the following: document.getElementById(‘goog’).cookie?

106 SAME ORIGIN POLICY

▪ Origin =

▪ The Same-Origin Policy (SOP) states that subjects from one origin cannot access objects from another origin ▪ This applies to JavaScript ▪ JS from origin D cannot access objects from origin D’ ▪ E.g. the iframe example ▪ However, JS included in D can access all objects in D

• E.g.

107 TIMELINE

▪ 1996: Netscape releases first implementation of Secure Socket Layer (SSLv3) ▪ Attributed to famous cryptographer Tahar Elgamal ▪ SSLv1 and SSLv2 had serious security problems and were never seriously released

▪ 1996: W3C releases the spec for Cascading Style Sheets (CSS1) ▪ First proposed by Håkon Wium Lie, now at ▪ Allows developers to separate content and markup from display attributes ▪ First implemented in IE 3, no browser was fully compatible until IE 5 in 2000

108 CSS

▪ Cascading stylesheets ▪ Language for styling HTML ▪ Decoupled from content and structure ▪ Selectors ▪ Match styles against DOM elements (id, class, positioning in tree, etc.) ▪ Directives ▪ Set style properties on elements

109 CSS

1. Inline ▪

2. Embedded

3. External ▪

110 CSS

body { font-family: sans-serif; }

#content { width: 75%; margin: 0 auto; }

a#logo { background-image: url(//img/logo.png); }

.button { // ... } Beware: some

p > span#icon { browsers allow JS background-image: url(':...'); inside CSS }

111 WEB ARCHITECTURE CIRCA-1992

Client Side Protocols Server Side Network Protocols HTML Network Protocols HTML Parser Gopher Document Renderer FTP HTTP

112 WEB ARCHITECTURE CIRCA-2015

Client Side Protocols Server Side Network Protocols HTML Network Protocols Database HTML Parser FTP Application Document HTTP 1.0/1.1 Code Model JS (Java, PHP, and JS Runtime HTTP 2.0 Python, Renderer SSL and TLS Node, etc) CSS CSS Parser Websocket

Storage Cookies

113 TIMELINE

▪ 1999: Microsoft enables access to IXMLHttpRequest ActiveX plugin in IE 5 ▪ Allows Javascript to programmatically issue HTTP requests ▪ Adopted as closely as possible by Netscape’s Gecko engine in 2000 ▪ Eventually led to AJAX, REST, and other crazy Web-dev buzzwords

114 XMLHTTPREQUEST (XHR)

▪ API for asynchronous network requests in JavaScript ▪ Browser-specific API (still to this day) ▪ Often abstracted via a library (jQuery) ▪ SOP restrictions apply (with exceptions; more on this later) ▪ Typical workflow ▪ Handle client-side event (e.g. button click) ▪ Invoke XHR to server ▪ Load data from server (HTML, XML, JSON) ▪ Update DOM

115 XHR EXAMPLE

116 XHR VS. SOP

▪ Legal: requests for objects from the same origin $.get('server.php?var=' + my_val); ▪ Illegal: requests for objects from other origins ▪ Why not? $.get(‘https://facebook.com/’);

▪ Mitigations ▪ JSONP (old-school, unsafe hack) ▪ XDR and CORS (modern techniques) ▪ More on these later…

117 SENDING DATA OVER HTTP

▪ Four ways to send data to the server ▪ Embedded in the URL (typically URL encoded, but not always) ▪ In cookies (cookie encoded) ▪ Inside a custom HTTP request header ▪ In the HTTP request body (form-encoded)

POST /purchase.html?user=cbw&item=iPad&price=399.99#shopping_cart HTTP/1.1 … other headers… 1 Cookie: user=cbw; item=iPad; price=399.99;2 X-My-Header: cbw/iPad/399.99 3 user=cbw&item=iPad&price=399.99 4

118 TRANSPORT LAYER SECURITY

119 SSL/TLS

▪ Application-layer protocol for confidentiality, integrity, and authentication between clients and servers ▪ Introduced by Netscape in 1995 as the Secure Sockets Layer (SSL) ▪ Designed to encapsulate HTTP, hence HTTPS ▪ Transport Layer Security (TLS) is the upgraded standard ▪ Defined in an RFC in 1999 ▪ Supersedes SSL: SSL is known to be insecure and should not be used ▪ Sits between transport and application layers ▪ Thus, applications must be TLS-aware ▪ Both client and server must have an asymmetric keypair ▪ X.509 certificates contain signed public keys ▪ PKI rooted in trusted (?) Certificate Authorities (CAs)

120 • GOALSConfidentiality OFand integrity: TLS use BofA’s public key to negotiate a session key; encrypt all traffic • Authentication: BofA’s cert can be validating by checking Verisign’s signature Verisign SVerisign

Trusted Key Store

Verisign

BofA SBofA • Contains BofA’s public key • Signed by Verisign 121 LET’S TALK ABOUT CERTIFICATES

▪ Suppose you start a new website and you want TLS encryption ▪ You need a certificate. How do you get one? ▪ Option 1: generate a certificate yourself ▪ Use openssl to generate a new asymmetric keypair ▪ Use openssl to generate a certificate that includes your new public key ▪ Problem? ▪ Your new cert is self-signed, i.e. not signed by a trusted CA ▪ Browsers cannot authenticate your cert to a trusted root CA ▪ Users will be shown a scary security warning when they visit your site ▪ Option 2: ▪ Pay a well-known CA to sign your certificate ▪ Any browser that trusts the CA will also trust your new cert ▪ Option 3: ▪ Let’s Encrypt https://letsencrypt.org ▪ Free, automated, and open Certificate Authority

122 TLS TIPS

▪ Software configuration (e.g. Apache, nginx) is tough, can lead to (exploitable) mistakes ▪ Prefer strong ciphers, disable old protocols, … ▪ Check Qualsys:

123 CONFUSED DEPUTIES

▪ Original example due to Hardy in 1985 1. Compiler is installed in SYSX directory 2. Compiler has permissions to write into SYSX directory, also allows users to pass a path for debug output 3. Billing information for company stored at SYSX/BILL 4. Malicious user gives SYSX/BILL as the debug path 5. Compiler happily overwrites the company's billing records ▪ In this example, the compiler is a confused deputy ▪ Tricked into performing a malicious action that it had permissions to perform, but should not have ▪ Classic argument for capabilities

124 SESSION FIXATION

Nyan Cat Part 2 ▪ There are many ways to get session cookies wrong, but here's one example ▪ Session fixation allows an attacker to fix a victim's session ID to a known value ▪ Attacker tricks victim into clicking on a link containing an attacker-supplied session ID ▪ Attack requires IDs to be accepted from request parameters ▪ Web app must accept IDs it did not itself generate ▪ Furthermore, session IDs in request parameters can leak via Referer header

125 CLICKJACKING

▪ Clickjacking, or UI redressing, is an attack on the browser user interface ▪ Attacker’s website loads a target site in an iframe ▪ Overlays an invisible UI on top of the legitimate UI ▪ User believes they're interacting with the visible layer ▪ But, they're really interacting with the invisible one

126 CLICKJACKING

▪ Attack exploits the lack of a trusted path to the user ▪ Websites may mix (and overlay) content from different origins ▪ User has no reliable indicator to tell what origin they're interacting with ▪ Attack has been used in many ways ▪ Tricking users into disabling Flash security settings ▪ Twitter worm propagation ▪ Facebook likes (likejacking)

127 MITIGATING CLICKJACKING

▪ Attack depends on hidden overlays, usually using frames ▪ Can defend using framebusting JS

if (top != self) top.location.replace(location);

▪ Or, use anti-framing security headers ▪ Heuristic defenses ▪ Look for invisible, high z-index frames (ClearClick) ▪ Creating trusted paths ▪ Prevent windows/frames from occluding each other (Gazelle research browser)

128 ANTI-FRAMING

▪ The ability to frame a site confers power to an adversary ▪ X-Frame-Options is a recent security header that specifies a site's desired framing policy ▪ Introduced in IE8 RC1 ▪ Browser enforces the policy specified by the HTTP server

X-Frame-Options: DENY | SAMEORIGIN | ALLOW-FROM origin

129 RESPONSE SPLITTING

@app.route('/oldurl') def do_redirect(): # ... url = request.args.get('u', '') resp.headers['Location'] = url return resp

▪ Response splitting is an attack against the integrity of responses issued by a server ▪ Similar to, but not the same, as XSS ▪ Simplest example is redirect splitting ▪ Apps vulnerable when they don't filter delimiters from untrusted inputs that appear in Location headers

130 WORKING EXAMPLE

131 @app.route('/oldurl') def do_redirect(): RESPONSE SPLITTING EXAMPLE # ... url = request.args.get('u', '') resp.headers['Location'] = url return resp

Moral of this story: always sanitize data from the user!

132 CROSS-SITE REQUEST FORGERY (CSRF)

▪ CSRF is another of the basic web attacks ▪ Attacker tricks victim into accessing URL that performs an unauthorized action ▪ Avoids the need to read private state (e.g. document.cookie) ▪ Abuses the SOP ▪ All requests to origin D* will include D*’s cookies ▪ … even if some other origin D sends the request to D*

133 VULNERABLE WEBSITE

Bank of Welcome, Christo Washington Account Transfer Invest Learn Locations Contact

Transfer Money

To:

Amount: Transfer

134 Client Side Server Side GET /login_form.html HTTP/1.1

1) GET the login HTTP/1.1 200 OK page

2) POST POST /login.php HTTP/1.1 username and HTTP/1.1 302 Found password, Set-Cookie: session=3#4fH8d%dA1; HttpOnly; Secure; receive a session cookie GET /money_xfer.html HTTP/1.1 3) GET the money Cookie: session=3#4fH8d%dA1; transfer page HTTP/1.1 200 OK

POST /xfer.php HTTP/1.1 4) POST the Cookie: session=3#4fH8d%dA1; money transfer request HTTP/1.1 302 Found 135 CSRF ATTACK ▪ Assume that the victim is logged-in to www.bofw.com

Bank of Washington

2) GET 1) Send malicious link 3) HTTP/1.1 200 OK evil.com Origin: www.bofw.com session=3#4fH8d%dA1 136 LOGIN CSRF

▪ Login CSRF is a special form of the more general case ▪ CSRF on a login form to log victim in as the attacker ▪ Attacker can later see what the victim did in the account ▪ Search history ▪ Items viewed ▪ Etc.

137 MITIGATING CSRF ATTACKS

▪ Two main types of defenses 1. Referer checking ▪ Leverages the Referer HTTP header, which is automatically added to all requests by the browser ▪ Imperfect solution, may lead to false positives 2. CSRF tokens ▪ Random tokens inserted into all forms by the server ▪ POSTed responses must include the corresponding random token

138 HTTP “REFERER” HEADER

▪ HTTP Referer header ▪ Sent automatically by the browser to indicate the origin of a request ▪ Example: if you click a link from

Referer: https://google.com/search?q=whatever

▪ Three possibilities in the previous CSRF example ▪ Referer: https://www.bofw.com/ ▪ Referer: http://evil.com/ ▪ Referer: ▪ Leveraging Referer to mitigate CSRF ▪ Strict: only accept POST if Referer is present and expected value ▪ Lenient: only accept POST if Referer is absent or expected value

139 REFERER PRIVACY

▪ Referer header can leak information from your browser to a website ▪ Imagine seeing an access log entry for a celebrity's Wikipedia entry where... Referer=http://int.cia.gov/projects/hitlist.html ▪ Common sources of blocking ▪ HTTP proxy at network perimeter ▪ HTTPS → HTTP transitions ▪ User blocking ▪ Broken user agents ▪ Using Referer to mitigate CSRF may lead to false positives ▪ Angry users, lost revenue

140 CSRF TOKENS

// ...

▪ Require POSTed responses to include a secret value ▪ Unguessability implies unforgeability ▪ SOP prevents attacker from downloading the target form using code in your browser ▪ Attacker can't feasibly predict correct value ▪ Requires rewriting app in most cases ▪ Many web frameworks include CSRF token modules ▪ Proxy could also be used to insert and check tokens

141 CSRF WRAP-UP

▪ Usually requires a one-step, side-effecting action ▪ Attacks are blind ▪ e.g., POST /important_action ▪ CSRF is the dual of XSS is some ways ▪ In XSS, client trust in the server is violated ▪ In CSRF, server trust in the client is violated ▪ A classic example of a confused deputy attack ▪ The app is tricked into doing something it shouldn't do, even though it possesses the authority to do so

142