Data Privacy On The Web

McMaster Software Freedom

March 3, 2020 Outline

1 Introduction

2 Data Scraping Demo

3 Theory

4 Tutorial

5 Wrapping Up

1 23 Introduction The Presenters

Sil Hamilton, 3rd Year English & Multimedia S.M Mukarram Nainar, 2nd year Mathematics & Physics

2 23 McMaster Software Freedom

Student group formed to promote software freedom and computer literacy on campus Bi-monthly drop-in meetings to discuss a wide variety of topics: I data privacy I operating systems I current aairs I programming See macswf.ca for more information and scheduled meetings

3 23 Privacy and Why It Matters

Privacy is a complicated topic; depending mainly on personal politics Companies track a lot, but the steps required to counter-act it are easy These steps do not need to negatively aect your experience! Information is Power! I advertising is (surprisingly) eective I that should worry you

4 23 Data Scraping Demo Panopticlick

Go to https://panopticlick.eff.org/ I note the items being gathered

5 23 Theory Entropy

Measure of information One bit of entropy = cuts down possibilities by half 33 bits of entropy uniquely identies anyone globally

I log2 (7 billion) ≈ 32.8

6 23 The Web

How does the internet work? Servers Addresses & DNS HTTP & TLS Javascript

7 23 Servers

The Web follows a "client-server" model You are a client; everything you do runs through a server Servers are just other people’s computers

8 23 Addresses & DNS

The internet is primarily run on the IPv4 protocol, used to assign addresses to connected devices Address in this case means a unique series of numbers to dierentiate devices Limited space: 232 possible addresses Rent out to countries, institutions, and companies in blocks; then rented to you (by ISP) Typically leased dynamically, but does not change often

9 23 HTTP & TLS

Base protocols HTTP is stateless I the protocol doesn’t store information I however, both the client and server can cookies, localstore HTTP Verbs I GET, POST, etc TLS encrypts and authenticates the connection I covered in more detail in next workshop

10 23 Javascript

Arbitrary code on your client Huge risk, since you can’t (usually) know what code does until you run it Blocking (at least some) is the best way to avoid tracking It can also be quite heavy on your computer

11 23 Knowing Yourself

Moving on from the wider web: how do you t in? IP Address Useragents Cookies & localstorage Referers Passwords Fonts & more

12 23 IP Address

IP is necessarily visible to all those you connect to Means you have a consistent identity when surng Primary method for tracking individuals over time ISP will keep logs of your activity VPNs and public networks may be used to mitigate this I Mullvad I Nord VPN (possibily compromised) I ExpressVPN, etc. Public networks introduce their own security implications

13 23 Useragent

Mozilla/5.0 (platform; rv:geckoversion) /geckotrail /firefoxversion

String read by websites to detect your browser version Sent by your web browser in the header of a HTTP request Contains information regarding your specics I Compatibility I Rendering engine I I Browser Can be fairly unique depending on your context Mitigated by spoong (eg. Useragent Switcher)

14 23 Cookies & localStorage

Cookies are the primary method for enabling persistence, manipulated with HTTP headers and used for. . . I login information I settings SameSite cookies (locality), averted by advertising domains Deprecated by localStorage (Web API) I accessed and modied via JS (client-side scripts) I supposed to be only read by the client I no expiration date, but only allows <10MB I essentially the same for tracking companies Among other mitigations, Cookie AutoDelete is good

15 23 Referers

HTTP header often contains address of site visited immediately prior Enables gathering information for analysis HTTPS sites will not along data to non-secured sites Danger crops up when websites receive referers linking to sensitive information Referer is shared with third-party sites even without leaving a page, eg. CDNs Website can dictate referrer-policy (two Rs!) Add-ons can delete referer after the fact, eg. uMatrix

16 23 Passwords

17 23 Passwords Continued

Passwords do not need to be complicated (for us)! I Six "random" words with non-regular capitalizations and special characters is good enough Best practice is to have a unique password for each service you have (don’t re-use them) I https://haveibeenpwned.com/ Various convenient password managers exist I KeepassXC I Firefox Lockwise

18 23 Fonts & Other "Features"

Websites can request locally installed fonts with JS Leaks a lot of information (more entropy, etc.)

19 23 Do Not Track

Essentially useless! Adds extra entropy

20 23 Tutorial Content Blocking

Most add-ons are a one-time install, no conguration necessary uMatrix: an excellent all-in-one lter & blocking tool

21 23 Pi-hole

Useful tool for DNS ltering Demonstration

22 23 Wrapping Up Final Notes

Presentation slides will be available on our website I macswf.ca Thank you!

23 / 23