Web-browsing history

As seen from the client side

Michael Sonntag Institute of Networks and Security ELEMENTS OF WEB-BROWSING HISTORY

 History  The list of URLs visited (+at which time, intentionality…)  Provides general information on time and location of activity  URL's may also contain form information: GET request parameters  Might be very extensive, as a single page may consists of very many independent elements = URLs  Pictures, Ads, JavaScript, CSS, iFrames, redirections…  Depends on how “History” is interpreted/stored by the browser  Cookies  Which websites were visited when + additional information  May allow determining whether the user was logged in  Can survive much longer than the history  Depends on the expiry date of the Cookie and the configuration  Theoretically: Collection of third-party cookies can determine site  Practically: Too much overlap; too many cookies

2 ELEMENTS OF WEB-BROWSING HISTORY

 Cache  The content of the pages visited  Incomplete: E.g. ad's will rarely be cached (“no-cache” headers)  Provides the full content of what was seen, e.g. (in theory!) webmail content  More exactly: What was delivered by the server to the client and was marked as “cacheable”  Mostly not that interesting, as the relevant part (e.g. the actual Facebook content, the chat messages etc) are nowadays “dynamic” and therefore not cached any more  What you get is the “standard border” every page has plus the icons, stylesheets, scripts, fonts etc

3 INTENTIONALITY  Did the user visit the webpage intentionally?  In general: If it's in the cache/history/cookie file: Yes  See also: Bookmarks!  BUT:  What about pop-ups?  E.g.: Pornography ads (no one views such content intentionally )!  Password protected pages?  Images/JavaScript can supply passwords too, when opening a URL!  Plugins, Prefetching…?  Investigation of other files/trying it out/content inspection/… are needed to verify, whether a page that was visited, was actually intended to be visited (and whether the content was expected!)  Usually this should not be a problem:  Logging in into the mail account  Visiting a website after entering log-in data (username+password)  Downloading files

4 INCREASING PROBLEM: DYNAMIC PAGES

 Web browsing history works well for “a page”  Server sends something, browser displays, user clicks on link  But today often content is retrieved dynamically by JavaScript  One “page” with lots of different content over time  Problems:  JavaScript requests are rarely cached  JavaScript requests do no show up in the browser history  The “page” might be completely empty at the start  “Repeating” works less well  Many pages change very frequently  Not every few weeks, but minutes/seconds!  The “page” might be the same, but the content (see above) not  Content may be adjusted to the viewer, i.e. dependent on language, browser, previous browsing (cookies!) etc.

5 WEB BROWSING PROCEDURE (classical!)  User enters the URL  Browser determines the IP address for the host part  Browser connect to the IP address (+port if specified)  Sends request  With additional information, e.g. what compression is allowed  May contain cookie(s)  Retrieves response  Headers and actual content  Header may contain cookie  Saved to memory (and perhaps the disk in the cache file)  Depends on headers, settings etc  Connection is closed  Note: HTTP 1.1 typically keeps the connection open for further requests (incl. pipelining). This is especially useful for images, stylesheets, scripts etc from the same site!

6 THE HTTP PROTOCOL  Basis of HTTP is a reliable stream protocol (usually TCP)  The HTTP state diagram is very simple  With some exceptions, e.g. authorization  There is only a single request and a single response  HTTP request methods:  GET: Retrieve some content  Should never change the state on the server!  Especially important if caching takes place somewhere  Parameters (optional) are encoded in the URL  POST: Send data for processing and retrieve result  To be used for requests changing the server state!  Parameters are sent in the request body  HEAD, PUT, DELETE, TRACE, OPTIONS, CONNECT…  Of less importance and (very) rare  Head (check freshness), Put (file upload)  Forget the rest (or: suspicious if present  investigate!)

7 THE HTTP PROTOCOL  The response always includes a status code  1xx Informational  2xx Success  3xx Redirection (request should be sent again differently)  4xx Client side error (e.g. incorrect request, not existing)  5xx Server side error (should not be retried)  Caching of HTTP: Commonly performed through proxies  Must either be validated with the source ( HEAD)  Or it is "fresh enough" according to client, server, and cache  Note: Browsers often ignore this  E.g. IE can be configured to never check for a newer version even if the cached page has already expired!  This has no influence on what proxies on the transmission path do!

8 THE HTTP PROTOCOL  Local (=browser) caches  If a page has expired, it is not necessarily deleted from the local cache  It might remain there for much longer  Can keep even pages marked as "no-cache" and "no-store"  "no-cache": Should not be cached for future requests  But might still be written to disk (e.g. )  "no-store": Should only be held in memory  Users are still allowed to use "Save As"!  All dynamically generated pages are typically marked as expiring in the past, no cache etc.  Why? Because previously there was no specification, so browsers did different things and servers provided headers for every browser make and version to be on the safe side!  Such pages must normally be generated anew for every visitor (=no caches on transmission path) and every visit (=no caching in browser)

9 THE HTTP PROTOCOL  Local (=browser) caches  This cache can be very large and contain very old files  Important for computer forensics!  Manual deletion or cleaner programs are simple and effective  But must be used every time after surfing  Attention: Many such programs just delete the files, only the more serious ones overwrite them securely!  Deleted by Browser before cleaning  No overwriting!  Also, fragments of files might remain in unused areas, so all free sectors and slack spaces would have to be cleaned every time!  See also swap file/partition, hibernation file ( URLs)  See also “ mode”

10 THE HTTP PROTOCOL EXAMPLE: HTTPS://WWW.GOOGLE.AT/ Request:

GET / HTTP/1.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip, deflate, br Accept-Language: de,en-US;q=0.7,en;q=0.3 Cache-Control: max-age=0 Connection: keep-alive Cookie: NID=119=mg3iUhrNJHA5CfE6CG5KVd…017-12-14-7; OGPC=5061821-25: DNT: 1 Host: www.google.at Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 6.1; W…) /20100101 Firefox/57.0

11 THE HTTP PROTOCOL EXAMPLE: HTTPS://WWW.GOOGLE.AT/ Response: HTTP/1.1 200 OK alt-svc: hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35" cache-control: private, max-age=0 content-encoding: br “Do not cache content-type: text/; charset=UTF-8 this response” date: Thu, 14 Dec 2017 07:32:10 GMT Page, not Cookie! expires: -1 p3p: CP="This is not a P3P policy! See g.co/p3phelp for more info." server: gws set-cookie: ; expires=Fri, 15-Jun-2018 07:36:24 GMT; path=/; domain=.google.at; HttpOnly strict-transport-security: max-age=3600 X-Firefox-Spdy: h2 x-frame-options: SAMEORIGIN x-xss-protection: 1; mode=block

12 HTTP/2  The new(er) protocol introduces a few changes  Data compression: Not really a problem – The content stays the same, compression was negotiable before too  Mostly an issue for wiretapping  Request pipelining: More pages over one connection  Was also part of HTTP/1.1  Wiretapping issue  Multiplexing requests: Concurrent/interleaved requests possible  Similar as pipelining: Wiretapping issue  Server push: Problem for intentionality  The server can send you data before you requested it, because you are going to need it anyway (e.g. stylesheets, scripts)  This can be “repurposed” to send you arbitrary data  Which might be cached (even if not shown, because it is not needed!)

13 COOKIES  What is a "cookie"?  Small (max. 4 kB) text file with information  Originates form the server  Stored locally  Transmitted back to server on "matching" requests  Content (with exemplary data):  Name: "session-id"  Value: "303-1195544-4348244"  Domain: ".amazon.de" Sent to all requests ("/") of  Website path: "/" subdomains of ".amazon.de" None  Till browser is  Expiry date and time: 15.10.2007, 00:02:22 closed ("session cookie")  Secure(): * Will be sent also on non-HTTPS connections  The data may have any meaning  Very rarely this is some "plain-text data"  Some part of it might be the IP address or the user name  But usually it is just a (more or less!) random unique number

14 THIRD-PARTY COOKIES  “Real” Third-Party Cookies: Site A sends Cookie with Domain value set to site B  Would NOT be sent back to site A, but only to site B!  Browsers generally do not accept such cookies any more  Third-Party Cookies: Page is from site A, but some iframe/picture/… is from site B – and the request for it sets a Cookie for site B  Cookies originates at site B and is sent back to site B, but the “overall”/”encompassing”/”top-level” page is from site A  Many browsers block these now (optionally/by default); also AdBlockers often drop them  Forensically the difference is unimportant: All are either stored or not, and all are stored in the same way  Theoretically a unique collection of third-party cookies (=same time) could identify a website (=verify; must be known to compare)

15 IE: INTERESTING FILES/LOCATIONS  Where can we find information on what users did with IE?  Att.: Locations change slightly with OS version/language!  \Local Settings\Temporary Internet Files\ Content.IE5 Also later versions of IE (This is the version of the file format, not of the software)!  Cache (webpages, images, applets, flash-files, …)  Win >=7: \AppData\Local\Microsoft\Windows\ Temporary Internet Files  \Local Settings\History  Where the user had been (URLs);  Subdirectories for various time spans  Win >=7: \AppData\Local\Microsoft\Windows\History  \Cookies  Win >=7: \AppData\Roaming\Microsoft\Windows\Cookies  Note: Data is deleted from these locations independently!  What is (was) present in one, is not necessarily available any more in the other locations  We must search all three locations and assemble the results

16 IE: COOKIE FILE STRUCTURE  Each cookie file contains all cookies for a single domain  The information is stored line-by-line; 9 lines = 1 cookie  Example:  __utma Name Value  36557369.378120483.1187701792.1189418701.1190710388.4  hotel.at/ Domain  1088 Flags  2350186496 Expiration time (UTC; LoVal","HiVal)  32111674  2116717664 Creation time (UTC; format as above)  29884241  * Secure (here: False)  __utmb • Note: Additional information on the cookies is in the  … index.dat file in the same directory! • Number of hits, suspected as advertisement

17 IE: NEWER VERSIONS  Additional information can be found under \AppData\Roaming\Microsoft\Windows  IECompatCache, IECompatUACache: Sites to be shown in compatibility view  Independent kind of “history”  IEDownloadHistory: Download history  Obviously interesting!  PrivacIE: NOT InPrivate browsing sessions! These are things blocked during InPrivate browsing sessions to prevent third-party tracking ( InPrivate Filtering).  Usually not interesting  Site X (=interesting) references content on site Y  Y might be in here (but Y might be referenced on A, B, C… too!)  IETldCache: Help file for colouring TLD (esp. second level)  Forensically probably not very useful

18 IE >=10  Most files are now in separate directories named “Low” located within the “old” directory structure  “Low” = “Low integrity files”  No problem if damaged/lost  All index.dat files have been replaced by a single database  Located in “%USERDIR%\AppData\Local\Microsoft\Windows\WebCache”  V01.chk (Checkpoint file), V01.log (Transaction log), V01res*.jrs/V01*.log (Reserved transaction logs; for clean shutdown in emergency cases, e.g. disk full)  WebCacheV01.dat (ESE Database)  Format: Windows ESE-DB (Extensible Storage Engine)  Same as Exchange, Active Directory, Desktop Search…  Problem: When windows is running, it is locked  Shadow Copy needed (or image investigation)  As before: Index only; actual content is unchanged!

ESEDatabaseView: http://www.nirsoft.net/utils/ese_database_view.html Or others (libesedb  reads also damages files) 33 Windows: esentutl.exe: E.g. recovery of crashed DB IE >=10  The database consists of several “containers”  Which are not necessarily consistently numbered  Probably a DB viewer limitation; use name of container!  Especially interesting: “History”/”MSHist01*”, “Cookies”, “Content”, “DOMStore”, “iedownload”  Attention: Some names occur several times (they also depend on a location, i.e. a directory!)  InPrivate browsing data is stored there as well, but deleted at the end (=when the window is closed)  “Deletion”  Carving for such entries is possible, as deleted records will not be reused until that database part is reorganized  B-Tree rebalancing  Parts will be overwritten (not necessarily all!)  Attention: Contains other things beside IE history/…!  Example: Windows 8.1 – All searches are logged here too  Searches: https://www.windowssearch.com/search?q=  While typing: https://www.windowssearch.com/suggestions?q=  Even if offline at that point in time!

34 IE >=10  Content: Structure is identical for all containers  But different columns are used (=others remain empty)  Common: FileSize, SecureDirectory (in which subdirectory), Filename, ResponseHeaders, AccessCount, CreationTime (when created; entry!), ExpiryTime (how long valid), ModifiedTime (last change), AccessTime (last access)  Specialized:  Cookies: URL = user@domain  History: No Creation time  DOMStore: No Expiry/modified time  iedownload: -  Content (=Cache): -

35 IE >=10  Accessing it is difficult: Always open  Solution: Offline or Shadow Copy  Or simply as a different user with appropriate permissions!  Example: Use “Hobocopy” (=Copy-tool with VSS support)  D:\Temp>HoboCopy.exe C:\Users\sonntag\AppData\Local\Microsoft\Windows\WebCache d:\temp\hobocopy WebCacheV01.dat HoboCopy (c) 2006 Wangdera Corporation. [email protected]

Starting a full copy from C:\Users\sonntag.ADS2- FIM\AppData\Local\Microsoft\Windows\WebCache to d:\temp\hobocopy Copied directory Backup successfully completed. Backup started at 2014-08-06 10:03:16, completed at 2014-08-06 10:03:21. 1 files (32.06 MB, 1 directories) copied, 16 files skipped  Then open it with a viewer (e.g. ESEDatabaseViewer)  Select “Containers” from the Combo Box to identify which container is which data store  Select container, select interesting files, use “Save as” to export e.g. as CSV

36 Hobocopy: https://github.com/candera/hobocopy/ IE >=10

37 IE >=10

38  Very similar to IE 10+  Also stored in ESE-DB in same format  Cache: Similar to IE 10+  Bookmarks: Similar to IE 10+  History, Cookies: Similar to IE 10+  New:  Cortana search: Stored in ESE DB in container “DependencyEntry 5”  Reading list: Separate ESE DB, separate for each user  InPrivate browsing: Still stored in DB and deleted on exit  Data carving remains useful

http://www.dataforensics.org/microsoft-edge-browser-forensics/ https://articles.forensicfocus.com/2015/07/27/project-spartan-forensics/ 39 MICROSOFT EDGE

 Base directory: C:\Users\\AppData\Local\Packages\Microsoft.MicrosoftEdge_  Cache files: \AC\#!001\MicrosoftEdge\Cache\  Cookies: \AC\#!001\MicrosoftEdge\Cookies\  History: C:\Users\\AppData\Local\Microsoft\Windows\WebCache\ WebCacheV01.dat  Webnotes: \AC\#!001\MicrosoftEdge\User\Default\WebNotes\  Last active session: \AC\#!001\MicrosoftEdge\User\Default\Recovery\Active\  DOM store: \AC\#!001\MicrosoftEdge\User\Default\DOMStore\  Edge settings, reading list etc: \AC\MicrosoftEdge\User\Default\DataStore\Data\nouser1\xxxxx\DBStore\spart an.edb

 Newer versions: These directories still exist, but most of the content has been moved into the ESE-DB (still in file systems: Cache, DOM store)

http://www.dataforensics.org/microsoft-edge-browser-forensics/ https://articles.forensicfocus.com/2015/07/27/project-spartan-forensics/ 40 FIREFOX: INTERESTING FILES/LOCATIONS

 Where can we find data on what users did with Firefox?  Profile ID is a random string generated once  \AppData\Local\Mozilla\ Firefox\Profiles\\cache2  Cache (webpages, images, applets, flash-files, …)  Since version 32  \AppData\Roaming\Mozilla\ Firefox\Profiles\\places.sqlite  History; stored in a SQLite database  \AppData\Roaming\Mozilla\ Firefox\Profiles\\cookies.sqlite  Cookies; stored in a SQLite database  Easy cache access: URL "about:cache"  Also extensions available for directly viewing cached files  Should only be used on write-protected disks/images!  Firefox has two caches: In-memory and on disk

41 FIREFOX: SQLITE COOKIE DB  Contains:  Organizational data: id, appId, inBrowserElement  BaseDomain: Domain name without host ( searching!)  Name: Name of contained data  Value: Value of contained data  Host: Where to send it to  Path: Sub-path restriction for host (rarely used)  Expiry: Date of expiry  LastAccessed: Date of last access  CreationTime: Date of first access  isSecure: Send only over encrypted connections?  isHttpOnly: Access by JavaScript forbidden?

42 FIREFOX: SQLITE HISTORY DB  places.sqlite; contains 13 tables  FavIcons, hosts, inputHistory, keywords, sequences, annotations, downloads (incl destination filename)…  Bookmarks: Hierarchical structure (“parent”), position, title, date added/modified, GUID  “moz_places”: Main table – connects GUID to URL  Contains: URL, title, rev_host (=reversed hostname; searching; www.mozilla.com  moc.allizom.www), visit_count, typed (=manually), last visit date, GUID, favIcon reference…  “moz_historyvisits”: Entry created when visiting a page  Id, from_visit (=id of previous page), place_id (=see above), visit_date (=timestamp), visit_type (1=link, 2=typed, 3=, 4=Embedded, 7=Download…), session

43 https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places FIREFOX: CACHE (>=32)  „index“ file: Index of cache entries  4 bytes version (32-47: 0x00000001)  4 bytes last changed timestamp (Unix 32 Bit Hex - Big Endian)  4 bytes dirty flag  Records of 36 bytes each  20 bytes SHA1-hash of URL  This is the name of the file in the subdirectories!  Format: “:http://………./…./…..”  4 bytes “frecency” (=frequency+recency)  See https://developer.mozilla.org/en-US/docs/Mozilla/Tech/Places/Frecency_algorithm  “amount of revisitation, the type of those visits, how recent they were, and whether the URI was bookmarked or tagged”  4 bytes expiration date (Unix 32 Bit Hex – Big Endian)  1 byte flags  3 bytes file size  Subdirectories „doomed“ and „entries“: Actual cache files

45 FIREFOX: CACHE (>=32)  Cache files contain the content, checksums, and some metadata  File format:  File exactly as it is received, i.e. might be compressed (or not, whatever the server decided to send!)  The very last 4 bytes of the cache file are the data length in this file  Other view: This is the offset of the beginning of the metadata  4 bytes hash for the whole file  2 bytes hash for every chunk (rounded up) of 262.144 bytes data  4 bytes version number (32-44: 0x00000001, >=45: 0x00000002)  4 bytes fetch count Added!  4 bytes last fetch date, 4 bytes last modified date Version 3 (>=51): Cache files can store alternative  4 bytes frecency, 4 bytes expiration date data (=processed, e.g.  4 bytes URL (“key”) length, 4 bytes flags, URL + ‘\0’ decompressed)  Attribute name + ‘\0’, attribute value + ’\0’  Can be various, typically includes “request-method”, “response-head”  4 bytes data length/metadata offset

46 https://hg.mozilla.org/mozilla-central/file/default/netwerk/cache2/CacheFileMetadata.h FIREFOX: CACHE (>=32)  Flags: Currently only a single one exists “Pinned”  Pinned flag (value “1”):  Resource will always be cached (regardless of size, headers etc)  Will never be evicted from the cache, even if “old”  Will not be deleted when the cache is cleared  Will be overwritten if a new version becomes known  Uses: Unclear!  Documentation: Pinned pages, offline web applications  Actual use: Pinned pages do NOT set this flag!

47 https://bugzilla.mozilla.org/show_bug.cgi?id=1032254  File locations: \AppData\Local\Google\Chrome\User Data  Most elements are in subdirectory “Default” (or “Profile” etc)  Bookmarks: “Bookmarks” file in JSON format  Date when added, URL  Modification date for “directories”  Cookies: SQLite database (no file extension!)  Table “cookies”: Creation&Expiration&Last access time (UTC), host key (=domain), Name&Value&Path, History: SQLite database, flags (Secure, HTTP-Only, Persistent (=Not session), Priority, FirstPartyOnly)  Cookie Priority: Too many cookies/domain?  Delete low priority first  FirstPartyOnly (=SameSite): CSRF protection; send only with requests initiated from the same domain (“Cookie: key=value; First-Party-Only”)

48 GOOGLE CHROME  History: „History“ file; SQLite database  Most important: “downloads”, “urls”, “visits”  Downloads: Target path, start&end time, number of bytes (announced/actually downloaded), MIME type, URL of tab and site from which downloaded…  URLs (single line for every URL visited): URL, title, visit count, typed count (=manually typing URL), last visit time, hidden (???)  Visits: Actual “history”, i.e. one line for every single visit  URL (=reference to URL table), visit time, from-visit (=from which URL did the user come), transition (how he got there; link, typed, reload…), duration of visit  visit_source: If URL came from a different device (=synchronization across multiple browsers on different devices)

https://developer.chrome.com/extensions/history 49 GOOGLE CHROME  Login data: „Login data“ file; SQLite database  Origin URL: URL where the form came from  Action URL: URL where the data was sent to  Username element/value: ID of element and content  Password element/value, Submit element: See above!  Note: password value is encrypted using DPAPI (which is based on the user password)  Creation date, how often used etc  Web data: “Web data” file; SQLite database  Auto fill information (stored form content), credit cards, token services, keywords etc.  Current/Last Session/Tabs: Information on tabs/sessions  Top sites: Most frequently visited sites

https://digital-forensics.sans.org/summit-archives/dfirprague14/ Give_Me_the_Password_and_Ill_Rule_the_World_Francesco_Picasso.pdf 50 GOOGLE CHROME  Cache: Subdirectory „Cache“  Index file: Main hash table „index“  Data files: Blocks of four fixed sizes (1-4*256/1k/4k) „data_1/2/3“  Stream files for longer content (i.e. >16 kB): „f_“  Headers and response content are stored separately  Deletion is only logically: marking as “deleted”  Different implementations:  Disk: Block (see above) or simple (Android)  Simple: One file per entry + index file  (Main) Memory (=RAM)  Media, App, Shader, PNaCl: Specialised caches (PNaCl: Portable Native Client; stores translation of bitcode files to ELF)

https://www.chromium.org/developers/design-documents/network-stack/disk-cache 51 IE-EXAMPLE: RECONSTRUCTING WEBMAIL  Cookies:  www.gmx.net/de/  Visits 1  moveinBrowser new%20MoveinData%28%29%2Eunpickle%28%7B%22viewed%22%3A%201%2C%20%22closed%22 %3A%20false%2C%20%22latest%22%3A%20new%20Date%281192174225718%29%7D%29  Decoded: new MoveinData().unpickle({"viewed": 1, "closed": false, "latest": new Date(1192174225718)})  Decoded date (Unix): Fr, 12 October 2007 07:30:25 UTC  gmx.net/  GUD bMDEpJi1JPF9xN0JINkUyQkExJSIhJxweJBkeGyAvLjcsLDQpKzJCSzElIiEnHB8dGRwcIC83Ny8tNC0u MktBMSMtSzksIh0gGw==  Mime encoded, but is just a binary value  Probably a unique ID for session handling

 logout.gmx.net/ Sa, 13 Oktober 2007 07:33:24 UTC  POPUPCHECK 1192260804812

52 IE-EXAMPLE: RECONSTRUCTING WEBMAIL  History (pasco; adjusts for local time zone!): = 12.10.2007 7:30-7:33 UTC!  Modified/access time: 10/12/2007 09:30 until 09:33  Local time of event: Western European DST (=+2)  But shown according to the time zone set at the moment of the analysis; physically stored as UTC time!  URLs (selection):  sonntag@http://www.gmx.net/de User visited GMX homepage  sonntag@http://service.gmx.net/de/cgi/login User logged in to GMX  sonntag@http://service.gmx.net/de/cgi/g.fcgi/mail/index?CUSTOMERNO=10333901&t=de169 0301692.1192174366.c35ea10d&FOLDER=inbox User visited his inbox  sonntag@http://service.gmx.net/de/cgi/derefer?TYPE=2&DEST=http%3A%2F%2Fwww.gmxa ttachments.net%2Fde%2Fcgi%2Fg.fcgi%2Fmail%2Fprint%2Fattachment%3Fmid%3Dbabgeh j.1192174412.25124.s9vnnjbfon.74%26uid%3DKxs5Dm8bQEVsw%252FqY9HVpw45KNTg2 NcIR%26frame%3Ddownload User opened an attachment  sonntag@http://www.gmxattachments.net/de/cgi/g.fcgi/mail/print/attachment:/filename/Lebens lauf.doc?mid=babgehj.1192174412.25124.s9vnnjbfon.74&uid=Kxs5Dm8bQEVsw%2FqY9HV pw45KNTg2NcIR&frame=attachment User downloaded an attachment called "Lebenslauf.doc"

53 IE-EXAMPLE: RECONSTRUCTING WEBMAIL  Cache:  282 entries  Images (GIF, JPG)  Stylesheets (CSS)  JavaScript (JS)  HTML files (HTML)  Only static files, login screen, etc.!  What is missing are the actual E-Mails  These are not cached on disk  In previous versions they might have been cached  Depending on the server, not the version of !  So webmail is not necessarily recoverable, but perhaps in some instances (swap/hibernation file, deleted files…)  Note: The cache only contains what is sent to the computer  Locally drafted E-Mail is "form input" which is never cached!  Only chance: Sent back as preview

54 TYPED URLS  To be found in the registry under HKCU\Software\Microsoft\Internet Explorer\TypedURLs  Number depends on OS version (later have more; 20/25/50)  Contains “url1” to “url20”  url1 is the most recent, url20 the last  All are moved when a new one is typed!  Since IE 10 there is a “parallel” key “TypedURLsTime”  FILETIME timestamps for the entries above  Most recent visit for the corresponding URL

55 OTHER INFORMATION  Form history and stored passwords  For identifying visited sites and accessing them  Often encrypted, but decryption programs exist  Search history: What was the person looking for?  Blocked sites: If the popup-host of a site was blocked, the site itself was probably visited!  Manually unblocked sites obviously interesting!  Compatibility settings (IE): Sites shown using old renderers  Certificate store: To identify secured sites visited often  Might include client certificates, which act as a kind of key  Download history: What file(names) were downloaded  And where they were stored locally (name; for searching)  Installed add-ons (browser controls)  Language preferences and all other configuration options

Careful interpretation necessary! 56 PRIVACY MODE: INTERNET EXPLORER  Allows Browsing without leaving traces (but see below!)  Additional feature: Prevent Sites from sending data to other sites (InPrivate Filtering)  IE traces third party content; if it appears on more than 10 (can be modified from 3 to 30) sites visited, it is blocked in InPrivate Browsing mode (“Ads” or similar)  Must be activated manually each time (works per-session)!  Can also be activated in non-private browsing mode  Complete blocking (no third-party content) can be set manually; exceptions can be configured as well  InPrivate Browsing does not store:  New cookies (existing ones can still be read!), history entries, form data, passwords, typed URLs, search queries, visited  Toolbars and extensions are disabled  Will keep: Bookmarks, downloaded files, Flash cookies

57 PRIVACY MODE: INTERNET EXPLORER  InPrivate Browsing previously stored files in the cache on the disk, but deleted them when closing the window  This meant, traces WOULD remain on the disk!  But this is no longer the case  Reconstructing the history:  Not available directly (not stored!)  Article unclear about this; some parts might remain  But possible through the cache, which contains the last access time of every stored element!  No advantage regarding:  Eavesdropping: ISP, NSA… can still copy the connection  Use good encryption  Servers: May be able to identify that this mode is used

58 PRIVACY MODE: FIREFOX  Firefox does not store  History entries (incl. intelligent ), search queries, download history, form data, cookies, cache, typed URLs, passwords, visited links  Will keep: Bookmarks, downloaded files, Flash cookies  Same features as IE  Except third party elements  Cookies can be filtered  Images too, but not through the UI!  about:config  permissions.default.image=3 (no third party images)  Scripts etc: NoScript or other extensions  Extensions generally remain active!  Configuration (e.g. third party images) is the same  Whether extensions work in privacy mode can be determined when installing an extension (and in the add-on configuration)

59 INCOGNITO MODE: CHROME  Chrome does not store  History entries, search queries, download history, form data, cookies, cache, typed URLs, passwords, visited links  Will keep: Bookmarks, downloaded files  Prevents sites from identifying, that incognito mode is used  But workarounds still exist…  Extensions are disabled, but can be individually enabled manually  Configuration (e.g. third party images) is the same  Will not store anything on disk (except swap/hibernation!)

60 PRIVACY MODE: PAGEFILE  If immediately/soon after using the privacy mode, content may remain in the pagefile  Records are not written to disk, but they were definitely in the main memory!  May contain:  Individual URLs: Easy to recognize  Problem: They need not be from the browser, but could also stem from other programs  Form content: Mostly unrecognizable (unless content known!)  Page data: HTML or fragments of it  Problem: How to assemble it to a complete webpage  Main problem: Conclusions  We have fragments, but mostly no trace how they were created  URLs could be the within program code loaded, strings etc  Page data: A file that was copied somewhere  …

61 PRIVACY MODE: TRACKING  No cookies are taken from the “normal” into the “private” mode  You are a “new” user  BUT: If you log in somewhere during the browsing session, the (previously anonymous) new cookies can be attributed to you  Privacy mode is a “single mode”, i.e. it applies to all tabs/windows together  Normal windows  Share cookies  Private mode windows  Share cookies  Close all private mode windows and open a new one  No cookies  Result: Private mode is way less private than you think  But it is today relatively good against local forensics

62 ALTERNATIVE: NETWORK FORENSICS  Copying the network traffic allows reconstructing the page  This requires live access on a router, intercept station etc. at the moment the user browses the web  Wiretapping  Very difficult to do legally!  Only very limited usability!  Result: Trace of the individual packets  Requires:  Reassembly of the TCP connection (difficult  tool needed!)  Splitting into the individual requests (HTTP 1.1 pipelining!)  Manual reassembly to a “viewable” local page  Inspection of the HTML code is quite simple  Following pages: Wireshark example  Left out: IPv6 DNS query, redirect to actual homepage, detailed analysis of the individual packets (not interesting!)

63 WWW.JKU.AT: DNS QUERY

64 WWW.JKU.AT: DNS RESPONSE

65 WWW.JKU.AT: HTTP QUERY

DNT = Do Not Track (Mozilla version of „X-Do-Not-Track“)

66 WWW.JKU.AT: HTTP RESPONSE (START)

67 HTTP RESPONSE (TCP STREAM)

68 CONCLUSIONS  What a user did in a browser can typically be reconstructed quite well  Especially Internet Explorer: Deleting the index.dat/ESE files is almost impossible  Dedicated "cleaner" programs are needed  Information may be stored multiple times  Main problem: Separating the “human & interesting” things from the huge number of other automatic&subordinate elements  Reconstructing the content of web-based E-Mail is difficult  That, which address etc can be discovered  But content is almost never cached and therefore unavailable  A variety of programs exist to investigate these files  Few of them are free  File formats are often not at all/badly documented

69 THANK YOU FOR https://www.ins.jku.at YOUR ATTENTION!

Michael Sonntag [email protected]

+43 (732) 2468 - 4137 JOHANNES KEPLER nd UNIVERSITÄT LINZ S3 235 (Science park 3, 2 floor) Altenberger Straße 69 4040 Linz, Österreich www.jku.at Literature

 Anderson, Keith: Firefox history exporter: https://bugzilla.mozilla.org/show_bug.cgi?id=241438 (Entry at 2006-03-17 09:10:47 PDT)  Malmström/Teveldal: Forensic analysis of the ESE database in Internet Explorer 10, http://hh.diva- portal.org/smash/get/diva2:635743/FULLTEXT02.pdf

71