Finding & Investigating Digital Footprints with Open Source Intelligence Workshop Dr Stephen Hill

The Web Explained

Search Engines

▪ To be truly effective at online research and investigation, it is important to understand the unique and combined qualities of each search engine and to use them effectively in conjunction with each other…

1 Search Engines (Index)

▪ Search engines are "engines" or "robots" that crawl the web looking for new web pages ▪ These robots read the web pages and put the text (or parts of the text) into a large database or index that you can then access…

▪ Google - https://www.google.co.uk ▪ Bing - http://www.bing.com ▪ Yahoo - https://uk.yahoo.com ▪ Yandex - https://www.yandex.com

Index Search Explained

▪ Page A and Page B have equivalent location and frequency of keywords; however

▪ Page A has 20 external webpages linking to it and Page B has 40

▪ Based on the implication that Page B is more popular, it would achieve a higher page ranking within Google and Bing’s search results than Page A

▪ This information is significant to investigators as many of the webpages sought may be “hidden” or purposely forced to be “unpopular” by the owner due to the nature or intention of the site…

Point to Remember!

This presents a challenge when using Google and Bing as both of these search engines focus on presenting the most popular pages at the top of their search results

When using these search engines, it may be necessary to locate the least popular sites within millions of search results, proving time consuming and relatively ineffective…

2 Google – Index Search

https://www.google.com.au

Google – Index Search

https://www.google.co.nz

Google – Index Search (Regional)

https://www.google.co.uk

3 ‘Bubbling & Tracking’

Operating systems version Resolution of computer screen Average amount of search requests per day Average amount of search requests per topic (to finish search) Distribution of search services used (web / images / videos) Average position of search results clicked on Time of the day Current date Search History Topics of ads clicked on Location Frequency of clicking advertising Browser Frequency of searches of domains on Google Browsers version Computer being used Language being used Time to type in a query Time we spent on the search result page Time between selecting different results for the same query Frequency clicking on adsense advertising on other websites

http://www.rene-pickhardt.de/google-uses-57-signals-to-filter

Google – Time Filter

4 Google – Time Filter

Google – Cache

Google – Cache

http://webcache.googleusercontent.com/search?q=cache:efj0Wj8fzxUJ:dfk.com/+ &cd=1&hl=en&ct=clnk&gl=au

5 Google – Similar

Google – Similar

Google Image Search

6 Google Image Search

Google Image Search

Google Image Search – Face Filter

7 Google Image Search

Google Image Search

Google Reverse Image Search

8 Google Reverse Image Search

Google Reverse Image Search

BEYOND GOOGLE

9 Bing

https://www.bing.com

Google & Bing

http://advangle.com

10 Google & Bing

Google & Bing

http://advangle.com

Google & Bing

http://advangle.com

11 Search Directories

▪ Search directories are hierarchical databases with references to web sites ▪ The web sites that are included are hand picked by individuals and classified according to the rules of that particular search service

▪ Yahoo Directory - https://business.yahoo.com ▪ BOTW - http://botw.org ▪ DMOZ - http://www.dmoz.org

DMOZ

http://www.dmoz.org

StartPage

https://startpage.com

12 13 Carrot2

http://search.carrot2.org

Yippy - Cluster Search

Formerly known as ‘Clusty’

http://www.yippy.com

14 DuckDuckGo

http://duckduckgo.com

15 DuckDuckGo Bangs

https://duckduckgo.com/bang

Semantic Search

www.cluuz.com

Qwant

https://www.qwant.com

16 Qwant

https://www.qwant.com

Exalead - Advanced

http://www.exalead.com/search

Where to Find Search Engines?

www.searchenginecolossus.com

17 Advanced Search Techniques

▪ Phrase searching: “fraud in New Zealand”

▪ Boolean search: AND* fraud, NOT* scam

▪ Google Alternative: “fraud”, -scam

▪ Boolean search: fraud OR scam OR swindle

▪ Parentheses: ( ) also known as nesting…

* Will not work with Google

Check the Spelling

▪ Remember words are can be spelt differently or there may be a misspelt word or typo on the website you are looking for hence why some search engines fail to find the word/phrase

▪ Consider spelling and typo’s ▪ Tyres & Tires, colour & color

▪ Stephen Hill, Steven Hill, Steve Hill

▪ Serach Engine, Fraud Invesdigation...

Wildcards *

In most search engines and directories, a search for investigat* will give you pages with the words including:

investigate, investigated, investigation, investigator

Note: Google uses a process called stemming

18 Truncation & Wildcards *

Other ways to search using the *

" * * director of HTC Parking and Security Limited“ = ?

"Ms Anna Koltsova phone *" =?

"the * population of Auckland is" = ?

Parentheses

▪ Require the terms and operations that occur inside the brackets to be searched first ▪ This is called "nesting"

“identity theft” ((organized OR organised) -crime)

▪ Parentheses MUST BE USED to group terms joined by OR when there is any other Boolean operator in the search…

19 Keyword Searching

Finding Archived Web Pages

https://archive.org/web

Internet Archive

http://archive.org/web

20 News Links

http://www.onlinenewspapers.com/ http://www.world-newspapers.com/ http://www.listofnewspapers.com/ http://www.refdesk.com/paper.html http://www.allyoucanread.com/ http://www.actualidad.com/ http://www.thepaperboy.com/newspapers-by-country.cfm http://news.silobreaker.com/

Real Time News

http://www.newsola.com

21 News Links

22 Classifieds - A Criminal Hotspot?

People Search

https://pipl.com

Company Search

https://opencorporates.com

23 Company Search

https://www.gov.uk/government/publications/overseas-registries/overseas-registries

Paste Sites – What Could You Find?

▪ Paste sites are websites allowing users to upload text for public viewing. ▪ Originally designed for developers who needed a place to store large amounts of text ▪ Links would be created to the text and the user could share the link with other programmers to review the code. ▪ Many hacking groups use this area of the Internet to store compromised data. ▪ Most popular site – ‘Pastebin’

Tools for Social Media Intelligence

24 Facebook

Facebook Search

LinkedIn

25 LinkedIn Search

LinkedIn Search

https://www.linkedin.com/help/linkedin/answer/76015

Twitter

26 Twitter Search

27 28 Social Searcher

http://www.social-searcher.com

Social Searcher

http://www.social-searcher.com

Social Searcher

http://www.social-searcher.com

29 Reverse Image & EXIF Extraction

Reverse Image Search

http://www.tineye.com

Reverse Image Search

30 Reverse Image Search

Reverse Image Search

http://www.tineye.com/

Metadata (EXIF)

▪ Exchangeable Image File Format

▪ Standard that specifies the formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners etc

▪ Applied to JPEG & TIFF images and can include;

▪ Original Image date & time, modified dated & time ▪ Camera details including ‘geolocation’ settings…

31 EXIF Sites to Consider

Jeffrey’s EXIF Viewer ▪ http://regex.info/exif.cgi

Others ▪ http://www.takenet.or.jp/~ryuuji/minisoft/exifread/english/ ▪ http://www.impulseadventure.com/photo/jpeg-snoop.html ▪ http://www.sno.phy.queensu.ca/~phil/exiftool

Camera Trace ▪ http://cameratrace.com/trace ▪ http://www.stolencamerafinder.com

Video Metadata ▪ https://mediaarea.net/en/MediaInfo

Where Was This taken?

https://petapixel.com/assets/uploads/2012/12/fugitivemcafee.jpg Tracing Location of a Photo

32 http://petapixel.com/assets/uploads /2012/12/fugitivemcafee.jpg

33 WHOIS

WHOIS

WHOIS

http://whois.domaintools.com/planethollywoodlondon.com

34 Hiding Your Identity Online

Disguising your ID

▪ Every time you surf the Internet, your IP address is publicly visible to everyone on target network resources ▪ It is important therefore not to leave a digital footprint...

Sock (Finger) Puppets

4 steps to create a sock puppet:

▪ Create fake ID – use name generator

▪ Create fake profiles/user accounts on Facebook etc.

▪ Fake/disguised email, phone and IP details

▪ Consider payment method – pre-paid credit card…

35 http://www.fakenamegenerator.com

Disguising Your Online ID

Proxy and VPN services re-route your internet traffic and change your IP

A Proxy is like a web filter

▪ Proxy will only secure traffic via the internet browser using the proxy server settings

A VPN encrypts all of your traffic

▪ VPN’s replace your ISP and route all traffic through the VPN server, including all programs and applications...

TOR

https://www.torproject.org

36

“Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: It prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location. Tor works with many of your existing applications, including web browsers, clients, remote login, and other applications based on the TCP protocol”.

Who is using Tor?

▪ Normal people (e.g. protect their browsing records)

▪ Militaries (e.g. military field agents)

▪ Journalists and their audiences (e.g. citizen journalists encouraging social change)

▪ Law enforcement officers (e.g. for online “undercover” operations)

▪ Activists and Whistleblowers (e.g. avoid persecution while still raising a voice)

▪ Bloggers

▪ IT professionals (e.g. during development and operational testing, access internet resources while leaving security policies in place)

37 Tor Project Some of the software and services under umbrella:

▪ Torbutton ▪ Tor Browser Bundle ▪ Vidalia ▪ ▪ Onionoo ▪ Metrics Portal ▪ Tor Cloud ▪ Shadow ▪

https://tails.boum.org Tails

TOR to Web

https://tor2web.org

38 VPN Options

https://www.privateinternetaccess.com

How Safe is your Browser?

https://panopticlick.eff.org

39 Public Vote on Secure Browser

Source: Sensors Tech Forum (http://sensorstechforum.com)

The users voted that the most secure browsers are:

▪ Google Chrome - 49% or 296 votes ▪ Mozilla Firefox - 31% of votes, or 187 voters ▪ Internet Explorer - 7% or 43 voters ▪ Safari and Opera both got 4% or 25 votes ▪ Microsoft Edge - 3%, or 19 votes ▪ Maxthon - 1% or 9 votes…

http://sensorstechforum.com/which-is-the-most-secure-browser-for-2016-firefox-chrome-internet-explorer-safari-2

Final Considerations

Other questions should also be taken into consideration in addition to securing your :

▪ Do you update your browser whenever a new version is available?

▪ Have you configured your browser updates as automatic?

▪ Do you use third-party browser add-ons and plugins, and if yes, are you familiar with their developers?

▪ Do you install third party software from unknown download pages, without paying attention to the Download Agreement?

40 Finding & Investigating Digital Footprints with Open Source Intelligence Workshop

Dr Stephen Hill [email protected]

41