Finding & Investigating Digital Footprints with Open Source Intelligence Workshop Dr Stephen Hill
The Web Explained
Search Engines
▪ To be truly effective at online research and investigation, it is important to understand the unique and combined qualities of each search engine and to use them effectively in conjunction with each other…
1 Search Engines (Index)
▪ Search engines are "engines" or "robots" that crawl the web looking for new web pages ▪ These robots read the web pages and put the text (or parts of the text) into a large database or index that you can then access…
▪ Google - https://www.google.co.uk ▪ Bing - http://www.bing.com ▪ Yahoo - https://uk.yahoo.com ▪ Yandex - https://www.yandex.com
Index Search Explained
▪ Page A and Page B have equivalent location and frequency of keywords; however
▪ Page A has 20 external webpages linking to it and Page B has 40
▪ Based on the implication that Page B is more popular, it would achieve a higher page ranking within Google and Bing’s search results than Page A
▪ This information is significant to investigators as many of the webpages sought may be “hidden” or purposely forced to be “unpopular” by the owner due to the nature or intention of the site…
Point to Remember!
This presents a challenge when using Google and Bing as both of these search engines focus on presenting the most popular pages at the top of their search results
When using these search engines, it may be necessary to locate the least popular sites within millions of search results, proving time consuming and relatively ineffective…
2 Google – Index Search
https://www.google.com.au
Google – Index Search
https://www.google.co.nz
Google – Index Search (Regional)
https://www.google.co.uk
3 ‘Bubbling & Tracking’
Operating systems version Resolution of computer screen Average amount of search requests per day Average amount of search requests per topic (to finish search) Distribution of search services used (web / images / videos) Average position of search results clicked on Time of the day Current date Search History Topics of ads clicked on Location Frequency of clicking advertising Browser Frequency of searches of domains on Google Browsers version Computer being used Language being used Time to type in a query Time we spent on the search result page Time between selecting different results for the same query Operating system Frequency clicking on adsense advertising on other websites
http://www.rene-pickhardt.de/google-uses-57-signals-to-filter
Google – Time Filter
4 Google – Time Filter
Google – Cache
Google – Cache
http://webcache.googleusercontent.com/search?q=cache:efj0Wj8fzxUJ:dfk.com/+ &cd=1&hl=en&ct=clnk&gl=au
5 Google – Similar
Google – Similar
Google Image Search
6 Google Image Search
Google Image Search
Google Image Search – Face Filter
7 Google Image Search
Google Image Search
Google Reverse Image Search
8 Google Reverse Image Search
Google Reverse Image Search
BEYOND GOOGLE
9 Bing
https://www.bing.com
Google & Bing
http://advangle.com
10 Google & Bing
Google & Bing
http://advangle.com
Google & Bing
http://advangle.com
11 Search Directories
▪ Search directories are hierarchical databases with references to web sites ▪ The web sites that are included are hand picked by individuals and classified according to the rules of that particular search service
▪ Yahoo Directory - https://business.yahoo.com ▪ BOTW - http://botw.org ▪ DMOZ - http://www.dmoz.org
DMOZ
http://www.dmoz.org
StartPage
https://startpage.com
12 13 Carrot2
http://search.carrot2.org
Yippy - Cluster Search
Formerly known as ‘Clusty’
http://www.yippy.com
14 DuckDuckGo
http://duckduckgo.com
15 DuckDuckGo Bangs
https://duckduckgo.com/bang
Semantic Search
www.cluuz.com
Qwant
https://www.qwant.com
16 Qwant
https://www.qwant.com
Exalead - Advanced
http://www.exalead.com/search
Where to Find Search Engines?
www.searchenginecolossus.com
17 Advanced Search Techniques
▪ Phrase searching: “fraud in New Zealand”
▪ Boolean search: AND* fraud, NOT* scam
▪ Google Alternative: “fraud”, -scam
▪ Boolean search: fraud OR scam OR swindle
▪ Parentheses: ( ) also known as nesting…
* Will not work with Google
Check the Spelling
▪ Remember words are can be spelt differently or there may be a misspelt word or typo on the website you are looking for hence why some search engines fail to find the word/phrase
▪ Consider spelling and typo’s ▪ Tyres & Tires, colour & color
▪ Stephen Hill, Steven Hill, Steve Hill
▪ Serach Engine, Fraud Invesdigation...
Wildcards *
In most search engines and directories, a search for investigat* will give you pages with the words including:
investigate, investigated, investigation, investigator
Note: Google uses a process called stemming
18 Truncation & Wildcards *
Other ways to search using the *
" * * director of HTC Parking and Security Limited“ = ?
"Ms Anna Koltsova phone *" =?
"the * population of Auckland is" = ?
Parentheses
▪ Require the terms and operations that occur inside the brackets to be searched first ▪ This is called "nesting"
“identity theft” ((organized OR organised) -crime)
▪ Parentheses MUST BE USED to group terms joined by OR when there is any other Boolean operator in the search…
19 Keyword Searching
Finding Archived Web Pages
https://archive.org/web
Internet Archive
http://archive.org/web
20 News Links
http://www.onlinenewspapers.com/ http://www.world-newspapers.com/ http://www.listofnewspapers.com/ http://www.refdesk.com/paper.html http://www.allyoucanread.com/ http://www.actualidad.com/ http://www.thepaperboy.com/newspapers-by-country.cfm http://news.silobreaker.com/
Real Time News
http://www.newsola.com
21 News Links
22 Classifieds - A Criminal Hotspot?
People Search
https://pipl.com
Company Search
https://opencorporates.com
23 Company Search
https://www.gov.uk/government/publications/overseas-registries/overseas-registries
Paste Sites – What Could You Find?
▪ Paste sites are websites allowing users to upload text for public viewing. ▪ Originally designed for software developers who needed a place to store large amounts of text ▪ Links would be created to the text and the user could share the link with other programmers to review the code. ▪ Many hacking groups use this area of the Internet to store compromised data. ▪ Most popular site – ‘Pastebin’
Tools for Social Media Intelligence
24 Facebook
Facebook Search
25 LinkedIn Search
LinkedIn Search
https://www.linkedin.com/help/linkedin/answer/76015
26 Twitter Search
27 28 Social Searcher
http://www.social-searcher.com
Social Searcher
http://www.social-searcher.com
Social Searcher
http://www.social-searcher.com
29 Reverse Image & EXIF Extraction
Reverse Image Search
http://www.tineye.com
Reverse Image Search
30 Reverse Image Search
Reverse Image Search
http://www.tineye.com/
Metadata (EXIF)
▪ Exchangeable Image File Format
▪ Standard that specifies the formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners etc
▪ Applied to JPEG & TIFF images and can include;
▪ Original Image date & time, modified dated & time ▪ Camera details including ‘geolocation’ settings…
31 EXIF Sites to Consider
Jeffrey’s EXIF Viewer ▪ http://regex.info/exif.cgi
Others ▪ http://www.takenet.or.jp/~ryuuji/minisoft/exifread/english/ ▪ http://www.impulseadventure.com/photo/jpeg-snoop.html ▪ http://www.sno.phy.queensu.ca/~phil/exiftool
Camera Trace ▪ http://cameratrace.com/trace ▪ http://www.stolencamerafinder.com
Video Metadata ▪ https://mediaarea.net/en/MediaInfo
Where Was This taken?
https://petapixel.com/assets/uploads/2012/12/fugitivemcafee.jpg Tracing Location of a Photo
32 http://petapixel.com/assets/uploads /2012/12/fugitivemcafee.jpg
33 WHOIS
WHOIS
WHOIS
http://whois.domaintools.com/planethollywoodlondon.com
34 Hiding Your Identity Online
Disguising your ID
▪ Every time you surf the Internet, your IP address is publicly visible to everyone on target network resources ▪ It is important therefore not to leave a digital footprint...
Sock (Finger) Puppets
4 steps to create a sock puppet:
▪ Create fake ID – use name generator
▪ Create fake profiles/user accounts on Facebook etc.
▪ Fake/disguised email, phone and IP details
▪ Consider payment method – pre-paid credit card…
35 http://www.fakenamegenerator.com
Disguising Your Online ID
Proxy and VPN services re-route your internet traffic and change your IP
A Proxy is like a web filter
▪ Proxy will only secure traffic via the internet browser using the proxy server settings
A VPN encrypts all of your traffic
▪ VPN’s replace your ISP and route all traffic through the VPN server, including all programs and applications...
TOR
https://www.torproject.org
36 TOR
“Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: It prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location. Tor works with many of your existing applications, including web browsers, instant messaging clients, remote login, and other applications based on the TCP protocol”.
Who is using Tor?
▪ Normal people (e.g. protect their browsing records)
▪ Militaries (e.g. military field agents)
▪ Journalists and their audiences (e.g. citizen journalists encouraging social change)
▪ Law enforcement officers (e.g. for online “undercover” operations)
▪ Activists and Whistleblowers (e.g. avoid persecution while still raising a voice)
▪ Bloggers
▪ IT professionals (e.g. during development and operational testing, access internet resources while leaving security policies in place)
37 Tor Project Some of the software and services under the Tor project umbrella:
▪ Torbutton ▪ Tor Browser Bundle ▪ Vidalia ▪ Orbot ▪ Tails ▪ Onionoo ▪ Metrics Portal ▪ Tor Cloud ▪ Shadow ▪ Tor2web…
https://tails.boum.org Tails
TOR to Web
https://tor2web.org
38 VPN Options
https://www.privateinternetaccess.com
How Safe is your Browser?
https://panopticlick.eff.org
39 Public Vote on Secure Browser
Source: Sensors Tech Forum (http://sensorstechforum.com)
The users voted that the most secure browsers are:
▪ Google Chrome - 49% or 296 votes ▪ Mozilla Firefox - 31% of votes, or 187 voters ▪ Internet Explorer - 7% or 43 voters ▪ Safari and Opera both got 4% or 25 votes ▪ Microsoft Edge - 3%, or 19 votes ▪ Maxthon - 1% or 9 votes…
http://sensorstechforum.com/which-is-the-most-secure-browser-for-2016-firefox-chrome-internet-explorer-safari-2
Final Considerations
Other questions should also be taken into consideration in addition to securing your web browser:
▪ Do you update your browser whenever a new version is available?
▪ Have you configured your browser updates as automatic?
▪ Do you use third-party browser add-ons and plugins, and if yes, are you familiar with their developers?
▪ Do you install third party software from unknown download pages, without paying attention to the Download Agreement?
40 Finding & Investigating Digital Footprints with Open Source Intelligence Workshop
Dr Stephen Hill [email protected]
41