DEEP WEB IOTA Report for Tax Administrations

DEEP IOTA Report for Tax Administrations IOTA Report for Tax Administrations – Deep Web DEEP WEB IOTA Report for Tax Administrations Intra-European Organisation of Tax Administrations (IOTA) Budapest 2012 1 IOTA Report for Tax Administrations – Deep Web PREFACE This report on deep Web investigation is the second report from the IOTA “E- Commerce” Task Team of the “Prevention and Detection of VAT Fraud” Area Group. The team started operations in January 2011 in Wroclaw, Poland initially focusing its activities on problems associated with the audit of cloud computing, the report on which was published earlier in 2012. During the Task Teams’ second phase of work the focus has been on deep Web investigation. What can a tax administration possibly gain from the deep Web? For many the deep Web is something of a mystery, something for computer specialists, something they have heard about but do not understand. However, the depth of the Web should not represent a threat as the deep Web offers a very important source of valuable information to tax administrations. If you want to understand, to master and to fully exploit the deep Web, you need to see the opportunities that arise from using information buried deep within the Web, how to work within the environment and what other tax administrations have already achieved. This report is all about understanding, mastering and using the deep Web as the key to virtually all audits, not just those involving E-commerce. It shows what a tax administration can achieve by understanding the deep Web and how to use it to their advantage in every audit. Special thanks go to all members of the Prevention and Detection of VAT Fraud Area Group e-commerce Task Team, Irina Andrejeva (Latvia), Massimo Chiappetta (Italy), Dag Hardyson (Sweden), Håvard Moldjord (Norway), Krzysztof Rozanski (Poland) and Jean-Luc Wichoud (Switzerland) who worked hard to prepare and bring this report to publication. Figure 1. The Web source: http://thehiddenwiki.info/ 2 IOTA Report for Tax Administrations – Deep Web TABLE OF CONTENTS 1. Introduction.................................................................................6 1.1 The deep Web may be classified into one or more of the following .............. 7 1.1. Tips for dealing with deep Web content ............................................... 8 1.2. What can you find within the deep Web? .............................................. 8 2. Deep Web investigation for tax administration .......................................9 2.1. Examples .................................................................................... 10 1.1.1 Emission Trading System............................................. 11 1.1.2 Car trading ............................................................. 13 1.1.3 Used car parts ......................................................... 14 1.1.4 Gambling area......................................................... 15 1.1.5 Translators Proz.com................................................. 16 1.1.6 Wikileaks ............................................................... 17 1.1.7 Auction sites - Allegro................................................ 18 1.1.8 EC-Eyes ................................................................. 19 1.1.9 Copernic Agent Professional: Wall Street Italia.................. 21 1.1.10 Italian example: Koobface .......................................... 23 3. Problems for tax administrations...................................................... 25 3.1. Need for change in the audit culture .................................................. 26 3.2. Volumes of information .................................................................. 26 3.3. Qualified personnel ....................................................................... 27 3.4. Proper equipment and Internet access ................................................ 27 3.5. Working on a market without borders ................................................. 27 4. Recommendations........................................................................ 28 4.1. Stand alone equipment & Internet access............................................. 28 4.2. Working with external information .................................................... 28 4.3. Monitoring .................................................................................. 28 4.4. Dedicated Specialist Teams.............................................................. 29 4.5. Right tools for the Right Job: ........................................................... 29 4.6. Training of auditors and investigators................................................. 29 4.7. Links between operational and technical centre/e-commerce specialists ...... 29 4.8. Cooperation between Tax Agencies .................................................... 30 3 IOTA Report for Tax Administrations – Deep Web PREFACE The Task Team on E-commerce received its mandate from the Area Group Prevention and Detection of VAT fraud. It was decided to focus on new trends in E- commerce and in particular on cloud computing and deep Web investigation, this was based on the fact that e-commerce issues have already been addressed by other international organisation (OECD, European Commission etc.) Cloud computing was chosen because it is a new technology that will probably become essential in two or three years and from a fiscal point of view it involves new risks and new ways of thinking for tax investigators. Deep Web investigation was chosen because lots of information is available on the Web but more and more often it is hidden in the Web services. New tools have been developed offering opportunities to access public information hidden under the surface of the Web. It was agreed that the task team would divide its task and provide two reports on each selected subject. The following is the second report which concentrates on the subject of deep Web investigation and its impact on the work of tax administrations. Aims For the report on deep Web investigations the following objectives were set. To: Prepare a report on deep Web investigations focusing on: o Defining what the deep Web is o Presenting the opportunities available to tax administrations provided by information obtainable from the deep Web o Giving examples of the successful use of deep Web investigation in tax administrations o Making recommendations for tax administrations on best practice Provide a network of experts Provide a toolbox of software used in IOTA tax administrations to investigate on the deep Web Methodology Many remarks have been made on the way in which task teams approach the problem of collecting information for the work they do, in particular because it can be complicated and burdensome for tax administration employees to answer questionnaires produced by task teams due to the number and the length of such questionnaires and the lack of resources available to answer all questions effectively. In addition, the fact that country answers are published means that for many tax administrations the answers must be validated by the tax administration’s management or by a strategic team. This complicates and slows down the process. Due to the subject matter, the E-commerce task team decided to contact the network of IOTA specialists known to members of the team which they had built up during participation in different workshops and during the work on the report on cloud computing. 4 IOTA Report for Tax Administrations – Deep Web The members of the task team are experts in e-commerce and have a good understanding of deep Web investigations. They believe that their collective knowledge is sufficient to achieve the mandate of the Area group on the subject of deep Web investigations and to provide an interesting and useful report to the members of IOTA. 5 IOTA Report for Tax Administrations – Deep Web 1. INTRODUCTION When you use a search engine like Google or Bing, the information you get back is sometimes referred to as the "Surface Web" or the "Visible Web." However, there is a lot more information out there - There are millions of web pages that Google and Bing cannot find. That is the deep Web, (also called Deepnet, the invisible Web, DarkNet, Undernet, or the hidden Web). The deep Web refers to World Wide Web1 content that is not accessible through a search on general search engines. Mike Bergman, founder of BrightPlanet, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.2 Figure 2. Layers of the Web source: http://informationfreed.blogspot.ch/2011/11/searching-deep-dark-and- dead-web.html 1 The World Wide Web (abbreviated as WWW or W3 commonly known as the Web or the "Information Superhighway"), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. Source Wikipedia: World Wide Web, http://en.wikipedia.org/wiki/World_Wide_Web 2 Source Wikipedia:

DEEP WEB IOTA Report for Tax Administrations

A Study on Vertical and Broad-Based Search Engines

Meta Search Engine with an Intelligent Interface for Information Retrieval on Multiple Domains

Harnessing the Deep Web: Present and Future

Search Engines and Power: a Politics of Online (Mis-) Information

An Intelligent Meta Search Engine for Efficient Web Document Retrieval

Introduction to Web Search Engines

LIVIVO – the Vertical Search Engine for Life Sciences

Web Crawling with Carlos Castillo

A Study of Search Engines for Health Sciences

Comparative Study of Search Engines

Aggregated Search Interface Preferences in Multi-Session Search Tasks

Using Exclusive Web Crawlers to Store Better Results in Search Engines' Database