DEEP

IOTA Report for Tax Administrations

IOTA Report for Tax Administrations –

DEEP WEB IOTA Report for Tax Administrations

Intra-European Organisation of Tax Administrations (IOTA)

Budapest 2012

1 IOTA Report for Tax Administrations – Deep Web

PREFACE

This report on deep Web investigation is the second report from the IOTA “E- Commerce” Task Team of the “Prevention and Detection of VAT Fraud” Area Group. The team started operations in January 2011 in Wroclaw, Poland initially focusing its activities on problems associated with the audit of cloud computing, the report on which was published earlier in 2012. During the Task Teams’ second phase of work the focus has been on deep Web investigation. What can a tax administration possibly gain from the deep Web? For many the deep Web is something of a mystery, something for computer specialists, something they have heard about but do not understand. However, the depth of the Web should not represent a threat as the deep Web offers a very important source of valuable information to tax administrations. If you want to understand, to master and to fully exploit the deep Web, you need to see the opportunities that arise from using information buried deep within the Web, how to work within the environment and what other tax administrations have already achieved. This report is all about understanding, mastering and using the deep Web as the key to virtually all audits, not just those involving E-commerce. It shows what a tax administration can achieve by understanding the deep Web and how to use it to their advantage in every audit. Special thanks go to all members of the Prevention and Detection of VAT Fraud Area Group e-commerce Task Team, Irina Andrejeva (Latvia), Massimo Chiappetta (Italy), Dag Hardyson (Sweden), Håvard Moldjord (Norway), Krzysztof Rozanski (Poland) and Jean-Luc Wichoud (Switzerland) who worked hard to prepare and bring this report to publication.

Figure 1. The Web source: http://thehiddenwiki.info/

2 IOTA Report for Tax Administrations – Deep Web

TABLE OF CONTENTS

1. Introduction...... 6 1.1 The deep Web may be classified into one or more of the following ...... 7 1.1. Tips for dealing with deep Web content ...... 8 1.2. What can you find within the deep Web? ...... 8 2. Deep Web investigation for tax administration ...... 9 2.1. Examples ...... 10 1.1.1 Emission Trading System...... 11 1.1.2 Car trading ...... 13 1.1.3 Used car parts ...... 14 1.1.4 Gambling area...... 15 1.1.5 Translators Proz.com...... 16 1.1.6 Wikileaks ...... 17 1.1.7 Auction sites - Allegro...... 18 1.1.8 EC-Eyes ...... 19 1.1.9 Copernic Agent Professional: Wall Street Italia...... 21 1.1.10 Italian example: Koobface ...... 23 3. Problems for tax administrations...... 25 3.1. Need for change in the audit culture ...... 26 3.2. Volumes of information ...... 26 3.3. Qualified personnel ...... 27 3.4. Proper equipment and Internet access ...... 27 3.5. Working on a market without borders ...... 27 4. Recommendations...... 28 4.1. Stand alone equipment & Internet access...... 28 4.2. Working with external information ...... 28 4.3. Monitoring ...... 28 4.4. Dedicated Specialist Teams...... 29 4.5. Right tools for the Right Job: ...... 29 4.6. Training of auditors and investigators...... 29 4.7. Links between operational and technical centre/e-commerce specialists ...... 29 4.8. Cooperation between Tax Agencies ...... 30

3 IOTA Report for Tax Administrations – Deep Web

PREFACE

The Task Team on E-commerce received its mandate from the Area Group Prevention and Detection of VAT fraud. It was decided to focus on new trends in E- commerce and in particular on cloud computing and deep Web investigation, this was based on the fact that e-commerce issues have already been addressed by other international organisation (OECD, European Commission etc.)  Cloud computing was chosen because it is a new technology that will probably become essential in two or three years and from a fiscal point of view it involves new risks and new ways of thinking for tax investigators.

 Deep Web investigation was chosen because lots of information is available on the Web but more and more often it is hidden in the Web services. New tools have been developed offering opportunities to access public information hidden under the surface of the Web. It was agreed that the task team would divide its task and provide two reports on each selected subject. The following is the second report which concentrates on the subject of deep Web investigation and its impact on the work of tax administrations. Aims

For the report on deep Web investigations the following objectives were set. To:  Prepare a report on deep Web investigations focusing on: o Defining what the deep Web is o Presenting the opportunities available to tax administrations provided by information obtainable from the deep Web o Giving examples of the successful use of deep Web investigation in tax administrations o Making recommendations for tax administrations on best practice  Provide a network of experts  Provide a toolbox of software used in IOTA tax administrations to investigate on the deep Web Methodology

Many remarks have been made on the way in which task teams approach the problem of collecting information for the work they do, in particular because it can be complicated and burdensome for tax administration employees to answer questionnaires produced by task teams due to the number and the length of such questionnaires and the lack of resources available to answer all questions effectively. In addition, the fact that country answers are published means that for many tax administrations the answers must be validated by the tax administration’s management or by a strategic team. This complicates and slows down the process. Due to the subject matter, the E-commerce task team decided to contact the network of IOTA specialists known to members of the team which they had built up during participation in different workshops and during the work on the report on cloud computing.

4 IOTA Report for Tax Administrations – Deep Web

The members of the task team are experts in e-commerce and have a good understanding of deep Web investigations. They believe that their collective knowledge is sufficient to achieve the mandate of the Area group on the subject of deep Web investigations and to provide an interesting and useful report to the members of IOTA.

5 IOTA Report for Tax Administrations – Deep Web

1. INTRODUCTION

When you use a like Google or Bing, the information you get back is sometimes referred to as the "Surface Web" or the "Visible Web." However, there is a lot more information out there - There are millions of web pages that Google and Bing cannot find. That is the deep Web, (also called Deepnet, the invisible Web, DarkNet, Undernet, or the hidden Web). The deep Web refers to World Wide Web1 content that is not accessible through a search on general search engines. Mike Bergman, founder of BrightPlanet, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.2

Figure 2. Layers of the Web source: http://informationfreed.blogspot.ch/2011/11/searching-deep-dark-and- dead-web.html

1 The (abbreviated as WWW or W3 commonly known as the Web or the "Information Superhighway"), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. Source Wikipedia: World Wide Web, http://en.wikipedia.org/wiki/World_Wide_Web 2 Source Wikipedia: Deep Web, http://en.wikipedia.org/wiki/Deep_Web

6 IOTA Report for Tax Administrations – Deep Web

1.1 The deep Web may be classified into one or more of the following categories:

To discover content on the Web, search engines use web crawlers that follow hyperlinks. This technique is ideal for discovering resources on the surface Web but is often ineffective at finding deep Web resources. For example, these crawlers do not attempt to find dynamic pages that are the result of database queries due to the infinite number of queries that are possible

 Dynamic content: dynamic web pages 3which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.  Unlinked content: pages which are not linked to by other pages, which may prevent Web crawling 4programs from accessing the content. This content is referred to as pages without back links (or in links).  Private Web: sites that require registration and login (password-protected resources).  Contextual Web: pages with varying content for different access contexts (e.g., ranges of client IP addresses or previous navigation sequences).  Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headers which prohibit search engines from browsing them and creating cached 5copies.  Scripted content: pages that are only accessible through links produced by JavaScript6 as well as content dynamically downloaded from Web servers via Flash 7or Ajax 8solutions.  Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats 9not handled by search engines.

3 Dynamic web pages are web sites that are generated at the time of access by a user or change as a result of interaction with the user. Source Wikipedia: Dynamic web page, http://en.wikipedia.org/wiki/Dynamic_Web_page 4 A is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Source Wikipedia: Web crawler, http://en.wikipedia.org/wiki/Web_crawling 5 A cache, in computer science, is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. If requested data is contained in the cache (cache hit), this request can be served by simply reading the cache, which is comparatively faster. Otherwise (cache miss), the data has to be recomputed or fetched from its original storage location, which is comparatively slower. Hence, the greater the number of requests that can be served from the cache, the faster the overall system performance becomes. Source Wikipedia: Cache (computing), http://en.wikipedia.org/wiki/Cache_(computing) 6 JavaScript (sometimes abbreviated JS) is a prototype‐based scripting language that is dynamic, weakly typed and has first‐class functions. It is a multi‐paradigm language, supporting object‐oriented, imperative, and functional programming styles. Source Wikipedia: JavaScript, http://en.wikipedia.org/wiki/JavaScript

7 IOTA Report for Tax Administrations – Deep Web

 Text content using the Gopher protocol 10and files hosted on FTP 11that are not indexed by most search engines. Engines such as Google do not index pages outside of the HTTP protocol. 12

1.1. Tips for dealing with deep Web content

 Vertical search can solve some of the problems with the deep Web. With vertical search, you can query a collection of data focusing on a specific topic, industry, type of content, geographical location, language, file type, web site, piece of data, etc. For example, on the social Web, there are search engines for blogs, RSS feeds, Twitter content, and so on.  Use a general search engine to locate a vertical search engine. For example, a Google search on "stock market search" will retrieve sites that allow you to search for current stock prices, market news, etc. This may be thought of as split level searching. For the first level, search for the database site. For the second level, go to the site and search the database itself for the information you want.  Try to figure out which kind of information might be stored in a database. There is no general rule. But think about large listings of things with a common theme. 13

1.2. What can you find within the deep Web?

Directories are part of the deep Web. These can include things like:

 items for sale in a Web store or on Web-based auctions  phone books  "people finders" such as lists of professionals; doctors or lawyers  patents  laws  dictionary definitions  digital exhibits

7 Adobe Flash (formerly Macromedia Flash) is a multimedia platform used to add animation, video, and interactivity to web pages. Flash is frequently used for advertisements, games and flash animations for broadcast. Source Wikipedia: Adobe Flash, http://en.wikipedia.org/wiki/Macromedia_Flash 8 Ajax (also AJAX; an acronym for Asynchronous JavaScript and XML) is a group of interrelated web development techniques used on the client-side to create asynchronous web applications. With Ajax, web applications can send data to, and retrieve data from, a server asynchronously (in the background) without interfering with the display and behavior of the existing page. Source Wikipedia: Ajax (programming), http://en.wikipedia.org/wiki/Ajax_%28programming%29 9 A file format is a particular way that information is encoded for storage in a computer file, as files need a way to be represented as bits when stored on a disc drive or other digital storage medium. File formats can be divided into two types: proprietary and open formats. Source Wikipedia: File format, http://en.wikipedia.org/wiki/File_formats 10 The Gopher protocol is a TCP/IP application layer protocol designed for distributing, searching, and retrieving documents over the Internet. Strongly oriented towards a menu-document design, the Gopher protocol presented an attractive alternative to the World Wide Web in its early stages, but ultimately failed to achieve popularity. Source Wikipedia, Gopher (protocol), http://en.wikipedia.org/wiki/Gopher_protocol 11 File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. It is often used to upload web pages and other documents from a private development machine to a public web-hosting server. Source Wikipedia: File Transfer Protocol, http://en.wikipedia.org/wiki/File_Transfer_Protocol 12 Source Wikipedia: Deep Web, http://en.wikipedia.org/wiki/Deep_Web 13 Source Internet Tutorials: The Deep Web, http://www.internettutorials.net/deepweb.asp

8 IOTA Report for Tax Administrations – Deep Web

 multimedia and graphical files  forums

Information that is new and constantly changing in content will appear on the deep Web. Look to the deep Web for late breaking items, such as:

 news  job postings  available airline flights, hotel rooms, etc.  stock and bond prices, market averages, etc.14

Figure 3. Types of information available on the deep Web source: http://brightplanet.com/wp-content/uploads/2012/03/12550176481- deepwebwhitepaper1.pdf

2. DEEP WEB INVESTIGATION FOR TAX ADMINISTRATION

The deep Web is an incredible source of information, not only for customers and businesses, but also for tax administrations. Without knowing it, each tax auditor has already used a deep Web resource to prepare an audit, for example in

14 Source Internet Tutorials: The Deep Web, http://www.internettutorials.net/deepweb.asp

9 IOTA Report for Tax Administrations – Deep Web searching for a phone number online or in checking taxpayers’ information on a trade register online. Deep Web is not only about e-commerce, it is much larger, it is about everything. Every business has information on the Web and the biggest part is in the deep Web. Considerable information can be found about “normal”15 transactions that generate traces on the deep Web. A typical example is car trading where, in many countries, the largest number of car sales are published on specialised Web sites with their content located on the deep Web, but the actual transactions are done face to face. This is not an e-commerce business, but traces and information about the transactions are available online on the deep Web. Everything is available on the Web. You don’t need to go to the city centre for shopping, you can do it online. Even if you want to buy something in the normal face-to-face way, you can very often go to a Web resource to find information and compare prices first. The largest volume of information held on the Web is saved in databases in the deep Web. This is why it is crucial to explore this area, not just because it a new trend, but because it is an incredible source of freely available information. Why tax administrations should be interested in the subject? There is a lot to gain from the deep Web:  It is a new source with information available on virtually all subjects  No control project should start without a search of the Web (contact with technical centre)  It reduces the information gap between the Tax Administration and the private economy because so much information is available on the deep Web  It offers the opportunity of examining the electronic ground (Internet, Social network, Media)  It helps to identify networks  It cover “real” businesses as well as e-commerce

2.1. Examples

Instead of presenting a long list of what is possible on the deep Web and how information collected on the deep Web could be used by tax administration the task Team decided to offer practical examples of best practices. From their perspective it has a better impact because it shows what has already been achieved by IOTA member tax administrations and what could possibly be adapted and extend into other countries. In this section examples are provided on various trade sectors and for different taxes. The team has tried to show a range of examples, some of which are technically more complicated than others, some are VAT-fraud oriented and other have nothing to do with VAT, some are focused on finding information on individuals or businesses, other concentrate on transactions. However, each example provides ideas on ways to acquire information from the deep Web to aid in tax compliance.

15 Normal transactions are defined here as those which are unrelated to e-commerce transactions

10 IOTA Report for Tax Administrations – Deep Web

1.1.1 Emission Trading System16

At the end of 2008 – beginning of 2009 a number of European countries faced a new type of fraud involving greenhouse gas emissions (also known as carbon credits). This new type of fraud spread across Europe and caused huge losses to many countries’ budgets. According to EU legislation, which is common is this particular field for all EU member states, all participants in the carbon credit trading market must be registered with the EU Emissions Trading System (EU ETS) where all transactions made with carbon credits are recorded. The ETS operates in 30 countries (the 27 EU Member States, Iceland, Liechtenstein and Norway). Therefore any person or company wishing to be active in the carbon credits trading market needs to open an account in the EU ETS, providing contact details of the account holder, plus a list and contact details of the representative persons. This information is public and available on the Website of the Community independent transaction log (CITL).

Figure 4. Community independent transaction log source: http://ec.europa.eu/environment/ets/account.do;jsessionid=3c1TTXJHxGT3yLf4Q y5xXMqBPJ8spMvBqCWWTHfQR3nzhbt5m3Wy!-2056334981 As the carbon credits trading market within the EU is an open market, any company from any EU member state can be registered in the national Emissions Trading Registry of another member state for carbon credits trading purposes. As the

16 Example provided by Irina Andrejeva from Latvia

11 IOTA Report for Tax Administrations – Deep Web authorities responsible for supervising the Emissions Trading Registries are not associated with the tax administrations there is a risk that companies from one member state can perform activities in other member states without reporting the results of such activities to the tax administration of its resident country. Therefore, having access to the information on persons registered in the Emissions Trading Registries of all EU member states is very important for tax administrations for identification of taxpayers under its jurisdiction. The CITL Website provides opportunities to carry out simple searches on-line. However, the on-line database does not provide facilities for performing more complex searches. It is possible to download the database in .xml format for further processing. For example, MS Excel functionality allows data stored in .xml format to be imported to a spreadsheet from which searches, sorts and other MS Excel functions can be applied to the data. Depending on the IT tools available to the tax administration data can also be combined and even cross-referenced with their own data. In the Emissions Trading Registry the following information on registered persons is available: - account ID and name that usually corresponds with the name of the company - address (incl. street, city, country, postal code), phone number and email address - names of the representatives - contact information of the representatives (address, city, country, phone number, email address)

Figure 5. Community indepen dent transaction log source: http://ec.europa.eu/environment/ets/singleAccount.do?accountID=82713®istry Code=GB&action=details&languageCode=en&returnURL=languageCode%3Den%26acc ount.registryCodes%3DGB%26account.accountTypeCodes%3D121%26identifierInReg% 3D%25252503%26accountHolder%3D%26primaryAuthRep%3D%26search%3DSearch%26s earchType%3Daccount%26currentSortSettings%3D%26resultList.currentPageNumber% 3D1 Using information given in the Emissions Trading Registry tax authorities can perform searches to identify persons who may have a residence and consequently taxation obligations in their country. The searches can be performed on addresses, city names as well as phone numbers having a link to a particular country. According to Latvian law on VAT all VAT liable persons are obliged to submit annexes on input tax and output tax where detailed information on all incoming

12 IOTA Report for Tax Administrations – Deep Web and out-going transactions must be declared to the tax administration together with their VAT returns. Therefore the Latvian tax administration has at its disposal information on all movements of goods and services. The information from the Latvian Emissions Trading System on transactions made with carbon credits was cross-checked with information on transactions declared by taxpayers in their VAT returns. The result of the cross-checks identified those taxpayers that: - had not declared transactions made with carbon credits in their VAT returns, - had incorrectly declared on their VAT returns transactions made with carbon credits (e.g. declaring such transactions as being the supply of goods), - in transactions involving carbon credits have declared a price that is lower than the market price. To tackle the identified problems the Latvian tax administration performed a number of preventive measures. Taxpayers were informed on the irregularities and mistakes found and were asked to make corrections in their VAT returns or to give explanations as to why (e.g. performing transactions on prices under the market value). 1.1.2 Car trading17

Quite often car dealers have cars that they can’t sell to their normal customers, for example a new car dealer who buys an old car from a customer in part exchange for a new one. This car would not be easy to sell to his usual clients so having a large customer network is crucial. That is why, with the development of the Web, car trading platforms have grown very fast and have become extremely popular both with car traders and potential customers who use them to find the best offers. In Switzerland a small number of platforms cover the largest part of the market. Car dealing has always been a risky area when it comes to VAT with two major modus operandi being used. The first is simply undeclared turnover, because the buyer is often the final consumer, so it is difficult to find evidence because VAT is often not deductible. The second method is the incorrect application of the margin scheme for second hand cars. This problem has been partially resolved with the introduction of the new VAT law since 1 January 2010. To help tax auditors and to improve audit targeting, information on the car web platforms has been used to:  -find unregistered dealers and tackle the black market  -help the auditor to prepare their audit by collecting information on potential transactions. As an example, one of the major players is named www.autoscout24.ch and covers many of the active dealers. It helps the Swiss tax administration to identify unregistered businesses as well as being an interesting source of information for tax auditors whilst preparing for an audit.

17 Example provided by Jean-Luc Wichoud from Switzerland

13 IOTA Report for Tax Administrations – Deep Web

Figure 6. AutoScout24 source: www.autoscout24.ch This platform has a deep Web structure because the information published on the Website is located in a database. This database offers the opportunity to make online queries, for example to find a car from 2005 using less than 4 litres of fuel per 100 km and the dealer should not be located more than 20 km from the current position. To extract information from this deep Web database the Swiss tax administration has used scripts, written using iMacro software, which allows them to collect information from the Website automatically. 2 scripts were prepared; one going through the users of the Website so the tax administration could obtain a list of all the dealers, with names, phone numbers, etc. The second script offers the option to put a particular dealer under investigation, this means that every night the list of cars for sale is automatically extracted and a list of potential transactions for the auditors is prepared. 1.1.3 Used car parts18

Many traders with similar goods or services can work together to jointly build an Internet platform or Website on the Internet to sell their products. On the Website the customer can search for a specific product regardless of which merchant owns the product. All merchants on the Website appear as a retail unit. When a number of merchants contribute to one Website, their goods (or services) will be structured in a database. The database is searchable but only by the customer that visits the Website and uses the search box or search form on the site. Generally, when it comes to databases it is possible for search engines to index some of the content but hardly ever the complete database. A search engine cannot be used to search through the content of a database and have complete, structured, organized results. To do that another tool is required.

18 Example provided by Dag Hardyson from Sweden

14 IOTA Report for Tax Administrations – Deep Web

Figure 7. Bildelsbasen.se source: www.bildelsbasen.se In the database Bildelsbasen.se more than 5,1 million used car parts belonging to 153 affiliated companies engaged in trading used car parts can be found. As a realistic measure, there are three major databases of used car parts in Sweden and there are 350 dealers contributing to these databases. The dealers together offer over 12 million used car parts for sale. If all the data from these three databases could be collected through the Internet and stored in a common database it would be possible to produce a list of used car parts sorted by each merchant. If this could be done every month an inventory list for each company connected to these Websites could be created and it could be easily noted which products have disappeared from the database (probably sold) and which products have been added. Tax administrations have previously never been able to collect as accurate information (for control purpose) from companies as today without auditing them. The Internet is full of information, often organized within databases. In practice, a few auditors with the support of a technical centre (or similar) could check a whole market within a country using public information held in databases on the Web if they had the tools to retrieve the information automatically. 1.1.4 Gambling area19

If the winnings from poker games are taxable in a particular country the question will arise as to how many poker players are present in the country and what have they won? In Searching for information about players and winnings the Internet is of course used. By doing so users will certainly find The Poker Database called The Hendon Mob. Here players from all countries and data about winnings divided into years can be found. The following picture shows information on some Austrian players.

19 Example provided by Dag Hardyson from Sweden

15 IOTA Report for Tax Administrations – Deep Web

Figure 8. The Hendon Mob source: www.thehendonmob.com This database contains information about the players: Names, country, poker tournaments, dates, places and price. Getting all the information from the database will give tax administrations a very good basis from which to start enquiries into the taxation of poker players. Note that the database only contains winnings from tournaments not from cash games on the Internet. Search engines can help to find the Website that contains the database but to extract the data from the database other tools are needed. Search engines can in some cases index some of the content but hardly ever the complete content. While using a search engine to search the content of a database never expect to have complete results. To do that –another tool such as ECEyes is needed. The Poker Database is in the deep Web and the information in the database is very important for tax administrations. This aspect must be considered when consideration is given to acquiring data from the Internet. All tax administrations require some form of technical centre that have the necessary tools needed to retrieve information from the Web whether it is in the surface web or in the deep Web. 1.1.5 Translators Proz.com20

This example is about translators associated with one web service – ProZ.com. The Website is registered to a person in New York. To search for a translator that fits a particular job this Website can be used. The Website allows a translator to create a personal profile and to set up the type of translation work that suits them.

20 Example provided by Dag Hardyson from Sweden

16 IOTA Report for Tax Administrations – Deep Web

Figure 9. The translation workplace PROZ.com source: www.proz.com On the Website the following information can be found. “With over 300,000 professional translators and translation companies – and no fees or commissions for clients – ProZ.com offers the largest directory of professional translation services”. On the Website all translators available in particular country can be found virtually. By searching for translators who translate from English to a particular language an idea of how many translators are connected to this Website can be found. It is possible to find a lot of information about a translator like: name, address, telephone number, e-mail address, country, own Website, expertise, credentials, native language, experience, references, KudoZ ProZ Point (customer satisfaction – used to rank freelancers) and “about me” and more information. As an example there are 2,574 hits on services that translate English into Czech. All 300,000 translators are organized in a database. If there is an interest in creating a list of all the translators in a particular country that have created a profile on ProZ a search engine like Google cannot be used if you require a complete, structured and organized result. The information on the Website is public and open to all users including tax administrations but it is in the deep Web. 1.1.6 Wikileaks21

There were articles at one time that spread the news that WikiLeaks was about to acquire details of thousands of Swiss bank accounts and put them on their Website. In January 2011 John Christensen from the Tax Justice Network estimated that $20 trillion is held offshore with the intention to evade taxes. If tax administrations find it interesting to examine data from WikiLeaks it is of course possible to visit the Website and search on for the data of interest, but if all the information needed must be complete, structured and organized then a special tool is necessary. In this case a tool such as DoRes (from the Dutch tax administration – Belastingdienst) can be used.

21 Example provided by Dag Hardyson from Sweden

17 IOTA Report for Tax Administrations – Deep Web

Figure 10. WikiLeaks source: www.wikileaks.com The Dutch tax administration has downloaded data from the WikiLeaks database. The Dutch tax administration uses their own tool – DoRes to download and structure the data. DoRes is described in the tool box which is annex to this report. Technical performance: First the URL’s of each classification from the Website (e.g. Wikileaks.org or Cablesearch.org) need to be obtained. Then they need to be put in DoRes, all received files need to be copied to a computer and then converted to a spread sheet. 1.1.7 Auction sites - Allegro22

This example is about the auction site Allegro and Application Programming Interface23. This Website is the biggest auction site in Poland with twenty five million registered users and three hundred thousand transactions per day. All of that data is organised in a big database which needs to be retained for at least five years based on Polish tax regulations. Completed transactions, after a defined time, are moved to an archive and are not visible online anymore. The user page does not provide information like prices and auction titles as opposed to, for example, the well-known auction site eBay. Only positive or negative comments are visible which makes it very difficult to estimate turnover and select individuals for audit.

22 Example provided by Krzysztof Rozanski from Poland 23 Application Programming Interface is a specification intended to be used as an interface by software components to communicate with each other, in this case the user with the database. Source Wikipedia: Application programming interface, http://en.wikipedia.org/wiki/Application_programming_interface

18 IOTA Report for Tax Administrations – Deep Web

Figure 11. Auction site Allegro source: www.allegro.pl The Allegro Website provides an Application Programming Interface for users to access the database and build their own software for automation of sales. Based on that interface and built-in functions it is possible to create software that collects all auctions (with prices, titles of the auction and html content). Too many http requests from the same IP number are blocked by the Allegro security policy and it makes it impossible to scrape the data that way. Application Programming Interface however allows up to 5 working processes at the same time, each process can download 50 auctions in one query without the IP being blocked. The information is public but it is in the deep Web, hidden in the database. The only way to get the data is by use of the Application Programming Interface. 1.1.8 EC-Eyes24

ECEyes is an analysis tool developed by Mr. Sven H Johansson and Mr. Jonas Björklund from the Swedish Tax Agency. The tool is free to download and use for all tax administrations connected with IOTA. ECEyes can be downloaded from the IOTA Website.

24 Example purpose by Dag Hardyson from Sweden

19 IOTA Report for Tax Administrations – Deep Web

Figure 14. ECEyes ECEyes is a very broad tool with many features which has the ability to support the process of analysing one or more Websites or a certain individual or company. ECEyes provides a variety of components that can be used for mapping web sites or people (searching, finding, identification and analysing) The tool contains five fixed browsers, two text sheets with text tools, a spider/crawler, a database, different methods to utilize online resources, three different ways of extracting links, automated WhoIs requests, many Domain Name System (DNS) features and much more. Documentation of the work may be done in different ways and all work is stored in a session folder. ECEyes is a tool created by tax auditors for tax auditors. ECEyes and the deep Web (Web resources) ECEyes connects to a lot of web resources divided into three search areas: national search, engine search and domain search. All together there are about 180 different web resources as default in ECEyes and additional web resources tax administrations find valuable in their work can be easily added. What those web resources are showing is mainly information from the deep Web. A small example of the default resources to be found in ECEyes includes:  WhoIs information  traffic information – traffic ranks  older versions of a Website  hosting information  hosting history  Websites with the same IP number  dedicated and co-located servers  referrers directly linking to pages on the Website you analyse  information from social media  and much more. ECEyes for the deep Web (Databases) ECEyes can be used in the deep Web context even when it comes to databases, which can be can be demonstrated by tracing the following steps: 1. To create URL’s with the text tools in ECEyes 2. To copy the URL’s to the processing list

20 IOTA Report for Tax Administrations – Deep Web

3. To download the pages automatically 4. To join the downloaded pages in to one single file 5. To edit this file by using an analysis and conversion software such as ACL or IDEA On the IOTA Website is a PowerPoint presentation “ECEyes for the Deep Web” where it is explained step by step how to download and structure data from a database with the support of ECEyes. In the example a poker database is chosen and it is shown how to extract information about poker players and their winnings from the on-line database.

1.1.9 Copernic Agent Professional: Wall Street Italia 25

Figure 15. Wall Street Italia, Inc. webpage on Facebook source: http://www.facebook.com/pages/Wall-Street-Italia/132595830103271 This example shows how an OSINT analysis is carried out by using the Copernic Agent Professional software.

25 Example provided by Massimo Chiappetta from Italy

21 IOTA Report for Tax Administrations – Deep Web

Figure 16. Copernic Agent Professional Foreword For the purposes of this illustration the company Wall Street Italia, Inc. – WSI – is used (website: www.wallstreetitalia.com). The company is described on their Website as follows: ””Wall Street Italia (WSI) is the no. 1 independent Website in Italy for economy, finance, politics and news, located in the United States (main office in Manhattan, New York), specialised in news and information for private and institutional investors that operate on the global and local markets.””26 A request was received from the Ministry of Economy and Finance to search for any useful information on this Website, in particular the identity of its real administrator, location of the Website and any personal data or address of the administrator. The reason for the request was due to the suspicion that the location of the Website and the residence of whoever managed it was not outside Italy. According to the Website, everything officially appeared to be located in the city of New York. A technical analysis of the Website was performed to verify the virtual dimensions of its links on the Internet. The survey carried out was necessary to know who was actually behind the Website. This was done because during that period, possible manipulation of the financial market through an activity of disinformation appeared to take place through the Website. The Software The Copernic Agent Professional is software that allows the user to search the Internet based on information from databases of many search engines. Additionally, once it identifies, the Website of interest, it allows for constant monitoring of the information contained therein, alerting by e-mail any information of interest that may emerge. Info-investigative analysis The analysis carried out was focused on an error made by one of the administrators who, in writing on a typical forum for Calabrian cooking recipes, used the same nickname used by the editor of the WSI Website. Subsequent in-depth verifications produced further confirmation. It was ascertained that in the same forum

26 Source Wall Street Italia, Inc.: About Us, http://www.wallstreetitalia.com/chisiamo.aspx

22 IOTA Report for Tax Administrations – Deep Web previously identified, the WSI Website administrator provided his e-mail to a user, which was the same one advertised on the WSI Website. According to conversations held under the topic, it emerged that the administrator said they visited New York only occasionally for work and that their operational base was in Italy. It is important to stress that the information present in the forum could not be obtained exclusively by a Google search, as this search engine did not list it. It was possible to obtain this information and to constantly monitor the forum only thanks to the use of the meta-search engine Copernic Agent Professional. In fact, as soon as the administrator of the WSI Website wrote in the forum, the software would send an email with the updated contents, allowing the reconstruction of the actual residence of the administrator. Technical analysis of the Website: Information regarding Website registration. The Website showed the following IP address: 62.85.170.2 (source – Netcraft). The domain was registered by the American company “Network Solutions” (website: www.networksolutions.com) and was hosted by the company Aconet S.r.l. (website: www.aconet.it), located in Via Luigi Bodio, 58 – Rome.

Figure 17. Site report for wallstreetitalia.com source: http://toolbar.netcraft.com/site_report?url=http://wallstreetitalia.com Links to other Websites: “Wallstreetitalia.com” is no. 57,572 on the list of the most visited Websites (source: www.urldogg.com). Querying the Google database, the following emerges:  there are about 80,000 web pages that indicate a reference to the Website www.wallstreetitalia.com.  The Website contains about 8,770 pages;  There are about 103 web pages that contain a link to the Website www.wallstreeitalia.com Conclusions The operational experience illustrated provides a real example of how it is possible, through an in-depth and accurate search of open source information, using advanced software such as OSINT, to identify individuals and to obtain a real profile of their life and business.

1.1.10 Italian example: Koobface27

In this study, analysis is performed on the famous OSINT Koobface botnet28.

23 IOTA Report for Tax Administrations – Deep Web

This is a real example of how it is possible, through a thorough and careful research on open source, able to identify in detail a single person responsible for a computer crime. Of course, as with other crimes not related to the Internet, a little mistake on the part of the crime under investigation is needed. Premise “Koobface” is a computer worm that targets users of the social network Facebook. After infection, users see the Internet pages of the "fake antivirus" (scam ware) or advertising. Moreover, in a later stage, “Koobface” attempts to obtain sensitive information of the victims, such as credit card numbers or login information in forums, social networks and e-mail boxes. Passwords are often replaced with "Koobface". The malware (malicious software) 29spreads through the social network, by sending messages with friend requests or views of videos such as "Look how you are funny ... here." In a second stage, it is necessary to upgrade Adobe Flash Player. By downloading this fake file, the computer of the user is infected. Analysis The analysis is based on a single mistake that the creator of the botnet made, the error is the use of his personal email for the registration of a domain "parked" within the infrastructure of “Koobface”. When building a botnet for malicious purposes, internet domains that have been previously registered are used and then abandoned. For example a particular IP number is supposed to be used in this network. It is necessary to determine its infrastructure using an online tool Robtex.

Figure 12. Network infrastructure source: http://ddanchev.blogspot.it/2012/01/whos-behind-koobface-botnet- osint.html

With this tool a set of controls with the relevant domain to be analysed or simply the IP number to check can be performed. In the case access to the Internet Protocol address 78.110.175.15 is under examination. It often happens that a particular IP number can match multiple domains, e.g. the diagram above, shows

27 Example provided by Massimo Chiappetta from Italy 28 A botnet is a collection of compromised computers, each known as a "bot", connected to the Internet. Botnets are formed when computers are targeted by code within malware (malicious software). The controller of a botnet directs these compromised computers via standards-based network protocols such as IRC (Internet Relay Chat) and HTTP (Hypertext Transfer Protocol). Source Wikipedia: Botnet, http://en.wikipedia.org/wiki/Botnet 29 Malware, short for malicious software, is software used or created by hackers to disrupt computer operation, gather sensitive information, or gain access to private computer systems. Source Wikipedia: Malware, http://en.wikipedia.org/wiki/Malicious_software

24 IOTA Report for Tax Administrations – Deep Web graphically how 5 domains can share the same IP number. Later, all shared domains of the examined IP will be analysed with the help of the online tool Wepawet. The same tool allows identification of the malicious code disguised within a specific Website which is then used for various purposes. In this case, the URL "zaebalinax.com / the / Pid = 14010” performs simple "redirection" of the user to the “Koobface” botnet. After it was confirmed that this domain was an integral part of this criminal structure it was sufficient to execute a simple query of the WhoIs databases of the domain to get the owner of the domain and email: [email protected]. In-depth The same e-mail was used to advertise the sale of Egyptian Sphinx kittens on 05.09.2007:

Figure 13. Advertisement for sale of Egyptian Sphinx kittens source: http://www.britancat.ru/brd/index.php?p=shop&start=30

Through this advert it was possible to determine a name and a phone number: Anton and +79219910190. The same telephone number was also used by another advertisement: the sale of a BMW. After further analysis, always running on open source, the real name of the creator of the botnet and other useful data, such as his real address, phone numbers and other recordings on various social networks have been recovered (Facebook, Twitter, Flickr, etc.).

3. PROBLEMS FOR TAX ADMINISTRATIONS

It is impossible to imagine modern society without the use of the Internet both for private and business purposes. This means that the Internet is an enormous source of information that can be useful for tax administrations; as it contains a lot of business information hidden from eyes of tax officials but still publicly available. The main aims of modern tax administrations are to raise compliance levels, reduce the tax gap and increase the amount of revenues to the budget; thus raising the living standards and wealth of all citizens. Tax administrations already have access to a wide range of data bases owned both by public institutions and private companies. However, this data only identifies those persons who are already within the system. The biggest interest for tax administrations are those persons who are outside the system. But how to find those who want to stay hidden from the eyes of controlling institutions? It is obvious that it is not relevant to use official databases to search for someone wanting to stay outside them, so a new source of information outside the tax administration is needed. This source is the Internet or, being more precise, the deep Web. It is vital for tax administrations to use information from the deep Web

25 IOTA Report for Tax Administrations – Deep Web as it is one of the most powerful tools for searching and identifying non-registered businesses and persons as well as for collecting information on the extent of their activities. Deep Web investigation offers new opportunities for tax administrations but also creates new problems for them as it is a difficult job getting data from the deep Web.

3.1. Need for change in the audit culture

To reach their target and get into the system those persons who are still outside it; tax administrations must change their way of thinking. There is a vital need to move from using internal to external sources of information. Staying up-to-date with the modern economic world and following market trends is a big challenge for tax administration as they must adapt its culture and organization in accordance with the new rules. Tax administrations need to understand that a factor for success and productivity of their work depends on the working methods they are using. To get good results modern technologies and modern approaches need to be used. It is impossible to stay competitive in a modern world without development and improvement. The changes in the way tax administrations are targeting their goals and in the culture of tax administrations are needed. By considering an extension of information sources to include the Internet and especially the deep Web must be followed by understanding the organizational changes necessary to work with this external information. The work needs to be done professionally, it needs to be well structured and organized as it is not enough just to find and download the data from the deep Web once. Dynamic data should be the objective for tax administrations as it contains information on the amounts of sales and the frequency of activities as well as changes in the amount of goods available in stock etc. Therefore, monitoring changes in the data available from the deep Web is an important source of external information. This clearly indicates the need for dedicated teams of specialists constantly working in this area. The following problems mentioned below are closely connected with the need for changes in the audit culture.

3.2. Volumes of information

Lack of information has been always considered a problem, but on the other hand an excess of information is a problem as well, for the processing of all this information requires a structured approach, with sufficient technical and human resources. Large amounts of information available on the deep Web could be a problem for tax administrations if there are no proper tools and qualified personnel to retrieve and process this data. Another important aspect to consider is the necessity to retrieve and process this data in a way that it can be used effectively by tax administrations, i.e. the information can be linked to existing tax administration data. As the deep Web contains an enormous amount of information it is vital for tax administration’s to define the scope of information they want to search for in the deep Web. Collecting all kinds of data from the deep Web is not relevant as it is a

26 IOTA Report for Tax Administrations – Deep Web time, labour and cost consuming process and much of the data may prove to be worthless for tax administration purposes as it cannot be linked to the tax administration’s internal data. . After information is collected from the Web it is necessary to analyse it and find ways of linking it with the tax administration’s data to make effective use of it. Therefore it is very important for a tax administration to have a unified approach for dealing with this information, including collecting it by scraping from the Web as well as further processing, analysing and combining with their own information sources. As these procedures are complicated, qualified personnel with special knowledge and expertise as well as proper equipment and IT tools are of major importance.

3.3. Qualified personnel

As already mentioned in the previous section, appropriate human resources are necessary to achieve good results from acquiring and using the information from the deep Web. Personnel involved in the area of searching, collecting and processing data from the deep Web need to have: - a good knowledge and understanding of how to search the surface web (visible web) and know the structure of the Internet to be able to perform complicated searches to acquire deep Web information; - an understanding of the deep Web, the information available, and the way it is structured and stored As most of tax administration employees, especially those working in the tax control field are more tax oriented and don’t necessarily have a deep technical or IT knowledge, tax administrations could be faced with a lack of qualified personnel to be involved in deep Web investigation work.

3.4. Proper equipment and Internet access

To be effective in deep Web investigation employees must be provided with the appropriate equipment to do the work. Many tax administrations only provide limited access to the Internet within their IT networks for their employees because of security reasons. To ensure the efficiency of the deep Web investigation process; people involved in it need to be equipped with stand-alone computers with unlimited Internet access so they have the ability to surf and search the Web for relevant information without any limitations. This may of course be a problem for some tax administrations because of limited financial resources that could restrict investment in the appropriate equipment.

3.5. Working on a market without borders

The Internet is providing for businesses an environment without any physical borders that makes it easy for them to be seen on the international market and to trade in any country of the world. However, the jurisdiction of tax administrations is still bound by the legislation within the geographical boundaries of their country. This gives tax administrations another problem. With more businesses going

27 IOTA Report for Tax Administrations – Deep Web international, tax administrations need to extend and improve their international cooperation and exchange of information to be able to follow the economic activities of the taxpayers

4. RECOMMENDATIONS

The Internet is a complex technical environment. However, almost everyone can use the Internet if they only look at the more frequently used functions. When we need to go beneath the surface, it becomes much more complicated, especially if there is a need to access information hidden in the deep Web. The online information is expanding exponentially and a lot of important information about companies and individuals is located outside many tax administrations databases. All tax administrations must consider how to organize the collection and use of this external information. The task Team has the following recommendations:

4.1. Stand alone equipment & Internet access

While carrying out investigations on the Web, in particular on the deep Web, tax auditors need the ability to hide the fact that they are representatives of a tax administration, because in this virtual world as soon as user is identified, evidence could be suppressed or another distorted view could be provided to the IP address that belonged to the tax administration. Additionally, during investigations on the Web the computers used are subject to viruses, trojans, etc. There is a need to be able to bypass the security restrictions of an internal network to be able to gain access to certain sites that may be restricted, for example gambling site, social network, etc. For those reason it is crucial to have stand alone equipment with separate Internet access, to be able to have the same view as a lambda anonym Web user and to be able to see everything.

4.2. Working with external information

Using deep Web investigation is ultimately working with external information. For a question of equality of treatment it is very important to have guidelines on treatment of the information collected on the deep Web. For example, collecting information from only one specific platform generates a distortion of concurrence with other platforms. Another important consequence is the fact that there is a lot of wrong information on the Web and that data should be used carefully by tax administration that need to have rules on how to treat it.

4.3. Monitoring

Searching for information on a case by case basis on the deep Web tax administrations will miss a lot of valuable information, because information on the deep Web is dynamic and frequently offers a picture of current transactions or discussions. Monitoring offers the opportunity to acquire the whole history. For example, on an auction site what is interesting is not really what a dealer has to sell now, but what he has sold in the past. Monitoring social media is interesting in order to identify trends on specific subjects, but to be effective it requires monitoring over a period of time.

28 IOTA Report for Tax Administrations – Deep Web

4.4. Dedicated Specialist Teams

Within a tax administration there needs to be a group of specialists who have an extensive knowledge about the Internet and the technology behind the Internet. These specialists also need to be equipped with the appropriate tools (software) to allow them to do an effective job. It is important to realise that the IT area, including the Internet, is undergoing constant development and reorganisation. One tool is capable of handling and structuring more information than ten or a hundred people could achieve. It is important to have both skilled personnel and skilled tools.

4.5. Right tools for the Right Job:

Living in a modern world with rapidly changing technologies requires the use of appropriate tools that will allow tax administrations to reach a desirable result in the most efficient and optimal way. Having highly qualified employees does not negate the need for proper tools. Therefore, tax administrations need to consider, as part of their strategic objectives, ways of using information collected from deep Web investigation and should also consider acquiring the appropriate tools to support the work in this field. Different types of information found on the Web require different tools to access it. Therefore tax administration need to have at their disposal information about the tools that can be used for deep Web investigation and have an understanding on their main functions.

4.6. Training of auditors and investigators

All auditors and investigators must have at least the basic skill to access information from the Web in order to do a proper job. Tax administrations should not be examining what has already been accounted for and reported. Instead tax administrations should focus their efforts on finding what has not been accounted for and not reported. It is on the Internet – both in the surface Web and in the deep Web, tax administrations can find vital external information about companies and individuals that they control or intend to control. The combination of fast globalization and fast development of cloud services will result in a rapid increase of micro multinational companies. Information about micro multinationals can always be found on the Internet, but not necessarily within the information tax administration have in their databases. If the ultimate conclusion is that all or most auditors and investigators need to be trained to use the Internet in an efficient manner for their work – the issue of education and training will arise. If tax administrations build a unit of dedicated IT/Internet specialists they will automatically set down a good foundation to build an appropriate training and educational program. Technology has made it easy to produce interactive training that can be developed rapidly and used widely within an organization.

4.7. Links between operational and technical centre/e-commerce specialists

Today almost every company is more or less published on the Web. There is frequently a lot of (or at least some) information available about individuals,

29 IOTA Report for Tax Administrations – Deep Web especially from social networks. No matter what control project is being planned, a technical centre can provide a lot of information about the subject and about those who are active in the area. A good policy could be that every planned control project would be reported to the technical centre in order for them to provide feedback about the companies and individuals that were active in that specific control area. This would improve the quality of the project. It is also vital that there is a clear path for feedback from auditors working in the field to the technical centre. It is the cooperation and understanding between management, investigators and specialists working in the technical centres that will produce the best results.

4.8. Cooperation between Tax Agencies

Many IOTA tax administrations have formed technology centres or Internet service centres in order to solve many of the issues that are Internet related. In the Task Team’s opinion, it is a step in the right direction to set up this type of specialist teams but, in order to exploit these investments even more it is important to increase international cooperation between these technical centres. It is essential that new experiences and new tools can spread quickly between all technical centres. When dealing with information from the Internet it is a question of dealing with public information and not in any way confidential information. Therefore, it should be simple and straightforward to collaborate and disseminate information between administrations.

30