Master Thesis

Total Page:16

File Type:pdf, Size:1020Kb

Master Thesis MASTER THESIS Bc. Michal Hušek Text mining in social network analysis Department of Theoretical Computer Science and Mathematical Logic Supervisor of the master thesis: doc. RNDr. Iveta Mrázová, CSc. Study programme: Informatics Specialization: Software Systems Prague 2018 2 3 I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act. In…...... date............ signature 4 Title: Text mining in social network analysis Author: Bc. Michal Hušek Department: Department of Theoretical Computer Science and Mathematical Logic Supervisor: doc. RNDr. Iveta Mrázová, CSc., Department of Theoretical Computer Science and Mathematical Logic Abstract: Nowadays, social networks represent one of the most important sources of valuable information. This work focuses on mining the data provided by social networks. Multiple data mining techniques are discussed and analysed in this work, namely, clustering, neural networks, ranking algorithms and histogram statistics. Most of the mentioned algorithms have been implemented and tested on real-world social network data and the obtained results have been mutually compared against each other whenever it made sense. For computationally demanding tasks, graphic processing units have been used in order to speed up calculations for vast amounts of data, e.g., during clustering. The performed tests have confirmed lower time requirements. All the performed analyses are, however, independent of the actually involved type of social network. Keywords: data mining, social networks, clustering, neural networks, ranking algorithms, CUDA 5 Názov: Text mining in social network analysis Autor: Bc. Michal Hušek Katedra: Katedra teoretické informatiky a matematické logiky Vedúca práce: doc. RNDr. Iveta Mrázová, CSc., Katedra teoretické informatiky a matematické logiky Abstrakt: Sociálne siete sú v súčastnosti veľmi hodnotným zdrojom informácií. Táto práca sa zameriava na dolovanie dát pochádzajúcich zo sociálnych sietí. Pojednáva a analyzuje rôzne techniky dolovania dát, konkrétne klastrovanie, neurónové siete, hodnotiace algoritmy a histogramovú štatistiku. Väčšina zmienených algoritmov bola implementovaná a testovaná s dátami z reálnej sociálnej siete. V prípade, že to bolo zmyslupné, výsledky boli vzájomne porovnané. Pre výpočtovo náročné úlohy, konkrétne klastrovanie, boli použité grafické procesory na ich zrýchlenie. Testy takto upravených programov potvrdili nižšie časové nároky. Všetky vykonané analýzy sú však nezávislé na konkrétnej použitej sociálnej sieti. Kľúčové slová: dolovanie dát, sociálne siete, klastrovanie, neurónové siete, hodnotiace algoritmy, CUDA 6 Acknowledgements In the first place, I would like to thank my supervisor doc. RNDr. Iveta Mrázová, CSc for all the help and time spend with my work. I would like to especially thank her for her patience with my not always sufficient progress and my very limited time schedule. In the second place, I would like to thank my faculty for providing me with the necessary software and hardware. Without them, it would be difficult to finish this work. 7 Table of Contents 1. INTRODUCTION ........................................................................................................... 11 1.1 MOTIVATION ............................................................................................................... 11 1.2 METHODOLOGY ............................................................................................................... 12 1.3 THESIS STRUCTURE ............................................................................................................ 12 1.4 OBJECTIVES OF THIS WORK ................................................................................................. 13 2. K-MEANS ALGORITHM ................................................................................................ 14 2.1 MOTIVATION ................................................................................................................... 14 2.2 DESCRIPTION OF THE ALGORITHM ........................................................................................ 14 2.3 CALCULATION OF THE CENTROID .......................................................................................... 15 2.4 CONCLUSION ................................................................................................................... 16 3. ENRON EMAIL DATA SET ............................................................................................. 17 3.1 MOTIVATION ................................................................................................................... 17 3.2 ENRON COMPANY ............................................................................................................. 17 3.3 ENRON EMAIL CORPUS ....................................................................................................... 18 3.4 THE FOLDER STRUCTURE OF ENRON EMAIL CORPUS ................................................................ 18 3.5 THE STRUCTURE OF THE EMAIL MESSAGE .............................................................................. 19 3.6 ENRON EMAIL CORPUS AS A SOCIAL NETWORK ....................................................................... 21 3.7 EMAILS CONTAINING POSSIBLY IMPORTANT INFORMATION ....................................................... 22 3.8 CONCLUSION ................................................................................................................... 23 4. EMAIL - BASED CLUSTERING OF THE ENRON DATA SET ................................................ 24 4.1 MOTIVATION ................................................................................................................... 24 4.2 VECTOR SPACE MODEL ....................................................................................................... 24 4.3 DESCRIPTION OF THE USED APPROACH .................................................................................. 25 4.4 TEST OF THE CLUSTERING ON A SMALL SUBSET OF THE INPUT DATA ............................................ 29 4.5 RESULTS OF THE CLUSTERING USING A SUBSET OF THE INPUT DATA ............................................ 29 4.6 CONCLUSION ................................................................................................................... 37 5. SPEEDING UP THE COMPUTATIONS USING GPUS ........................................................ 38 5.1 MOTIVATION ................................................................................................................... 38 5.2. GPU COMPUTING ............................................................................................................ 38 5.3. NVIDIA CUDA ............................................................................................................... 39 5.4 TASKS SUITABLE FOR GPU CALCULATIONS ............................................................................. 39 5.5. SUMMARY ...................................................................................................................... 40 6 NVIDIA CUDA BASED CLUSTERING ............................................................................... 41 6.1. MOTIVATION .................................................................................................................. 41 6.2 DESCRIPTION OF THE USED APPROACH .................................................................................. 41 6.3 CLUSTERING TESTS WITH THE FIRST VERSION OF CUDA BASED PROGRAM ................................... 43 6.4 CUDA CLUSTERING WITH PARALLEL CALCULATION OF DISTANCES TO CENTROIDS .......................... 44 6.5 MODIFIED CUDA KERNEL .................................................................................................. 46 6.6 SPEED COMPARISON ......................................................................................................... 48 6.7 CONCLUSION ................................................................................................................... 49 7. PERSON - BASED CLUSTERING OF THE ENRON DATA SET ............................................. 51 8 7.1 MOTIVATION ................................................................................................................... 51 7.2 THE IDEA OF THE FIRST PERSON-BASED CLUSTERING EXPERIMENT .............................................. 51 7.3 DESCRIPTION OF THE USED APPROACH IN THE FIRST PERSON-BASED CLUSTERING EXPERIMENT. ....... 51 7.4 RETHINKING THE IDEAS OF PERSON BASED CLUSTERING ........................................................... 54 7.5 PERSON- COMMUNICATION BASED CLUSTERING ..................................................................... 54 7.6 TIME BASED PERSON CLUSTERING ........................................................................................ 58 7.7 CONCLUSION ................................................................................................................... 60 8. HISTOGRAM STATISTICS .............................................................................................. 62 8.1 MOTIVATION ..................................................................................................................
Recommended publications
  • Essays on Financial Communication in Earnings Conference Calls
    Essays on Financial Communication in Earnings Conference Calls Xiaoxi Wu This dissertation is submitted for the degree of Doctor of Philosophy September 2019 Department of Accounting and Finance Abstract Earnings conference calls are an important platform of financial communication. They provide researchers with unique opportunities to observe firm managers’ and financial analysts’ interactions and natural communication style in a daily-task environment. Relying on multidisciplinary theories and methods, this dissertation studies financial communication in conference calls from both the managers’ and the sell-side analysts’ perspectives. It consists of three self-contained studies. Chapter 2 focuses on managers’ communication strategies in conference calls. It explores, in the small non-negative earnings surprises setting, whether non-manipulators design communication strategies to separate themselves from earnings manipulators, and whether manipulators pool through obfuscation. Chapters 3 and 4 focus on sell-side analysts’ communication behaviour in conference calls. Chapter 3 examines how analysts’ people skills affect their communication behaviour and relationships with firm management. Chapter 4 applies both qualitative and quantitative discourse analyses and investigates how analysts use linguistic politeness strategies to establish socially desirable identities in publicly accessible analyst-manager interactions. The three studies combined contribute to the accounting literature by furthering our understanding of managers’ and analysts’
    [Show full text]
  • A Case of Corporate Deceit: the Enron Way / 18 (7) 3-38
    NEGOTIUM Revista Científica Electrónica Ciencias Gerenciales / Scientific e-journal of Management Science PPX 200502ZU1950/ ISSN 1856-1810 / By Fundación Unamuno / Venezuela / REDALYC, LATINDEX, CLASE, REVENCIT, IN-COM UAB, SERBILUZ / IBT-CCG UNAM, DIALNET, DOAJ, www.jinfo.lub.lu.se Yokohama National University Library / www.scu.edu.au / Google Scholar www.blackboard.ccn.ac.uk / www.rzblx1.uni-regensburg.de / www.bib.umontreal.ca / [+++] Cita / Citation: Amol Gore, Guruprasad Murthy (2011) A CASE OF CORPORATE DECEIT: THE ENRON WAY /www.revistanegotium.org.ve 18 (7) 3-38 A CASE OF CORPORATE DECEIT: THE ENRON WAY EL CASO ENRON. Amol Gore (1) and Guruprasad Murthy (2) VN BRIMS Institute of Research and Management Studies, India Abstract This case documents the evolution of ‘fraud culture’ at Enron Corporation and vividly explicates the downfall of this giant organization that has become a synonym for corporate deceit. The objectives of this case are to illustrate the impact of culture on established, rational management control procedures and emphasize the importance of resolute moral leadership as a crucial qualification for board membership in corporations that shape the society and affect the lives of millions of people. The data collection for this case has included various sources such as key electronic databases as well as secondary data available in the public domain. The case is prepared as an academic or teaching purpose case study that can be utilized to demonstrate the manner in which corruption creeps into an ambitious organization and paralyses the proven management control systems. Since the topic of corporate practices and fraud management is inherently interdisciplinary, the case would benefit candidates of many courses including Operations Management, Strategic Management, Accounting, Business Ethics and Corporate Law.
    [Show full text]
  • Conspiracy of Fools”
    Submitted version of review published in GARP Risk Review Review of “Conspiracy of Fools” Joe Pimbley Kurt Eichenwald’s Conspiracy of Fools (Broadway Books, 2005) is a spellbinding account of the rise and fall of Enron. In nearly 700 pages the reader finds answers to “what happened?” and “how did it happen?” Based on retrospective interviews with more than a hundred primary and secondary actors in this drama, the author creates multiple, parallel story lines. He jumps back and forth between these sub-plots in a manner that maintains energy and gives the reader many natural stopping points. The great strengths of Conspiracy are that it’s thorough, extremely well- written, captivating, and, finally, it rings true. The author avoids the easy, simple conclusions that all the executives are “guilty” of crimes or plain greed and that the media-lionized whistle-blower is pure of heart. We see the ultimate outcome as personal tragedies for Jeff Skilling (President) and Ken Lay (CEO) even though they are undeniably culpable. Culpability and guilt are not synonymous, however, and different readers will have widely different judgments to render on these two men. The view of Andrew Fastow is not so murky. He and a handful of his associates did indeed lie, cheat, and steal for personal gain. Fastow’s principle “contribution” to Enron was the creation of structured finance transactions to skirt accounting rules. This one-sentence description doesn’t tell the reader much. Eichenwald gives many examples to flesh out the concept. The story of “Alpine Investors” provides the simplest case. The company wished to sell the Zond Corporation, a wind-farm operator, prior to the closing of Enron’s purchase of Portland General.
    [Show full text]
  • The Enron Failure and the State of Corporate Disclosure George Benston, Michael Bromwich, Robert E
    Following the Moneythe The Enron Failure and the State of Corporate Disclosure George Benston, Michael Bromwich, Robert E. Litan, and Alfred Wagenhofer AEI-Brookings Joint Center for Regulatory Studies 00-0890-FM 1/30/03 9:33 AM Page i Following the Money 00-0890-FM 1/30/03 9:33 AM Page iii Following the Money The Enron Failure and the State of Corporate Disclosure George Benston Michael Bromwich Robert E. Litan Alfred Wagenhofer - Washington, D.C. 00-0890-FM 1/30/03 9:33 AM Page iv Copyright © 2003 by AEI-Brookings Joint Center for Regulatory Studies, the American Enterprise Institute for Public Policy Research, Washington, D.C., and the Brookings Institution, Washington, D.C. All rights reserved. No part of this publication may be used or reproduced in any manner whatsoever without per- mission in writing from the AEI-Brookings Joint Center, except in the case of brief quotations embodied in news articles, critical articles, or reviews. Following the Money may be ordered from: Brookings Institution Press 1775 Massachusetts Avenue, N.W. Washington, D.C. 20036 Tel.: (800) 275-1447 or (202) 797-6258 Fax: (202) 797-6004 www.brookings.edu Library of Congress Cataloging-in-Publication data Following the money : the Enron failure and the state of corporate disclosure / George Benston . [et al.]. p. cm. Includes bibliographical references and index. ISBN 0-8157-0890-4 (cloth : alk. paper) 1. Disclosure in accounting—United States. 2. Corporations—United States—Accounting. 3. Corporations—United States—Auditing. 4. Accounting—Standards—United States. 5. Financial statements—United States. 6. Capital market—United States.
    [Show full text]
  • The Book of Broken Promises:$400 Billion Broadband Scandal
    THE BOOK OF BROKEN PROMISES: $400 BILLION BROADBAND SCANDAL & FREE THE NET FOR ERIC LEE, AUNT ETHEL, ARNKUSH, AND THE TEAM Author: Bruce Kushnick, Executive Director New Networks Institute February, 2015 Cover Art: Ferrari Wall Paper1, Broken Skateboard by Pr0totyp2 Disclaimer: AT&T, Verizon and CenturyLink are the progeny of the original AT&T. The AT&T logo is the property of AT&T Inc. and the use has not been authorized, sponsored by, or endorsed by the trademark owner. The Verizon logo is the property of Verizon Communications, Inc, and the use has not been authorized, sponsored by, or endorsed by the trademark owner. The CenturyLink logo is the property of CenturyLink, and the use has not been authorized, sponsored by, or endorsed by the trademark owner. All rights reserved. This book has been prepared by New Networks Institute. All rights reserved. Reproduction or further distribution of this report without written authorization is prohibited by law. For additional copies or information please contact [email protected]. © 1997, 2004, 2015 New Networks Institute The Book of Broken Promises 1 What others have said about Bruce Kushnick’s research and previous books: 3 David Cay Johnston, Recipient of the Pulitzer Prize, Author of The Fine Print, 2012 “Kushnick’s estimate comes from his meticulous analysis of disclosure document filed with the Securities and Exchange Commission and other regulatory agencies… Kushnick’s estimate might significantly understate how much extra money people paid for an electronic highway they did not get. It seems very likely that Kushnick’s numbers are uncomfortably close to the truth.” Dr.
    [Show full text]
  • Online Peer-To-Peer Payments: Paypal Primes the Pump, Will Banks Follow Carl Kaminski
    NORTH CAROLINA BANKING INSTITUTE Volume 7 | Issue 1 Article 20 2003 Online Peer-to-Peer Payments: PayPal Primes the Pump, Will Banks Follow Carl Kaminski Follow this and additional works at: http://scholarship.law.unc.edu/ncbi Part of the Banking and Finance Law Commons Recommended Citation Carl Kaminski, Online Peer-to-Peer Payments: PayPal Primes the Pump, Will Banks Follow, 7 N.C. Banking Inst. 375 (2003). Available at: http://scholarship.law.unc.edu/ncbi/vol7/iss1/20 This Comments is brought to you for free and open access by Carolina Law Scholarship Repository. It has been accepted for inclusion in North Carolina Banking Institute by an authorized administrator of Carolina Law Scholarship Repository. For more information, please contact [email protected]. Online Peer-to-Peer Payments: PayPal Primes the Pump, Will Banks Follow? As more businesses and individuals turn to the Internet to buy and sell goods, new peer-to-peer payment systems have developed to make these transactions possible.' A peer-to-peer payment system allows one person or entity to transfer money to another.2 The most common of these payment systems are checks and credit cards, but the growth of Internet commerce and the unique demands of the online marketplace have spurred the development of new Internet payment systems.3 While credit cards are useful for making purchases from online merchants, individuals and many small businesses cannot accept credit card payments.4 Checks are not useful in the online market place where buyers and sellers are often unable to determine the reliability or even the identity of each other.5 Making payments by check often causes delays as shipments are held up until a check clears.6 Therefore, a market for Internet peer-to-peer payment systems that are convenient, fast, reliable and safe has emerged Online auctions, in particular the online auction giant eBay, Inc.
    [Show full text]
  • Letter from Berlin
    LETTER FROM BERLIN THE MAGAZINE OF INTERNATIONAL ECONOMIC POLICY 220 I Street, N.E., Suite 200 Washington, D.C. 20002 202-861-0791 www.international-economy.com Germany’s Wirecard Scandal [email protected] The largest accounting fraud in the country’s postwar history. By Klaus C. Engelen hile all of Europe broke. When Wirecard took the place of the world electronic payment trans- is still in the grip of Commerzbank AG in the DAX in action services as well as the issuing of of the worst pan- September 2018, the fintech’s shares physical cards. According to its promo- demic in a century, were worth about €20 billion. tion material, Wirecard authorized and Germany’s po- Its Austrian CEO Markus Braun, processed payments for about 280,000 Wlitical and financial establishments are who owned 7 percent of Wirecard, merchants, issued credit and prepaid also haunted by the Wirecard AG scan- was a billionaire. Now Braun (51) is in cards, and provided technology for dal. It is turning out to be the largest detention awaiting trial with two other contactless smartphone payments. case of accounting fraud in the coun- company executives. His second-in- Clients included German discounters try’s post-war history. The sad story is command Jan Marsalek (40), also Aldi and Lidl as well as nearly one that most of the political and financial Austrian, who was in charge of the hundred airlines. Since January 2006, establishments at all levels aided and company’s Asian business, has van- the group included a bank with a full abetted the mega-fraud.
    [Show full text]
  • Enron Scandal: the Fall of a Wall Street Darling
    Enron Scandal: The Fall of a Wall Street Darling The story of Enron Corp. is the story of a company that reached dramatic heights, only to face a dizzying fall. Its collapse affected thousands of employees and shook Wall Street to its core. At Enron's peak, its shares were worth $90.75; when it declared bankruptcy on December 2, 2001, they were trading at $0.26. To this day, many wonder how such a powerful business, at the time one of the largest companies in the U.S, disintegrated almost overnight and how it managed to fool the regulators with fake holdings and off-the-books accounting for so long. Enron's Energy Origins Enron was formed in 1985, following a merger between Houston Natural Gas Co. and Omaha-based InterNorth Inc. Following the merger, Kenneth Lay, who had been the chief executive officer (CEO) of Houston Natural Gas, became Enron's CEO and chairman and quickly rebranded Enron into an energy trader and supplier. Deregulation of the energy markets allowed companies to place bets on future prices, and Enron was poised to take advantage. In 1990, Lay created the Enron Finance Corp. To head it, he appointed Jeffrey Skilling, whose work as a McKinsey & Co consultant had impressed Lay. Skilling was at the time one of the youngest partners at McKinsey. Why Enron Collapsed Skilling joined Enron at an auspicious time. The era's regulatory environment allowed Enron to flourish. At the end of the 1990s, the dot-com bubble was in full swing, and the Nasdaq hit 5,000.
    [Show full text]
  • SOX 404 Dashboard Year 6 Update
    SOX 404 Dashboard Year 6 Update October 2010 Mark Cheffers, CPA, ABV, CEO [email protected] 508.476.7007 x223 Don Whalen, Esq. Research Director [email protected] 508.476.7007 x222 Maggie Thrun, Research Analyst [email protected] 508.476.7007 x236 ® Audit Analytics October 2010 Table of Contents Page A. Summary x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 1 B. Introduction x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 2 C. SOX 404 Requirement History: The Staggered and Two-Tiered Implementation of SOX 404 x 2 D. Executive Summary – Year 6 Section SOX 404 Update x x x x x x x x x x x x x x x 3 E. SOX 404 Year 5 Tables x x x x x x x x x x x x x x x x x x x x x x x x x x x 7 F. SOX 404 Year 6 Tables (Partial Year Data)x x x x x x x x x x x x x x x x x x x x 20 G. Definitions for the Internal Control Issuesx x x x x x x x x x x x x x x x x x x x x 33 H. Definitions for the GAAP/Accounting Areas of Failure x x x x x x x x x x x x x x x x 35 I.
    [Show full text]
  • Ebay Toolkit
    Screw-PayPal.com Toolkit Pro 2014 ALL The Secret Information That eBay & PayPal Doesn't Want You To Know! Disclaimer The materials contained on this website are provided for general information purposes only and do not constitute legal or other professional advice on any subject matter. Our ebook does not accept any responsibility for any loss which may arise from reliance on information contained on this site. Permission is given for the downloading and temporary storage of one or more of these pages for the purpose of viewing on a personal computer. The contents of this ebook are protected by copyright under international conventions and, apart from the permission stated, the reproduction, permanent storage, or retransmission of the contents of this site is prohibited without the prior written consent of screw-paypal.com. Some links within this website may lead to other websites, including those operated and maintained by third parties. Our ebook includes these links solely as a convenience to you, and the presence of such a link does not imply a responsibility for the linked site or an endorsement of the linked site, its operator, or its contents. This ebook and its contents are provided "AS IS" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. While the information contained within the ebook is periodically updated, no guarantee is given that the information provided in this website is correct, complete, and/or up-to-date. Reproduction, distribution, republication, and/or retransmission of material contained within this ebook are prohibited unless the prior written permission of screw-paypal.com of Toolkit Pro 2014 has been obtained.
    [Show full text]
  • No. 06-20856 UNITED STATES COURT of APPEALS for the FIFTH CIRCUIT in Re ENRON CORPORATION SECURITIES, DERIVATIVE & “ERISA” LITIGATION
    No. 06-20856 UNITED STATES COURT OF APPEALS FOR THE FIFTH CIRCUIT In re ENRON CORPORATION SECURITIES, DERIVATIVE & “ERISA” LITIGATION REGENTS OF THE UNIVERSITY OF CALIFORNIA, et al., Plaintiffs-Appellees, vs. CREDIT SUISSE FIRST BOSTON (USA), INC., et al., Defendants-Appellants. On Appeal from the Southern District of Texas Newby v. Enron Corp., No. H-01-CV-3624 The Honorable Melinda Harmon APPELLEES’ BRIEF RESPONDING TO MERRILL LYNCH AND CREDIT SUISSE APPEALS LERACH COUGHLIN STOIA GELLER LERACH COUGHLIN STOIA GELLER RUDMAN & ROBBINS LLP RUDMAN & ROBBINS LLP WILLIAM S. LERACH PATRICK J. COUGHLIN DARREN J. ROBBINS 100 Pine Street, Suite 2600 HELEN J. HODGES San Francisco, CA 94111 G. PAUL HOWES Telephone: 415/288-4545 ERIC ALAN ISAACSON SPENCER A. BURKHOLZ JAMES I. JACONETTE ANNE L. BOX 655 West Broadway, Suite 1900 San Diego, CA 92101 Telephone: 619/231-1058 Lead Counsel for Appellees and for Lead Plaintiff The Regents of the University of California [Additional counsel appear on signature page.] Regents of the University of California, et al., Plaintiffs/Appellees v. Credit Suisse First Boston (USA), Inc., et al., Defendants/Appellants Fifth Circuit No. 06-20856 CERTIFICATE OF INTERESTED PERSONS/ CORPORATE DISCLOSURE STATEMENT The undersigned counsel of record certifies that the following listed persons and entities as described in the fourth sentence of Rule 28.2.1 have an interest in the outcome of this case. These representations are made in order that the judges of this Court may evaluate possible disqualification or recusal. Plaintiffs: Counsel for Plaintiffs: Lead Plaintiff: William S. Lerach, Regents of the University of California Attorney-in-Charge Darren J.
    [Show full text]
  • The Enron Corpus: Where the Email Bodies Are Buried?
    The Enron Corpus: Where the Email Bodies are Buried? Dr. David Noever Sr. Technical Fellow, PeopleTec, Inc. www.peopletec.com 4901-D Corporate Drive Huntsville, AL 35805 USA [email protected] Abstract To probe the largest public-domain email database for indicators of fraud, we apply machine learning and accomplish four investigative tasks. First, we identify persons of interest (POI), using financial records and email, and report a peak accuracy of 95.7%. Secondly, we find any publicly exposed personally identifiable information (PII) and discover 50,000 previously unreported instances. Thirdly, we automatically flag legally responsive emails as scored by human experts in the California electricity blackout lawsuit, and find a peak 99% accuracy. Finally, we track three years of primary topics and sentiment across over 10,000 unique people before, during and after the onset of the corporate crisis. Where possible, we compare accuracy against execution times for 51 algorithms and report human-interpretable business rules that can scale to vast datasets. Introduction The 2002 Enron fraud case uncovered financial deception in the world’s largest energy trading company and at the time, triggered the largest US bankruptcy and its most massive audit failure [1]. For the previous six years, Fortune magazine had named Enron “America’s most innovative company.” By 1999, Enron was brokering a quarter of all electricity and natural gas deals [1]. Piggybacking on the internet bubble, Enron devised methods to sell everything and own nothing. Problematically, the company could assign its own revenue (mark-to-market) and then bury its losses using off-book debt in shell companies (or Special Purpose Entities called Raptor and Chewco).
    [Show full text]