Deep Text Is an Approach to Text Analytics That Adds Depth and Intelligence to Our Ability to Utilize a Growing Mass of Unstructured Text
Total Page:16
File Type:pdf, Size:1020Kb
DEEP “One of the most thorough walkthroughs of text analytics ever provided.” —Jeff Catlin, CEO, Lexalytics TEXT Deep text is an approach to text analytics that adds depth and intelligence to our ability to utilize a growing mass of unstructured text. In this book, author Tom Reamy explains what deep text is and surveys its many uses and benefits. DEEP Reamy describes applications and development best practices, discusses business issues including ROI, provides how-to advice and instruction, and offers Data to Big Big(ger) Text and Add Media, Social From Get RealValue Overload, to Conquer Information USING TEXT ANALYTICS guidance on selecting software and building a text analytics capability within an organization. Whether you need to harness a flood of social media content or turn a mountain of business information into an organized and useful asset, Deep Text will supply the TEXTUSING TEXT ANALYTICS insights and examples you’ll need to do it effectively. to Conquer Information Overload, Get Real Value From Social Media, “Sheds light on all facets of text analytics. Comprehensive, entertaining and Add Big(ger) Text to Big Data and enlightening.” —Fiona R. McNeill, Global Product Marketing Manager, SAS “Reamy takes the text analytics bull by the horns and gives it the time and exposure it deserves.” —Bryan Bell, VP of Global Marketing, Expert System REAMY “I highly recommend Deep Text as required reading for those whose work involves any form of unstructured content.” —Jim Wessely, President, Advanced Document Sciences $59.50 TOM REAMY DEEP TEXT Praise for Deep Text “Remarkably useful—a must-read for anyone trying to understand text analytics and how to apply it in the real world.” —Jeff Fried, CTO, BA Insight “A much-needed publication around the largely misunderstood field of text analytics … I highly recommend Deep Text as required reading for those whose work involves any form of unstructured content.” —Jim Wessely, President, Advanced Document Sciences “Sheds light on all facets of text analytics. Comprehensive, entertaining, and enlightening. Reamy brings together the philosophy, value, and virtue of harnessing text data, making this volume a welcome addition to any professional’s library.” —Fiona R. McNeill, Global Product Marketing Manager, Cloud & Platform Technology, SAS “One of the most thorough walkthroughs of text analytics ever provided.” —Jeff Catlin, CEO Lexalytics “Reamy takes the text analytics bull by the horns and gives it the time and exposure it deserves. A detailed explanation of the complex challenges, the various industry approaches, and the variety of options available to move forward.” —Bryan Bell, Executive VP, Market Development, Expert System “Written in a breezy style, Deep Text is filled with advice on the role of text analytics within the enterprise, from information architecture to inter- face. Practitioners differ on questions of learning systems vs. rule-based systems, types of categorizers, or the need for relationship extraction, but the precepts in Tom Reamy’s book are worth exploring regardless of your philosophical bent.” —Sue Feldman, CEO, Synthexis “A must read for anybody in the text analytics business. A lifetime’s worth of knowledge and experience bottled up for us to drink at our own pace. Enjoy!” —Jeremy Bentley, CEO, Smartlogic DEEP TEXT USING TEXT ANALYTICS to Conquer Information Overload, Get Real Value From Social Media, and Add Big(ger) Text to Big Data TOM REAMY Medford, New Jersey First Printing Deep Text: Using Text Analytics to Conquer Information Overload, Get Real Value From Social Media, and Add Big(ger) Text to Big Data Copyright © 2016 by Tom Reamy All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means, including information storage and retrieval systems, without permission in writing from the publisher, except by a reviewer, who may quote brief passages in a review. Published by Information Today, Inc., 143 Old Marlton Pike, Medford, New Jersey 08055. Publisher’s Note: The author and publisher have taken care in the preparation of this book but make no expressed or implied warranty of any kind and assume no respon- sibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. Many of the designations used by manufacturers and sellers to distinguish their prod- ucts are claimed as trademarks. Where those designations appear in this book and Information Today, Inc. was aware of a trademark claim, the designations have been printed with initial capital letters. Library of Congress Cataloging-in-Publication Data Names: Reamy, Tom. Title: Deep text : using text analytics to conquer information overload, get real value from social media, and add big(ger) text to big data / by Tom Reamy. Description: Medford, New Jersey : Information Today, Inc., 2016. Identifiers: LCCN 2016010975 | ISBN 9781573875295 Subjects: LCSH: Data mining. | Text processing (Computer science) | Text files—Analysis. | Big data. | Social media. | Electronic data processing. Classification: LCC QA76.9.D343 R422 2016 | DDC 006.3/12—dc23 LC record available at https://lccn.loc.gov/2016010975 Printed and bound in the United States of America President and CEO: Thomas H. Hogan, Sr. Editor-in-Chief and Publisher: John B. Bryans Project Editor: Alison Lorraine Production Manager: Tiffany Chamenko Production Coordinator: Johanna Hiegl Marketing Coordinator: Rob Colding Indexer: Nan Badgett Interior Design by Amnet Systems Cover Design by Denise Erickson infotoday.com Contents Tables and Figures x Foreword, by Patrick Lambe xiii Acknowledgments xv Introduction 1 Baffling, Isn’t It? ...........................................2 Text Analytics and Text Mining ...............................2 What Is Text Analytics? ......................................5 Is Deep Learning the Answer? . 6 Text Analytics, Deep Text, and Context .........................7 Who Am I? ...............................................8 Plan of the Book ..........................................10 PART 1 TEXT ANALYTICS BASICS Chapter 1 What Is Text Analytics? 21 And Why Should You Care?..................................21 What Is Text Analytics? .....................................21 Text Analytics Applications ..................................32 Chapter 2 Text Analytics Functionality 37 Text Mining ..............................................38 Entity (etc.) Extraction .....................................41 v vi Deep Text Summarization ............................................46 Sentiment Analysis // Social Media Analysis . 48 Auto-Categorization / Auto-Classification . 50 Supplemental Auto-Functions ................................56 Chapter 3 The Business Value of Text Analytics 57 The Basic Business Logic of Text Analytics .....................58 Benefits in Three Major Areas ................................59 Why Isn’t Everyone Doing It? ...............................70 Selling the Benefits of Text Analytics ..........................73 Summary ................................................77 PART 2 GETTING StartED IN TEXT ANALYTICS Chapter 4 Current State of Text Analytics Software 83 A Short (Personal) History of Text Analytics . 83 Current State of Text Analytics ...............................87 Current Trends in Text Analytics .............................101 Text Analytics Arrives .....................................106 Chapter 5 Text Analytics Smart Start 107 Getting Started with Text Analytics ..........................107 Smart Start Methodology ..................................111 Design of the Text Analytics Team ...........................116 Summary ...............................................119 Chapter 6 Text Analytics Software Evaluation 123 The Software Evaluation Process ............................123 Text Analytics Evaluation: Stage One .........................126 Text Analytics Evaluation: Stage Two—Proof of Concept (POC) ..........................................131 Example Proof of Concept / Pilot Projects .....................137 General POC / Pilot Observations and Issues ...................143 Summary ...............................................146 Contents vii PART 3 TEXT ANALYTICS DEVELOPMENT Chapter 7 Enterprise Development 151 Categorization Development Phases ..........................151 Preliminary Phase ........................................152 Development Process .....................................156 Testing: Relevance Ranking and Scoring ......................167 Maintenance / Governance: All is Flux ........................171 Entity / Noun Phrase Extraction .............................177 Chapter 8 Social Media Development 185 Sentiment Analysis Development ............................187 Sentiment Project: Loving / Hating Your Phone .................188 Social Media Project ......................................197 Issues, We’ve Got Lots and Lots of Issues . .202 Chapter 9 Development: Best Practices and Case Studies 207 Achieving New Inx(s)ights into the News . .207 A Tale of Two Taxonomies .................................214 Success = Multiple Projects + Approaches + Deep Integration .....215 Failure = Wrong Approach / Mindset + Lack of Integration ........220 PART 4 TEXT ANALYTICS APPLicatiONS Chapter 10 Text Analytics Applications—Search 229 The Enterprise Search Dance . .230 Essentials of Faceted Navigation .............................236 Advantages of Faceted Navigation