Best Practices in Text Analytics and Natural Language Processing
Total Page:16
File Type:pdf, Size:1020Kb
Best Practices in Text Analytics and Natural Language Processing Marydee Ojala ............. 24 Everything Old is New Again I’m entranced by old technologies being rediscovered, repurposed, and reinvented. Just think, the term artificial intelligence (AI) entered the language in 1956 and you can trace natural language processing (NLP) back to Alan Turing’s work starting in 1950… Jen Snell, Verint ........... 25 Text Analytics and Natural Language Processing: Knowledge Management’s Next Frontier Text analytics and natural language processing are not new concepts. Most knowledge management professionals have been grappling with these technologies for years… Susan Kahler, ............. 26 Keeping It Personal With Natural Language SAS Institute, Inc. Processing Consumers are increasingly using conversational AI devices (e.g., Amazon Echo and Google Home) and text-based communication apps (e.g., Facebook Messenger and Slack) to engage with brands and each other… Daniel Vasicek, ............ 27 Data Uncertainty, Model Uncertainty, and the Access Innovations, Inc. Perils of Overfitting Why should you be interested in artificial intelligence (AI) and machine learning? Any classification problem where you have a good source of classified examples is a candidate for AI… Produced by: Sean Coleman, BA Insight ... 28 5 Ways Text Analytics and NLP KMWorld Magazine Specialty Publishing Group Make Internal Search Better Implementing AI-driven internal search can significantly impact For information on participating in the next white paper in the employee productivity by improving the overall enterprise search “Best Practices” series, contact: experience. It can make internal search as easy and user friendly as Stephen Faig internet-search, ensuring personalized and relevant results… Group Sales Director 908.795.3702 Access Innovations, Inc. PH: 505.998.0800 [email protected] 6301 Indian School Rd., NE Contact: [email protected] Albuquerque, NM 87110 Web: www.accessinn.com LaShawn Fugate BA Insight PH: 339.368.7234 Account Executive 7 Liberty Square, Floor 3 Contact: [email protected] Boston, MA 02109 Web: www.bainsight.com 859.361.0667 [email protected] SAS Institute, Inc. PH: 1.800.727.0025 100 SAS Campus Drive Fax: 1.919.677.4444 Cary, NC 27513-2414, USA Web: https://www.sas.com Verint Systems Inc. PH: 1.800.4VERINT (1.800.483.7468) 175 Broadhollow Road, Suite 100 Contact: [email protected] Melville, NY 11747 Web: www.verint.com KMWorld.com Marydee Ojala is conference Everything Old program director for Information Today, Inc. She works on conferences such Is New Again as Enterprise Search & Discovery, which is co-located with By Marydee Ojala, Conference Program Director, Information Today, Inc. KMWorld, and WebSearch Marydee Ojala University, among others. She is a frequent speaker at U.S. and international ’m entranced by old technologies being re- Ask NLP information professional events. In addition, she discovered, repurposed, and reinvented. Just moderates the popular KMWorld webinar series. I One major change from the early days of think, the term artificial intelligence (AI) en- Ojala is based in Indianapolis, Indiana and can be NLP is the increasing reliance on more infor- tered the language in 1956 and you can trace reached at [email protected]. mal language rather than the written word of natural language processing (NLP) back to structured letters or memos. Customers expect Alan Turing’s work starting in 1950. Text ana- to interact with companies in a conversational lytics has its antecedents in data mining. Data way, either by actual phone calls or by text mes- mining itself has a long history, all the way be able to predict from noisy data even sages. As Susan Kahler, SAS’s Global AI Prod- back to Thomas Bayes, who died in 1761, and without a perfect fit. He warns, too, about uct Marketing Manager, points out, you can use his eponymous theorem that still informs al- overfitting, where the ML algorithms get NLP engines to teach and guide machines to gorithms regarding inference, probability, and overly specific, weeding out information examine these types of audio and text data. predictions. that is actually germane and thus giving er- Identifying relationships and patterns in Even when the concept and the phrases are roneous or misleading results. the data puts you in a better position to meet the same, the increased power, bandwidth, and customer needs and delivery better experi- sheer computing power available today chang- ences, personalized to the individual custom- Improving Internal Search es the applications significantly. Whenever I er. Additionally, pulling together data from Internal search is one area deeply affected read about the pattern-matching feats of NLP, all your customer communications channels by text analytics, NLP, and ML. Sean Cole- machine learning (ML), and AI, I’m remind- and feeding it into an NLP engine gives you man, CTO and Chief Customer Officer at BA ed of a student I knew at university. He man- ually counted how many times certain words appeared in a Shakespeare play. As you can “Identifying relationships and patterns in the data puts you in a better position imagine, this consumed an enormous amount to meet customer needs and delivery better experiences, personalized to the of his time—and I’ve never been convinced that his professor thought it was worth the ef- individual customer. Additionally, pulling together data from all your customer fort. With today’s technology, this task would take maybe a minute, maybe even less time. communications channels and feeding it into an NLP engine gives you insights that will streamline future customer communications.” Data Quality Matters Jen Snell, VP of Product Marketing at insights that will streamline future customer Insight, suggests adding semantic search to the Verint, understands that the fundamental pur- communications. list of “new” technologies. In his view, seman- pose of text analytics and NLP remains con- tic search lets people “search as they speak,” stant: “They help get the right information to Oh See Our Data creates a single unified index, and applies employees at the right time.” Sounds simple, conceptual knowledge to search queries. This Optical character recognition (OCR) is right? It’s certainly an admirable goal, but one leads to higher relevance of search results. another of those old-time technologies that too often thwarted by the amount of informa- Associated technologies for better search has morphed into a new world of AI. Daniel tion available. Sifting through zettabytes of include content intelligence, which gives em- data is not a task for humans; it has to be done Vasicek, Senior Data Scientist, Access In- ployees detailed information about files, mul- by computers. novations, Inc., sees huge improvements in timedia content, and taxonomies, along with For computers do this well, regardless of OCR performance and in automatic transla- the textual content. A single index incorporat- how robust their text analytics and NLP ca- tion from one language to another. However, ing ML features autosuggestion and autocor- pabilities, the quality of the underlying data real data has a degree of uncertainty that caus- rection. Personalization is gained by showing needs to be assessed and optimized. It doesn’t es it to be non-conforming. what other people viewed, based on location, matter whether that data is structured or un- Because data isn’t always perfect (a per- department, or interests. Sentiment analysis, structured, its quality determines the success son’s job affiliation from several years ago, to be meaningful internally, can focus on so- or failure of a KM project. for example), models built to reflect and draw cial media inside the enterprise and on emails Snell also cautions that not all NLPs are inferences from that data aren’t always perfect, and other electronic communications. Cole- the same. Two approaches to using an NLP either. Thus, fitting data to the model requires man also mentions search bots as coming of engine are statistical and symbolic. In the for- some balancing of measurement errors with age for internal search. mer, you train the system to identify patterns, model errors. This enhances model predictions. Not everything that was old is new again, generate a model, and predict word meanings This may sound somewhat heretical, but with good reason. Few people miss Clip- based on a large data corpus. The latter used Vasicek thinks that exact fits are not only py or rotary dial phones. AI in 1956 played hard-coded linguistic rules, which originate no longer possible, but also not desirable. checkers and that was the extent of its “intel- with people and are then taught to machines. As he writes, “An exact fit to noisy data ligence.” We’ve moved on, and I think we can Neither, according to Snell, are sufficient by means that the model is fitting the noise.” all agree that the evolution of AI, NLP, ML, themselves. Regardless of how you deploy Although you’d like to reduce the uncer- and text analytics benefits us both personally ❚ NLP, data quality makes all the difference. tainty in predications, models should still and professionally. S24 KMWorld September/October 2019 Jen Snell is VP of product marketing Text Analytics and Natural at Verint, where she leads a product strategy team focused on intel- Language Processing: ligent self-service, conversational AI, automation, and analytics. She is a Knowledge Management’s frequent speaker Jen Snell and a leading contributor on topics shaping the development and design of Next Frontier interactive technologies. By Jen Snell, Vice President, Product Marketing, Verint linguistic rules that are developed by peo- ple and taught to machines. At a high level, symbolic NLP seeks to teach the meaning of words to the machine, while statistical seeks to ext analytics and natural language pro- Of course, with both text analytics and predict appropriate responses to inputs based Tcessing are not new concepts.