Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents June 2021: Complimentary Abstract / Table of Contents

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents June 2021: Complimentary Abstract / Table of Contents State of the Service Optimization Market Report Technologies Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents June 2021: Complimentary Abstract / Table of Contents Copyright © 2021 Everest Global, Inc. We encourage you to share these materials internally within your company and its affiliates. In accordance with the license granted, however, sharing these materials outside of your organization in any form—electronic, written, or EGR-2021-38-CA-4432 verbal—is prohibited unless you obtain the express, prior, and written consent of Everest Global, Inc. It is your organization’s responsibility to maintain the confidentiality of these materials in accordance with your license of them. Our research offerings This report is included in the following research program(s): Service Optimization Technologies ► Application Services ► Finance & Accounting ► Market Vista™ If you want to learn whether your ► Banking & Financial Services BPS ► Financial Services Technology (FinTech) ► Mortgage Operations organization has a membership agreement or request information on ► Banking & Financial Services ITS ► Global Business Services ► Multi-country Payroll pricing and membership options, please ► Catalyst™ ► Healthcare BPS ► Network Services & 5G contact us at [email protected] ► Clinical Development Technology ► Healthcare ITS ► Outsourcing Excellence ► Cloud & Infrastructure ► Human Resources ► Pricing-as-a-Service Learn more about our ► Conversational AI ► Insurance BPS ► Process Mining custom research capabilities ► Contingent Workforce Management ► Insurance ITS ► Procurement Benchmarking ► Cost Excellence ► Insurance Technology (InsurTech) ► Recruitment Process Outsourcing ► Customer Experience Management Services ► Insurance Third-Party Administration (TPA) Services ► Rewards & Recognition Contract assessment ► Cybersecurity ► Intelligent Document Processing (IDP) ► Service Optimization Technologies Peer analysis ► Data & Analytics ► Interactive Experience (IX) Services ► Supply Chain Management (SCM) Services Market intelligence ► Digital Adoption Platforms (DAP) ► IT Services Executive Insights™ ► Talent Excellence GBS ► Digital Services ► Life Sciences BPS ► Talent Excellence ITS Tracking: service providers, ► Engineering Services ► Life Sciences ITS ► Workplace Services locations, risk, technologies ► Enterprise Platform Services ► Locations Insider™ ► Work at Home Agent (WAHA) Customer Locations: costs, skills, Experience Management (CXM) sustainability, portfolios Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 2 1. Introduction and overview 5 Research methodology 6 Contents Key information on the report 7 Background and/or introduction 8 IDP supplier landscape 12 2. Summary of key messages 13 3. Introduction to IDP 18 4. IDP market size and adoption trends 24 IDP market size 26 IDP adoption trends by industry 28 IDP adoption trends by geography 30 IDP adoption trends by process area 31 IDP adoption trends by buyer size 32 For more information on this and other research 5. Buyer expectations 33 published by Everest Group, please contact us: Drivers for IDP adoption 34 Anil Vijayan, Vice President Overall buyer satisfaction 35 Ashwin Gopakumar, Practice Director Buyer satisfaction with vendor capabilities 36 Samikshya Meher, Senior Analyst Key strengths and areas of improvement 37 Utkarsh Shahdeo, Senior Analyst 6. IDP product capabilities and trends 38 Shiven Mittal, Senior Analyst Key IDP capabilities and features 39 Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 3 Languages supported and adoption trends 47 IDP product capabilities – future trends 48 Contents Key implications for enterprises 49 7. IDP solution characteristics 50 Deployment model and hosting options 52 Client training and support 53 Talent 54 Partner ecosystem 55 Go-to-market strategy 58 Commercial model 60 8. IDP vendor landscape 61 Overall market share by revenue 63 Vendors with the largest share of IDP license revenue in the top industry verticals 64 Vendors with the largest share of IDP license revenue in the top business functions 65 Vendors’ share of IDP license revenue in major geographies 66 Key investment themes by IDP vendors 67 9. Challenges and best practices 69 10. Appendix 72 Glossary 73 Research calendar 76 Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 4 Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents Background of the research Background of the research Everest Group defines Intelligent Document Processing (IDP) as any software product or solution that captures data from documents (e.g., email, text, PDF, and scanned documents), categorizes it, and extracts relevant data for further processing using AI technologies such as computer vision, Optical Character Recognition (OCR), Natural Language Processing (NLP), and machine/deep learning. These solutions are typically non-invasive and can be integrated with internal applications, systems, and other automation platforms. IDP products find a wide variety of use cases from different business functions and verticals. Adoption of IDP solutions can not only help enterprises achieve cost savings, but also improve their workforce productivity and employee & customer experience. These products are also rapidly evolving in the sophistication of their capabilities, features, and functionalities. In this study, we assess IDP software products in the market that leverage AI/cognitive capabilities and are available for independent licensing. They are offered as either platforms that allow enterprises to deploy as out-of-the-box solutions using pre-built modules, or custom solutions to the buyers with the intent of classifying and extracting data from documents. In this study, we analyze the IDP technology landscape across various dimensions: IDP market size and adoption trends Buyer expectations IDP product capabilities and trends IDP solution characteristics IDP vendor landscape Challenges to IDP adoption Outlook for 2020-21 Scope of this report: Geography Products Technology vendors Global Intelligent Document Processing 27 leading IDP (IDP) technology vendors Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 5 Our research methodology is based on four pillars of strength to produce actionable and insightful research for the industry 01 02 03 04 Robust definitions Primary sources Diverse set of Fact-based research and frameworks of information market touchpoints Data-driven analysis Function-specific Annual contractual Ongoing interactions with expert pyramids, Total and operational RFIs, across key perspectives, Value Equation, service provider stakeholders, input from trend-analysis across PEAK Matrix®, and briefings and buyer a mix of perspectives market adoption, market maturity interviews, web-based and interests, supports contracting, and service surveys both data analysis and providers thought leadership Proprietary database on Intelligent Document Processing (IDP) capabilities of 27 technology vendors Repository of existing research in IDP Dedicated team for IDP research Executive-level relationships with buyers, service providers, technology providers, and industry associations Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 6 Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents Everest Group’s SOT research is based on multiple sources of proprietary information Vendors assessed Proprietary database of 27 IDP technology vendors The database tracks the vendors’ offering/capabilities for: – Document processing and software learning features – Product-related training and support services – Interoperability, monitoring, and improvement features – Availability and adoption of commercial model(s) – Deployment and hosting options – IT governance and security – Partnerships with service providers and other technology vendors Proprietary operational information database of technology vendors (updated annually) The database tracks the following operational information for each vendor: – Revenue and number of FTEs – Portfolio coverage in terms of industry, – Number of clients geography, process areas, and buyer size – FTE split by different lines of business Demonstrations and interactions with technology vendors and other industry stakeholders Detailed demos for a comprehensive product view and executive-level discussions with IDP vendors that cover: – Current state of the market – Opportunities and challenges – Vision and strategy – Emerging areas of investment – Annual performance and outlook Buyer reference interviews, ongoing buyer surveys, and interactions Interviews with technology vendors’ reference clients and enterprise IDP buyers to get the buyer perspective around: – Drivers and objectives for adopting IDP – Emerging priorities / buying criteria – Apprehensions and challenges – Outcomes achieved – Assessment of vendors’ performance – Lessons learnt and best practices The source of all content is Everest Group unless otherwise specified Confidentiality: Everest Group takes its confidentiality pledge very seriously. Any information we collect that is contract specific will only be presented back to the industry in an aggregated fashion Proprietary & Confidential. © 2021, Everest Global, Inc. | EGR-2021-38-CA-4432 7 Intelligent
Recommended publications
  • Your Intelligent Digital Workforce: How RPA and Cognitive Document
    Work Like Tomorw. YOUR INTELLIGENT DIGITAL WORKFORCE HOW RPA AND COGNITIVE DOCUMENT AUTOMATION DELIVER THE PROMISE OF DIGITAL BUSINESS CONTENTS Maximizing the Value of Data ...................................................................... 3 Connecting Paper, People and Processes: RPA + CDA .......................10 Data Driven, Document Driven ....................................................................4 RPA + CDA: Driving ROI Across Many Industries ................................. 12 Capture Your Content, Capture Control ..................................................5 6 Business Benefits of RPA + CDA ............................................................ 15 Kofax Knows Capture: An Innovation Timeline .......................................6 All CDA Solutions Are Not Created Equal .............................................. 16 Artificial Intelligence: The Foundation of CDA ....................................... 7 Additional Resources ....................................................................................17 AI: Context is Everything ..............................................................................8 AI: Learning by Doing .....................................................................................9 MAXIMIZING THE VALUE OF DATA Data drives modern business. This isn’t surprising when you consider that 90 percent of the world’s data has been created in the last two years alone. The question is: how can you make this profusion of data work for your business and not against it? Enter
    [Show full text]
  • Optical Image Scanners and Character Recognition Devices: a Survey and New Taxonomy
    OPTICAL IMAGE SCANNERS AND CHARACTER RECOGNITION DEVICES: A SURVEY AND NEW TAXONOMY Amar Gupta Sanjay Hazarika Maher Kallel Pankaj Srivastava Working Paper #3081-89 Massachusetts Institute of Technology Cambridge, MA 02139 ABSTRACT Image scanning and character recognition technologies have matured to the point where these technologies deserve serious consideration for significant improvements in a diverse range of traditionally paper-oriented applications, in areas ranging from banking and insurance to engineering and manufacturing. Because of the rapid evolution of various underlying technologies, existing techniques for classifying and evaluating alternative concepts and products have become largely irrelevant. A new taxonomy for classifying image scanners and optical recognition devices is presented in this paper. This taxonomy is based on the characteristics of the input material, rather than on speed, technology or application domain. 2 1. INTRODUCTION The concept of automated transfer of information from paper documents to computer-accessible media dates back to 1954 when the first Optical Character Recognition (OCR) device was introduced by Intelligent Machines Research Corporation [1]. By 1970, approximately 1000 readers were in use and the volume of sales had grown to one hundred million dollars per annum [3]. In spite of these early developments, through the seventies and early eighties scanning technology was utilized only in highly specialized applications. The lack of popularity of automated reading systems stemmed from the fact that commercially available systems were unable to handle documents as prepared for human use. The constraints placed by such systems served as barriers, severely limiting their applicability. In 1982, Ullmann [2] observed: "A more plausible view is that in the area of character recognition some vital computational principles have not yet been discovered or at least have not been fully mastered.
    [Show full text]
  • Shreddr: Pipelined Paper Digitization for Low-Resource Organizations
    Shreddr: pipelined paper digitization for low-resource organizations Kuang Chen Akshay Kannan Yoriyasu Yano Dept. of EECS Captricity, Inc. Captricity, Inc. UC Berkeley [email protected] [email protected] [email protected] Joseph M. Hellerstein Tapan S. Parikh Dept. of EECS School of Information UC Berkeley UC Berkeley [email protected] [email protected] ABSTRACT able remote agents to directly enter information at the point of ser- For low-resource organizations working in developing regions, in- vice, replacing data entry clerks and providing near immediate data frastructure and capacity for data collection have not kept pace with availability. However, mobile direct entry usually replace existing the increasing demand for accurate and timely data. Despite con- paper-based workflows, creating significant training and infrastruc- tinued emphasis and investment, many data collection efforts still ture challenges. As a result, going “paperless” is not an option for suffer from delays, inefficiency and difficulties maintaining quality. many organizations [16]. Paper remains the time-tested and pre- Data is often still “stuck” on paper forms, making it unavailable ferred data capture medium for many situations, for the following for decision-makers and operational staff. We apply techniques reasons: from computer vision, database systems and machine learning, and • leverage new infrastructure – online workers and mobile connec- Resource limitations: lack of capital, stable electricity, IT- tivity – to redesign
    [Show full text]
  • Document and Form Processing Automation with Document AI Using Machine Learning to Automate Document and Form Processing ___
    Document and Form Processing Automation with Document AI Using Machine Learning to Automate Document and Form Processing ___ Unleash the value of your unstructured document data or speed up manual document processing tasks with the help of Google AI. We will help you build an end-to-end, production-capable document processing solution with Google’s industry-leading Document AI tools, customized to your case. Business Challenge Most business transactions begin, involve, or end with a document. However, approximately 80% of enterprise data is unstructured which historically has made it expensive and difficult to harness that data. The inability to understand unstructured data can decrease operational efficiency, impact decision making, and even increase compliance costs. Decision makers today need the ability to quickly and cost effectively process and make use of their rapidly growing unstructured datasets. Document Workflows Unstructured Data Free Form Text RPA vendors estimate that ~50% Approximately 80% of enterprise 70% is free-form text such as of their workflows begin with a data is unstructured including written documents and emails document machine and human generated Solution Overview Google’s mission is “to organize the world's information and make it universally accessible and useful”. This has led Google to create a comprehensive set of technologies to read (Optical Character Recognition), understand (Natural Language Processing) and make useful (data warehousing, analytics, and visualization) documents, forms, and handwritten text. Google’s Document AI technologies provide OCR (optical character recognition) capabilities that deliver unprecedented accuracy by leveraging advanced deep-learning neural network algorithms. Document AI has support for 200 languages and handwriting recognition of 50 languages.
    [Show full text]
  • CNN-Based Page Segmentation and Object Classification for Counting
    Journal of Imaging Article CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation Yekta Said Can * and M. Erdem Kabadayı College of Social Sciences and Humanities, Koc University, Rumelifeneri Yolu, 34450 Sarıyer, Istanbul, Turkey; [email protected] * Correspondence: [email protected] Received: 31 March 2020; Accepted: 11 May 2020; Published: 14 May 2020 Abstract: Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places. Keywords: page segmentation; historical document analysis; convolutional neural networks; Arabic script layout analysis 1. Introduction Historical documents are valuable cultural resources that provide the examination of the historical, social, and economic aspects of the past.
    [Show full text]
  • Pre-Training of Text and Layout for Document Image Understanding
    LayoutLM: Pre-training of Text and Layout for Document Image Understanding Yiheng Xu∗ Minghao Li∗ Lei Cui [email protected] [email protected] [email protected] Harbin Institute of Technology Beihang University Microsoft Research Asia Shaohan Huang Furu Wei Ming Zhou [email protected] [email protected] [email protected] Microsoft Research Asia Microsoft Research Asia Microsoft Research Asia ABSTRACT 1 INTRODUCTION Pre-training techniques have been verified successfully in a vari- Document AI, or Document Intelligence1, is a relatively new re- ety of NLP tasks in recent years. Despite the widespread use of search topic that refers techniques for automatically reading, under- pre-training models for NLP applications, they almost exclusively standing, and analyzing business documents. Business documents focus on text-level manipulation, while neglecting layout and style are files that provide details related to a company’s internal and information that is vital for document image understanding. In external transactions, which are shown in Figure 1. They may be this paper, we propose the LayoutLM to jointly model interactions digital-born, occurring as electronic files, or they may be in scanned between text and layout information across scanned document form that comes from written or printed on paper. Some common images, which is beneficial for a great number of real-world doc- examples of business documents include purchase orders, financial ument image understanding tasks such as information extraction reports, business emails, sales agreements, vendor contracts, letters, from scanned documents. Furthermore, we also leverage image invoices, receipts, resumes, and many others. Business documents features to incorporate words’ visual information into LayoutLM.
    [Show full text]
  • Intelligent Document Processing (IDP) – Technology Vendor Landscape with Products PEAK Matrix® Assessment 2021
    Intelligent Document Processing (IDP) – Technology Vendor Landscape with Products PEAK Matrix® Assessment 2021 May 2021 Copyright © 2021 Everest Global, Inc. This document has been licensed for exclusive use and distribution by IBM 1. Introduction and overview 5 Research methodology 6 Contents Background of the research 7 Scope of the research 8 2. Summary of key messages 10 3. Overview of IDP software products 12 Understanding enterprise grade IDP solutions 13 OCR vs. IDP 14 Drivers of IDP Solution 15 Types of IDP solution 16 Partner ecosystem 17 4. IDP Product PEAK Matrix® characteristics 18 PEAK Matrix positions – summary 19 For more information on this and other research PEAK Matrix framework 20 published by Everest Group, please contact us: Everest Group PEAK Matrix for IDP 21 Anil Vijayan, Vice President Characteristics of Leaders, Major Contenders, and Aspirants 24 Ashwin Gopakumar, Practice Director Technology vendors’ capability summary dashboard 27 Senior Analyst Samikshya Meher, 5. IDP market – competitive landscape 32 Shiven Mittal, Senior Analyst Utkarsh Shahdeo, Senior Analyst Proprietary & Confidential. © 2021, Everest Global, Inc. | This document has been licensed for exclusive use and distribution by IBM 2 6. Profiles of 27 technology vendors 39 Leaders 39 Contents – ABBYY 40 – AntWorks 42 – Automation Anywhere 44 – IBM 46 – Kofax 48 – WorkFusion 50 Major Contenders 52 – BIS 53 – Celaton 55 – Datamatics 57 – EdgeVerve 59 – Evolution AI 61 – HCL Technologies 63 – Hypatos 65 – Hyperscience 67 – Indico 69 – Infrrd 71 – JIFFY.ai 73 Proprietary & Confidential. © 2021, Everest Global, Inc. | This document has been licensed for exclusive use and distribution by IBM 3 – Nividous 75 – Parascript 77 Contents – Rossum 79 – Singularity Systems 81 – UST SmartOps 83 Aspirants 85 – GuardX 86 – i3systems 88 – qBotica 90 – SortSpoke 92 – TAIGER 94 7.
    [Show full text]
  • Capturing Data Intelligently
    Capturing data intelligently AN EASY WAY OF ENSURING COMPLIANCE WHEN COMMUNICATING WITH YOUR CUSTOMER White paper Capturing data intelligently A Docbyte whitepaper TABLE OF CONTENT 1. THE UNSTRUCTURED DATA AND PAPER CONUNDRUM 3 1.1. Unfortunately, paper often reigns supreme… 3 1.2. Why are companies still using paper? 4 1.3. The other side of the problem: unstructured information 4 2. CONQUERING THE PAPER MOUNTAIN AND DIGITALLY UNSTRUCTURED 5 WITH CAPTURE TECHNOLOGY 2.1. What to look for in a capture product 5 3. INTELLIGENTLY CAPTURING YOUR CONTENT 6 3.1. What is intelligent capture? 7 3.2. Why you should capture content intelligently 8 3.3. Compliance without the headaches 10 3.4. Two types of capture 11 3.5. The power of mobile intelligent capture 12 4. INCREASING YOUR INTELLIGENT CAPTURE SOFTWARE’S RELIABILITY 16 4.1. Natural language processing 16 4.2. Liveness detection and face recognition 16 4.3. Pattern recognition through machine learning 17 5. STRUCTURING THE UNSTRUCTURED 18 5.1. Why don’t we just structure all incoming information? 18 5.2. Getting the help of customers and partners thanks to upload portals 18 6. ROBOTS VERSUS ALGORITHMS: IS RPA ENOUGH? 20 6.1. Reaching the limits 20 6.2. Machine learning to the rescue 21 7. ONCE YOU GO INTELLIGENT CAPTURE, YOU NEVER GO BACK 22 2/ Capturing data intelligently A Docbyte whitepaper 1. THE UNSTRUCTURED DATA AND PAPER CONUNDRUM Capturing customer data from incoming mail or messaging channels and for account creation is crucial in many businesses. Yet surprisingly, quite a lot of data is still captured on paper or is being sent by and to companies in an unorganized manner.
    [Show full text]
  • Historical Document Processing
    Historical Document Processing: A Survey of Techniques, Tools, and Trends James P. Philips1*, Nasseh Tabrizi1 1 East Carolina University, United States of America *Corresponding author: James P. Philips [email protected] Abstract Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including computer vision, document analysis and recognition, natural language processing, and machine learning, to convert images of ancient manuscripts, letters, diaries, and early printed texts automatically into a digital format usable in data mining and information retrieval systems. Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text from these collections has become acute. Since Historical Document Processing encompasses multiple sub-domains of computer science, knowledge relevant to its purpose is scattered across numerous journals and conference proceedings. This paper surveys the major phases of, standard algorithms, tools, and datasets in the field of Historical Document Processing, discusses the results of a literature review, and finally suggests directions for further research. keywords historical document processing, archival data, handwriting recognition, OCR, digital humanities INTRODUCTION Historical Document Processing is the process of digitizing written and printed material from the past for future use by historians. Digitizing historical documents preserves them by ensuring a digital version will persist even if the original document is destroyed or damaged. Moreover, since an extensive number of historical documents reside in libraries and other archives, access to them is often hindered.
    [Show full text]
  • Cognitive Document Processing
    Capgemini’s Cognitive Document Processing A new platform harnesses cognitive capabilities such as artificial intelligence and machine learning to ease the burden of processing documents and extracting data from them. It can reduce costs, improve customer experience, and help to ensure regulatory compliance. Over the past few years, the financial services industry has How the solution works experienced a steep rise in the volume of digital documents Let’s consider a typical scenario. The bank sends scanned or it has to deal with. These include a wide range of items handwritten documents to the platform; these may have spanning application or claim forms, checks, passports, bills, been uploaded by customers themselves. At this point, the and many others, and can be in a range of formats such as solution executes OCR or ICR services on the documents to JPG, PNG, PDF, and HTML. extract information in text format. OCR is used to transform Processing these items and extracting data from them is scanned images into machine-encoded text; ICR performs labor intensive and costly. It involves complex operations similar functions, typically where handwritten documents that can easily go wrong, with the risk of damage to the need to be identified – for example, to carry out signature business of the financial institution (FI) and its customers. recognition or validation. Facial detection is used to extract ID Problems with document processing, such as lost documents photos if these are included (for example, in a passport); the and missing signatures, can also lead to regulatory breaches, photos can then be used for profile verification.
    [Show full text]
  • Accelerate Your Digital Transformation with Intelligent Document Processing
    Accelerate your Digital Transformation with Intelligent Document Processing 1 ACCELERATE YOUR DIGITAL TRANSFORMATION WITH INTELLIGENT DOCUMENT PROCESSING Not seen the return on your digital transformation efforts? Or struggling to get digital transformation projects off the ground? This eBook offers practical and pragmatic advice on how Sypht’s simple, smart and scalable intelligent document processing can help super-charge your digital transformation journey. 2 Contents 1. Introduction ..................................................................4 2. How Sypht can help .................................................8 3. The journey ...................................................................11 4. Sypht in action .......................................................... 14 5. Future-proofing your business ......................... 17 6. Summary ..................................................................... 19 3 1. Introduction How we interpret data can sometimes be a matter of life or death. As World War II unfolded, the US Air Force had a problem. American planes needed armour to protect them in combat, but too much weighed the planes down, making them less maneuverable. Fortunately, the Air Force had the data to solve the problem. The planes that returned from combat were riddled with bullet holes – and the damage was far from uniform. If they simply concentrated the armour around the areas where the planes were being hit, it would make them safer and lighter. But how much more armour should they use – and where? Section of plane Bullet holes per square foot Engine 1.11 Fuselage 1.73 Fuel system 1.55 Rest of plane 1.8 For that, they turned to Columbia University’s Statistical Research Group, which provided a surprising answer. The armour doesn’t go where the bullet holes are, it goes where they aren’t: the engines. The reason? Planes that were hit in the engines weren’t coming back to base at all[1].
    [Show full text]
  • Unfolding the Structure of a Document Using Deep Learning
    Unfolding the Structure of a Document using Deep Learning Muhammad Mahbubur Rahman and Tim Finin Abstract—Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired information in them. We aim to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of our research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. We evaluate the effectiveness and robustness of our framework through extensive experiments on two collections: more than one million scholarly articles from arXiv and a collection of requests for proposal documents from government sources. Index Terms—Document Structure, Deep Learning, Document Understanding, Semantic Annotation F 1 INTRODUCTION URRENT language understanding approaches are The semantic organization of the sections and subsec- C mostly focused on small documents, such as newswire tions of documents across all vertical domains is not the articles, blog posts, and product reviews. Understanding same. For example, business documents typically have com- and extracting information from large documents like legal pletely different structures than user manuals or scholarly documents, reports, proposals, technical manuals, and re- papers. Even research articles from Computer Science and search articles is still a challenging task. The reason behind Social Science may have different structures. For example, this challenge is that the documents may be multi-themed, Social Science articles usually have sections named method- complex, and cover diverse topics.
    [Show full text]