Enterprise Search in the European Union

Total Page:16

File Type:pdf, Size:1020Kb

Enterprise Search in the European Union Enterprise Search in the European Union: A Techno-economic Analysis Authors: Martin White, Stavri G Nikolov Editors: Shara Monteleone, Ramon Compaño, Ioannis Maghiros 2 0 1 3 Report EUR 26000 EN European Commission Joint Research Centre Institute for Prospective Technological Studies Contact information Address: Edificio Expo. c/ Inca Garcilaso, 3. E-41092 Seville (Spain) E-mail: [email protected] Tel.: +34 954488318 Fax: +34 954488300 http://ipts.jrc.ec.europa.eu http://www.jrc.ec.europa.eu This publication is a Scientific and Policy Report by the Joint Research Centre of the European Commission. Legal Notice Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) Certain mobile telephone operators do not allow access to 00 800 numbers or these calls may be billed. A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server http://europa.eu/. JRC78202 EUR 26000 EN ISBN 978-92-79-30493-4 (pdf) ISSN 1831-9424 (online) doi:10.2791/17809 Luxembourg: Publications Office of the European Union, 2013 © European Union, 2013 Reproduction is authorised provided the source is acknowledged. Printed in Spain Preface This report contributes to the work being carried out by IPTS on the potential of Search, providing a techno-economic analysis of Enterprise Search in the EU and a discussion of the main challenges and opportunities related to the current state of the Enterprise Search market in Europe. This study is part of CHORUS+ - an initiative supported by the Directorate General Information Society and Media. Information about CHORUS+ and its related activities is available at http://avmediasearch.eu 1 Table of Contents Preface ................................................................................................................................................................. 1 Executive Summary ......................................................................................................................................... 5 Methodology .................................................................................................................................................... 11 Part I: Managing Enterprise Information .............................................................................................. 13 1.1 The enterprise repository .......................................................................................................................................... 13 1.2 Reasons for complexity of ES repository ......................................................................................................... 15 1.3 The technology of enterprise search .................................................................................................................. 17 Part 2: Market Considerations ................................................................................................................. 25 2.1 The value chain for enterprise search ............................................................................................................... 25 2.2 The enterprise search business structure ........................................................................................................ 27 2.3 The EU market for enterprise search ................................................................................................................. 32 2.4 Making a business case for enterprise search ............................................................................................. 35 Part 3: The Choice of Enterprise Search Solutions ........................................................................... 43 3.1 Selecting and implementing enterprise search applications ................................................................ 43 3.2 Search implementation and user satisfaction .............................................................................................. 48 3.3 Enterprise search skills availability ..................................................................................................................... 51 Part 4: Analysis and Policy Considerations ......................................................................................... 53 4.1 Technology assessment and forecast ............................................................................................................... 53 4.2 SWOT analysis ................................................................................................................................................................ 61 4.3 Opportunities for EC support actions. Some policy briefs. ..................................................................... 62 Appendix A: List of enterprise search vendors – alphabetical ..................................................... 71 Appendix B: Corporate profiles of selected enterprise search vendors ................................... 75 Appendix C: Delphi summary tables ....................................................................................................... 79 Appendix D: Enterprise search industry analysis consultancies.................................................. 83 Appendix E: Workshop "Exploring the future of enterprise search" ........................................... 85 References ....................................................................................................................................................... 87 3 Executive Summary The value of enterprise search The term ‘enterprise search’ (ES) is used as a generic description for information retrieval applications that use a range of different core technologies to search enterprise repositories. For the purpose of this report, it includes the search of organisations' external web sites, intranets and other electronic text held by the organisations in the form of email, database records, and documents on file shares. This is often referred to as ‘unstructured’ information. Enterprise search technologies date back to the late 1960s when they were developed to search large online databases of scientific, commercial and legal information and to support the legal teams working on a number of large anti-trust suits in the USA – the breakup of AT&T being one example. There are three main technical approaches to ES: Boolean, vector space and probabilistic. Though there are some differences between the requirements of searching web sites and searching other enterprise applications, primarily around security management, it is possible to use the same enterprise search application for both purposes. The development of enterprise search applications requires a wide range of specialised skills, in particular mathematical approaches to set theory, probability and computational linguistics. Enterprise repositories of unstructured information are growing rapidly because of the widespread adoption of social media, increased compliance and regulatory requirements and a lack of resources to remove redundant information. According to research in the USA, large companies (i.e. with more than 1,000 employees) have accumulated over 100 terabytes of information, and many have more than 1 petabyte. Surveys indicate that senior managers are aware of the importance of unstructured information but few are taking action to provide employees with adequate tools to access this information. Motivators Motivators for the development of an enterprise search market, as emerged from the surveys mentioned in this report and also from the workshop organized by JRC-IPTS, “Exploring the future of Enterprise Search”, in Seville in October 2011 are: There is increasing information everywhere: more than 200 billion emails per day; 80% of enterprise information is unstructured. Digital data growth is enormous: it is expected to be 35 zettabytes in 10 years' time. In particular, it seems that 94% of organizations are collecting and managing more business data than just a few years ago and business information collected/managed has increased by 86% in the last few years.1 The cost of poor data management: organizations are seemingly losing revenue each year (on average, 14%) as a result of not being able to fully leverage the information they collect. That translates to circa $130 million in lost opportunity each year for a $1 billion organization.2 Legal compliance of the enterprise: obligation to store and find all enterprise documents, business communications for legal reasons. Enterprise data is all over the place. ES has to federate all the information existing in both structured data (databases) and unstructured data (text, reports, mail). 1 Source: Oracle Survey, From Overload to Impact: an industry scorecard on big data business challenges, 2012. 2 Ib. 5 In other words, if one reason for adopting ES is the growth in data generation, a more worrying reason is the fact that this huge amount of information is largely unstructured. It is estimated that about 80% of the information stored is either unstructured or has no adequate metadata for the needs of employees. As noted by Findwise in its recent Enterprise Search and Findability Survey (2012), quick access to information is of strategic importance in the Information Economy: "The fault does not lie
Recommended publications
  • Oracle Data Sheet- Secure Enterprise Search
    ORACLE SECURE ENTERPRISE SEARCH 11g DATA SHEET ORACLE SECURE ENTERPRISE SEARCH VERSION 11G R2 KEY FEATURES Oracle Secure Enterprise Search 11g (SES), a standalone product RELEASE 11.2.2.2 HIGHLIGHTS from Oracle, enables a high quality, secure search across all Facet Navigation enterprise information assets. Key SES features include: Push-based Content Indexing Multi-tier Install Search Result Tagging The ability to search and locate public, private and shared Unified Microsoft Sharepoint content across intranet web content, databases, files on local Connector, certified for MOSS 2010 version disk or file-servers, IMAP email, document repositories, New search result „hard sort‟ applications, and portals option Auto Suggestions “As You Type” Excellent search quality, with the most relevant items for a query Sitemap.org Support spanning diverse sources being shown first FACET NAVIGATION Full GUI support for facet Sub-second query performance creation, manipulation, and browsing. No programming required. Facet API support is Highly secure crawling, indexing, and searching also provided Facet navigation is secure. Integration with Desktop Search tools Facet values and counts are computed using only those documents that the search Ease of administration and maintenance – a „no-DBA‟ approach user is authorized to see Hierarchical- and range facets to Search. Date, number and string types Update facets without re- Information Uplift for the Intranet crawling As a result of search engines on the Internet, the power of effective search PUSH-BASED CONTENT technologies has become clear to everyone. Using the World Wide Web, consumers INDEXING have become their own information retrieval experts. But search within enterprises Allows customers to push differs radically from public Internet search.
    [Show full text]
  • Magnify Search Security and Administration Release 8.2 Version 04
    Magnify Search Security and Administration Release 8.2 Version 04 April 08, 2019 Active Technologies, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iWay, iWay Software, Parlay, PC/FOCUS, RStat, Table Talk, Web390, WebFOCUS, WebFOCUS Active Technologies, and WebFOCUS Magnify are registered trademarks, and DataMigrator and Hyperstage are trademarks of Information Builders, Inc. Adobe, the Adobe logo, Acrobat, Adobe Reader, Flash, Adobe Flash Builder, Flex, and PostScript are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Due to the nature of this material, this document refers to numerous hardware and software products by their trademarks. In most, if not all cases, these designations are claimed as trademarks or registered trademarks by their respective companies. It is not this publisher's intent to use any of these names generically. The reader is therefore cautioned to investigate all claimed trademark rights before using any of these names other than to refer to the product described. Copyright © 2019, by Information Builders, Inc. and iWay Software. All rights reserved. Patent Pending. This manual, or parts thereof, may not be reproduced in any form without the written permission of Information Builders, Inc. Contents Preface ......................................................................... 7 Conventions ......................................................................... 7 Related Publications .................................................................
    [Show full text]
  • Enterprise Search Technology Using Solr and Cloud Padmavathy Ravikumar Governors State University
    Governors State University OPUS Open Portal to University Scholarship All Capstone Projects Student Capstone Projects Spring 2015 Enterprise Search Technology Using Solr and Cloud Padmavathy Ravikumar Governors State University Follow this and additional works at: http://opus.govst.edu/capstones Part of the Databases and Information Systems Commons Recommended Citation Ravikumar, Padmavathy, "Enterprise Search Technology Using Solr and Cloud" (2015). All Capstone Projects. 91. http://opus.govst.edu/capstones/91 For more information about the academic degree, extended learning, and certificate programs of Governors State University, go to http://www.govst.edu/Academics/Degree_Programs_and_Certifications/ Visit the Governors State Computer Science Department This Project Summary is brought to you for free and open access by the Student Capstone Projects at OPUS Open Portal to University Scholarship. It has been accepted for inclusion in All Capstone Projects by an authorized administrator of OPUS Open Portal to University Scholarship. For more information, please contact [email protected]. ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD By Padmavathy Ravikumar Masters Project Submitted in partial fulfillment of the requirements For the Degree of Master of Science, With a Major in Computer Science Governors State University University Park, IL 60484 Fall 2014 ENTERPRISE SEARCH TECHNOLOGY USING SOLR AND CLOUD 2 Abstract Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database in9tegration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.
    [Show full text]
  • Christopher Egan, MAKING FINDINGS, and IMPOSING a CEASE-AND-DESIST ORDER Respondent
    UNITED STATES OF AMERICA Before the SECURITIES AND EXCHANGE COMMISSION SECURITIES ACT OF 1933 Release No. 10256 / November 15, 2016 ACCOUNTING AND AUDITING ENFORCEMENT Release No. 3823 / November 15, 2016 ADMINISTRATIVE PROCEEDING File No. 3-17678 ORDER INSTITUTING CEASE-AND- In the Matter of DESIST PROCEEDINGS PURSUANT TO SECTION 8A OF THE SECURITIES ACT, Christopher Egan, MAKING FINDINGS, AND IMPOSING A CEASE-AND-DESIST ORDER Respondent. I. The Securities and Exchange Commission (“Commission”) deems it appropriate that cease- and-desist proceedings be, and hereby are, instituted pursuant to Section 8A of the Securities Act of 1933 (“Securities Act”) against Christopher Egan (“Egan” or “Respondent”). II. In anticipation of the institution of these proceedings, Egan has submitted an Offer of Settlement (“Offer”), which the Commission has determined to accept. Solely for the purpose of these proceedings and any other proceedings brought by or on behalf of the Commission, or to which the Commission is a party, and without admitting or denying the findings herein, except as to the Commission’s jurisdiction over him and the subject matter of these proceedings, which are admitted and except as provided herein in Section V., Egan consents to the entry of this Order Instituting Cease-and-Desist Proceedings Pursuant to Section 8A of the Securities Act of 1933, Making Findings, and Imposing a Cease-and-Desist Order (“Order”), as set forth below. III. On the basis of this Order and Respondent’s Offer, the Commission finds1 that: Summary 1. This proceeding arises out of a financial fraud at Autonomy Corporation plc (“Autonomy”), a Cambridge, England-based software company.
    [Show full text]
  • MHEC-10012015 Exhibit A1 - Network Products and Services December 2015
    MHEC-10012015 Exhibit A1 - Network Products and Services December 2015 MHEC Proposed HP Networking Product Line % off USUDDP Classic Procurve Wired and Wireless Portfolio HP Networking Switching and SBN APs, controllers, MSM APs, 6H, I5 34% Networking RF Manager, MSM415, TMS zl, PCM, IDM,NIM, ProCurve switches HP Networking Enterprise Routing A6600, A8800, A12500, I6, 34 34% and Switching A5820X, A9500, A7500 MSR Routers, IMC, 8800, A3100, A3600, A5100, A5500, A7500, E4200, E4500, E4800, E5500, V14xx, V19xx. HP Networking Routing I7 34% Wireless=A3000, A7700, HP Networking A8700, A9000, Airprotect, A- WA AP, A-WX Controller, E3000, etc. IPS, NAC, Controller, Net- Procurve Security I8 34% Optics, SMS License HPN Network Management, HP Network Management and Wired Accessories, Routers- 1U 34% Software I7 and WAN Accessories, Software, Wireless Devices HP Network Accessories HP Network Accessories 35 34% WLAN WLAN 3P 34% Telephony Server HP Telephony 1U 34% Storage/VCX Hardware Services 7G, I9 20% HP Networking Services Software Services 7G 20% Installation Services 7G 20% MHEC Proposed Enterprise Security Products (ESP) Product Line % off USUDDP ArcSight Term & Perpetual SW licenses 49, 51 34% ArcSight Appliance 59 34% ArcSight Education Online Training F9, 5C 20% Education Classroom-based Training F9, 5C 20% Professional Services F9, 5C 20% Tipping Point Hardware and Software I8 34% Tipping Point Tipping Point Support I9 20% Atalla 5W 30% MHEC Proposed Aruba Networks, Inc, a Hewlett Packard Enterprise Company % off USUDDP Aruba Networking
    [Show full text]
  • COMPLAINT I 1 G
    1 TABLE OF CONTENTS Page 2 I. INTRODUCTION. 1 3 II. NATURE OF THE ACTION. 6 4 III. JURISDICTION AND VENUE.. 10 5 III. THE PARTIES. 11 6 A. The Plaintiff. 11 7 B. The Nominal Defendant. 11 8 C. The Individual Defendants. 11 9 D. The Bank Defendants.. 14 10 E. The Auditor Defendant. 15 11 F. Unnamed Participants. 15 12 IV. STATEMENT OF FACTS. 16 13 A. A Brief History of the Hewlett-Packard Company. 16 14 B. Mark Hurd Rejects Autonomy Acquisition. 17 15 C. HP’s Recent History of Bad Deals and Failures.. 18 16 D. Road to Autonomy: Léo Apotheker Becomes New CEO.. 20 17 E. HP Acquires Autonomy.. 23 18 1. August 18, 2011: HP Announces Autonomy Acquisition.. 23 19 2. September 13, 2011: HP Hypes The Value of the Transformative 20 Autonomy IDOL Technology in Order to Finalize the Autonomy Acquisition. 28 21 3. September 22, 2011: CEO Léo Apotheker Forced Out of HP; New 22 CEO Meg Whitman Continues to Praise the Autonomy IDOL Technology. 30 23 F. HP Ignored Serious Concerns About The Propriety of the Autonomy 24 Acquisition For $11.7 Billion.. 30 25 1. HP’s Chief Financial Officer Warned HP Against the Autonomy Acquisition. 30 26 2. HP Knew About Multiple Reports of Improprieties at Autonomy 27 and Multiple Red Flags About Autonomy. 31 28 3. Analysts Warned of Autonomy’s Outdated Technology.. 34 DERIVATIVE COMPLAINT i 1 G. Multiple Companies Refuse to Acquire Autonomy Because It Was OverPriced.. 36 2 1. Oracle Warns HP of Autonomy’s Overvaluation.
    [Show full text]
  • HP Autonomy Data Protection
    HP Autonomy Data Protection June 2014 David Jones, SVP/GM, Data Protection HP Autonomy HP Software: driving the new style of IT The 6th largest software company in the world Applications IT Operations Enterprise HP Autonomy HP Vertica Delivery Management Security Management Driving the new style of IT Driving the new style of IT Built to disrupt the Harnessing 100% of Analytics at extreme • Agility • Cloud / Hybrid adversary human information scale in real-time • Mobility • Operations Analytics • Enterprise • Data Protection • High performance analytics • DevOps • Service Anywhere • Application • Enterprise search • Massive scalability • Modern user interface • Automation & orchestration • Infrastructure • Information governance and management • Open architecture • Information analytics • Optimized data storage • Marketing optimization 2 © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP SW – HP Autonomy Leadership Team Robert Youngjohns, EVP, HP Software (Announced 5/22/14) Products Field/Mrkt/Ops Technology Functions Don Leeke Fernando Lucini Rafiq Mohammadi Global Sales & Sales Frank Ippolito CTO, HP Autonomy GM, Marketing Opt Operations Human Resources Mike Sullivan Andrew Joiner Mike Graves Emerging Technologies Jim Bergkamp GM, Engineering Finance eDiscovery/Archiving Marketing/Partners/Channel Don Avant Sean Blanchflower David Jones Operations & Research & Development GM, Data Protection Services IDOL Susan Ferguson Mohit Mutreja Neil Araujo VP, WW
    [Show full text]
  • BI SEARCH and TEXT ANALYTICS New Additions to the BI Technology Stack
    SECOND QUARTER 2007 TDWI BEST PRACTICES REPORT BI SEARCH AND TEXT ANALYTICS New Additions to the BI Technology Stack By Philip Russom TTDWI_RRQ207.inddDWI_RRQ207.indd cc11 33/26/07/26/07 111:12:391:12:39 AAMM Research Sponsors Business Objects Cognos Endeca FAST Hyperion Solutions Corporation Sybase, Inc. TTDWI_RRQ207.inddDWI_RRQ207.indd cc22 33/26/07/26/07 111:12:421:12:42 AAMM SECOND QUARTER 2007 TDWI BEST PRACTICES REPORT BI SEARCH AND TEXT ANALYTICS New Additions to the BI Technology Stack By Philip Russom Table of Contents Research Methodology and Demographics . 3 Introduction to BI Search and Text Analytics . 4 Defining BI Search . 5 Defining Text Analytics . 5 The State of BI Search and Text Analytics . 6 Quantifying the Data Continuum . 7 New Data Warehouse Sources from the Data Continuum . 9 Ramifications of Increasing Unstructured Data Sources . .11 Best Practices in BI Search . 12 Potential Benefits of BI Search . 12 Concerns over BI Search . 13 The Scope of BI Search . 14 Use Cases for BI Search . 15 Searching for Reports in a Single BI Platform Searching for Reports in Multiple BI Platforms Searching Report Metadata versus Other Report Content Searching for Report Sections Searching non-BI Content along with Reports BI Search as a Subset of Enterprise Search Searching for Structured Data BI Search and the Future of BI . 18 Best Practices in Text Analytics . 19 Potential Benefits of Text Analytics . 19 Entity Extraction . 20 Use Cases for Text Analytics . 22 Entity Extraction as the Foundation of Text Analytics Entity Clustering and Taxonomy Generation as Advanced Text Analytics Text Analytics Coupled with Predictive Analytics Text Analytics Applied to Semi-structured Data Processing Unstructured Data in a DBMS Text Analytics and the Future of BI .
    [Show full text]
  • Corporate Governance Case Studies Volume Three
    CORPORATE GOVERNANCE CASE STUDIES VOLUME THREE Edited by Mak Yuen Teen Corporate Governance Case Studies Volume three Mak Yuen Teen FCPA (Aust.) Editor First published October 2014 Copyright ©2014 Mak Yuen Teen and CPA Australia. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher, except for inclusion of brief quotations in a review. The views expressed in this publication are those of the authors and do not necessarily represent the views of, and should not be attributed to, CPA Australia Ltd. Please contact CPA Australia or Professor Mak Yuen Teen for permission of use of any case studies in this publication. Corporate Governance Case Studies Volume Three Editor : Mak Yuen Teen FCPA (Aust.) Editor’s email : [email protected] Published by : CPA Australia Ltd 1 Raffles Place #31-01 One Raffles Place Singapore 048616 Website : cpaaustralia.com.au Email : [email protected] ISBN : 978-981-09-1544-5 II Contents Contents III Foreword V Preface VII Singapore Cases Airocean in Choppy Waters ...............................................................................1 A Brewing Takeover Battle for F&N ..................................................................10 Hong Fok Corporation: The Badger and The Bear............................................20 Olam in Muddy Waters ....................................................................................29
    [Show full text]
  • Text Analysis: the Next Step in Search
    eDiscovery & Information Management Text Analysis: The Next Step In Search ZyLAB White Paper Johannes C. Scholtes, Ph.D. Chief Strategy Officer, ZyLAB Contents Summary 3 Finding Without Knowing Exactly What to Look For 4 Beyond the Google Standard 4 Challenges Facing Text Analysis 6 Control of Unstructured Information 6 Different Levels of Semantic Information Extraction 7 Co-reference and Anaphora Resolution 11 Faceted Search and Information Visualization 12 Text Analysis on Non-English Documents 15 Content Analytics on Multimedia Files: Audio Search 16 A Prosperous Future for Text Analysis 17 About ZyLAB 19 Summary Text and content analysis differs from traditional search in that, whereas search requires a user to know what he or she is looking for, text analysis attempts to discover information in a pattern that is not known before- hand. One of the most compelling differences with regular (web) search is that typical search engines are optimized to find only the most relevant documents; they are not optimized to find all relevant documents. The majority of commonly-used search tools are built to retrieve only the most popular hits—which simply doesn’t meet the demands of exploratory legal search. This whitepaper will lead the reader beyond the Google standard, explore the limitations and possibilities of text analysis technology and show how text analysis becomes an essential tool to help process and analyze to- day’s enormous amounts of enterprise information in a timely fashion. 3 Finding Without Knowing Exactly What to Look For In general, text analysis refers to the process of extracting interesting and non-trivial information and knowledge from unstructured text.
    [Show full text]
  • Searching the Enterprise
    R Foundations and Trends• in Information Retrieval Vol. 11, No. 1 (2017) 1–142 c 2017 U. Kruschwitz and C. Hull • DOI: 10.1561/1500000053 Searching the Enterprise Udo Kruschwitz Charlie Hull University of Essex, UK Flax, UK [email protected] charlie@flax.co.uk Contents 1 Introduction 2 1.1 Overview........................... 3 1.2 Examples........................... 5 1.3 PerceptionandReality . 9 1.4 RecentDevelopments . 10 1.5 Outline............................ 11 2 Plotting the Landscape 13 2.1 The Changing Face of Search . 13 2.2 DefiningEnterpriseSearch . 14 2.3 Related Search Areas and Applications . 17 2.4 SearchTechniques. 34 2.5 Contextualisation ...................... 37 2.6 ConcludingRemarks. 49 3 Enterprise Search Basics 52 3.1 StructureofData ...................... 53 3.2 CollectionGathering. 59 3.3 SearchArchitectures. 63 3.4 Information Needs and Applications . 68 3.5 SearchContext ....................... 76 ii iii 3.6 UserModelling........................ 78 3.7 Tools, Frameworks and Resources . 81 4 Evaluation 82 4.1 RelevanceandMetrics. 83 4.2 Evaluation Paradigms and Campaigns . 85 4.3 TestCollections ....................... 89 4.4 LessonsLearned ....................... 94 5 Making Enterprise Search Work 95 5.1 PuttingtheUserinControl . 96 5.2 Relevance Tuning and Support . 103 6 The Future 110 6.1 GeneralTrends........................ 110 6.2 TechnicalDevelopments. 111 6.3 Moving towards Cooperative Search . 113 6.4 SomeResearchChallenges . 114 6.5 FinalWords ......................... 117 7 Conclusion 118 Acknowledgements 120 References 121 Abstract Search has become ubiquitous but that does not mean that search has been solved. Enterprise search, which is broadly speaking the use of information retrieval technology to find information within organisa- tions, is a good example to illustrate this.
    [Show full text]
  • Opentext Magellan Text Mining Helps Users Gain Insight from Unstructured Content How to Uncover Insights and Information That Optimize Organizations' Content
    WHITE PAPER OpenText Magellan Text Mining helps users gain insight from unstructured content How to uncover insights and information that optimize organizations' content. Contents Introduction: Mining text for meaning 3 Definitions 4 Creating and using semantic metadata 6 Automated metadata assignment 6 Semi-automated metadata assignment 6 Methodology 7 Statistical patterns 7 Grammatical patterns 7 Machine learning 8 Decision trees 8 Post-processing algorithms 8 Knowledge engineering 9 Magellan Text Mining modules and architecture 9 Concept Extractor 10 Named Entity Extractor 10 Categorizer 10 Summarizer 11 Sentiment Analyzer 11 Language Detector 12 Additional Magellan Text Mining components 12 Conclusion 13 OpenText Magellan Text Mining helps users gain insight from unstructured content 2/13 OpenText Magellan Text Mining helps users gain insight from unstructured content. OpenText Magellan Text Mining enables enterprises to take control of their knowledge assets to manage and grow their business efficiently. Using thoughtfully selected text analytics techniques, such as metadata federation or crawlers to access data from multiple repositories, this tool can extract from content the meaningful pieces of information and help users connect with the content most relevant to them. This white paper focuses on how Magellan Text Mining streamlines and speeds up a key task of content analytics, the semantic annotation of content, to make documents more “findable” and usable for uses ranging from indexing and content curation to claim form processing and creating new, value-added content products. It takes on the task of tagging content with semantic metadata, traditionally done manually, and frees up workers from many hours of repetitive labor to exert more judgment in content management.
    [Show full text]