Pretrained Transformers for Text Ranking: BERT and Beyond
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents June 2021: Complimentary Abstract / Table of Contents
State of the Service Optimization Market Report Technologies Intelligent Document Processing (IDP) State of the Market Report 2021 – Key to Unlocking Value in Documents June 2021: Complimentary Abstract / Table of Contents Copyright © 2021 Everest Global, Inc. We encourage you to share these materials internally within your company and its affiliates. In accordance with the license granted, however, sharing these materials outside of your organization in any form—electronic, written, or EGR-2021-38-CA-4432 verbal—is prohibited unless you obtain the express, prior, and written consent of Everest Global, Inc. It is your organization’s responsibility to maintain the confidentiality of these materials in accordance with your license of them. Our research offerings This report is included in the following research program(s): Service Optimization Technologies ► Application Services ► Finance & Accounting ► Market Vista™ If you want to learn whether your ► Banking & Financial Services BPS ► Financial Services Technology (FinTech) ► Mortgage Operations organization has a membership agreement or request information on ► Banking & Financial Services ITS ► Global Business Services ► Multi-country Payroll pricing and membership options, please ► Catalyst™ ► Healthcare BPS ► Network Services & 5G contact us at [email protected] ► Clinical Development Technology ► Healthcare ITS ► Outsourcing Excellence ► Cloud & Infrastructure ► Human Resources ► Pricing-as-a-Service Learn more about our ► Conversational AI ► Insurance BPS ► Process Mining custom -
Sociedade Brasileira De Cardiologia • ISSN-0066-782X • Volume 104, Nº 2, February 2015
www.arquivosonline.com.br Sociedade Brasileira de Cardiologia • ISSN-0066-782X • Volume 104, Nº 2, February 2015 Figure 1 – Ilustration of cell interactions during atheroma formation. Page 170 Editorial Fatty Acid and Cholesterol Concentrations in Usually Consumed The representativeness of the Arquivos Brasileiros de Cardiologia for Fish in Brazil Brazilian Cardiology Science Assessment of Myocardial Infarction by Cardiac Magnetic Resonance Special Article Imaging and Long-Term Mortality Multivariate Analysis for Animal Selection in Experimental Research Review Article Original Articles Microparticles as Potential Biomarkers of Cardiovascular Disease Reproducibility of left ventricular mass by echocardiogram in the Letter to the Editor ELSA-Brasil Acute Effects of Continuous Positive Airway Pressure on Pulse Pressure in CHF Relationship between Neutrophil-To-Lymphocyte Ratio and Electrocardiographic Ischemia Grade in STEMI Sudden Cardiac Death in Brazil: A Community-Based Autopsy Series Eletronic Pages (2006-2010) Clinicoradiological Session Impact of Light Salt Substitution for Regular Salt on Blood Pressure of Case 2/2015 A 33-Year-Old Woman with Double Right Ventricular Hypertensive Patients Chamber and Ventricular Septal Defect Effects of Ischemic Postconditioning on the Hemodynamic Parameters Case Report and Heart Nitric Oxide Levels of Hypothyroid Rats Giant Left Atrial Thrombus with Double Coronary Vascularization Neural Mechanisms and Delayed Gastric Emptying of Liquid Induced Image Through Acute Myocardial Infarction in Rats Mitral -
Arvedon, Sessler Win Mixed on Score Change
Friday, March 18, 2011 Volume 54, Number 8 Daily Bulletin NABC National Tournament • Louisville • March 10-20, 2011 54th Spring North American Bridge Championships Editors: Brent Manley and Dave Smith Arvedon, Sessler win No. 1 seed falls In Vanderbilt Mixed on score change The top-seeded Team Monaco, with four Just as they had apparently lost out on members of last year’s winning squad, are on the Wednesday night to a score correction, Lloyd sidelines today after being trounced by the original Arvedon and Carolyn Sessler were returned to first No. 17 seed, captained by Seymon Deutsch. They place in the Rockwell Mixed Pairs on Thursday by ousted the top seed 151-87. another score change. The No. 2 seed, Nick Nickell, had to rally in The pair in first place when everyone went to the fourth quarter for a 108-105 victory, while No. bed on Wednesday – Gail Greenberg and Jeff Hand 3 John Diamond cruised against No. 14 Richard – dropped to second. Schwartz. No. 4 James Cayne routed original No. The original margin of victory for Arvedon and 20 Barbara Sonsini. Sessler was four-tenths of a matchpoint. It appeared The No. 5 seed, led by Aubrey Strul, was that Greenberg and Hand had won by sixth-tenths. ousted by Joe Grue, originally seeded No. 21, and The new standings put Arvedon-Sessler back in No. 7 Carolyn Lynch was defeated by No. 22 Mary first by their original margin. Ann Berg. The winners were playing together for the first As this issue went to press, one match – Meng time in about five years. -
Introduction to Information Retrieval
DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome. 1 1 Boolean retrieval The meaning of the term information retrieval can be very broad. Just getting a credit card out of your wallet so that you can type in the card number is a form of information retrieval. However, as an academic field of study, INFORMATION information retrieval might be defined thus: RETRIEVAL Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). As defined in this way, information retrieval used to be an activity that only a few people engaged in: reference librarians, paralegals, and similar pro- fessional searchers. Now the world has changed, and hundreds of millions of people engage in information retrieval every day when they use a web search engine or search their email.1 Information retrieval is fast becoming the dominant form of information access, overtaking traditional database- style searching (the sort that is going on when a clerk says to you: “I’m sorry, I can only look up your order if you can give me your Order ID”). IR can also cover other kinds of data and information problems beyond that specified in the core definition above. The term “unstructured data” refers to data which does not have clear, semantically overt, easy-for-a-computer structure. It is the opposite of structured data, the canonical example of which is a relational database, of the sort companies usually use to main- tain product inventories and personnel records. -
Your Intelligent Digital Workforce: How RPA and Cognitive Document
Work Like Tomorw. YOUR INTELLIGENT DIGITAL WORKFORCE HOW RPA AND COGNITIVE DOCUMENT AUTOMATION DELIVER THE PROMISE OF DIGITAL BUSINESS CONTENTS Maximizing the Value of Data ...................................................................... 3 Connecting Paper, People and Processes: RPA + CDA .......................10 Data Driven, Document Driven ....................................................................4 RPA + CDA: Driving ROI Across Many Industries ................................. 12 Capture Your Content, Capture Control ..................................................5 6 Business Benefits of RPA + CDA ............................................................ 15 Kofax Knows Capture: An Innovation Timeline .......................................6 All CDA Solutions Are Not Created Equal .............................................. 16 Artificial Intelligence: The Foundation of CDA ....................................... 7 Additional Resources ....................................................................................17 AI: Context is Everything ..............................................................................8 AI: Learning by Doing .....................................................................................9 MAXIMIZING THE VALUE OF DATA Data drives modern business. This isn’t surprising when you consider that 90 percent of the world’s data has been created in the last two years alone. The question is: how can you make this profusion of data work for your business and not against it? Enter -
Optical Image Scanners and Character Recognition Devices: a Survey and New Taxonomy
OPTICAL IMAGE SCANNERS AND CHARACTER RECOGNITION DEVICES: A SURVEY AND NEW TAXONOMY Amar Gupta Sanjay Hazarika Maher Kallel Pankaj Srivastava Working Paper #3081-89 Massachusetts Institute of Technology Cambridge, MA 02139 ABSTRACT Image scanning and character recognition technologies have matured to the point where these technologies deserve serious consideration for significant improvements in a diverse range of traditionally paper-oriented applications, in areas ranging from banking and insurance to engineering and manufacturing. Because of the rapid evolution of various underlying technologies, existing techniques for classifying and evaluating alternative concepts and products have become largely irrelevant. A new taxonomy for classifying image scanners and optical recognition devices is presented in this paper. This taxonomy is based on the characteristics of the input material, rather than on speed, technology or application domain. 2 1. INTRODUCTION The concept of automated transfer of information from paper documents to computer-accessible media dates back to 1954 when the first Optical Character Recognition (OCR) device was introduced by Intelligent Machines Research Corporation [1]. By 1970, approximately 1000 readers were in use and the volume of sales had grown to one hundred million dollars per annum [3]. In spite of these early developments, through the seventies and early eighties scanning technology was utilized only in highly specialized applications. The lack of popularity of automated reading systems stemmed from the fact that commercially available systems were unable to handle documents as prepared for human use. The constraints placed by such systems served as barriers, severely limiting their applicability. In 1982, Ullmann [2] observed: "A more plausible view is that in the area of character recognition some vital computational principles have not yet been discovered or at least have not been fully mastered. -
Shreddr: Pipelined Paper Digitization for Low-Resource Organizations
Shreddr: pipelined paper digitization for low-resource organizations Kuang Chen Akshay Kannan Yoriyasu Yano Dept. of EECS Captricity, Inc. Captricity, Inc. UC Berkeley [email protected] [email protected] [email protected] Joseph M. Hellerstein Tapan S. Parikh Dept. of EECS School of Information UC Berkeley UC Berkeley [email protected] [email protected] ABSTRACT able remote agents to directly enter information at the point of ser- For low-resource organizations working in developing regions, in- vice, replacing data entry clerks and providing near immediate data frastructure and capacity for data collection have not kept pace with availability. However, mobile direct entry usually replace existing the increasing demand for accurate and timely data. Despite con- paper-based workflows, creating significant training and infrastruc- tinued emphasis and investment, many data collection efforts still ture challenges. As a result, going “paperless” is not an option for suffer from delays, inefficiency and difficulties maintaining quality. many organizations [16]. Paper remains the time-tested and pre- Data is often still “stuck” on paper forms, making it unavailable ferred data capture medium for many situations, for the following for decision-makers and operational staff. We apply techniques reasons: from computer vision, database systems and machine learning, and • leverage new infrastructure – online workers and mobile connec- Resource limitations: lack of capital, stable electricity, IT- tivity – to redesign -
Information Retrieval of Text, Structure and Sequential Data in Heterogeneous XML Document Collections Eugen Popovici
Information Retrieval of Text, Structure and Sequential Data in Heterogeneous XML Document Collections Eugen Popovici To cite this version: Eugen Popovici. Information Retrieval of Text, Structure and Sequential Data in Heterogeneous XML Document Collections. Computer Science [cs]. Université de Bretagne Sud; Université Européenne de Bretagne, 2008. English. tel-00511981 HAL Id: tel-00511981 https://tel.archives-ouvertes.fr/tel-00511981 Submitted on 26 Aug 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE SOUTENUE DEVANT L’UNIVERSITÉ EUROPÉENNE DE BRETAGNE pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ EUROPÉENNE DE BRETAGNE Mention : SCIENCES ET TECHNOLOGIES DE L’INFORMATION ET DE LA COMMUNICATION par EUGEN-COSTIN POPOVICI Information Retrieval of Text, Structure and Sequential Data in Heterogeneous XML Document Collections Recherche et filtrage d’information multimédia (texte, structure et séquence) dans des collections de documents XML hétérogènes Présentée le 10 janvier 2008 devant la commission d’examen composée de : M. BOUGHANEM Professeur, Université Paul Sabatier, Toulouse III Rapporteur P. GROS Directeur de Recherche, INRIA, Rennes Examinateur M. LALMAS Professeur, Queen Mary University of London Rapporteur P.-F. MARTEAU Professeur, Université de Bretagne-Sud Directeur G. -
Document and Form Processing Automation with Document AI Using Machine Learning to Automate Document and Form Processing ___
Document and Form Processing Automation with Document AI Using Machine Learning to Automate Document and Form Processing ___ Unleash the value of your unstructured document data or speed up manual document processing tasks with the help of Google AI. We will help you build an end-to-end, production-capable document processing solution with Google’s industry-leading Document AI tools, customized to your case. Business Challenge Most business transactions begin, involve, or end with a document. However, approximately 80% of enterprise data is unstructured which historically has made it expensive and difficult to harness that data. The inability to understand unstructured data can decrease operational efficiency, impact decision making, and even increase compliance costs. Decision makers today need the ability to quickly and cost effectively process and make use of their rapidly growing unstructured datasets. Document Workflows Unstructured Data Free Form Text RPA vendors estimate that ~50% Approximately 80% of enterprise 70% is free-form text such as of their workflows begin with a data is unstructured including written documents and emails document machine and human generated Solution Overview Google’s mission is “to organize the world's information and make it universally accessible and useful”. This has led Google to create a comprehensive set of technologies to read (Optical Character Recognition), understand (Natural Language Processing) and make useful (data warehousing, analytics, and visualization) documents, forms, and handwritten text. Google’s Document AI technologies provide OCR (optical character recognition) capabilities that deliver unprecedented accuracy by leveraging advanced deep-learning neural network algorithms. Document AI has support for 200 languages and handwriting recognition of 50 languages. -
CNN-Based Page Segmentation and Object Classification for Counting
Journal of Imaging Article CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation Yekta Said Can * and M. Erdem Kabadayı College of Social Sciences and Humanities, Koc University, Rumelifeneri Yolu, 34450 Sarıyer, Istanbul, Turkey; [email protected] * Correspondence: [email protected] Received: 31 March 2020; Accepted: 11 May 2020; Published: 14 May 2020 Abstract: Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places. Keywords: page segmentation; historical document analysis; convolutional neural networks; Arabic script layout analysis 1. Introduction Historical documents are valuable cultural resources that provide the examination of the historical, social, and economic aspects of the past. -
On the Collaboration Support in Information Retrieval
Open Archive TOULOUSE Archive Ouverte ( OATAO ) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 19042 To link to this article : DOI : 10.1145/3092696 URL : http://doi.org/10.1145/3092696 To cite this version : Soulier, Laure and Tamine-Lechani, Lynda On the collaboration support in Information Retrieval. (2017) ACM Computing Surveys, vol. 50 (n° 4). pp. 1. ISSN 0360-0300 Any correspondence concerning this service should be sent to the repository administrator: [email protected] On the Collaboration Support in Information Retrieval LAURE SOULIER, Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6 LYNDA TAMINE, IRIT, Université de Toulouse, UPS Collaborative Information Retrieval (CIR) is a well-known setting in which explicit collaboration occurs among a group of users working together to solve a shared information need. This type of collaboration has been deemed as bene!cial for complex or exploratory search tasks. With the multiplicity of factors im- pacting on the search e#ectiveness (e.g., collaborators’ interactions or the individual perception of the shared information need), CIR gives rise to several challenges in terms of collaboration support through algorithmic approaches. More particularly, CIR should allow us to satisfy the shared information need by optimizing the collaboration within the search session over all collaborators, while ensuring that both mutually bene!cial goals are reached and that the cognitive cost of the collaboration does not impact the search e#ectiveness. -
Developing a Personalized Article Retrieval System for Pubmed
DEVELOPING A PERSONALIZED ARTICLE RETRIEVAL SYSTEM FOR PUBMED By Sachintha Pitigala A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computational Science Middle Tennessee State University August 2016 Dissertation Committee: Dr. Cen Li, Committee Chair Dr. Suk Seo Dr. John Wallin Dr. Qiang Wu ACKNOWLEDGEMENTS Firstly, I thank my advisor Dr. Cen Li, Professor of Computer Science, for her unrelent- ing advice and support throughout the past six years. I would like to thank Dr. John Wallin, Director of the Computational Science Program, for his invaluable advice and support throughout the years. My special gratitude also goes to Dr. Suk Seo and Dr. Qiang Wu for their advice in presenting this thesis. Further, I would like to take the opportunity to thank all the professors in the computational science program for teaching me new and valuable materials to successfully finish my PhD studies. My heartfelt gratitude also goes to all who participated in the study of the PARS system by volunteering their time to evaluate the search outputs. I would also like to thank all the academic and nonacademic staff of the Department of Computer Science, College of Basic and Applied Sciences and the Graduate School for supporting me in various ways with regards to my studies throughout the past six years. Finally, I thank my parents for their support and encouragement over the years. ii ABSTRACT PubMed keyword based search often results in many citations not directly relevant to the user information need. Personalized Information Retrieval (PIR) systems aim to improve the quality of the retrieval results by letting the users supply information other than keywords.