Data Search using Deep Learning

Movie Premiere Director Actors The Web is a rich source for tables: Biutiful 05/17/2010 ? ? True Grit 12/22/2010 ? ? - Many tables describe the same real 10/01/2010 ? ? world entities The King's Speech 1/07/2010 ? ? 127 Hours 09/04/2010 ? ? - Most tables contain only partial The Fighter 12/10/2010 ? ? ? ? ? ? information - Tables are scattered across different websites WDC Schema.org Table Corpus

Table Augmentation Actor Movie - Augment an input table with Social Network, The Andrew GarfieldMovieSocial Network, TheDirector information from a table corpus King's Speech, The Biutiful - Task not trivial due to heterogeneity Helena Bonham Carter King's Speech, The Joel Coen and size of table corpus True GritKing's Speech, The Ethan Coen The Social Network The King's Speech Tom Hooper

How can we use transformer-based models for Table Augmentation?

Universität Mannheim – Bizer/Brinkmann: Team Project FSS2021 – Slide 1 Data Search using Deep Learning

Project Goal Experiment with State-of-the-Art NLP Transformer Models and use them to search for Tabular Data Involves − Data Profiling, Data Preprocessing − Model Training, Evaluation, Selection Learning Targets − Gain technical experience with State-of-the-Art Data Search Technologies − Gain work experience as Data Scientist Requirements − Data Science & Engineering Skills, Programming Experience (Python) − Relevant Courses: Web Data Integration, Data Mining I & II, Information Retrieval & Web Search Organization: 4-6 people, 6 months, work as a complete team and in subgroups Instructors: Alexander Brinkmann, Christian Bizer

Universität Mannheim – Bizer/Brinkmann: Team Project FSS2021 – Slide 2