E-mail: [email protected] Tasos Arvanitis LinkedIn: http://gr.linkedin.com/in/tasosarvanitis

Summary of Qualifications

✓ Extensive experience in engineering and performance tuning of database systems and big data platforms. ✓ 13+ years industry and research track record of designing and implementing scalable software products with a diligent approach and a focus on quality. ✓ Passionate about challenging data management problems and related technologies including large scale data processing, data pipelines and distributed databases.

Technical Skills

✓ Programming languages: Java, Python, C, PL/SQL ✓ Databases: Oracle RDBMS, SQL Server, IBM Netezza, PostgreSQL, MySQL ✓ Big Data: Impala, Hive, Tez, Hadoop (Cloudera, Hortonworks, MapR), AWS EMR, Azure HDInsight ✓ Linux command line utilities, shell scripting, development tools (git, maven, gdb)

Education

PhD in Computer Science National Technical University of Athens (NTUA), Greece Jan 2013 Dissertation Title: “Personalized Data Management Systems”,

Masters in Electrical and Computer Engineering – Major: Computer Science NTUA, Greece Jul 2005 Diploma Thesis: “System and Application Integration using Web Services”,

Work Experience

Unravel Data, Palo Alto, CA, USA (Java, Python, Hadoop, Impala, Hive, Tez) Principal Software Engineer Aug 2019 - present Senior Software Engineer Mar 2018 – July 2019 ✓ Tech lead for Unravel’s application monitoring and performance tuning engine for Impala and Hive/Tez. ✓ Designed and developed a unified framework that powers data insights and tuning recommendations for various big data and cloud-native SQL engines (e.g. Hive, Impala, SparkSQL, Redshift). ✓ Developed several SQL tuning and data layout recommendations for Impala, Hive and Tez. ✓ Improved the stability and scalability of message-driven ingestion and processing pipelines that extract job execution details and metrics from data sources such as Cloudera Manager, YARN Resource Manager, ATS and Hive Metastore. These improvements allow processing hundreds of thousands of jobs per day with single digit message lagging on production deployments monitored an Unravel instance. ✓ Assumed operational responsibility for the above areas (support for customer escalations and CRs). ✓ Responsible for task prioritization, code reviewing and mentoring of junior engineers in the above areas. Oracle Corporation, Database Manageability Group, Redwood Shores, CA, USA Oct 2014 – Mar 2018 Senior Member of Technical Staff - C, PL/SQL, Oracle RDBMS ✓ Development point of contact (feature owner) for SQL Monitor, SQL Tuning and Advisor Framework ✓ Security lead and code reviewer for the SQL Manageability area ✓ Implemented new features and provided bug fixes for Oracle RDBMS 12.2 and 18c releases including: - Tuning of SQL workloads from remote and pluggable databases in multi-tenant (CDB) architectures - Integration of SQL Tuning and Statistics Advisor to expose system and SQL-level recommendations - Support for SQL Tuning and SQL Performance Analyzer features in Database Vault ✓ Discovered privilege escalation vulnerabilities in SQL Analysis infrastructure service and managed the delivery of security fixes across various client database components. ✓ Developed various security patches for privilege escalation, resource and SQL injection vulnerabilities. Vanderbilt Institute for Clinical and Translational Research, Nashville, TN, USA Sep 2013 – Sep 2014 Health Systems Software Engineer II - Java, Perl, Python, IBM Netezza ✓ Responsible for scaling, migrating and maintaining the unstructured ETL pipeline for the Research Derivative (RD) data warehouse of the Vanderbilt University and Medical Center. ✓ Implemented code modifications in NLP tools so that they can run in a distributed fashion, using IBM Netezza MPP Data Warehouse. The changes allowed the NLP pipeline, which was previously run as a batch job once a month, to process more than 100K daily incoming medical records in a few hours. ✓ Designed and developed a data cleaning and normalization pipeline for medication annotations found in medical text. Implemented a suite of tools that expose relationships between medication-related terms (e.g. brand names, generics, ingredients) and retrieve associated drug names and indications. ✓ Built various tools for extracting medications, procedures and other medical terms from medical records. ✓ Collaborated on the implementation of a smoking status identification algorithm using machine learning (SVM) and NLP tools (Apache cTakes and UIMA). University of California, Riverside, Riverside, CA, USA Jun 2012 – Aug 2013 Research Scholar/Postdoctoral Researcher - UCR Database Lab ✓ Designed algorithms for similarity searching in large biomedical databases [EDBT’14], ranking methods for temporal web search queries [CIKM’13] and diversified searching of microblogging posts [EDBT’14] National Technical University of Athens (NTUA), Athens, Greece May 2007 – May 2012 Graduate Research Assistant/PhD Student - Knowledge & Database Systems Lab ✓ Designed and implemented the PrefDB personalization system. PrefDB is built on top of PostgreSQL and includes a profile manager, a query parser, optimizer and execution engine as well as a graphical monitoring tool. (implemented in Java, PL/pgSQL, PostgreSQL) [ICDE’12, SIGMOD’12, TKDE’13] ✓ Developed novel algorithms for contextual search and product positioning based on skyline queries. The proposed algorithms outperform the state-of-the art approaches in terms of efficiency, scalability and progressiveness of result calculation. (implemented in C++ STL, Boost) [ICDE’10, CIKM’12] Inst. for the Management of Information Systems, “Athena” RC, Athens, Greece Dec 2008 – Jun 2010 Graduate Research Assistant - TALOS research project ✓ Designed and developed a web application, API and a backend application server for task modeling, content annotation and task-based content retrieval. (JPA, JSP, jQuery, JSON, SQL Server, Tomcat) Satways LTD, Athens, Greece - subcontractor for Siemens SBT/SES/CCS Jun 2007 - Nov 2008 Software Engineer - Java (JPA/Hibernate, ), SQL Server, JBoss ✓ Responsible for the design, development and deployment of AIS (Automatic Identification System) as part of the Athens C4I project. AIS allows real-time monitoring of commercial and coast guard vessels and it is being used by the Greek Coast Guard. ✓ Contributed to the development of the AutoTrack AVL fleet management system. EXUS, Business Process Management Department, Athens, Greece Jun 2006 - May 2007 Software Engineer - Java EE (JSF, Struts, JSP), JavaScript, SQL Server, JBoss ✓ Contributed to the development of iPerform workflow engine v2.0 ✓ Implemented CRs for iPerform customers including Piraeus Bank and Wind Hellas ✓ Responsible for the migration, deployment and testing of iPerform v2.0 at Wind Hellas. Greek Army General Staff, Athens, Greece - Military service and as a contractor Oct 2005 – Jul 2006 Software Engineer - Java EE (EJB 2.1), Oracle RDBMS, BEA Weblogic ✓ Worked on the development and support for the middleware components of the Greek Army C4I. Selected Publications

✓ A. Arvanitis, S. Babu, E. Chu, A. Popescu, A. Simitsis, K. Wilkinson, Automated Performance Management for the Big Data Stack, CIDR’19 ✓ A. Arvanitis, M. Wiley, V. Hristidis, Efficient Concept-based Document Ranking, EDBT’14 (2 citations) ✓ S. Cheng, A. Arvanitis, M. Chrobak, V. Hristidis, Multi-query Diversity in Microblogging Posts, EDBT’14 (14 citations) ✓ S. Cheng, A. Arvanitis, V. Hristidis, How Fresh Do You Want Your Search Results?, CIKM’13 (17 citations) ✓ A. Arvanitis, G. Koutrika, PrefDB: Supporting Preferences as First-Class Citizens in Relational Databases, IEEE Transactions on Knowledge and Data Engineering (TKDE) (11 citations) ✓ A. Arvanitis, A. Deligiannakis, Y. Vassiliou, Efficient Influence-based Processing of Market Research Queries, CIKM’12 (25 citations) ✓ A. Arvanitis, G. Koutrika, PrefDB: Bringing Preferences closer to the DBMS, SIGMOD’12 (12 citations) ✓ A. Arvanitis, G. Koutrika, Towards Preference-aware Relational Databases, ICDE’12 (28 citations) ✓ D. Sacharidis, A. Arvanitis, T. Sellis, Probabilistic Contextual Skylines, ICDE’10 (23 citations)