Web Application for Data Import from XLSX Into a Relational Database

Web Application for Data Import from XLSX Into a Relational Database

Masaryk University Faculty of Informatics Web application for Data Import from XLSX into a Relational Database Bachelor’s Thesis Samuel Toman Brno, Spring 2021 Masaryk University Faculty of Informatics Web application for Data Import from XLSX into a Relational Database Bachelor’s Thesis Samuel Toman Brno, Spring 2021 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Samuel Toman Advisor: Mgr. Luděk Bártek Ph.D. i Acknowledgements I would like to express gratitude towards my advisor Mgr. Luděk Bártek, Ph.D., for always being available to patiently answer all my questions. Likewise, I would like to thank my consultants JUDr. Ing. František Kasl, Ph.D. and JUDr. Pavel Loutocký, Ph.D., BA. iii Abstract Spreadsheets are often used in office environments due to their user- friendliness coupled with their practicality. The majority of spread- sheet users are non-professional programmers, and as such, keeping them user-friendly remains a high priority. Their intuitiveness comes at a price, however. Due to their design, they are not well suited for stor- ing and querying large, structured data. They are nevertheless often relegated to precisely that role. The conversion process from a spread- sheet to a relational database can often be problematic and requires some level of technical knowledge. The main objective of this thesis is to provide a semi-automatic means of importing spreadsheets into a relational database, easing the process of conversion while still pro- viding enough modularity to design a suitable database schema. The thesis examines existing solutions and addresses their shortcomings in a resulting web application. As part of the thesis, the application was incorporated into an existing system called “CyQualf.” iv Keywords database systems, MySQL, PHP, web application, Docker v Contents Introduction 1 1 Data representation in XLSX documents and SQL databases 3 1.1 Office Open XML Workbook (XLSX) . .3 1.1.1 Data representation . .4 1.2 Relational database . .4 1.2.1 Relational model . .5 2 Project requirements 7 2.1 Functional requirements . .7 2.1.1 Mapping schema . .8 2.1.2 HTTP API . .9 2.2 Non-functional requirements . .9 3 Exploration of existing XLSX to SQL conversion tools 11 3.1 Web converters . 11 3.1.1 SQLizer . 11 3.1.2 Other web converters . 12 3.2 Desktop application converters . 13 3.3 Conclusion . 14 3.3.1 Missing functionality . 14 4 Technology stack and frameworks 17 4.1 PHP language . 17 4.2 PHP spreadsheet parser . 17 4.2.1 PhpSpreadsheet . 18 4.3 JavaScript . 19 4.3.1 React . 19 4.4 Docker . 20 5 Implementation and project structure 21 5.1 Project structure . 21 5.2 Server-side . 22 5.2.1 Server-side file structure . 22 5.2.2 Parsing the mapping schema . 23 5.2.3 Mapping relationships . 24 vii 5.3 Client-side . 26 5.3.1 Client-side file structure . 26 5.3.2 Front-end design components . 27 6 Deployment 29 6.1 Docker project structure . 29 6.1.1 Mariadb service . 29 6.1.2 Adminer service . 30 6.1.3 Php service . 30 6.1.4 Server service . 31 6.1.5 React-frontend service . 31 6.2 Summary of configuration files . 32 7 Conclusion 33 Bibliography 35 A Usage example 37 A.1 Running the application . 37 A.1.1 Service configurations . 37 A.2 Using the GUI . 38 A.3 Using the API . 39 B Graphical user interface design 43 viii List of Tables 1.1 One-to-many relationship represented in XLSX. 4 5.1 An example worksheet Employee 25 ix List of Figures 4.1 A comparison of download counts (from NPM package manager) of the three most popular JavaScript front-end frameworks/libraries. Downloads measured from April 2019 to April 2021. 20 5.1 A class diagram of the mapping schema data structure. 24 5.2 Component decomposition of the webpage GUI. 28 A.1 A single table of the mapping schema. 38 A.2 A mapping schema containing two tables. 39 B.1 The webpage GUI on a desktop-sized screen width. 43 B.2 The webpage GUI on a smartphone-sized screen width. 44 xi Introduction Spreadsheet programs are often considered to be a significant factor in the introduction and establishment of personal computers (PCs), due to the spreadsheets being one of the main use-cases for the early PCs [1, p. C-177]. The first spreadsheet application for PCs called VisiCalc, originally released for Apple II in 1979, was considered a huge commercial success. It was often referred to as Apple II’s first “killer app” [2], meaning a program so essential, one would buy a computer just to be able to use it. As seen from their continued success, it is clear that spreadsheets provide essential services, often considered irreplaceable by their users. However, as is the case with any software, they are not the tool for everything. Spreadsheets work well enough when manipulating or analyzing manageably small data; they begin to struggle once the data gets sufficiently big, however. Among the many problems exacerbated by a growing dataset are poor performance, data redundancy, error proliferation, and many more. Their structure does not allow them to link and cross-reference data between tables easily, enforce data integrity rules, or retrieve data using complex querying functions. All of the above-mentioned are desirable traits for a system maintaining a sizeable or a critical dataset. In conclusion, a spreadsheet is not a database; it is not designed for the purpose of long-term storage of large or essential data. Thus, a problem of conversion to a proper database emerges. The existing web applications for importing spreadsheets into relational databases do not offer a solution functionally sufficient enough, to design a relational schema and subsequently map the data into it. The only available option is a simple import of the entire worksheet into the database as-is, without the option of establishing relations. To fully leverage the advantages of the relational model, the imported tables would have to be further processed into a new schema, which might be an uneasy procedure. This thesis aims to develop a web application capable of importing spreadsheet data into a database according to a user-defined schema, using a pleasant graphical interface. 1 The first chapter of the thesis contains a quick overview andcom- parison of data representation in spreadsheets and relational databases. The second chapter describes the project requirements, detailing what the application should be capable of and how it should behave. The third chapter explores existing solutions, compares both web and desktop variants, and draws a conclusion based on this analysis. The selected technologies and the reasoning for their selection are outlined in the following, fourth chapter. The fifth chapter details the project structure and selected parts of the implementation. The following sixth chapter explains the deployment of the application using Docker, de- tailing individual Docker services comprising the project. The seventh chapter contains a conclusion while summarizing the thesis. Addi- tionally, two appendices concerning the usage of the application and its graphical design are appended at the end of the thesis. 2 1 Data representation in XLSX documents and SQL databases This chapter offers an overview of the XLSX format in comparison to relational databases. It describes the differences in data represen- tation, structure, and functionality between the two paradigms. A short insight into the file structure of XLSX is also given to deepen the understanding of the format. 1.1 Office Open XML Workbook (XLSX) XLSX is a spreadsheet format designed by Microsoft, introduced to- gether with Microsoft Excel 2007 and standardized by Ecma Interna- tional, ISO1 and IEC2. The format was designed to comply with the Office Open XML specification [3] and served as a successor tothe previous proprietary Excel Binary File Format (XLS) used by earlier versions of Microsoft Excel. Since its inception in December 2006, XLSX has become widely supported by most modern spreadsheet programs due to it being a standardized format. In contrast to the previous XLS, which is a binary format, XLSX is a ZIP-compressed3 archive containing several XML4 files. Compared to its predecessor, XLSX offers a significant file size reduction [4, p. 324]. As a ZIP archive, the file can be unpacked, revealing the underlying structure of the format: • [Content_Types].xml - Contains references to all XML files included in the package. • _rels/ - A folder consisting of a single XML file storing package- level relationships. 1. International Organization for Standardization 2. International Electrotechnical Commission 3. ZIP is an archive file format, supporting lossless compression 4. Extensible Markup Language format designed to be both human-readable and machine-readable 3 1. Data representation in XLSX documents and SQL databases Table 1.1: One-to-many relationship represented in XLSX. Department Department Employee Employee Tag Name Wage ENG Engineering Julian Johnson 35000 ENG Engineering Jane Jones 39000 ACC Accounting Martin Moore 28000 ACC Accounting Larry Lewis 46000 • docProps/ - A folder containing XML files with overall doc- ument properties, such as author, last modification date, and metadata about the file’s content. • xl/ - This is the main folder, branching into further subfolders and XML files. As a whole, it contains the details about the workbook contents and the data itself.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    62 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us