Finding Tailored Educational Paths Using a Graph Database
Total Page:16
File Type:pdf, Size:1020Kb
UPTEC IT 20032 Examensarbete 30 hp September 2020 Finding tailored educational paths using a graph database Emil Stolpe Institutionen för informationsteknologi Department of Information Technology Abstract Finding tailored educational paths using a graph database Emil Stolpe Teknisk- naturvetenskaplig fakultet UTH-enheten The Swedish educational system is full of possibilities but is also rather complicated because of that fact. There exist several Besöksadress: different paths to reach the same goal but how do you find them and Ångströmlaboratoriet Lägerhyddsvägen 1 which one is the quickest? Hus 4, Plan 0 This project has tried to make it easier for students to find the Postadress: right path from start to finish by presenting possible study paths. Box 536 751 21 Uppsala It has been done by collecting information about schools and programs and inserting it into a graph database which has then been traversed Telefon: to extract the fastest paths from the starting point (e.g. elementary 018 – 471 30 03 school) of a student to their goal (e.g. Doctor) based on a few Telefax: arguments. 018 – 471 30 00 Interviews with student counselors have been conducted in order to Hemsida: evaluate how practical the system is. A conclusion from these http://www.teknat.uu.se/student interviews is that the system is useful but halted by the fact the database contains too little information. The idea is good but the system would need to be scaled up to be more useful, which is expected when it is a prototype. To fill the database with all information necessary is left as a future work since it would be too time-consuming. Handledare: Andreas Samuelsson Ämnesgranskare: Anna Eckerdal Examinator: Lars-Åke Nordén UPTEC IT 20032 Tryckt av: Reprocentralen ITC Sammanfattning Svenska utbildningssystemet har manga˚ mojligheter¨ men ar¨ till foljd¨ aven¨ komplicerat. Man kan gam˚ anga˚ olika vagar¨ och na˚ samma mal˚ men hur hittar man dem och vilken ar¨ snabbast? Det har¨ projektet har fors¨ okt¨ gora¨ det lattare¨ for¨ studenter att hitta ratt¨ vag¨ fran˚ start till slut genom att presentera mojliga¨ studievagar.¨ Det har gjorts genom att samla ihop information om skolor och program i en grafdatabas som sedan har traverserats for¨ att ta ut de/den snabbaste vagarna¨ fran˚ studentens startpunk (exempelvis Grundskolan) till dess mal˚ (exempelvis lakare)¨ baserat pa˚ ett antal argument. Intervjuer med studievagledare¨ gjordes for¨ att undersoka¨ hur praktiskt systemet faktiskt ar.¨ En slutsats fran˚ dessa intervjuer ar¨ att systemet ar¨ anvandbart¨ men hammas¨ av det faktum att det har for¨ lite information for¨ anvandaren.¨ Det ar¨ en bra ide´ men skulle behova¨ skalas upp med mer data for¨ att bli mer anvandbart,¨ nagot˚ som ar¨ vantat¨ da˚ det ar¨ en prototyp. Att fylla databasen med all information ar¨ lamnat¨ till framtida arbete da˚ det ar¨ for¨ tidskravande.¨ ii Contents 1 Introduction 1 1.1 Delimitations . 2 2 Background 2 2.1 Graph databases and Neo4j . 2 2.1.1 Neo4j . 3 2.2 The data . 6 2.2.1 Skolverket and EMIL . 6 2.2.2 Jobtech . 7 2.3 Preconditions and motivations . 8 2.4 Finding a path today . 9 2.5 Gestalt principles . 9 2.6 Shortest path algorithms . 11 3 Method 14 3.1 Example case . 14 3.2 Design . 15 3.3 Population of database . 21 3.4 Forming the path . 21 3.4.1 Collecting nodes . 22 3.4.2 Finding shortest path . 23 3.4.3 Creating the path . 25 3.5 Prototype . 25 3.6 Evaluation method . 28 iii 4 Results and Discussion 30 4.1 Answers from evaluation . 30 5 Conclusion 34 6 Future work 35 6.1 Improvements on the database . 35 6.2 Improvements on the path finding . 35 6.3 Improving collection of data . 36 6.4 Further evaluation . 37 iv 1 Introduction 1 Introduction Education opens up endless opportunities for the individual and an educated population is the premise for a range of benefits for a country [8]. A high percentage (43.3%) of Sweden’s population possess a tertiary education, ranking it as the 13:th best country in this category according to OECD [1]. However the many educational paths available to students can be a bit overwhelming and there often exists several ways to reach the same goal. According to Swedish and international research the student’s choice in study is highly affected by the lack of information about what education and schools are available [23]. In Sweden students have to take their first step towards higher education at the early age of 15 - when they choose a program at an upper secondary school, which will here on be referred to as gymnasium. This is the time where they stake out a path that will eventually lead to their working career. The student might know what job they want to have but does not know the different choices they can make to reach it. The student may not know where the program they chose will lead and might regret their choice later and have to make compensations for that in the future. A change in programs at gymnasium level can lead to extra years to get a degree which can in turn also be costly. This thesis will be conducted at a company based in Uppsala, called Ava. They work with digitalized solutions for automation of education- and work coaching [3]. It is important for them to guide students through inventive solutions which is why the goal of this project is to come up with a system that will help students find their way by being a guide in the educational jungle. It will be done by building a path finder for students which finds the optimal paths for them. For example student A has just finished compulsory school and is going to choose a program at a gymnasium. The student provides the path finder with their goal, which is to become an Electrical engineer. They receives a path that includes what different programs on different gymnasiums the student can pick that will lead directly to a program on Uppsala University where the student receives an exam in electrical engineering. Student B has the same end goal but lacks the grades to enter a program that leads directly there. This student is provided with the possible programs and additionally what courses they have to complement with that will lead to the shortest path (in terms of years) to become an electric engineer. With the help of this system the student should be able to get a clear view of his/her options and the optimal path/paths to reach the goal. Hopefully, by providing students with a clear end goal and part goals on the way, this system can help to increase motivation for education, enlighten possibilities and help in choosing an effective path. 1 2 Background 1.1 Delimitations To keep the project within the time frame of a master thesis the following limitations has been set. • This system will only be large enough so the path finding can yield interesting results. This includes a geographical limitation where the data will only be from within Uppsala and only include a handful of schools. • Only higher educations that can be achieved at Uppsala University are included. Higher educations from other schools will not be in the solution. • The Swedish educations system is designed with no dead ends, a student can become anything through many means. To keep the project within limits, some opportunities has been omitted while trying to keep the “no dead ends” principle. • Only two starting points for a student are supported: From elementary school and from Gymnasium. 2 Background This project is covering several areas, from databases and algorithms to UI design and the data being used. 2.1 Graph databases and Neo4j Databases are used to hold and store data used by the application. They can be of var- ious scale and types such as relational-, NoSQL- or graph databases. This project will be using a graph database and an explanation of the more traditional relational database will be provided to highlight the differences between the two types of database. In a typ- ical relational databases, RDB, data is stored as rows in tables. Every row in one table have to be of the same data type and therefore in a table holding data about employees, all rows in that table needs to correspond to an employee, same for other entities such as movie reviews, reviewers and employers. These databases work well when concerned with storing large amounts of data, espe- cially homogeneous data and when the relation between entities in tables are not of high importance. Having a database with a table containing employees and a table containing movies would be good since they do not have much to do with each other. However, if 2 2 Background you would add another table that contains employers you would suddenly need a relation between employers and employees. This link can be done by just adding a key-attribute to the tables but if information of the relation is of interest, for example you might be interested in why the employee was hired or when, then you will need a table for the relation as well. Continuing on you might want to add friends of each employee and employer and movie reviews. For all of these new tables you have relations and subse- quently you may need tables for those as well. You can see that it becomes complicated quickly and will result in complex and inefficient retrievals of data when relations are concerned. Say that a developer wants to find a movie, its review, who reviewed it and an employee that is a friend of the reviewer, then there will be a large JOIN operation to put together all data from the different tables which is not good for performance [7].