The Relationship Between Student Interests and Student Retention

Retaining interests: the relationship between student interests and student retention Melody Sabo Submitted in partial fulfillment of the requirements for graduation from the Malone University Honors Program Adviser: Kyle Calderhead, Ph. D. 26th April 2016 1 Acknowledgements: My family: For their ever-present help and support, for cheesy mail and mathematical socks, and their great love for me. My friends: For allowing me various meltdowns, making me hot chocolate, and scolding me when I needed it. My committee—Drs. Calderhead, Hahn, and Waalkes: For their patience and support My adviser—Dr. Calderhead: For his support and pep talks throughout this project and my mathematical career, and for convincing me to stay a math major Most especially: My roommate—Miss Katherine Karkoska: For her selfless nature, love, and support throughout all kinds of moments in my life this past year. 2 Table of Contents Abstract: ...................................................................................................................................................................... 4 I. Background ............................................................................................................................................................. 5 A. Social Network Analysis .............................................................................................................................. 5 B. Homophily ........................................................................................................................................................ 6 II. Model Development........................................................................................................................................... 8 A. Definitions ......................................................................................................................................................... 9 B. Model ................................................................................................................................................................ 10 C. Analysis ............................................................................................................................................................ 14 III. Conclusion .......................................................................................................................................................... 17 A. Future Research ........................................................................................................................................... 18 Appendix 1- Interests and their weights* ................................................................................................... 21 Appendix 2- Clustering Layout comparison ............................................................................................... 22 Appendix 3: Fruchterman Reingold graphs side-by-side ..................................................................... 23 Bibliography............................................................................................................................................................. 24 3 Abstract: Social network analysis is becoming increasingly popular in our world of ever- expanding user data sets. Surprising relationships have been found between seemingly unrelated attributes of human behavior. At Malone University all prospective students fill out admissions applications, including a section identifying future activities in which they are interested. Potential university students can be defined as belonging to network communities with other trait-sharing students. Perhaps these students do not actually become involved in the activities, but do their choices reveal something about their future retention status? This study looks at student interests data from admissions applications of the 477 incoming freshman at Malone University in 2009. The involvement interests marked are studied along with the retention rates of those students. The goal of this study is to use the analysis of social networks to determine if there is a relationship between retention rates and the interests chosen by incoming freshman. 4 I. Background The method we will use to evaluate our data comes from the field of social network analysis. We specifically take the concept of homophily to justify some of our decisions regarding the model. In the following section, we will provide a brief overview of social network analysis, with specific attention given to the field of homophily. A. Social Network Analysis Social networks are a studied network coming out of the field of graph theory, which is the study of discrete graphs. Discrete graphs are a collection of vertices, also called nodes in graph theory, and a collection of edges that join vertices based on some defined relationship. More complex graphs add weight to the edges, which can be seen as representing either a stronger edge or perhaps a larger edge (as in a graph of a sewer system with larger and smaller pipes). Some graphs may also add direction to the edges, with the edges going specifically from one node to another. Social networks are graphs where nodes represent people or groups of people and edges represent some kind of relationship that exists between those people. For instance, a popular network has been studied where the nodes are actors/actresses and the edges represent the relation: “has been in a movie with”. This became a popular network when it began to be studied by three Kevin Bacon fans [4]. The object of their study was to find the “Bacon number” for any actor/actress: the Bacon number is the least number of edges it takes for an actor/actress to get from him/her to Kevin Bacon in the graph. A website was then 5 developed by Patrick Reynolds to find out the Bacon number of any actor. For instance, the Bacon number for Johnny Depp is one as he was in the movie “Black Mass” with Kevin Bacon. The Bacon number for Basil Rathbone is two: Rathbone was in “Sherlock Holmes in Washington” with John Archer who was then in “The Little Sister” with Kevin Bacon [8]. This study helps to demonstrate that we can talk about the centrality of nodes in a graph. It turns out that Kevin Bacon can be used to study a subgraph or sub-community of our larger actor/actress graph because of the many movies in which he has been. As a very connected actor, Bacon has a significant community (or cluster) around him as the community “those who have acted with Bacon” or “those who acted with those who acted with Bacon” and so on. B. Homophily Homophily is a helpful and central theme in the analysis of social networks. The first use of the term comes out of a sociological work Freedom and Control in Modern Society [2]. The chapter “Friendship as a Social Process: a substantive and methodological analysis” written by Paul Lazarsfeld and Robert Merton introduces both the terms homophily and heterophily [2]. In their studies on friendship, Lazarsfeld and Merton hypothesized we are more likely to be friends with those who are similar to us, a tendency they termed homophily; it’s opposite, heterophily, is the tendency to be friends with those who are not like us. Homophily has gone on from this work to become an important concept both in sociology and in the modeling of social networks. Homophily has been termed, “one of the basic notions governing the structure of social networks” [3]. This can be demonstrated by thinking of the traditional social network model: 6 two people are alike in some way, thus we think of them as sharing some kind of connection and, if we are studying that kind of connection, we form an edge between the two people as a result. As we study homophily in a graph, we want to find a good way of quantifying this property- to what extent are the nodes of a particular network “alike”? If we can devise several ways of trying to answer this question, then we want to determine what the most effective and most representative ways are. For a simple model of homophily, we will look at an example of grade school friendships from Easley and Kleinberg’s book Networks, Crowds, and Markets [3]. This network models the friendships between students in a grade school classroom. In this example, elementary students are asked to identify who their friends are. From this identification, a network is made of the children where the edges formed between students signify the friendships that there are (See Figure 1.) Figure 1: Grade school friendships [3] 7 The grade school network is organized in a way where the nodes representing girls are circles and those representing boys are squares. We are interested in whether or not friendships are formed more frequently between children of the same sex. Thus we compare the probability of an edge existing between a boy and a girl in our actual graph with the probability of an edge existing between a boy and a girl if the same number of edges are assigned randomly [3]. The probability of an edge being randomly assigned between a boy and a girl is: 6 3 3 6 4 ( ) ( ) + ( ) ( ) = ≈ 0.44. 9 9 9 9 9 This would mean that about 7 out of the 17 edges in our graph would be between a boy and a girl. In reality we see 5 out of the 17 edges, or 29% of edges, in the graph are between a boy and a girl. While what is observed is less than what is expected, deciding what amount of difference makes this significant is beyond the scope of this paper. If we decide that this is significantly less, then we would conclude there may be a presence

The Relationship Between Student Interests and Student Retention

Networkx Tutorial

Networkx: Network Analysis with Python

Graphprism: Compact Visualization of Network Structure

Graph Simplification and Interaction

Constraint Graph Drawing

Chapter 18 Spectral Graph Drawing

Graph Drawing by Stochastic Gradient Descent

Networkx: Network Analysis with Python

Homophily at a Glance: Visual Homophily Estimation in Network Graphs Is Robust Under Time Constraints

IAT 814 Knowledge Visualization Networks, Graphs and Trees

A Numerical Optimization Approach to General Graph Drawing

55 GRAPH DRAWING Emilio Di Giacomo, Giuseppe Liotta, Roberto Tamassia