Social Network Analysis of Affiliation Networks
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of Affiliation Networks to Promote Online Communities of Practice for Science Education Kathleen Perez-Lopez, Ph.D. Darren Cambridge, Ph.D. Al Byers, Ph.D. Sherry Booth, Ph.D. Sunbelt XXXII San Diego, CA 3/17/2012 Outline • Issue: organizing forums of topics in an online community • Current context • Our approach • Initial stages: Implementation and progress • Next Steps • NSTA LC supports science teachers increasing their knowledge of science and of pedagogy • Hosts an online community through its Community Forums • Members can initiate topics within a number of forums or post to existing topics http://learningcenter.nsta.org/ Year of NSTA LC Posts 9/24/2010 - 9/28/2011 6792 posts 20 forums 307 members 556 topics SNA using NodeXL http://nodexl.codeplex.com/ Year of NSTA LC Posts 9/24/2010 - 9/28/2011 6792 posts 20 forums 307 members 556 topics SNA using NodeXL http://nodexl.codeplex.com/ Current NSTA Approach to Decomposing Overly Large Forums Heuristics . 25 moderators make recommendations . Review topics for thematic coherence . Manually move topics and aggregate under forum header (e.g., STEM) http://learningcenter.nsta.org/ Approach to Repartitioning Topics . Goal: Repartition Topics w/o creating Member islands Alternative topic partitions: Fn, n = 0, 1, 2, … F0 = the original NSTA Forums . Bimodal networks TM : Topic-Member network of posts to topics by members MFn : Member-Forum networks for Fn . Consider derived unimodal networks T Topic-Topic T = TM * MT, where MT = (TM) T Member-Member, Mn = MFn * FnM, where FnM = (MFn) . Goal restated: Find a natural partition Fn of T such that Clustering(Mn) is LOW Methodology . Annual postings data • Ignored topics and members with < 2 posts o 6792 posts by 307 members to 556 topics in 20 forums • On 2nd pass, omitted 26 “online advisors” o 2815 posts by 281 members to 474 topics in 20 forums . Unimodal networks derived from bimodal affiliation networks are often very dense . Partition topic networks using alternative grouping approaches . Measure resulting clustering on member network Repartitioning Topics Find Fn , a partition of topics, that yields: 1. VERY segregated Topic network, Tn 474 x 281 474 x 474 281 x 474 X Member-Topic Topic-Member Tn 2. UN-segregated member network, Mn 281 x 20+ 281 x 281 20+ x 281 X Fn-Member Member-Fn Mn Related Work • Rodríguez, Sicilia, Sanchez-Alonso, Lezcano (2011), “Exploring affiliation network models as a collaborative filtering mechanism in e-learning” – Very similar concept to what we do here – Create a 1-mode topic network from a learner-topic affiliation network, using m-slices to cluster topics – Smaller data set – More pre-filtering of topics – Blockmodeling to describe learner clusters (our intent also) • Recommender systems – Conceptually similar; could provide some insights to the current task Clustering Algorithms • F0 Original NSTA LC forums • F1 Clauset-Newman-Moore groups (NodeXL) • F2 Wakita-Tsurumi groups (NodeXL) • F3 m-slices and k-cores (Pajek ) • F4 Wakita-Tsurumi on a reduced dataset • M4 Wakita-Tsurumi on member network from F4 Member Network from NSTA Forums • Derived from Members posting to NSTA forums • 20,576 edges • 20,573 in connected component • Density 0.44 • Ave deg 134 Member Network from NSTA Forums Wakita-Tsurumi Groups • 12 Groups Member Network from NSTA Forums Wakita-Tsurumi Groups • 12 Groups Member Network from NSTA Forums Wakita-Tsurumi Groups • Strong inter- group ties • Want the derived partition to also be very dense Topic Network • Derived from topic postings from members • 74,709 edges • 74707 in connected component • Density 0.49 • Ave deg 271 F1: Topic Network Clauset-Newman-Moore Groups • Only 6 groups F1: Topic CNM Groups w/o Lines • Only 6 groups • 2 dominant • Split the graph in half F1: Topic Network CNM Groups - Boxed • 6 groups • 2 dominant • Split the graph in half • Halves tightly connected • Very POOR partition F2: Topic Clustering Wakita-Tsurumi • Somewhat better G14 • 8 + 6 groups • 2-3 dominant • G14 has 42% of nodes, 38% of edges • Group tightly connected • POOR partition F3: Topic Clustering with m-Slices • Derive cohesive subgroups using line multiplicity m – Topics linked by ≥ m mutually posting members • M-slice – Largest subnet of lines with multiplicity ≥ m, and their incident vertices – Usually leaves disjoint, highly connected clusters – But not in this case! • Used Pajek, following de Nooy, et. al. – 1 to 20 slices output, created partition from the 8-slice – Considered m=4, tried to cluster with k-cores, still had one dominant cluster of 212 out of 556 nodes F3: Topic Clustering Using m-Slices 1-Slice, Colored with NSTA Forums Using Pajek 3-Slice, Colored with NSTA Forums Using Pajek 5-Slice, Colored with NSTA Forums Using Pajek 6-Slice Original Forum Defs 7-Slice, Colored with NSTA Forums Using Pajek 8-Slice, Colored with NSTA Forums Using Pajek 8-Slice Groups Using Pajek 8-Slice Groups Using Pajek 8-Slice, Colored with NSTA Forums Group Doesn’t Align w/ Forums Using Pajek F3-1: 4-slice Followed by k-cores 4-slice 4-Slice, Nodes Removed 4-Slice, Nodes Removed 4-Slice Components 4-Slice Largest Component 4-Slice Largest Component k-core 4-Slice Largest Component 20-59-core: 212 nodes 4-Slice Largest Component 20-59-core: Component (=1) 4-Slice Largest Component 20-59-core: Component, Expanded Repartitioning Topics Find Fn , a partition of topics, that yields: 1. VERY segregated Topic network, Tn 474 x 281 474 x 474 281 x 474 X Member-Topic Topic-Member Tn 2. UN-segregated member network, Mn 281 x 20+ 281 x 281 20+ x 281 X Fn-Member Member-Fn Mn Repartitioning Topics: Poor Results Find Fn , a partition of topics, that yields: 1. VERY UNsegregated Topic network, Tn 474 x 281 474 x 474 281 x 474 X Member-Topic Topic-Member Tn 2. UN-segregated member network, Mn 307 x 20+ 307 x 307 20+ x 307 X Fn-Member Member-Fn Mn F4: Wakita-Tsurumi on Refined Dataset Ignoring posts from 26 online advisors: 2815 posts 20 forums 281 members 474 topics SNA using NodeXL http://nodexl.codeplex.com/ F4: Wakita-Tsurumi on Refined Dataset Removing inter-group lines SNA using NodeXL http://nodexl.codeplex.com/ M4: Member Network from F4 • More clustered than desired • 7 groups – 1 dominates, but – 5 are significant Next Steps • Consider realistic restrictions on forum definitions • Find different data to represent the natural clustering of topics – Textual content analysis • Filtering out non-contextual content – Friendly banter – Might be useful for other purposes, but interference here • More iterative approach • Consider time – Not a static phenomena; analyze over time References • DeNooy, W., Mrvar,A., & Batagelj, V. (2005) Exploratory Social Network Analysis with Pajek, New York: Cambridge University Press. • Wasserman, S., & Faust, K. (1994) Social Network Analysis: Methods and Applications. New York and Cambridge, Eng: Cambridge University Press. • Borgatti, S., 2-Mode Concepts in Social Network Analysis, Forthcoming in Encyclopedia of Complexity and System Science. • Rodríguez, D., Sicilia, M., Sanchez-Alonso, S., Lezcano, L., & García- Barriocanal, E.. (2011, in press) Exploring affiliation network models as a collaborative filtering mechanism in e-learning. Interactive Learning Environments. • Hansen, D., Shneiderman, B., & Smith, M. (2011) Analyzing Social Media Networks with NodeXL, Insights from a Connected World. Burlington, MA: Morgan Kaufmann. • Su, X. & Khoshgoftaar, T. (2009). A Survey of Collaborative Filtering Techniques, Advances in Artificial IntelligenceVolume. .