Analysis of Kinship Relations with Pajek 3
Total Page:16
File Type:pdf, Size:1020Kb
Social Science Computer Review Volume XX Number X Month XXXX xx-xx © 2007 Sage Publications Analysis of Kinship Relations 10.1177/0894439307299587 http://ssc.sagepub.com hosted at With Pajek http://online.sagepub.com Vladimir Batagelj Andrej Mrvar University of Ljubljana, Slovenia In the article, two general approaches to analysis of large sparse networks are presented: frag- ment searching and matrix multiplication. These two approaches are applied to analysis of large genealogies. Genealogies can be represented as graphs in different ways: as Ore graphs, p-graphs, or bipartite p-graphs. We show that p-graphs are more suitable for searching for relinking patterns, whereas Ore graphs are better for computing kinship relations using matrix multiplication. Algorithms described in this article are implemented in the program Pajek. Keywords: genealogy; Ore graph; p-graph; bipartite p-graph; calculating kinship relations; relinking marriages; relinking index; large networks; social network analysis; Pajek Introduction People collect genealogical data for several different reasons or purposes: • Researchers in history, sociology, and anthropology (Hamberger, Houseman, Daillant, White, & Barry, 2005; White, Batagelj, & Mrvar, 1999) use genealogies to compare dif- ferent cultures. In these researches, they consider kinship as a fundamental social relation. • Individuals collect records about their families or about people living in a longer period on a selected territory, for example: • Mormon genealogy (MyFamily.com, 2004) • Genealogy of the S¸kofja Loka District (Hawlina, 2004) • Genealogy of American presidents (Tompsett, 1993) • There exist special genealogies where the relation is “nonbiological”: • Students and their Ph.D. thesis advisors (Theoretical Computer Science Genealogy; Johnson & Parberry, 1993) • Genealogies of gods of antiquity (Hawlina, 2004). Many programs for data entry and maintenance of genealogical records can be found on the market (GIM, Brother’s Keeper, Family Tree Maker, etc.), but only few analyses can be done using Authors’ Note: This work was partially supported by the Slovenian Research Agency, Project J1-6062-0101. It is a detailed version of a part of the talks presented at “Dagstuhl Seminar 05361: Algorithmic Aspects of Large and Complex Networks,” September 4–9, 2005, Dagstuhl, Germany; and at the meeting “Algebraic Combinatories and Theoretical Computer Science,” February 12–15, 2006, Bled, Slovenia. 1 2 Social Science Computer Review these programs. This was the reason to expand the program Pajek (Batagelj & Mrvar, 2006; White et al., 1999) with some procedures for the analysis and visualization of large genealogies. Pajek is a general program for the analysis and visualization of large networks. It is free for noncommercial use. GEDCOM Standard GEDCOM (Family History Department, 1996) is a standard for storing and exchanging genealogical data which is used to interchange and combine data from different programs which were used for entering the data. The lines in Table 1 are extracted from the GEDCOM file of European royal families (Royal Genealogies, 1992). From data represented in the described way, we can generate several networks, as explained in the following section. Table 1 Part of the GEDCOM File of European Royal Families 0 HEAD 0@I115@INDI 1 FILE ROYALS.GED 1 NAME William Arthur Philip/Windsor/ . 1 TITL Prince 0 @I58@ INDI 1 SEX M 1 NAME Charles Philip Arthur/Windsor/ 1 BIRT 1 TITL Prince 2 DATE 21 JUN 1982 1 SEX M 2 PLAC St. Mary’s Hospital, Paddington 1 BIRT 1 CHR 2 DATE 14 NOV 1948 2 DATE 4 AUG 1982 2 PLAC Buckingham Palace, London 2 PLAC Music Room, Buckingham Palace 1 CHR 1 FAMC @F16@ 2 DATE 15 DEC 1948 . 2 PLAC Buckingham Palace Music Room 0 @I116@ INDI 1 FAMS @F16@ 1 NAME Henry Charles Albert/Windsor/ 1 FAMC @F14@ 1 TITL Prince . 1 SEX M . 1 BIRT 0 @I65@ INDI 2 DATE 15 SEP 1984 1 NAME Diana Frances /Spencer/ 2 PLAC St. Mary’s Hospital, Paddington 1 TITL Lady 1 FAMC @F16@ 1 SEX F . 1 BIRT 0 @F16@ FAM 2 DATE 1 JUL 1961 1 HUSB @I58@ 2 PLAC Park House, Sandringham 1 WIFE @I65@ 1 CHR 1 CHIL @I115@ 2 PLAC Sandringham, Church 1 CHIL @I116@ 1 FAMS @F16@ 1 DIV N 1 FAMC @F78@ 1 MARR . 2 DATE 29 JUL 1981 . 2 PLAC St. Paul’s Cathedral, London Batagelj, Mrvar / Analysis of Kinship Relations With Pajek 3 Representation of Genealogies Using Networks Genealogies can be represented using networks in different ways: as an Ore-graph,a p-graph, and a bipartite p-graph. Ore-graph In an Ore graph of genealogy, every person (INDI tag in the GEDCOM file) is represented by a vertex, and they are linked with relations: Marriage relation is a spouse of (FAMS or FAM ϩ HUSB ϩ WIFE) is represented with edges; and relation is a parent of (FAMC or FAM ϩ CHIL) is represented by arcs pointing from each of the parents to their children—partitioned into the relations is a mother of (red dotted) and is a father of (blue solid) (see Figure 1). p-graph In a p-graph, vertices represent individuals (INDI not in FAM ϩ HUSB ϩ WIFE) or couples (FAM ϩ HUSB ϩ WIFE). In case that person is not married, she or he is represented by a Figure 1 Ore Graph m-grandmotherm-grandmother f-grandfather f-grandmother mother stepmother sister-in-law brother wife sister daughter-in-law son son-in-law daughter grandson 4 Social Science Computer Review Figure 2 p-Graph grandson son & daughter-in-law son-in-law & daughter sisterI & wife brother & sister-in-law father & stepmother father & mother f-grandfather & f-grandmother m-grandfather & m-grandmother vertex; otherwise, the person is represented with the partner in a common vertex. There are only arcs in p-graphs—they point from children (CHIL) to their parents (FAM). (See Figure 2.) The solid arcs represent the relation is a son of, and the dotted arcs represent the relation is a daughter of. p-graphs are usually also used for visual representation of genealogies. Because they are acyclic graphs, the vertices can be assigned to levels. Special algorithms for drawing genealogies are included in Pajek. As an example, part of the Bouchard genealogy (Beauregard, 1995) and its most relinked part are presented in Figures 4 and 5. Bipartite p-graph A bipartite p-graph has two kinds of vertices—vertices representing couples (rectangles) and ver- tices representing individuals (circles for women and triangles for men)—therefore, each married Batagelj, Mrvar / Analysis of Kinship Relations With Pajek 5 Figure 3 Bipartite p-graph grandson son & daughter-in-law son-in-law & daughter daughter-in-law Son daughter son-in-law I & wife brother & sister-in-law sister wife brother sister-in-law father & stepmother father & mother stepmother father mother f-grandfather & f-grandmother m-grandfather & m-grandmother f-grandfather f-grandmother m-grandfather m-grandmother person is involved in two kinds of vertices (or even more if he or she is involved in multiple mar- riages). Arcs, again, point from children to their parents (see Figure 3). Genealogies Are Sparse Networks We shall call a genealogy regular if every person in it has at most two parents. Genealogies are sparse networks—the number of lines is of the same order as the number of vertices. In this section, some bounds on the number of lines in different kinds of regular genealogies are given (Mrvar & Batagelj, 2004). In a regular Ore genealogy (V, E, A), the set of vertices V is partitioned into two subsets: Vi – set of individuals – single persons, and Vm – set of married persons. Therefore, V = Vi ∪ Vm and Vi ∩ Vm = 1 Figure 4 6 SocialScienceComputerReview The Bouchard Genealogy Joseph/MARTEL/ Genevi‘eve/GAGNE/ Isaac/MARTEL/ Jean-Louis/BOUCHARD/ Christine/BOUCHARD/ Madeleine/TREMBLAY/ Basile/MARTEL/ Joseph/BOUCHARD/ He’l‘ene/SAVARD/ Marie-Anne/TREMBLAY/ Joseph/SAVARD/ Louis-Rock-Augustin/TREMBLAY/ Franc,oise/BERGERON/ Ursule-Constance/SIMARD/ Louis/HUDON=BEAULIEU/ Marie-Anne/LEBEL/ The’ophile/MARTEL/ Jean-Baptiste/HUDON=BEAULIEU/ Claudia/HUDON/ Th‘ecle/MIVILLE=DESCHENES/ Benoi^t/MIVILLE=DESCHENES/ Josephte/PELLETIER/ Octave/HUDON/ Arthe’mise/LABRI/ Charles/LABRI=MIGNAULT/ Dorothe’e/CORDEAU=DESLAURIERS/ Amable/LABRI=MIGNAULT/ Marguerite/COLLIN/ Joseph/COLLIN/ Marie-Rose/ROY/ Joseph/BOUCHARD/ Louis/LAROUCHE=GAUTHIER/ Lydia/MARTEL/ Dorothe’e/BOUCHARD/ Michel/LAROUCHE=GAUTHIER/ Marie/TREMBLAY/ Vincent/TREMBLAY/ Catherine/LABRANCHE/ Vildebond/LAROUCHE=GAUTHIER/ Mon./LAFORGE=PRADET/ J.Thierry/FORTIN/ Sylvie/TREMBLAY/ Jean-Baptiste/LAFORGE=PRADET/ Constance/FORTIN/ Joseph-Marie/PRADET/ Genevi‘eve/BELLEY/ Sixte/BOUCHARD/ Marie-Anne/LAROUCHE/ Jean-Baptiste/CHEVALIER/ Marie-The’r‘ese/POULIN/ Epiphane/TREMBLAY/ Marie/CHEVALIER/ J.-M.-Moi’’se/TREMBLAY/ Philibert/TREMBLAY/ Elisabeth/POTVIN/ Jehanne/CONGNET/ Omer/BOUCHARD/ Charles/TREMBLAY/ Nicolas/TREMBLAY/ Pierre/TREMBLAY/ Pierre/TREMBLAY/ Mathilde/TREMBLAY/ Fe’licite’/DUCHESNE/ Louise/SIMARD/ Madeleine/ROUSSIN/ Ozanne/ACHON/ Lucien/BOUCHARD/ Philippe/BOUCHARD/ Hon.Sauveur/TREMBLAY/ Jean/ACHON/ Audrey/BEST/ Alice/SIMARD/ Ange’lique/LAVOIE/ He’l‘ene/REGOURDE/ Jean-Baptiste/BOUCHARD/ Rosalie/TREMBLAY/ Jean-Baptiste/BOUCHARD/ Ursule/TREMBLAY/ Louis/BOUCHARD/ Franc,oise/DUFOUR/ Roch/BOIVIN/ Pe’lagie/FORTIN/ Colomban/TREMBLAY/ Ros./BOIVIN/ Ignace/TREMBLAY/ Appoline/SIMARD/ Alexis/TREMBLAY/ Olympe/TREMBLAY/ Antoine/TREMBLAY/ Marguerite/LAVOIE/ Andre’/TREMBLAY/ Marie-Reine/TREMBLAY/ The’odule/BOUCHARD/ Andre’/TREMBLAY/ Louise/TREMBLAY/ Eme’rentienne/LAVOIE/ Basile/SIMARD/ Monique/GRAVEL/ Etienne/MERCIER/ Antoine/BOUCHARD/ Claude/BOUCHARD/ Jacques/BOUCHARD/ Monique/SIMARD/ Madeleine/SIMARD/ Louise/GAGNE/ Noe’’lle/TOUSCHARD/