Graph-Theoretic Properties of the Class of Phonological Neighbourhood Networks

Graph-theoretic Properties of the Class of Phonological Neighbourhood Networks Rory Turnbull Newcastle University [email protected] Abstract flan clan This paper concerns the structure of phono- plant logical neighbourhood networks, which are a graph-theoretic representation of the phono- plane logical lexicon. These networks represent each word as a node and links are placed be- plan planner tween words which are phonological neighbours, usually defined as a string edit distance plaque of one. Phonological neighbourhood networks have been used to study many aspects of the planned mental lexicon and psycholinguistic theories pan of speech production and perception. This pa- plans per offers preliminary graph-theoretic observa- tions about phonological neighbourhood net- Figure 1: Example phonological neighbourhood networks considered as a class. To aid this ex- work centred around the English word plan. Note that ploration, this paper introduces the concept of some neighbours of a word are neighbours of each the hyperlexicon, the network consisting of all other. Adapted from Turnbull and Peperkamp(2017). possible words for a given symbol set and their neighbourhood relations. The construction of the hyperlexicon is discussed, and basic prop- 0 0 erties are derived. This work is among the first of w), intransitive (if w is a neighbour of w , and w to directly address the nature of phonological is a neighbour of w00, it is not necessarily the case neighbourhood networks from an analytic per- that w is a neighbour of w00), and anti-reflexive (w spective. cannot be a neighbour of itself). Figure1 shows an abbreviated phonological 1 Motivation neighbourhood network for some words of English. Recent work in phonological psycholinguistics One advantage of this representation is that it per- has investigated the structure of the lexicon mits analysis with the methods of network science through the use of phonological neighbourhood and graph theory, and work so far has shown a good networks (Chan and Vitevitch, 2010; Turnbull and deal of promise in modeling psycholinguistic prop- Peperkamp, 2017; Siew, 2013; Siew and Vitevitch, erties of the lexicon with these methods (Chan and 2020; Shoemark et al., 2016). A phonological Vitevitch, 2010; Vitevitch, 2008). A common analy- neighbourhood network is a representation of the sis technique within network science is to compare lexicon where each word is treated as a node and a given network with a randomly generated one a link is placed between nodes if and only if those that has the same number of nodes and links. No- two nodes are phonological neighbours. Two words table features of the target network relative to the are neighbours if their string edit distance, in terms random network are likely due to intrinsic proper- of phonological representation, is one. In other ties of the target network, rather than chance. From words, the neighbours of a word w are all the this structure one can then infer details about the words that can be formed by the addition, dele- organising principles that generated the network tion, or substitution of a single phoneme from w. originally. The neighbourhood relation is symmetric (if w is a For phonological neighbourhood networks, how- neighbour of w0, then w0 is necessarily a neighbour ever, this method is often inappropriate, as many 233 Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 233–240 Online Event, June 10, 2021. ©2021 Association for Computational Linguistics logically possible network structures are not pos- if the the edges and vertices of H are subsets of sible phonological neighbourhood networks. This those of G. A subgraph H is an induced subgraph fact is because the links between nodes—the neigh- of G if every edge in E(G) whose endpoints are bourhood relations—are intrinsic to the definitions both in V (H) is present in E(H). In other words, of the nodes themselves. Changing a link between an induced subgraph can be obtained by the pro- nodes necessarily means changing the content of cess of removing vertices (and any incident edges) a node, which then could entail other changes to from a graph, but not removing edges on their own. other links. This problem was highlighted by Turn- Figure2 provides illustrative examples. 1 bull and Peperkamp(2017), who instead chose to The diamond is K4 with one edge removed. A randomly generate lexicons and derive networks circle Ck has the set of nodes f1; 2; :::; kg and edge from those lexicons. However, randomly gener- set ff1; 2g; f2; 3g; :::; f(k − 1); kg; fk; 1gg. (Cir- ated lexicons do not guarantee the same number cle graphs that are induced subgraphs of a larger of links will be present in the resulting network, graph are also known as k-holes.) Figure3 depicts making it difficult to compare like with like. For the diamond and C5. this reason, studying phonological neighbourhood A star Sk is a graph with one central vertex networks as a class, and discovering their defin- which is connected to k other unique vertices. No ing characteristics, is an important methodological other vertices or edges exist. Figure4 depicts the goal for psycholinguists. stars S3 (also known as a claw), S4, and S6. This research therefore seeks to answer the fol- The Cartesian product A × B of two sets A and lowing broad questions: What are the distinctive B is defined as characteristics of phonological neighbourhood networks, including their definitions in terms of edge A × B = f(a; b)ja 2 A; b 2 Bg; (1) sets and vertex sets, their extremal properties, and characterization of forbidden subgraphs? Is there that is, the Cartesian product of A and B is an effective and efficient method by which phono- the set of all ordered pairs where the first el- logical neighbourhood graphs can be distinguished ement is a member of A and the second el- from other graphs? The present paper lays the ement is a member of B. For example, the mathematical foundations for future investigations Cartesian product of fa; b; cg and fx; yg is of both of these questions. f(a; x); (a; y); (b; x); (b; y); (c; x); (c; y)g. The Cartesian product GH of two graphs G 2 Preliminaries and H has the vertex set This section briefly defines the basic mathematical V (GH) = V (G) × V (H): (2) definitions and operations used in the remainder of the paper. The reader is referred to standard text- A given vertex (a; x) is linked with another vertex books in graph theory, such as Trudeau(1993) or (b; y) if a = b (the first elements are identical) and Diestel(2005), for more details. As mathematical fx; yg 2 E(H) (the second elements are linked in terminology and notation can vary between sub- H), or if x = y (the second elements are identical) fields, alternative names and characterizations of and fa; bg 2 E(G) (the first elements are linked some objects are mentioned in the ensuing sections, in G). To aid understanding, Figure5 depicts an but they are not strictly necessary to understand the example of the Cartesian product of two graphs, arguments of this paper. G and H. Graph G has V (G) = fa; b; cg and Networks can be modeled as mathematical ob- E(G) = ffa; bg; fb; cgg. Graph H has V (H) = jects known as graphs, which consist of vertices fx; yg and E(H) = ffx; ygg. Observe how G (nodes) and edges (links). Let G be an undirected and H can be seen in GH as two orthogonal graph with no self-loops with vertex set V (G) and dimensions. Note also that the total number of edge set E(G). Let Kn denote the complete graph vertices in GH is equal to the product of the with n vertices and all possible edges. number of vertices in G and H. A graph H is said to be a subgraph of a graph We further denote the Cartesian exponent of a G if V (H) ⊆ V (G) and E(H) ⊆ V (G), that is, graph G as 1 n See also Gruenenfelder and Pisoni(2009) for related con- G = GGG:::G; (3) cerns. | {z } n 234 G H J a b a b a b c d d d Figure 2: Three graphs. H is an induced subgraph of G formed through the removal of vertex c and its incident edges. J is also a subgraph of G, but it is not an induced subgraph due to the fact that the edge between vertices a and d is missing. diamond C5 Figure 3: The diamond graph and C5. S3 S4 S6 Figure 4: Star graphs S3, S4, and S6. G H GH a x (a, x) (a, y) b (b, x) (b, y) c y (c, x) (c, y) Figure 5: Graphs G and H and the Cartesian product GH. 235 3 K2 K2 0 ((0, 0), 0) ((0, 1), 0) ((0, 0), 1) ((0, 1), 1) 1 ((1, 0), 0) ((1, 1), 0) ((1, 0), 1) ((1, 1), 1) 3 Figure 6: The complete graph K2 and Cartesian exponent K2 , i.e. K2K2K2. 0 000 010 001 011 00 01 1 100 110 101 111 10 11 Figure 7: The hyperlexicon H(2; f1; 2; 3g). Here the alphabet is defined as f0; 1g but any set of two symbols is possible. Edges between layers (i.e. phoneme additions/deletions) are drawn in grey; edges within layers (i.e. phoneme substitutions) are drawn in black. that is, the Cartesian product of G with itself n − 1 ture can be clearly seen in Figure7. Edges within 3 times.

Load more