A Hierarchical Tree Distance Measure for Classification

Total Page:16

File Type:pdf, Size:1020Kb

A Hierarchical Tree Distance Measure for Classification A Hierarchical Tree Distance Measure for Classification Kent Munthe Caspersen, Martin Bjeldbak Madsen, Andreas Berre Eriksen and Bo Thiesson Department of Computer Science, Aalborg University, Aalborg, Denmark {swallet, martinbmadsen}@gmail.com, {andreasb, thiesson}@cs.aau.dk Keywords: Machine Learning, Multi-class Classification, Hierarchical Classification, Tree Distance Measures, Multi- output Regression, Multidimensional Scaling, Process Automation, UNSPSC. Abstract: In this paper, we explore the problem of classification where class labels exhibit a hierarchical tree structure. Many multiclass classification algorithms assume a flat label space, where hierarchical structures are ignored. We take advantage of hierarchical structures and the interdependencies between labels. In our setting, labels are structured in a product and service hierarchy, with a focus on spend analysis. We define a novel distance measure between classes in a hierarchical label tree. This measure penalizes paths though high levels in the hierarchy. We use a known classification algorithm that aims to minimize distance between labels, given any symmetric distance measure. The approach is global in that it constructs a single classifier for an entire hierarchy by embedding hierarchical distances into a lower-dimensional space. Results show that combining our novel distance measure with the classifier induces a trade-off between accuracy and lower hierarchical distances on misclassifications. This is useful in a setting where erroneous predictions vastly change the context of a label. 1 INTRODUCTION the UNSPSC standard, see (Programme, 2016). Many classification problems have a hierarchical With the increasing advancement of technologies de- structure, but few multiclass classification algorithms veloped to gather and store vast quantities of data, in- take advantage of this fact. Traditional multiclass clas- teresting applications arise. Many kinds of business sification algorithms ignore any hierarchical structure, processes are supported by classifying data into one essentially flattening the hierarchy such that the clas- of multiple categories. In addition, as the quantity of sification problem is solved as a multiclass classifi- data grows, structured organizations of assigned cate- cation problem. Such problems are often solved by gories are often created to describe interdependencies combining the output of multiple binary classifiers, us- between categories. Spend analysis systems are an ex- ing techniques such as One-vs-One and One-vs-Rest ample domain where such a hierarchical structure can to provide predictions (Bishop, 2006). be beneficial. Hierarchical multiclass classification (HMC) algo- In a spend analysis system, one is interested in rithms are a variant of multiclass classification algo- being able to drill down on the types of purchases rithms which take advantage of labels organized in a across levels of specificity to aid in planning of pro- hierarchical structure. Depending on the label space, curements. Such tools also provide processes to gain hierarchical structures can be in the shape of a tree insights in how much and to whom spending is going or directed acyclic graph (DAG). Figure 1 shows an towards, supporting spend visibility. For example, in example of a tree-based label structure. In this paper, the UNSPSC1 taxonomy, a procured computer mouse we focus on tree structures. would belong to the following categories of increasing Silla and Freitas (Silla and Freitas, 2011) de- specificity: “Information Technology Broadcasting scribe hierarchical classification problems as a 3-tuple Υ, Ψ, Φ , where Υ is the type of graph represent- and Telecommunications”, “Computer Equipment and h i Accessories”, “Computer data input devices”, “Com- ing the hierarchical structure of classes, Ψ specifies puter mouse or trackballs”. For more information on whether a datapoint can have multiple labels in the class hierarchy, and Φ specifies whether the labeling 1 R United Nations Standard Products and Services Code , of datapoints only includes leaves or if nodes within a cross-industry taxonomy for product and service classifi- the hierarchy are included as well. Using this defini- cation. tion, we are concerned with problems of the form 502 Caspersen, K., Madsen, M., Eriksen, A. and Thiesson, B. A Hierarchical Tree Distance Measure for Classification. DOI: 10.5220/0006198505020509 In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017), pages 502-509 ISBN: 978-989-758-222-6 Copyright c 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved A Hierarchical Tree Distance Measure for Classification Υ = T (tree), meaning classes are organized in a used in conjunction with active learning. This devi- • tree structure. ates from the method introduced in this paper in that Ψ = SPL (single path of labels), meaning the model we introduce does not support DAGs and can be • problems we consider are not hierarchically multi- used without the aid of active learning, with promising label. results. Φ = PD (partial depth) labeling, meaning data- (Wang et al., 1999) identify issues in local- • points do not always have a leaf class. approach hierarchical classification and propose a In this paper, a novel distance measure is introduced, global-classifier based approach, aiming for closeness with respect to the label tree. The purpose of the of hierarchy labels. The authors realize that the con- distance measure is to capture similarity between la- cern of simply being correct or wrong in hierarchical bels and penalize errors at high levels in the hierarchy, classification is not enough, and that only focusing on more than errors at lower levels. This distance mea- the broader, higher levels is where the structure, and sure leads to a trade-off between accuracy and the dis- thus accuracy, diminishes. To mitigate these issues, tance of misclassifications. Intuitively, this trade-off the authors implement a multilabel classifier based makes sense for UNSPSC codes as, for example, clas- upon rules from features to classes found during train- sifying an apple as a fresh fruit should be penalized ing. These rules minimize a distance measure between less than classifying an apple as toxic waste. Training two classes, and are deterministically found. Their a classifier for such distance measures is not straight- distance measure is application-dependent, and the au- forward, therefore a classification method is presented, thors use the shortest distance between two labels. In which copes with a distance measure defined between this paper, we also construct a global classifier which two labels. aims to minimize distances between hierarchy labels. The rest of this paper is structured as follows. Sec- (Weinberger and Chapelle, 2009) introduces a la- tion 2 discusses existing HMC approaches in the lit- bel embedding with respect to the hierarchical struc- erature. Section 3 introduces hierarchical classifica- ture of the label tree. They build a global multiclass tion. In Section 4, we define properties a hierarchical classifier based on the embedding. We utilize their tree distance measure should comply to, and describe method of classification with our novel distance mea- our concrete implementation of these properties. Sec- sure. tion 5 details how to embed the distance measure in a hierarchical multiclass classifier. The experiments of Section 6 compares this classifier with other classi- 3 HIERARCHICAL fiers. Finally, Section 7 presents our ideas for further research. Section 8 concludes. CLASSIFICATION The hierarchical structure among labels allows us to reason about different degrees of misclassification. 2 RELATED WORK We are concerned with predicting the label of data- points within a hierarchical taxonomy. We define the (Dumais and Chen, 2000) explore hierarchical classi- input data as a set of tuples, such that a dataset D is fication of web content by ordering SVMs in a hierar- defined by chical fashion, and classifying based on user-specified D = (x, y) x X, y Y , (1) thresholds. The authors focus on a two-level label hi- { | ∈ ∈ } erarchy, as opposed to the 4-level UNSPSC hierarchy where x is a q-dimensional datapoint in feature space we utilize on in this paper. Assigning an instance to X and y is a label in a hierarchically structured set of a class requires using the posterior probabilities prop- labels Y = 1, 2, . , m . agated from the SVMs through the hierarchy. The Assume{ we have a datapoint} x with label y = U authors conclude that exploiting the hierarchical struc- from the label tree in Figure 1. It makes sense that ture of an underlying problem can, in some cases, pro- a prediction yˆ = V should be penalized less than a duce a better classifier, especially in situations with a prediction yˆ0 = Z, since it is closer to the true label large number of labels. y in the label tree. We capture this notion of distance (Labrou and Finin, 1999) use a global classifier between any two labels with our hierarchy embracing based system to classify web pages into a 2-level DAG- distance measure, properties of which are defined in based hierarchy of Yahoo! categories by computing Section 4. the similarity between documents. The authors con- One commonly used distance measure is to count clude that their system is not accurate enough to be the number of edges on a path between two labels in suitable for automatic classification, and should be the node hierarchy. We call this method the Edges 503 ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods R 0, or 1, depending on whether x is smaller than, equal to, or larger than 0, respectively. Finally, we define a notion of structural equivalence S X between two nodes in a label tree, denoted A B, such that the root is structurally equivalent to itself,≡ T W Y Z and A B ( sib(A) = sib(B) ρ(A) ρ(B)) . ≡ ⇐⇒ | | | | ∧ ≡ U V This recursive definition causes two nodes A and B to Figure 1: An example of a 3-level label tree. be structurally equivalent if all nodes met on the path from A to the root, pair-wise have the same number Between (EB) distance.
Recommended publications
  • Bayes Nets Conclusion Inference
    Bayes Nets Conclusion Inference • Three sets of variables: • Query variables: E1 • Evidence variables: E2 • Hidden variables: The rest, E3 Cloudy P(W | Cloudy = True) • E = {W} Sprinkler Rain 1 • E2 = {Cloudy=True} • E3 = {Sprinkler, Rain} Wet Grass P(E 1 | E 2) = P(E 1 ^ E 2) / P(E 2) Problem: Need to sum over the possible assignments of the hidden variables, which is exponential in the number of variables. Is exact inference possible? 1 A Simple Case A B C D • Suppose that we want to compute P(D = d) from this network. A Simple Case A B C D • Compute P(D = d) by summing the joint probability over all possible values of the remaining variables A, B, and C: P(D === d) === ∑∑∑ P(A === a, B === b,C === c, D === d) a,b,c 2 A Simple Case A B C D • Decompose the joint by using the fact that it is the product of terms of the form: P(X | Parents(X)) P(D === d) === ∑∑∑ P(D === d | C === c)P(C === c | B === b)P(B === b | A === a)P(A === a) a,b,c A Simple Case A B C D • We can avoid computing the sum for all possible triplets ( A,B,C) by distributing the sums inside the product P(D === d) === ∑∑∑ P(D === d | C === c)∑∑∑P(C === c | B === b)∑∑∑P(B === b| A === a)P(A === a) c b a 3 A Simple Case A B C D This term depends only on B and can be written as a 2- valued function fA(b) P(D === d) === ∑∑∑ P(D === d | C === c)∑∑∑P(C === c | B === b)∑∑∑P(B === b| A === a)P(A === a) c b a A Simple Case A B C D This term depends only on c and can be written as a 2- valued function fB(c) === === === === === === P(D d) ∑∑∑ P(D d | C c)∑∑∑P(C c | B b)f A(b) c b ….
    [Show full text]
  • Balanced Trees Part One
    Balanced Trees Part One Balanced Trees ● Balanced search trees are among the most useful and versatile data structures. ● Many programming languages ship with a balanced tree library. ● C++: std::map / std::set ● Java: TreeMap / TreeSet ● Many advanced data structures are layered on top of balanced trees. ● We’ll see several later in the quarter! Where We're Going ● B-Trees (Today) ● A simple type of balanced tree developed for block storage. ● Red/Black Trees (Today/Thursday) ● The canonical balanced binary search tree. ● Augmented Search Trees (Thursday) ● Adding extra information to balanced trees to supercharge the data structure. Outline for Today ● BST Review ● Refresher on basic BST concepts and runtimes. ● Overview of Red/Black Trees ● What we're building toward. ● B-Trees and 2-3-4 Trees ● Simple balanced trees, in depth. ● Intuiting Red/Black Trees ● A much better feel for red/black trees. A Quick BST Review Binary Search Trees ● A binary search tree is a binary tree with 9 the following properties: 5 13 ● Each node in the BST stores a key, and 1 6 10 14 optionally, some auxiliary information. 3 7 11 15 ● The key of every node in a BST is strictly greater than all keys 2 4 8 12 to its left and strictly smaller than all keys to its right. Binary Search Trees ● The height of a binary search tree is the 9 length of the longest path from the root to a 5 13 leaf, measured in the number of edges. 1 6 10 14 ● A tree with one node has height 0.
    [Show full text]
  • Trees: Binary Search Trees
    COMP2012H Spring 2014 Dekai Wu Binary Trees & Binary Search Trees (data structures for the dictionary ADT) Outline } Binary tree terminology } Tree traversals: preorder, inorder and postorder } Dictionary and binary search tree } Binary search tree operations } Search } min and max } Successor } Insertion } Deletion } Tree balancing issue COMP2012H (BST) Binary Tree Terminology } Go to the supplementary notes COMP2012H (BST) Linked Representation of Binary Trees } The degree of a node is the number of children it has. The degree of a tree is the maximum of its element degree. } In a binary tree, the tree degree is two data } Each node has two links left right } one to the left child of the node } one to the right child of the node Left child Right child } if no child node exists for a node, the link is set to NULL root 32 32 79 42 79 42 / 13 95 16 13 95 16 / / / / / / COMP2012H (BST) Binary Trees as Recursive Data Structures } A binary tree is either empty … Anchor or } Consists of a node called the root } Root points to two disjoint binary (sub)trees Inductive step left and right (sub)tree r left right subtree subtree COMP2012H (BST) Tree Traversal is Also Recursive (Preorder example) If the binary tree is empty then Anchor do nothing Else N: Visit the root, process data L: Traverse the left subtree Inductive/Recursive step R: Traverse the right subtree COMP2012H (BST) 3 Types of Tree Traversal } If the pointer to the node is not NULL: } Preorder: Node, Left subtree, Right subtree } Inorder: Left subtree, Node, Right subtree Inductive/Recursive step } Postorder: Left subtree, Right subtree, Node template <class T> void BinaryTree<T>::InOrder( void(*Visit)(BinaryTreeNode<T> *u), template<class T> BinaryTreeNode<T> *t) void BinaryTree<T>::PreOrder( {// Inorder traversal.
    [Show full text]
  • Q-Tree: a Multi-Attribute Based Range Query Solution for Tele-Immersive Framework
    Q-Tree: A Multi-Attribute Based Range Query Solution for Tele-Immersive Framework Md Ahsan Arefin, Md Yusuf Sarwar Uddin, Indranil Gupta, Klara Nahrstedt Department of Computer Science University of Illinois at Urbana Champaign Illinois, USA {marefin2, mduddin2, indy, klara}@illinois.edu Abstract given in a high level description which are transformed into multi-attribute composite range queries. Some of the Users and administrators of large distributed systems are examples include “which site is highly congested?”, “which frequently in need of monitoring and management of its components are not working properly?” etc. To answer the various components, data items and resources. Though there first one, the query is transformed into a multi-attribute exist several distributed query and aggregation systems, the composite range query with constrains (range of values) on clustered structure of tele-immersive interactive frameworks CPU utilization, memory overhead, stream rate, bandwidth and their time-sensitive nature and application requirements utilization, delay and packet loss rate. The later one can represent a new class of systems which poses different chal- be answered by constructing a multi-attribute range query lenges on this distributed search. Multi-attribute composite with constrains on static and dynamic characteristics of range queries are one of the key features in this class. those components. Queries can also be made by defining Queries are given in high level descriptions and then trans- different multi-attribute ranges explicitly. Another mention- formed into multi-attribute composite range queries. Design- able property of such systems is that the number of site is ing such a query engine with minimum traffic overhead, low limited due to limited display space and limited interactions, service latency, and with static and dynamic nature of large but the number of data items to store and manage can datasets, is a challenging task.
    [Show full text]
  • Tree Structures
    Tree Structures Definitions: o A tree is a connected acyclic graph. o A disconnected acyclic graph is called a forest o A tree is a connected digraph with these properties: . There is exactly one node (Root) with in-degree=0 . All other nodes have in-degree=1 . A leaf is a node with out-degree=0 . There is exactly one path from the root to any leaf o The degree of a tree is the maximum out-degree of the nodes in the tree. o If (X,Y) is a path: X is an ancestor of Y, and Y is a descendant of X. Root X Y CSci 1112 – Algorithms and Data Structures, A. Bellaachia Page 1 Level of a node: Level 0 or 1 1 or 2 2 or 3 3 or 4 Height or depth: o The depth of a node is the number of edges from the root to the node. o The root node has depth zero o The height of a node is the number of edges from the node to the deepest leaf. o The height of a tree is a height of the root. o The height of the root is the height of the tree o Leaf nodes have height zero o A tree with only a single node (hence both a root and leaf) has depth and height zero. o An empty tree (tree with no nodes) has depth and height −1. o It is the maximum level of any node in the tree. CSci 1112 – Algorithms and Data Structures, A.
    [Show full text]
  • A Simple Linear-Time Algorithm for Finding Path-Decompostions of Small Width
    Introduction Preliminary Definitions Pathwidth Algorithm Summary A Simple Linear-Time Algorithm for Finding Path-Decompostions of Small Width Kevin Cattell Michael J. Dinneen Michael R. Fellows Department of Computer Science University of Victoria Victoria, B.C. Canada V8W 3P6 Information Processing Letters 57 (1996) 197–203 Kevin Cattell, Michael J. Dinneen, Michael R. Fellows Linear-Time Path-Decomposition Algorithm Introduction Preliminary Definitions Pathwidth Algorithm Summary Outline 1 Introduction Motivation History 2 Preliminary Definitions Boundaried graphs Path-decompositions Topological tree obstructions 3 Pathwidth Algorithm Main result Linear-time algorithm Proof of correctness Other results Kevin Cattell, Michael J. Dinneen, Michael R. Fellows Linear-Time Path-Decomposition Algorithm Introduction Preliminary Definitions Motivation Pathwidth Algorithm History Summary Motivation Pathwidth is related to several VLSI layout problems: vertex separation link gate matrix layout edge search number ... Usefullness of bounded treewidth in: study of graph minors (Robertson and Seymour) input restrictions for many NP-complete problems (fixed-parameter complexity) Kevin Cattell, Michael J. Dinneen, Michael R. Fellows Linear-Time Path-Decomposition Algorithm Introduction Preliminary Definitions Motivation Pathwidth Algorithm History Summary History General problem(s) is NP-complete Input: Graph G, integer t Question: Is tree/path-width(G) ≤ t? Algorithmic development (fixed t): O(n2) nonconstructive treewidth algorithm by Robertson and Seymour (1986) O(nt+2) treewidth algorithm due to Arnberg, Corneil and Proskurowski (1987) O(n log n) treewidth algorithm due to Reed (1992) 2 O(2t n) treewidth algorithm due to Bodlaender (1993) O(n log2 n) pathwidth algorithm due to Ellis, Sudborough and Turner (1994) Kevin Cattell, Michael J. Dinneen, Michael R.
    [Show full text]
  • AVL Trees, Page 1 ' $
    CS 310 AVL Trees, Page 1 ' $ Motives of Balanced Trees Worst-case search steps #ofentries complete binary tree skewed binary tree 15 4 15 63 6 63 1023 10 1023 65535 16 65535 1,048,575 20 1048575 1,073,741,823 30 1073741823 & % CS 310 AVL Trees, Page 2 ' $ Analysis r • logx y =anumberr such that x = y ☞ log2 1024 = 10 ☞ log2 65536 = 16 ☞ log2 1048576 = 20 ☞ log2 1073741824 = 30 •bxc = an integer a such that a ≤ x dxe = an integer a such that a ≥ x ☞ b3.1416c =3 ☞ d3.1416e =4 ☞ b5c =5=d5e & % CS 310 AVL Trees, Page 3 ' $ • If a binary search tree of N nodes happens to be complete, then a search in the tree requires at most b c log2 N +1 steps. • Big-O notation: We say f(x)=O(g(x)) if f(x)isbounded by c · g(x), where c is a constant, for sufficiently large x. ☞ x2 +11x − 30 = O(x2) (consider c =2,x ≥ 6) ☞ x100 + x50 +2x = O(2x) • If a BST happens to be complete, then a search in the tree requires O(log2 N)steps. • In a linked list of N nodes, a search requires O(N)steps. & % CS 310 AVL Trees, Page 4 ' $ AVL Trees • A balanced binary search tree structure proposed by Adelson, Velksii and Landis in 1962. • In an AVL tree of N nodes, every searching, deletion or insertion operation requires only O(log2 N)steps. • AVL tree searching is exactly the same as that of the binary search tree. & % CS 310 AVL Trees, Page 5 ' $ Definition • An empty tree is balanced.
    [Show full text]
  • 7. Hierarchies & Trees Visualizing Topological Relations
    7. Hierarchies & Trees Visualizing topological relations Vorlesung „Informationsvisualisierung” Prof. Dr. Andreas Butz, WS 2011/12 Konzept und Basis für Folien: Thorsten Büring LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 1 Outline • Hierarchical data and tree representations • 2D Node-link diagrams – Hyperbolic Tree Browser – SpaceTree – Cheops – Degree of interest tree – 3D Node-link diagrams • Enclosure – Treemap – Ordererd Treemaps – Various examples – Voronoi treemap – 3D Treemaps • Circular visualizations • Space-filling node-link diagram LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 2 Hierarchical Data • Card et al. 1999: data repository in which data cases are related to subcases • Many data collections have an inherent hierarchical organization – Organizational Charts – Websites (approximately hierarchical) Yee et al. 2001 – File system – Family tree – OO programming • Hierarchies are usually represented as tree visual structures • Trees tend to be easier to lay out and interpret than networks (e.g. no cycles) • But: as shown in the example, networks may in some cases be visualized as a tree LMU München – Medieninformatik – Andreas Butz – Informationsvisualisierung – WS2011/12 Folie 3 Tree Representations • Two kinds of representations • Node-link diagram (see previous lecture): represent connections as edges between vertices (data cases) http://www.icann.org • Enclosure: space-filling approaches by visually nesting the hierarchy LMU
    [Show full text]
  • Lowest Common Ancestors in Trees and Directed Acyclic Graphs1
    Lowest Common Ancestors in Trees and Directed Acyclic Graphs1 Michael A. Bender2 3 Martín Farach-Colton4 Giridhar Pemmasani2 Steven Skiena2 5 Pavel Sumazin6 Version: We study the problem of finding lowest common ancestors (LCA) in trees and directed acyclic graphs (DAGs). Specifically, we extend the LCA problem to DAGs and study the LCA variants that arise in this general setting. We begin with a clear exposition of Berkman and Vishkin’s simple optimal algorithm for LCA in trees. The ideas presented are not novel theoretical contributions, but they lay the foundation for our work on LCA problems in DAGs. We present an algorithm that finds all-pairs-representative : LCA in DAGs in O~(n2 688 ) operations, provide a transitive-closure lower bound for the all-pairs-representative-LCA problem, and develop an LCA-existence algorithm that preprocesses the DAG in transitive-closure time. We also present a suboptimal but practical O(n3) algorithm for all-pairs-representative LCA in DAGs that uses ideas from the optimal algorithms in trees and DAGs. Our results reveal a close relationship between the LCA, all-pairs-shortest-path, and transitive-closure problems. We conclude the paper with a short experimental study of LCA algorithms in trees and DAGs. Our experiments and source code demonstrate the elegance of the preprocessing-query algorithms for LCA in trees. We show that for most trees the suboptimal Θ(n log n)-preprocessing Θ(1)-query algorithm should be preferred, and demonstrate that our proposed O(n3) algorithm for all- pairs-representative LCA in DAGs performs well in both low and high density DAGs.
    [Show full text]
  • Trees a Tree Is a Graph Which Is (A) Connected and (B) Has No Cycles (Acyclic)
    Trees A tree is a graph which is (a) Connected and (b) has no cycles (acyclic). 1 Lemma 1 Let the components of G be C1; C2; : : : ; Cr, Suppose e = (u; v) 2= E, u 2 Ci; v 2 Cj. (a) i = j ) !(G + e) = !(G). (b) i 6= j ) !(G + e) = !(G) − 1. (a) v u (b) u v 2 Proof Every path P in G + e which is not in G must contain e. Also, !(G + e) ≤ !(G): Suppose (x = u0; u1; : : : ; uk = u; uk+1 = v; : : : ; u` = y) is a path in G + e that uses e. Then clearly x 2 Ci and y 2 Cj. (a) follows as now no new relations x ∼ y are added. (b) Only possible new relations x ∼ y are for x 2 Ci and y 2 Cj. But u ∼ v in G + e and so Ci [ Cj becomes (only) new component. 2 3 Lemma 2 G = (V; E) is acyclic (forest) with (tree) components C1; C2; : : : ; Ck. jV j = n. e = (u; v) 2= E, u 2 Ci; v 2 Cj. (a) i = j ) G + e contains a cycle. (b) i 6= j ) G + e is acyclic and has one less com- ponent. (c) G has n − k edges. 4 (a) u; v 2 Ci implies there exists a path (u = u0; u1; : : : ; u` = v) in G. So G + e contains the cycle u0; u1; : : : ; u`; u0. u v 5 (a) v u Suppose G + e contains the cycle C. e 2 C else C is a cycle of G. C = (u = u0; u1; : : : ; u` = v; u0): But then G contains the path (u0; u1; : : : ; u`) from u to v – contradiction.
    [Show full text]
  • Arxiv:1901.04560V1 [Math.CO] 14 Jan 2019 the flexibility That Makes Hypergraphs Such a Versatile Tool Complicates Their Analysis
    MINIMALLY CONNECTED HYPERGRAPHS MARK BUDDEN, JOSH HILLER, AND ANDREW PENLAND Abstract. Graphs and hypergraphs are foundational structures in discrete mathematics. They have many practical applications, including the rapidly developing field of bioinformatics, and more generally, biomathematics. They are also a source of interesting algorithmic problems. In this paper, we define a construction process for minimally connected r-uniform hypergraphs, which captures the intuitive notion of building a hypergraph piece-by-piece, and a numerical invariant called the tightness, which is independent of the construc- tion process used. Using these tools, we prove some fundamental properties of minimally connected hypergraphs. We also give bounds on their chromatic numbers and provide some results involving edge colorings. We show that ev- ery connected r-uniform hypergraph contains a minimally connected spanning subhypergraph and provide a polynomial-time algorithm for identifying such a subhypergraph. 1. Introduction Graphs and hypergraphs provide many beautiful results in discrete mathemat- ics. They are also extremely useful in applications. Over the last six decades, graphs and their generalizations have been used for modeling many biological phenomena, ranging in scale from protein-protein interactions, individualized cancer treatments, carcinogenesis, and even complex interspecial relationships [12, 15, 16, 19, 22]. In the last few years in particular, hypergraphs have found an increasingly prominent position in the biomathematical literature, as they allow scientists and practitioners to model complex interactions between arbitrarily many actors [22]. arXiv:1901.04560v1 [math.CO] 14 Jan 2019 The flexibility that makes hypergraphs such a versatile tool complicates their analysis. Because of this, many different algorithms and metrics have been devel- oped to assist with their application [18, 23].
    [Show full text]
  • Section 11.1 Introduction to Trees Definition
    Section 11.1 Introduction to Trees Definition: A tree is a connected undirected graph with no simple circuits. : A circuit is a path of length >=1 that begins and ends a the same vertex. d d Tournament Trees A common form of tree used in everyday life is the tournament tree, used to describe the outcome of a series of games, such as a tennis tournament. Alice Antonia Alice Anita Alice Abigail Abigail Alice Amy Agnes Agnes Angela Angela Angela Audrey A Family Tree Much of the tree terminology derives from family trees. Gaea Ocean Cronus Phoebe Zeus Poseidon Demeter Pluto Leto Iapetus Persephone Apollo Atlas Prometheus Ancestor Tree An inverted family tree. Important point - it is a binary tree. Iphigenia Clytemnestra Agamemnon Leda Tyndareus Aerope Atreus Catreus Forest Graphs containing no simple circuits that are not connected, but each connected component is a tree. Theorem An undirected graph is a tree if and only if there is a unique simple path between any two of its vertices. Rooted Trees Once a vertex of a tree has been designated as the root of the tree, it is possible to assign direction to each of the edges. Rooted Trees g a e f e c b b d a d c g f root node a internal vertex parent of g b c d e f g leaf siblings h i a b c d e f g h i h i ancestors of and a b c d e f g subtree with b as its h i root subtree with c as its root m-ary trees A rooted tree is called an m-ary tree if every internal vertex has no more than m children.
    [Show full text]