A Tree-Based Summarization Framework for Differences Between Two Data Sets a Thesis Submitted to Kent State University in Partia

Total Page:16

File Type:pdf, Size:1020Kb

A Tree-Based Summarization Framework for Differences Between Two Data Sets a Thesis Submitted to Kent State University in Partia A Tree-based Summarization Framework For Differences Between Two Data Sets A thesis submitted to Kent State University in partial fulfillment of the requirements for the degree of Master of Science by Dong Wang May 2009 Thesis written by Dong Wang B.S., University of Science and Technology of China, 1997 M.S., University of Science and Technology of China, 2000 M.S., Kent State University Approved by Dr. Ruoming Jin, Advisor Dr. Robert Walker, Chair, Department of Computer Science Dr. John R.D. Stalvey, Dean, College of Arts and Sciences ii Table of Contents List of Figures iv List of Tables vii List of Algorithms viii 1 Introduction 1 1.1 Ubiquitous Change . 1 1.2 Hypothesis Testing . 2 1.3 Test Statistics Properties . 2 1.4 Test Statistic Methods . 3 1.5 Utilizing of χ2 Test............................ 4 2 Related Work 6 2.1 Change Detection . 6 2.2 Measuring Differences in Data Set . 6 2.3 Describing Differences Between Multidimensional Data Sets . 7 3 Problem Definition 9 3.1 Histogram-based Analysis . 9 3.2 Multi-dimensional Contingency table . 10 3.3 Decision-tree Like Structure Approach . 11 iii 4 Dynamic Programming Algorithm 13 5 Greedy Recursive Scheme 17 6 Experimental Results 19 6.1 Data Preparing . 19 6.2 Testing Environment . 23 6.3 Running Costs . 23 6.4 Test Results . 24 6.5 Accuracy of the Greedy Algorithm . 24 7 Conclusion and future work 26 Bibliography 27 Appendices 29 A Dynamic Programming Algorithm Results For the Real Data Sets 30 iv List of Figures 3.1 Building of one-dimensional contingency table . 10 3.2 Illustration of a 2-d grid . 11 3.3 Final result of the optimal cuts. 12 4.1 Possible cuts of an 8 × 6 2d hypercube . 13 4.2 Illustration of the decision tree building for 1-d hypercube . 15 4.3 Illustration of the decision tree building for 2-d hypercube . 16 5.1 A simple 1 − d test hypercube . 18 6.1 A simple tree-structure explaination . 20 6.2 Running time of both algorithms vs grid sizes . 24 A.1 Cutting result for the first kind of change of Abalone . 30 A.2 Cutting result for the second kind of change of Abalone . 31 A.3 Cutting result for the first kind of change of Auto MPG . 32 A.4 Cutting result for the second kind of change of Auto MPG . 33 A.5 Cutting result for the first kind of change of Clouds . 34 A.6 Cutting result for the second kind of change of Clouds . 35 A.7 Cutting result for the first kind of change of Cement . 36 A.8 Cutting result for the second kind of change of Cement . 37 A.9 Cutting result for the first kind of change of Credit . 38 v A.10 Cutting result for the second kind of change of Credit . 39 A.11 Cutting result for the first kind of change of Vowel Context . 40 A.12 Cutting result for the second kind of change of Vowel Context . 41 A.13 Cutting result for the first kind of change of Cylinder . 42 A.14 Cutting result for the second kind of change of Cylinder . 43 vi List of Tables 6.1 List of Data Sets. 19 6.2 Comparison of the two algorithms . 25 vii List of Algorithms 1 Dynamic Programming Algorithm To Find The Maximize Indepen- dence . 15 2 Greedy Algorithm To Find The Maximize Independence . 17 3 Third Kind Of Change Generation . 22 4 Hypercube Generation . 23 viii Chapter 1 Introduction 1.1 Ubiquitous Change One of the fundamental problems in data mining is comparing the change between two data sets with the same set of attributes. Detecting and describing the changes have great potentials in many research areas. Listed below are some examples: • Every month a retail store generates a sales report. The manager of the store wants to compare the reports from different periods and find out which factor contributes most to the differences. • In several locations of the tropical ocean, the water temperatures at multiple depths are collected. By comparing the differences between the data sets from different times of the year, the scientists want to know which area is the most significant contributor to the differences between two data sets of two different collecting times. • The log files of two popular web sites might have very different patterns. The administrators want to know what causes the differences. The reason might be a certain point of time or a certain group of visitors. In these examples, the two data sets studied usually have identical sets of attributes. The difference between them is the distribution of the data sets. A summary of these 1 2 differences could help the managers to find the trends in the consumer market and help them to make the right decisions. Unlike the widely used OLAP tools[6], which can drill-down or roll-up to different levels of aggregations and help the user to find the differences, a method to describe and explain these differences is required. 1.2 Hypothesis Testing To describe differences, differences first must be statistically defined. Suppose there are two different data sets, say two transactional data sets D1 and D2 from a chain store in two different locations. For D1, the underlying distribution function is F1, and for D2, the underlying distribution function is F2. The question is if there’s any differences between F1 and F2. So there are two hypotheses, a null hypothesis H0 and an alternative hypothesis H1: H0 : F1 = F2 H1 : F1 6= F2 A test statistic s(F1,F2) is needed, which is a function of F1 and F2 (and therefore a function of D1 and D2); from the value of s it can be decided (with a certain degree of confidence) to reject one of the hypotheses above. 1.3 Test Statistics Properties There are several properties a good test statistic has to have 1. Consistency: When min(|D1|, |D2|) → ∞, we can always reject one of the hypotheses. For example, if D1 is drawn from a Bernoulli distribution with p = 0.5, and D2 is also drawn from a Bernoulli distribution, but with p = 0.49, then if the difference of the two mean values of the two sample sets is used as 3 the test statistic, s = |µ1 − µ2| , the right decision (which is to reject H0) might not be reached when the sample size is small, however when the sample size goes to infinity, even though the difference between D1 and D2 is very small, s will become so significant that H0 can be rejected with high confidence. 2. Distribution-free: In most real datasets, it is difficult to determine functions F1 and F2 in advance and even if they are determined once, distribution functions could change from time to time. So it is necessary to make the test statistic distribution-free. 3. Power: If there are more than one test statistic that meets the requirements above, which one should be used? There are two types of errors in a statistical decision process. Type I error is rejecting a hypothesis that should have been accepted; type II error is accepting a hypothesis that should have been rejected. The power of a test statistic is the probability of NOT committing a type II error. A good test statistic should have a power close to 1, so we can reject the hypothesis when it is not correct. So the goal is to find a test statistic that is consistent, distribution-free and has a power as close to 1 as possible. 1.4 Test Statistic Methods In real-world data sets, many data sets have more than one attribute, so besides the properties listed in section 1.3, the test statistic methods must be multi-variant. Listed below are some of the qualified methods: 1. Friedman’s MST method[8]. As an extension of the univariate Wald-Wolfowitz test[17], Friedman’s MST method can be used to find the number of homo- 4 geneous regions in the mixture of two types of multivariate points. At the beginning, a complete graph is built based on the distances δij between any two points i and j. Then a minimal spanning tree(MST) is built from the complete graph. Next, All the edges across the two different groups are removed. Since it is an MST, each time an edge is removed, the total number of isolated sub- graphs will increase by one. After all the removals, a set of homogeneous and isolated graphs will be formed, and the number of these isolated graphs can be used as the test statistic. 2. The k-nearest neighbors[10]. In multi-dimensional space, once the distance deltaij between any two points i and j is calculated (it could be metric like Euler distance), The k-nearest neighbors for each point can be determined. Each point pi, together with its k nearest neighbors form a sphere centered at pi. At the end, the number of homogeneous spheres can be used as the test statistic. 3. Cross-matching[14]. First, Kullback-Leibler divergence[16] is calculated be- tween any two points from both types, supposing the total number of the two types of points is N, the minimum distance non-bipartite matching can be performed and generate N/2 pairs (if N is odd, a dumb point d with δid = 0, i ∈ {1, 2,...,N}, then the pair containing the dumb point is discarded). The test statistic is the number of homogeneous pairs in these N/2 pairs.
Recommended publications
  • A Public Transport Bus Assignment Problem: Parallel Metaheuristics Assessment
    David Fernandes Semedo Licenciado em Engenharia Informática A Public Transport Bus Assignment Problem: Parallel Metaheuristics Assessment Dissertação para obtenção do Grau de Mestre em Engenharia Informática Orientador: Prof. Doutor Pedro Barahona, Prof. Catedrático, Universidade Nova de Lisboa Co-orientador: Prof. Doutor Pedro Medeiros, Prof. Associado, Universidade Nova de Lisboa Júri: Presidente: Prof. Doutor José Augusto Legatheaux Martins Arguente: Prof. Doutor Salvador Luís Bettencourt Pinto de Abreu Vogal: Prof. Doutor Pedro Manuel Corrêa Calvente de Barahona Setembro, 2015 A Public Transport Bus Assignment Problem: Parallel Metaheuristics Assess- ment Copyright c David Fernandes Semedo, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa A Faculdade de Ciências e Tecnologia e a Universidade Nova de Lisboa têm o direito, perpétuo e sem limites geográficos, de arquivar e publicar esta dissertação através de exemplares impressos reproduzidos em papel ou de forma digital, ou por qualquer outro meio conhecido ou que venha a ser inventado, e de a divulgar através de repositórios científicos e de admitir a sua cópia e distribuição com objectivos educacionais ou de investigação, não comerciais, desde que seja dado crédito ao autor e editor. Aos meus pais. ACKNOWLEDGEMENTS This research was partly supported by project “RtP - Restrict to Plan”, funded by FEDER (Fundo Europeu de Desenvolvimento Regional), through programme COMPETE - POFC (Operacional Factores de Competitividade) with reference 34091. First and foremost, I would like to express my genuine gratitude to my advisor profes- sor Pedro Barahona for the continuous support of my thesis, for his guidance, patience and expertise. His general optimism, enthusiasm and availability to discuss and support new ideas helped and encouraged me to push my work always a little further.
    [Show full text]
  • Solving the Maximum Weight Independent Set Problem Application to Indirect Hex-Mesh Generation
    Solving the Maximum Weight Independent Set Problem Application to Indirect Hex-Mesh Generation Dissertation presented by Kilian VERHETSEL for obtaining the Master’s degree in Computer Science and Engineering Supervisor(s) Jean-François REMACLE, Amaury JOHNEN Reader(s) Jeanne PELLERIN, Yves DEVILLE , Julien HENDRICKX Academic year 2016-2017 Acknowledgments I would like to thank my supervisor, Jean-François Remacle, for believing in me, my co-supervisor Amaury Johnen, for his encouragements during this project, as well as Jeanne Pellerin, for her patience and helpful advice. 3 4 Contents List of Notations 7 List of Figures 8 List of Tables 10 List of Algorithms 12 1 Introduction 17 1.1 Background on Indirect Mesh Generation . 17 1.2 Outline . 18 2 State of the Art 19 2.1 Exact Resolution . 19 2.1.1 Approaches Based on Graph Coloring . 19 2.1.2 Approaches Based on MaxSAT . 20 2.1.3 Approaches Based on Integer Programming . 21 2.1.4 Parallel Implementations . 21 2.2 Heuristic Approach . 22 3 Hybrid Approach to Combining Tetrahedra into Hexahedra 25 3.1 Computation of the Incompatibility Graph . 25 3.2 Exact Resolution for Small Graphs . 28 3.2.1 Complete Search . 28 3.2.2 Branch and Bound . 29 3.2.3 Upper Bounds Based on Clique Partitions . 31 3.2.4 Upper Bounds based on Linear Programming . 32 3.3 Construction of an Initial Solution . 35 3.3.1 Local Search Algorithms . 35 3.3.2 Local Search Formulation of the Maximum Weight Independent Set . 36 3.3.3 Strategies to Escape Local Optima .
    [Show full text]
  • Cognicrypt: Supporting Developers in Using Cryptography
    CogniCrypt: Supporting Developers in using Cryptography Stefan Krüger∗, Sarah Nadiy, Michael Reifz, Karim Aliy, Mira Meziniz, Eric Bodden∗, Florian Göpfertz, Felix Güntherz, Christian Weinertz, Daniel Demmlerz, Ram Kamathz ∗Paderborn University, {fistname.lastname}@uni-paderborn.de yUniversity of Alberta, {nadi, karim.ali}@ualberta.ca zTechnische Universität Darmstadt, {reif, mezini, guenther, weinert, demmler}@cs.tu-darmstadt.de, [email protected], [email protected] Abstract—Previous research suggests that developers often reside on the low level of cryptographic algorithms instead of struggle using low-level cryptographic APIs and, as a result, being more functionality-oriented APIs that provide convenient produce insecure code. When asked, developers desire, among methods such as encryptFile(). When it comes to tool other things, more tool support to help them use such APIs. In this paper, we present CogniCrypt, a tool that supports support, participants suggested tools like a CryptoDebugger, developers with the use of cryptographic APIs. CogniCrypt analysis tools that find misuses and provide code templates assists the developer in two ways. First, for a number of common or generate code for common functionality. These suggestions cryptographic tasks, CogniCrypt generates code that implements indicate that participants not only lack the domain knowledge, the respective task in a secure manner. Currently, CogniCrypt but also struggle with APIs themselves and how to use them. supports tasks such as data encryption, communication over secure channels, and long-term archiving. Second, CogniCrypt In this paper, we present CogniCrypt, an Eclipse plugin that continuously runs static analyses in the background to ensure enables easier use of cryptographic APIs. In previous work, a secure integration of the generated code into the developer’s we outlined an early vision for CogniCrypt [2].
    [Show full text]
  • Analysis and Processing of Cryptographic Protocols
    Analysis and Processing of Cryptographic Protocols Submitted in partial fulfilment of the requirements of the degree of Bachelor of Science (Honours) of Rhodes University Bradley Cowie Grahamstown, South Africa November 2009 Abstract The field of Information Security and the sub-field of Cryptographic Protocols are both vast and continually evolving fields. The use of cryptographic protocols as a means to provide security to web servers and services at the transport layer, by providing both en- cryption and authentication to data transfer, has become increasingly popular. However, it is noted that it is rather difficult to perform legitimate analysis, intrusion detection and debugging on data that has passed through a cryptographic protocol as it is encrypted. The aim of this thesis is to design a framework, named Project Bellerophon, that is capa- ble of decrypting traffic that has been encrypted by an arbitrary cryptographic protocol. Once the plain-text has been retrieved further analysis may take place. To aid in this an in depth investigation of the TLS protocol was undertaken. This pro- duced a detailed document considering the message structures and the related fields con- tained within these messages which are involved in the TLS handshake process. Detailed examples explaining the processes that are involved in obtaining and generating the var- ious cryptographic components were explored. A systems design was proposed, considering the role of each of the components required in order to produce an accurate decryption of traffic encrypted by a cryptographic protocol. Investigations into the accuracy and the efficiency of Project Bellerophon to decrypt specific test data were conducted.
    [Show full text]
  • MULTIVALUED SUBSETS UNDER INFORMATION THEORY Indraneel Dabhade Clemson University, [email protected]
    Clemson University TigerPrints All Theses Theses 8-2011 MULTIVALUED SUBSETS UNDER INFORMATION THEORY Indraneel Dabhade Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Applied Mathematics Commons Recommended Citation Dabhade, Indraneel, "MULTIVALUED SUBSETS UNDER INFORMATION THEORY" (2011). All Theses. 1155. https://tigerprints.clemson.edu/all_theses/1155 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. MULTIVALUED SUBSETS UNDER INFORMATION THEORY _______________________________________________________ A Thesis Presented to the Graduate School of Clemson University _______________________________________________________ In Partial Fulfillment of the Requirements for the Degree Master of Science Industrial Engineering _______________________________________________________ by Indraneel Chandrasen Dabhade August 2011 _______________________________________________________ Accepted by: Dr. Mary Beth Kurz, Committee Chair Dr. Anand Gramopadhye Dr. Scott Shappell i ABSTRACT In the fields of finance, engineering and varied sciences, Data Mining/ Machine Learning has held an eminent position in predictive analysis. Complex algorithms and adaptive decision models have contributed towards streamlining directed research as well as improve on the accuracies in forecasting. Researchers in the fields of mathematics and computer science have made significant contributions towards the development of this field. Classification based modeling, which holds a significant position amongst the different rule-based algorithms, is one of the most widely used decision making tools. The decision tree has a place of profound significance in classification-based modeling. A number of heuristics have been developed over the years to prune the decision making process.
    [Show full text]
  • The Maritime Pickup and Delivery Problem with Cost and Time Window Constraints: System Modeling and A* Based Solution
    The Maritime Pickup and Delivery Problem with Cost and Time Window Constraints: System Modeling and A* Based Solution CHRISTOPHER DAMBAKK SUPERVISOR Associate Professor Lei Jiao University of Agder, 2019 Faculty of Engineering and Science Department of ICT Abstract In the ship chartering business, more and more shipment orders are based on pickup and delivery in an on-demand manner rather than conventional scheduled routines. In this situation, it is nec- essary to estimate and compare the cost of shipments in order to determine the cheapest one for a certain order. For now, these cal- culations are based on static, empirical estimates and simplifications, and do not reflect the complexity of the real world. In this thesis, we study the Maritime Pickup and Delivery Problem with Cost and Time Window Constraints. We first formulate the problem mathe- matically, which is conjectured NP-hard. Thereafter, we propose an A* based prototype which finds the optimal solution with complexity O(bd). We compare the prototype with a dynamic programming ap- proach and simulation results show that both algorithms find global optimal and that A* finds the solution more efficiently, traversing fewer nodes and edges. iii Preface This thesis concludes the master's education in Communication and Information Technology (ICT), at the University of Agder, Nor- way. Several people have supported and contributed to the completion of this project. I want to thank in particular my supervisor, Associate Professor Lei Jiao. He has provided excellent guidance and refreshing perspectives when the tasks ahead were challenging. I would also like to thank Jayson Mackie, co-worker and friend, for proofreading my report.
    [Show full text]
  • Author Guidelines for 8
    Proceedings of the 53rd Hawaii International Conference on System Sciences | 2020 Easy and Efficient Hyperparameter Optimization to Address Some Artificial Intelligence “ilities” Trevor J. Bihl Joseph Schoenbeck Daniel Steeneck & Jeremy Jordan Air Force Research Laboratory, USA The Perduco Group, USA Air Force Institute of Technology, USA [email protected] [email protected] {Daniel.Steeneck; Jeremy.Jordan}@afit.edu hyperparameters. This is a complex trade space due to Abstract ML methods being brittle and not robust to conditions outside of those on which they were trained. While Artificial Intelligence (AI), has many benefits, attention is now given to hyperparameter selection [4] including the ability to find complex patterns, [5], in general, as mentioned in Mendenhall [6], there automation, and meaning making. Through these are “no hard-and-fast rules” in their selection. In fact, benefits, AI has revolutionized image processing their selection is part of the “art of [algorithm] design” among numerous other disciplines. AI further has the [6], as appropriate hyperparameters can depend potential to revolutionize other domains; however, this heavily on the data under consideration itself. Thus, will not happen until we can address the “ilities”: ML methods themselves are often hand-crafted and repeatability, explain-ability, reliability, use-ability, require significant expertise and talent to appropriately train and deploy. trust-ability, etc. Notably, many problems with the “ilities” are due to the artistic nature of AI algorithm development, especially hyperparameter determination. AI algorithms are often crafted products with the hyperparameters learned experientially. As such, when applying the same Accuracy w w w w algorithm to new problems, the algorithm may not w* w* w* w* w* perform due to inappropriate settings.
    [Show full text]
  • Constructing Parallel Algorithms for Discrete Transforms
    Constructing Parallel Algorithms for Discrete Transforms From FFTs to Fast Legendre Transforms Constructie van Parallelle Algoritmen voor Discrete Transformaties Van FFTs tot Snelle Legendre Transformaties met een samenvatting in het Nederlands Proefschrift ter verkrijging van de graad van do ctor aan de Universiteit Utrecht op gezag van de Rector Magnicus Prof dr H O Voorma inge volge het b esluit van het College voor Promoties in het op enbaar te verdedigen op woensdag maart des middags te uur do or Marcia Alves de Inda geb oren op augustus te Porto Alegre Brazilie Promotor Prof dr Henk A van der Vorst Copromotor Dr Rob H Bisseling Faculteit Wiskunde en Informatica Universiteit Utrecht Mathematics Sub ject Classication T Y C Inda Marcia Alves de Constructing Parallel Algorithms for Discrete Transforms From FFTs to Fast Legendre Transforms Pro efschrift Universiteit Utrecht Met samenvatting in het Nederlands ISBN The work describ ed in this thesis was carried out at the Mathematics Department of Utrecht University The Netherlands with nancial supp ort by the Fundacao Co or denacao de Ap erfeicoamento de Pessoal de Nivel Sup erior CAPES Aos carvoeiros In my grandmothers Ega words a family that is always together my family Preface The initial target of my do ctoral research with parallel discrete transforms was to develop a parallel fast Legendre transform FLT algorithm based on the sequential DriscollHealy algorithm To do this I had to study their algorithm in depth This task was greatly simplied thanks to previous work
    [Show full text]
  • A Comprehensible Guide to Recent Nature-Inspired Algorithms Arxiv
    Mitigating Metaphors: A Comprehensible Guide to Recent Nature-Inspired Algorithms∗ Michael A. Lonesy Abstract In recent years, a plethora of new metaheuristic algorithms have explored different sources of inspiration within the biological and natural worlds. This nature-inspired approach to algorithm design has been widely criticised. A notable issue is the ten- dency for authors to use terminology that is derived from the domain of inspiration, rather than the broader domains of metaheuristics and optimisation. This makes it difficult to both comprehend how these algorithms work and understand their relationships to other metaheuristics. This paper attempts to address this issue, at least to some extent, by providing accessible descriptions of the most cited nature- inspired algorithms published in the last twenty years. It also discusses common- alities between these algorithms and more classical nature-inspired metaheuristics such as evolutionary algorithms and particle swarm optimisation, and finishes with a discussion of future directions for the field. 1 Introduction This paper is intended to be an objective guide to the most popular nature-inspired opti- misation algorithms published since the year 2000, measured by citation count. It is not the first paper to review this area [15, 65, 67], but it is arguably the first to present these algorithms in terms that will be familiar to the broader optimisation, metaheuristics, evolutionary computation, and swarm computing communities. Unlike some previous reviews, it does not aim to advocate for this area of research or provide support for the idea of designing algorithms based upon observations of natural systems. It only aims to report and summarise what already exists in more accessible terms.
    [Show full text]
  • Experiments with Kemeny Ranking: What Works When?
    Experiments with Kemeny Ranking: What Works When? Alnur Alia, Marina Meilab a1 Microsoft Way, Microsoft, Redmond, WA, 98052, U.S.A. bUniversity of Washington, Department of Statistics, Seattle, WA, 98195, U.S.A. Abstract This paper performs a comparison of several methods for Kemeny rank aggregation (104 algorithms and combinations thereof in total) originating in social choice theory, machine learning, and theoretical computer science, with the goal of establishing the best trade-offs between search time and performance. We find that, for this theoretically NP-hard task, in practice the problems span three regimes: strong consensus, weak consensus, and no consensus. We make specific recommendations for each, and propose a computationally fast test to distinguish between the regimes. In spite of the great variety of algorithms, there are few classes that are consistently Pareto optimal. In the most interesting regime, the inte- ger program exact formulation, local search algorithms and the approximate version of a theoretically exact branch and bound algorithm arise as strong contenders. Keywords: Kemeny ranking, consensus ranking, branch and bound, sorting, experimental evaluation 1. Introduction Preference aggregation has been extensively studied by economists un- der social choice theory. Arrow discussed certain desirable properties that a ranking rule must have and showed that no rule can simultaneously satisfy them all [2]. Thus, a variety of models of preference aggregation have been Email addresses: [email protected] (Alnur Ali), [email protected] (Marina Meila) Preprint submitted to Mathematical Social Sciences July 11, 2011 proposed, each of which satisfy subsets of properties deemed desirable. In theoretical computer science, too, many applications of preference aggrega- tion exist, including merging the results of various search engines [8, 14], collaborative filtering [26], and multiagent planning [16].
    [Show full text]
  • Compression of Signal on Graphs with the Application to Image and Video Coding
    USC–SIPI REPORT #438 COMPRESSION OF SIGNAL ON GRAPHS WITH THE APPLICATION TO IMAGE AND VIDEO CODING By Yung-Hsuan Chao December 2017 Signal and Image Processing Institute UNIVERSITY OF SOUTHERN CALIFORNIA USC Viterbi School of Engineering Department of Electrical Engineering-Systems 3740 McClintock Avenue, Suite 400 Los Angeles, CA 90089-2564 U.S.A. U S C P.D. D COMPRESSION OF SIGNAL ON GRAPHS WITH THE APPLICATION TO IMAGE AND VIDEO CODING Author: Supervisor: Yung-Hsuan C Dr. Antonio O A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy (ELECTRICAL ENGINEERING) December 2017 Copyright 2017 Yung-Hsuan Chao iii Abstract In this Ph.D. dissertation, we discuss several graph-based algorithms for transform coding in image and video compression applications. Graphs are generic data structures that are useful in representing signals in various applications. Different from the classic transforms such as Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT), graphs can represent signals on irregular and high dimensional domains, e.g. social networks, sensor networks. For regular signals such as images and videos, graphs can adapt to local charac- teristics such as edges and therefore provide more flexibility than conventional transforms. A frequency interpretation for signal on graphs can be derived using the Graph Fourier Trans- form (GFT). By properly adjusting the graph structure, e.g. connectivity and weights, based on signal characteristics, the GFT can provide compact representations even for signals with discontinuities. However, the GFT has high implementation complexity, making it less ap- plicable in signals of large size, e.g.
    [Show full text]
  • Comparative Evaluations of Image Encryption Algorithms
    Comparative Evaluations of Image Encryption Algorithms Zhe Liu A thesis submitted to the Auckland University of Technology in partial fulfillment of the requirements for the degree of Master of Computer and Information Sciences (MCIS) 2018 School of Engineering, Computer and Mathematical Sciences Abstract Information security has become a significant issue for protecting the secret information during transmission in practical applications in the era of information. A raft of information security schemes have been used in image encryption. These schemes can be divided into two domains; the first one encrypts the images based on spatial domain, the typical method of spatial image encryption technology is in use of chaotic system, most of the earlier encryption methods are belong to this domain; the other encrypts images on frequency domain, most of the optical image encryption methods are processed in this domain. In this thesis, a slew of approaches for image encryption have been proposed. The contributions of this thesis are listed as follows. (1) We design the improved encryption method based on traditional Double Random Phase Encoding (DRPE) method and use Discrete Cosine Transform (DCT) to replace Discrete Fourier Transform (DFT) so as to avoid operations on complex numbers; we use a logistic map to generate random matrices instead of random phase masks in the traditional DRPE so as to decrease the size of secret keys. (2) We design the encryption method based on traditional watermarking scheme by using Discrete Wavelet Transform (DWT), DCT and Singular Value Decomposition (SVD) together, the proposed DWT-DCT-SVD method has higher robustness than traditional chaotic scrambling method and DRPE method.
    [Show full text]