UNIVERSITY of CALIFORNIA SAN DIEGO Efficient Learning In

Total Page:16

File Type:pdf, Size:1020Kb

UNIVERSITY of CALIFORNIA SAN DIEGO Efficient Learning In UNIVERSITY OF CALIFORNIA SAN DIEGO Efficient Learning in Heterogeneous Internet of Things Ecosystems A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science (Computer Engineering) by Yeseong Kim Committee in charge: Professor Tajana Simunic Rosing, Chair Professor Chung-Kuan Cheng Professor Ryan Kastner Professor Farinaz Koushanfar Professor Dean Tullsen 2020 Copyright Yeseong Kim, 2020 All rights reserved. The dissertation of Yeseong Kim is approved, and it is ac- ceptable in quality and form for publication on microfilm and electronically: Chair University of California San Diego 2020 iii DEDICATION To my wife, Jeeyoon, my dauther, Olivia Daon, and my family, Youngju, Haeja, Arie, and Evie. iv EPIGRAPH O give thanks unto the Lord; for he is good: because his mercy endureth for ever. — Psalm 118:1 v TABLE OF CONTENTS Signature Page . iii Dedication . iv Epigraph . .v Table of Contents . vi List of Figures . ix List of Tables . xi Acknowledgements . xii Vita ............................................. xiv Abstract of the Dissertation . xviii Chapter 1 Introduction . .1 1.1 Cross-Platform Energy Prediction and Task Allocation . .4 1.2 Efficient Learning Based on HD Computing . .5 1.3 Collaborative Learning with HD Computing . .6 1.4 Beyond Classical Learning: DNA Pattern Matching using HD Com- puting . .7 Chapter 2 Intelligent Cross-Platform Task Characterization and Allocation . .9 2.1 Introduction . 10 2.2 Related Work . 12 2.3 Overview of P4 ............................ 14 2.4 Automated System Modeling . 15 2.4.1 Full-System Power Modeling . 15 2.4.2 Cross-Platform Application Phase Recognition . 19 2.5 Cross-Platform Prediction . 21 2.5.1 Phase-Based Training Data Generation . 22 2.5.2 Cross-Platform Prediction Model Training . 24 2.5.3 Online Prediction . 26 2.6 Cross-Platform Management for ML tasks . 28 2.6.1 Cross-Platform Management Framework . 28 2.6.2 Application Task Extraction . 30 2.6.3 Task Allocation Case Study . 31 2.7 Experimental Setup . 32 2.7.1 Benchmarks . 33 vi 2.7.2 Model Training Parameters . 34 2.7.3 Overhead . 35 2.8 Evaluation of P4 Models . 36 2.8.1 Full-System Power Estimation . 36 2.8.2 Cross-Platform Prediction . 41 2.9 Evaluation of Model-Based ML Task Allocation . 43 2.9.1 Energy Use Optimization . 44 2.9.2 Energy Cost Reduction . 45 2.10 Conclusion . 46 Chapter 3 Hyperdimensional Computing for Efficient Learning in IoT Systems . 48 3.1 Introduction . 49 3.2 Related Work . 51 3.3 HD Computing Primitives . 52 3.4 HD-Based Classification . 54 3.4.1 Design Overview . 54 3.4.2 Sensor Data Encoding . 55 3.4.3 Model Training . 56 3.4.4 Model-Based Inference . 58 3.5 Evaluation . 59 3.5.1 Experimental Setup . 59 3.5.2 Classification Accuracy . 61 3.5.3 Efficiency Comparison . 61 3.6 Conclusion . 63 Chapter 4 Collaborative Learning with Hyperdimensional Computing . 64 4.1 Introduction . 65 4.2 Motivational Scenario . 68 4.3 Related Work . 69 4.4 Secure Learning in HD Space . 71 4.4.1 Security Model . 71 4.4.2 Proposed Framework . 71 4.4.3 Secure Key Generation and Distribution . 72 4.5 SecureHD Encoding and Decoding . 74 4.5.1 Encoding in HD Space . 75 4.5.2 Decoding in HD Space . 78 4.6 Collaborative Learning in HD Space . 82 4.6.1 Hierarchical Learning Approach . 82 4.6.2 HD Model-Based Inference . 85 4.7 Evaluation . 85 4.7.1 Experimental Setup . 85 4.7.2 Encoding and Decoding Performance . 86 4.7.3 Evaluation of SecureHD Learning . 87 vii 4.7.4 Data Recovery Trade-offs . 89 4.7.5 Metadata Recovery Trade-offs . 92 4.8 Conclusion . 93 Chapter 5 HD Computing Beyond Classical Learning: DNA Pattern Matching . 94 5.1 Introduction . 95 5.2 Related Work . 96 5.3 GenieHD Overview . 97 5.4 DNA Pattern Matching Using HD Computing . 98 5.4.1 DNA Sequence Encoding . 99 5.4.2 Pattern Matching . 102 5.5 Hardware Acceleration Design . 105 5.5.1 Acceleration Architecture . 105 5.5.2 Implementation on Parallel Computing Platforms . 107 5.6 Evaluation . 108 5.6.1 Experimental Setup . 108 5.6.2 Efficiency Comparison . 109 5.6.3 Pattern Matching for Multiple Queries . 110 5.6.4 Dimensionality Exploitation . 112 5.7 Conclusion . 113 Chapter 6 Summary and Future Work . 114 6.1 Thesis Summary . 115 6.2 Future Directions . 117 6.2.1 Efficient Cognitive Processing with HD Computing . 117 6.2.2 Software Infrastructure for HD Computing . 118 Bibliography . 119 viii LIST OF FIGURES Figure 1.1: Computing Nodes on Heterogeneous IoT Systems . .2 Figure 2.1: An overview of P4 framework . 14 Figure 2.2: Power and performance with PMC events for Linpack benchmark on Intel SR1560SF server at maximum frequency . 20 Figure 2.3: Application phases for multi-threaded bzip2 benchmark independently iden- tified for Intel SR1560SF and Sun X4270 servers at the maximum frequency 21 Figure 2.4: Cross-platform phase matching (Intel SR1560SF to SUN X4270, splash2x.lu cb) 22 Figure 2.5: Identified phases of four benchmarks running for 60 seconds (IntelH, IntelM and IntelL: Intel server running at highest, medium, and lowest frequency settings. SunH, DellH, and A15H: Sun server, Dell server, and ARM Cortex- 15 processor running at highest frequency.) . 23 Figure 2.6: Cumulative distribution of instructions for two clusters . 24 Figure 2.7: Feed-forward neural networks for online prediction . 27 Figure 2.8: Overview of model-driven management on Spark environment . 28 Figure 2.9: Task group identification for two Spark applications . 31 Figure 2.10: Cumulative distribution of execution times for two task groups . 31 Figure 2.11: Cross-platform NN model configurations . 35 Figure 2.12: Overhead of model-driven management . 36 Figure 2.13: Processor power estimation errors (Intel SR1560SF) . 37 Figure 2.14: Power estimation error of subcomponents (Intel SR1560SF) . 38 Figure 2.15: Runtime subcomponent power estimation (Intel SR1560SF) . 38 Figure 2.16: Average error of single-machine supply power estimation. ARM A15 and A7 represents respectively either Cortex A15 or A7 processor . 40 Figure 2.17: Summary of time-variant power prediction accuracy . 40 Figure 2.18: Time-variant power level prediction for four heterogeneous platform combi- nations . 41 Figure 2.19: Cross-platform energy prediction accuracy. The error for each case shown in (a) is the average error cross-validated for all benchmark applications. 43 Figure 2.20: Summary of energy use optimization . 44 Figure 2.21: Energy breakdown comparison between Spark default and model-driven policy 45 Figure 2.22: Summary of cluster-level energy cost reduction . 46 Figure 2.23: Energy breakdown over different price ratios between clusters . 46 Figure 3.1: Overview of HD-Based Classification (Example: Human Activity Recognition) 54 Figure 3.2: Encoding of Sensor Measurements . 57 Figure 3.3: Accuracy Comparison for Different Modeling Methods . 60 Figure 3.4: Efficiency Comparison for Training and Inference . 61 Figure 4.1: Motivational scenario . 68 ix Figure 4.2: Execution time of homomorphic encryption and decryption over MNIST dataset . 69 Figure 4.3: Overview of SecureHD . 70 Figure 4.4: MPC-based key generation . 72 Figure 4.5: Illustration of SecureHD encoding and decoding procedures . 73 Figure 4.6: Value extraction example . 76 Figure 4.7: Iterative error correction procedure . 79 Figure 4.8: Relationship between the number of metavector.
Recommended publications
  • ARTIFICIAL INTELLIGENCE and ALGORITHMIC LIABILITY a Technology and Risk Engineering Perspective from Zurich Insurance Group and Microsoft Corp
    White Paper ARTIFICIAL INTELLIGENCE AND ALGORITHMIC LIABILITY A technology and risk engineering perspective from Zurich Insurance Group and Microsoft Corp. July 2021 TABLE OF CONTENTS 1. Executive summary 03 This paper introduces the growing notion of AI algorithmic 2. Introduction 05 risk, explores the drivers and A. What is algorithmic risk and why is it so complex? ‘Because the computer says so!’ 05 B. Microsoft and Zurich: Market-leading Cyber Security and Risk Expertise 06 implications of algorithmic liability, and provides practical 3. Algorithmic risk : Intended or not, AI can foster discrimination 07 guidance as to the successful A. Creating bias through intended negative externalities 07 B. Bias as a result of unintended negative externalities 07 mitigation of such risk to enable the ethical and responsible use of 4. Data and design flaws as key triggers of algorithmic liability 08 AI. A. Model input phase 08 B. Model design and development phase 09 C. Model operation and output phase 10 Authors: D. Potential complications can cloud liability, make remedies difficult 11 Zurich Insurance Group 5. How to determine algorithmic liability? 13 Elisabeth Bechtold A. General legal approaches: Caution in a fast-changing field 13 Rui Manuel Melo Da Silva Ferreira B. Challenges and limitations of existing legal approaches 14 C. AI-specific best practice standards and emerging regulation 15 D. New approaches to tackle algorithmic liability risk? 17 Microsoft Corp. Rachel Azafrani 6. Principles and tools to manage algorithmic liability risk 18 A. Tools and methodologies for responsible AI and data usage 18 Christian Bucher B. Governance and principles for responsible AI and data usage 20 Franziska-Juliette Klebôn C.
    [Show full text]
  • Text Mining Course for KNIME Analytics Platform
    Text Mining Course for KNIME Analytics Platform KNIME AG Copyright © 2018 KNIME AG Table of Contents 1. The Open Analytics Platform 2. The Text Processing Extension 3. Importing Text 4. Enrichment 5. Preprocessing 6. Transformation 7. Classification 8. Visualization 9. Clustering 10. Supplementary Workflows Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 2 Noncommercial-Share Alike license 1 https://creativecommons.org/licenses/by-nc-sa/4.0/ Overview KNIME Analytics Platform Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 3 Noncommercial-Share Alike license 1 https://creativecommons.org/licenses/by-nc-sa/4.0/ What is KNIME Analytics Platform? • A tool for data analysis, manipulation, visualization, and reporting • Based on the graphical programming paradigm • Provides a diverse array of extensions: • Text Mining • Network Mining • Cheminformatics • Many integrations, such as Java, R, Python, Weka, H2O, etc. Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 4 Noncommercial-Share Alike license 2 https://creativecommons.org/licenses/by-nc-sa/4.0/ Visual KNIME Workflows NODES perform tasks on data Not Configured Configured Outputs Inputs Executed Status Error Nodes are combined to create WORKFLOWS Licensed under a Creative Commons Attribution- ® Copyright © 2018 KNIME AG 5 Noncommercial-Share Alike license 3 https://creativecommons.org/licenses/by-nc-sa/4.0/ Data Access • Databases • MySQL, MS SQL Server, PostgreSQL • any JDBC (Oracle, DB2, …) • Files • CSV, txt
    [Show full text]
  • A Survey of Clustering with Deep Learning: from the Perspective of Network Architecture
    Received May 12, 2018, accepted July 2, 2018, date of publication July 17, 2018, date of current version August 7, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2855437 A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture ERXUE MIN , XIFENG GUO, QIANG LIU , (Member, IEEE), GEN ZHANG , JIANJING CUI, AND JUN LONG College of Computer, National University of Defense Technology, Changsha 410073, China Corresponding authors: Erxue Min ([email protected]) and Qiang Liu ([email protected]) This work was supported by the National Natural Science Foundation of China under Grant 60970034, Grant 61105050, Grant 61702539, and Grant 6097003. ABSTRACT Clustering is a fundamental problem in many data-driven application domains, and clustering performance highly depends on the quality of data representation. Hence, linear or non-linear feature transformations have been extensively used to learn a better data representation for clustering. In recent years, a lot of works focused on using deep neural networks to learn a clustering-friendly representation, resulting in a significant increase of clustering performance. In this paper, we give a systematic survey of clustering with deep learning in views of architecture. Specifically, we first introduce the preliminary knowledge for better understanding of this field. Then, a taxonomy of clustering with deep learning is proposed and some representative methods are introduced. Finally, we propose some interesting future opportunities of clustering with deep learning and give some conclusion remarks. INDEX TERMS Clustering, deep learning, data representation, network architecture. I. INTRODUCTION property of highly non-linear transformation. For the sim- Data clustering is a basic problem in many areas, such as plicity of description, we call clustering methods with deep machine learning, pattern recognition, computer vision, data learning as deep clustering1 in this paper.
    [Show full text]
  • A Tree-Based Dictionary Learning Framework
    A Tree-based Dictionary Learning Framework Renato Budinich∗ & Gerlind Plonka† June 11, 2020 We propose a new outline for adaptive dictionary learning methods for sparse encoding based on a hierarchical clustering of the training data. Through recursive application of a clustering method the data is organized into a binary partition tree representing a multiscale structure. The dictionary atoms are defined adaptively based on the data clusters in the partition tree. This approach can be interpreted as a generalization of a discrete Haar wavelet transform. Furthermore, any prior knowledge on the wanted structure of the dictionary elements can be simply incorporated. The computational complex- ity of our proposed algorithm depends on the employed clustering method and on the chosen similarity measure between data points. Thanks to the multi- scale properties of the partition tree, our dictionary is structured: when using Orthogonal Matching Pursuit to reconstruct patches from a natural image, dic- tionary atoms corresponding to nodes being closer to the root node in the tree have a tendency to be used with greater coefficients. Keywords. Multiscale dictionary learning, hierarchical clustering, binary partition tree, gen- eralized adaptive Haar wavelet transform, K-means, orthogonal matching pursuit 1 Introduction arXiv:1909.03267v2 [cs.LG] 9 Jun 2020 In many applications one is interested in sparsely approximating a set of N n-dimensional data points Y , columns of an n N real matrix Y = (Y1;:::;Y ). Assuming that the data j × N ∗R. Budinich is with the Fraunhofer SCS, Nordostpark 93, 90411 Nürnberg, Germany, e-mail: re- [email protected] †G. Plonka is with the Institute for Numerical and Applied Mathematics, University of Göttingen, Lotzestr.
    [Show full text]
  • Practical Homomorphic Encryption and Cryptanalysis
    Practical Homomorphic Encryption and Cryptanalysis Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften (Dr. rer. nat.) an der Fakult¨atf¨urMathematik der Ruhr-Universit¨atBochum vorgelegt von Dipl. Ing. Matthias Minihold unter der Betreuung von Prof. Dr. Alexander May Bochum April 2019 First reviewer: Prof. Dr. Alexander May Second reviewer: Prof. Dr. Gregor Leander Date of oral examination (Defense): 3rd May 2019 Author's declaration The work presented in this thesis is the result of original research carried out by the candidate, partly in collaboration with others, whilst enrolled in and carried out in accordance with the requirements of the Department of Mathematics at Ruhr-University Bochum as a candidate for the degree of doctor rerum naturalium (Dr. rer. nat.). Except where indicated by reference in the text, the work is the candidates own work and has not been submitted for any other degree or award in any other university or educational establishment. Views expressed in this dissertation are those of the author. Place, Date Signature Chapter 1 Abstract My thesis on Practical Homomorphic Encryption and Cryptanalysis, is dedicated to efficient homomor- phic constructions, underlying primitives, and their practical security vetted by cryptanalytic methods. The wide-spread RSA cryptosystem serves as an early (partially) homomorphic example of a public- key encryption scheme, whose security reduction leads to problems believed to be have lower solution- complexity on average than nowadays fully homomorphic encryption schemes are based on. The reader goes on a journey towards designing a practical fully homomorphic encryption scheme, and one exemplary application of growing importance: privacy-preserving use of machine learning.
    [Show full text]
  • Comparison of Dimensionality Reduction Techniques on Audio Signals
    Comparison of Dimensionality Reduction Techniques on Audio Signals Tamás Pál, Dániel T. Várkonyi Eötvös Loránd University, Faculty of Informatics, Department of Data Science and Engineering, Telekom Innovation Laboratories, Budapest, Hungary {evwolcheim, varkonyid}@inf.elte.hu WWW home page: http://t-labs.elte.hu Abstract: Analysis of audio signals is widely used and this work: car horn, dog bark, engine idling, gun shot, and very effective technique in several domains like health- street music [5]. care, transportation, and agriculture. In a general process Related work is presented in Section 2, basic mathe- the output of the feature extraction method results in huge matical notation used is described in Section 3, while the number of relevant features which may be difficult to pro- different methods of the pipeline are briefly presented in cess. The number of features heavily correlates with the Section 4. Section 5 contains data about the evaluation complexity of the following machine learning method. Di- methodology, Section 6 presents the results and conclu- mensionality reduction methods have been used success- sions are formulated in Section 7. fully in recent times in machine learning to reduce com- The goal of this paper is to find a combination of feature plexity and memory usage and improve speed of following extraction and dimensionality reduction methods which ML algorithms. This paper attempts to compare the state can be most efficiently applied to audio data visualization of the art dimensionality reduction techniques as a build- in 2D and preserve inter-class relations the most. ing block of the general process and analyze the usability of these methods in visualizing large audio datasets.
    [Show full text]
  • Information Guide
    INFORMATION GUIDE 7 ALL-PRO 7 NFL MVP LAMAR JACKSON 2018 - 1ST ROUND (32ND PICK) RONNIE STANLEY 2016 - 1ST ROUND (6TH PICK) 2020 BALTIMORE DRAFT PICKS FIRST 28TH SECOND 55TH (VIA ATL.) SECOND 60TH THIRD 92ND THIRD 106TH (COMP) FOURTH 129TH (VIA NE) FOURTH 143RD (COMP) 7 ALL-PRO MARLON HUMPHREY FIFTH 170TH (VIA MIN.) SEVENTH 225TH (VIA NYJ) 2017 - 1ST ROUND (16TH PICK) 2020 RAVENS DRAFT GUIDE “[The Draft] is the lifeblood of this Ozzie Newsome organization, and we take it very Executive Vice President seriously. We try to make it a science, 25th Season w/ Ravens we really do. But in the end, it’s probably more of an art than a science. There’s a lot of nuance involved. It’s Joe Hortiz a big-picture thing. It’s a lot of bits and Director of Player Personnel pieces of information. It’s gut instinct. 23rd Season w/ Ravens It’s experience, which I think is really, really important.” Eric DeCosta George Kokinis Executive VP & General Manager Director of Player Personnel 25th Season w/ Ravens, 2nd as EVP/GM 24th Season w/ Ravens Pat Moriarty Brandon Berning Bobby Vega “Q” Attenoukon Sarah Mallepalle Sr. VP of Football Operations MW/SW Area Scout East Area Scout Player Personnel Assistant Player Personnel Analyst Vincent Newsome David Blackburn Kevin Weidl Patrick McDonough Derrick Yam Sr. Player Personnel Exec. West Area Scout SE/SW Area Scout Player Personnel Assistant Quantitative Analyst Nick Matteo Joey Cleary Corey Frazier Chas Stallard Director of Football Admin. Northeast Area Scout Pro Scout Player Personnel Assistant David McDonald Dwaune Jones Patrick Williams Jenn Werner Dir.
    [Show full text]
  • Robust Hierarchical Clustering∗
    Journal of Machine Learning Research 15 (2014) 4011-4051 Submitted 12/13; Published 12/14 Robust Hierarchical Clustering∗ Maria-Florina Balcan [email protected] School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA Yingyu Liang [email protected] Department of Computer Science Princeton University Princeton, NJ 08540, USA Pramod Gupta [email protected] Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA Editor: Sanjoy Dasgupta Abstract One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across many different fields ranging from computational biology to social sciences to computer vision in part because their output is easy to interpret. Unfortunately, it is well known, however, that many of the classic agglomerative clustering algorithms are not robust to noise. In this paper we propose and analyze a new robust algorithm for bottom-up agglomerative clustering. We show that our algorithm can be used to cluster accurately in cases where the data satisfies a number of natural properties and where the traditional agglomerative algorithms fail. We also show how to adapt our algorithm to the inductive setting where our given data is only a small random sample of the entire data set. Experimental evaluations on synthetic and real world data sets show that our algorithm achieves better performance than other hierarchical algorithms in the presence of noise. Keywords: unsupervised learning, clustering, agglomerative algorithms, robustness 1. Introduction Many data mining and machine learning applications ranging from computer vision to biology problems have recently faced an explosion of data. As a consequence it has become increasingly important to develop effective, accurate, robust to noise, fast, and general clustering algorithms, accessible to developers and researchers in a diverse range of areas.
    [Show full text]
  • Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data
    International Journal of Geo-Information Article Space-Time Hierarchical Clustering for Identifying Clusters in Spatiotemporal Point Data David S. Lamb 1,* , Joni Downs 2 and Steven Reader 2 1 Measurement and Research, Department of Educational and Psychological Studies, College of Education, University of South Florida, 4202 E Fowler Ave, Tampa, FL 33620, USA 2 School of Geosciences, University of South Florida, 4202 E Fowler Ave, Tampa, FL 33620, USA; [email protected] (J.D.); [email protected] (S.R.) * Correspondence: [email protected] Received: 2 December 2019; Accepted: 27 January 2020; Published: 1 February 2020 Abstract: Finding clusters of events is an important task in many spatial analyses. Both confirmatory and exploratory methods exist to accomplish this. Traditional statistical techniques are viewed as confirmatory, or observational, in that researchers are confirming an a priori hypothesis. These methods often fail when applied to newer types of data like moving object data and big data. Moving object data incorporates at least three parts: location, time, and attributes. This paper proposes an improved space-time clustering approach that relies on agglomerative hierarchical clustering to identify groupings in movement data. The approach, i.e., space–time hierarchical clustering, incorporates location, time, and attribute information to identify the groups across a nested structure reflective of a hierarchical interpretation of scale. Simulations are used to understand the effects of different parameters, and to compare against existing clustering methodologies. The approach successfully improves on traditional approaches by allowing flexibility to understand both the spatial and temporal components when applied to data. The method is applied to animal tracking data to identify clusters, or hotspots, of activity within the animal’s home range.
    [Show full text]
  • Chapter G03 – Multivariate Methods
    g03 – Multivariate Methods Introduction – g03 Chapter g03 – Multivariate Methods 1. Scope of the Chapter This chapter is concerned with methods for studying multivariate data. A multivariate data set consists of several variables recorded on a number of objects or individuals. Multivariate methods can be classified as those that seek to examine the relationships between the variables (e.g., principal components), known as variable-directed methods, and those that seek to examine the relationships between the objects (e.g., cluster analysis), known as individual-directed methods. Multiple regression is not included in this chapter as it involves the relationship of a single variable, known as the response variable, to the other variables in the data set, the explanatory variables. Routines for multiple regression are provided in Chapter g02. 2. Background 2.1. Variable-directed Methods Let the n by p data matrix consist of p variables, x1,x2,...,xp,observedonn objects or individuals. Variable-directed methods seek to examine the linear relationships between the p variables with the aim of reducing the dimensionality of the problem. There are different methods depending on the structure of the problem. Principal component analysis and factor analysis examine the relationships between all the variables. If the individuals are classified into groups then canonical variate analysis examines the between group structure. If the variables can be considered as coming from two sets then canonical correlation analysis examines the relationships between the two sets of variables. All four methods are based on an eigenvalue decomposition or a singular value decomposition (SVD)of an appropriate matrix. The above methods may reduce the dimensionality of the data from the original p variables to a smaller number, k, of derived variables that adequately represent the data.
    [Show full text]
  • Homomorphic Encryption Library
    Ludwig-Maximilians-Universit¨at Munchen¨ Prof. Dr. D. Kranzlmuller,¨ Dr. N. gentschen Felde Data Science & Ethics { Microsoft SEAL { Exercise 1: Library for Matrix Operations Microsoft's Simple Encrypted Arithmetic Library (SEAL)1 is a publicly available homomorphic encryption library. It can be found and downloaded at http://sealcrypto.codeplex.com/. Implement a library 13b MS-SEAL.h supporting matrix operations using homomorphic encryption on basis of the MS SEAL. due date: 01.07.2018 (EOB) no. of students: 2 deliverables: 1. Implemenatation (including source code(s)) 2. Documentation (max. 10 pages) 3. Presentation (10 { max. 15 minutes) 13b MS-SEAL.h #i n c l u d e <f l o a t . h> #i n c l u d e <s t d b o o l . h> typedef struct f double ∗ e n t r i e s ; unsigned int width; unsigned int height; g matrix ; /∗ Initialize new matrix: − reserve memory only ∗/ matrix initMatrix(unsigned int width, unsigned int height); /∗ Initialize new matrix: − reserve memory − set any value to 0 ∗/ matrix initMatrixZero(unsigned int width, unsigned int height); /∗ Initialize new matrix: − reserve memory − set any value to random number ∗/ matrix initMatrixRand(unsigned int width, unsigned int height); 1https://www.microsoft.com/en-us/research/publication/simple-encrypted-arithmetic-library-seal-v2-0/# 1 /∗ copy a matrix and return its copy ∗/ matrix copyMatrix(matrix toCopy); /∗ destroy matrix − f r e e memory − set any remaining value to NULL ∗/ void freeMatrix(matrix toDestroy); /∗ return entry at position (xPos, yPos), DBL MAX in case of error
    [Show full text]
  • A Study of Hierarchical Clustering Algorithm
    International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 11 (2013), pp. 1225-1232 © International Research Publications House http://www. irphouse.com /ijict.htm A Study of Hierarchical Clustering Algorithm Yogita Rani¹ and Dr. Harish Rohil2 1Reseach Scholar, Department of Computer Science & Application, CDLU, Sirsa-125055, India. 2Assistant Professor, Department of Computer Science & Application, CDLU, Sirsa-125055, India. Abstract Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but these objects are very dissimilar to the objects that are in other clusters. Clustering methods are mainly divided into two groups: hierarchical and partitioning methods. Hierarchical clustering combine data objects into clusters, those clusters into larger clusters, and so forth, creating a hierarchy of clusters. In partitioning clustering methods various partitions are constructed and then evaluations of these partitions are performed by some criterion. This paper presents detailed discussion on some improved hierarchical clustering algorithms. In addition to this, author have given some criteria on the basis of which one can also determine the best among these mentioned algorithms. Keywords: Hierarchical clustering; BIRCH; CURE; clusters ;data mining. 1. Introduction Data mining allows us to extract knowledge from our historical data and predict outcomes of our future situations. Clustering is an important data mining task. It can be described as the process of organizing objects into groups whose members are similar in some way. Clustering can also be define as the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters.
    [Show full text]