Cross-Modal Prediction Using Canonical Correlation Analysis with Privacy Preservation

CROSS-MODAL PREDICTION USING CANONICAL CORRELATION ANALYSIS WITH PRIVACY PRESERVATION by Hoa T. Nguyen A thesis presented to the Department of Computer Science and the Graduate School of the University of Central Arkansas in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Conway, Arkansas May 2021 TO THE OFFICE OF GRADUATE STUDIES: The members of the Committee approve the thesis of Hoa T. Nguyen presented on April 9, 2021. Olcay Kursun, Ph.D., Committee Chairperson Ahmad Alsharif Ahmad Alsharif, Ph.D., Committee Co-Chairperson Bernard Chen, Ph.D. Tansel Halic, Ph.D. PERMISSION Title Cross-Modal Prediction Using Canonical Correlation Analysis With Privacy Preservation Department Computer Science Degree Master of Science, Computer Science In presenting this thesis/dissertation in partial fulfillment of the requirements for a graduate degree from the University of Central Arkansas, I agree that the Library of this University shall make it freely available for inspections. I further agree that permission for extensive copying for scholarly purposes may be granted by the professor who supervised my thesis/dissertation work, or, in the professor’s absence, by the Chair of the Department or the Dean of the Graduate School. It is understood that due recognition shall be given to me and to the University of Central Arkansas in any scholarly use which may be made of any material in my thesis/dissertation. Hoa T. Nguyen April 16, 2021 © 2021 Hoa T. Nguyen iv ACKNOWLEDGEMENT I wish to express my appreciation to my thesis advisor, Dr. Olcay Kursun, professor of the Department of Computer Science at the University of Central Arkansas. He was always supportive and encouraging during our time working on the thesis. He taught me much about machine learning, deep learning, and other materials needed for the thesis study. Dr. Kursun also provided me with some of the deep learning programs he developed under the DART project. This work was supported, in part, by the National Science Foundation under Award No. OIA-1946391. I would also like to extend my gratitude to the faculty members of the Department of Computer Science at the University of Central Arkansas. With their guidance, I was able to complete my undergraduate and graduate studies. The lessons that I learned from them have enabled me to discover, continuously pursue, and turn my passion for Computer Science into a career after graduation. Finally, I would like to thank my family and friends for being by my side and supporting me through the years of my studies. They have been most encouraging and always had faith in the academic and career path that I have chosen. Everything that I have achieved would not have been possible without them. v VITA Hoa T. Nguyen studied towards her Bachelor’s degree first at Arkansas State University – Beebe and then at the University of Central Arkansas (UCA), Conway, AR. In May 2020, she graduated from UCA, Magna Cum Laude, with a Bachelor of Science in two majors, Mathematics and Computer Science. She was enrolled in the 4+1 program at UCA and managed to complete her Master’s studies with the thesis option in May 2021. vi ABSTRACT Canonical Correlation Analysis (CCA) is a multi-view feature extraction method that aims at finding correlated features (similarities) across multiple datasets (also called views or modalities). CCA characterizes these similarities by learning linear transformations of each view such that their extracted features have a maximal mutual correlation. As CCA is a linear method, the features are computed by a weighted sum of each view's variables. With the learned weights, CCA can be applied to test examples and serve in cross-modal prediction by inferring the target-view variables of an example from its given variables in a source (query) view. Being a linear method, CCA’s applicability on unstructured datasets in cross-modal prediction is limited. Although kernel extensions of CCA theoretically generalize it to learn nonlinear transformations, combining CCA with deep learning is the state-of-the-art method for mining cross-modal correlations among unstructured data, such as images, audio, and text/tags. Moreover, as the traditional cryptographic tools block the ability of CCA to explore the similarities among encrypted views, with multiple correlated datasets involving multiple organizations, privacy-preserving extensions of CCA have been recently proposed. This thesis proposes a CCA-based method for cross-modal prediction on two datasets: Multi-View Digits, a structured dataset used as a proof-of-concept, and CIFAR-100, an unstructured dataset used to demonstrate the mining of image-tag correlations. The proposed method is designed to take advantage of deep learning solutions for feature extraction to deal with unstructured data. This thesis also describes a procedure that incorporates the solutions to Yao’s Millionaires’ problem, which is studied in the cybersecurity field to support privacy preservation, into the proposed CCA-based cross-modal prediction method. vii TABLE OF CONTENTS ACKNOWLEDGEMENT ...................................................................................................v VITA .................................................................................................................................. vi ABSTRACT ...................................................................................................................... vii LIST OF TABLES ............................................................................................................. ix LIST OF FIGURES .............................................................................................................x LIST OF SYMBOLS AND/OR ABBREVIATIONS ....................................................... xi CHAPTER 1 INTRODUCTION .........................................................................................1 CHAPTER 2 MATERIALS & METHODS ........................................................................4 2.1 CCA For Cross-Modal Prediction .....................................................................4 2.2 Multi-View Datasets ..........................................................................................6 2.3 Convolutional Neural Networks .....................................................................10 CHAPTER 3 CCA-BASED CROSS-MODAL PREDICTION PROCEDURE ...............14 3.1 Application of CCA & Nearest Neighbors Model ...........................................15 3.2 Classification Accuracy Evaluation Method ...................................................17 3.3 Privacy Preservation Based On Millionaires' Problem & Alternating Regression ..............................................................................................................18 CHAPTER 4 EXPERIMENTAL RESULTS ....................................................................21 4.1 Multi-View Digits Dataset ...............................................................................21 4.2 CIFAR-100 Dataset .........................................................................................28 CHAPTER 5 CONCLUSION AND FUTURE WORK ....................................................34 REFERENCES ..................................................................................................................36 viii LIST OF TABLES Table 2.2.1. Five exemplary superclasses and their corresponding classes in the CIFAR- 100 dataset ...........................................................................................................................8 Table 3.1.1. Steps to train a CCA model and obtain the canonical space for two views ..14 Table 3.3.1. Proposed CCA-privacy-preserving algorithm in cross-modal prediction .....18 Table 4.1.1. Test errors computed for Multi-View Digits modality pairs when performing nearest neighbors search (a) within the canonical space and (b) after reconstructing the space ...................................................................................................................................22 Table 4.2.1. Test errors produced by the (a) proposed and (b) pseudoinverse methods of cross-modal prediction, using tag representations and image features extracted by AlexNet, ResNet, and VGG ...............................................................................................30 ix LIST OF FIGURES Figure 2.1.1. A general schematic for CCA learning of two-view data ..............................4 Figure 2.2.1. Some exemplary images of the CIFAR-100 dataset ......................................8 Figure 2.3.1. An example of CNN architecture .................................................................11 Figure 3.1.1. Proposed cross-modal prediction process with CCA and evaluation method using within-/cross-modal intersection error, cross-modal classification error, and cross- modal-top-3 classification error. ........................................................................................15 Figure 4.1.1. Test errors for 30 query/target view pairs of the Multi-View Digits dataset using the proposed and pseudoinverse cross-modal prediction methods. .........................25 Figure 4.1.2. Test errors produced by using different correlation thresholds when applying the proposed and CCA-pseudoinverse cross-modal prediction methods with pixel averages as the query modality and Fourier coefficients as the target modality ......27 Figure 4.2.1. The number of CCA components versus the test errors on the CIFAR

Load more