Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices 1 Chenhan D
Total Page:16
File Type:pdf, Size:1020Kb
2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2019) Singapore 1 – 4 October 2019 IEEE Catalog Number: CFP19MCO-POD ISBN: 978-1-7281-4883-0 Copyright © 2019 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved. *** This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP19MCO-POD ISBN (Print-On-Demand): 978-1-7281-4883-0 ISBN (Online): 978-1-7281-4882-3 Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: [email protected] Web: www.proceedings.com 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems- on-Chip (MCSoC) MCSoC 2019 Table of Contents Message from the Chairs xi Committee Members xii Keynotes xvii Session 1: Auto-Tuning for Multicore and GPU (ATMG2019) Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices 1 Chenhan D. Yu (The University of Texas at Austin), Severin Reiz (Technische Universität München), and George Biros (The University of Texas at Austin) Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU 9 Satoshi Ohshima (Kyushu University), Ichitaro Yamazaki (University of Tennessee), Akihiro Ida (The University of Tokyo), and Rio Yokota (Tokyo Institute of Technology) An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems 17 Mulya Agung (Tohoku University), Muhammad Alfian Amrizal (Tohoku University), Ryusuke Egawa (Tohoku University), and Hiroyuki Takizawa (Tohoku University) Performance Tuning of Tile Matrix Decomposition 25 Tomohiro Suzuki (University of Yamanashi) Session 2: Low-power Solutions for Future SoC design A System Delay Monitor Exploiting Automatic Cell-Based Design Flow and Post-Silicon Calibration 32 Hayate Okuhara (Keio University), Ryosuke Kazami (Keio University), and Hideharu Amano (Keio University) Multicore Power Estimation using Independent Component Analysis Based Modeling 38 Mark Sagi (Technical University of Munich), Nguyen Anh Vu Doan (Technical University of Munich), Thomas Wild (Technical University of Munich), and Andreas Herkersdorf (Technical University of Munich) Building Scalable and Highly Efficient Accelerators Near the End of Conventional Scaling 46 Johannes Maximilian Kühn (Preferred Networks Inc.) v Session 3-A: Digital Circuit & FPGA-based Design - I FPGA/Python Co-Design for Lane Line Detection on a PYNQ-Z1 Board 53 Koki Honda (Keio University), Kaijie Wei (Keio University), and Hideharu Amano (Keio University) Design of Asynchronous CNN Circuits on Commercial FPGA from Synchronous CNN Circuits 61 Hayato Kato (The University of Aizu, Japan) and Hiroshi Saito (The University of Aizu, Japan) Modular Memory System for RISC-V Based MPSoCs on Xilinx FPGAs 68 Ahmed Kamaleldin (Technische Universität Dresden), Muhammad Ali (Technische Universität Dresden), Pedram Amini Rad (Technische Universität Dresden), Marcus Gottschalk (Technische Universität Dresden), and Diana Göhringer (Technische Universität Dresden) A Novel SLM-Based Virtual FPGA Overlay Architecture 74 Theingi Myint (Kumamoto University), Motoki Amagasaki (Kumamoto University), Qian Zhao (Kyushu Institute of Technology), Masahiro Iida (Kumamoto University), and Masato Kiyama (Kumamoto University) Session 3-B: Machine Learning Deep Learning Framework with Arbitrary Numerical Precision 81 Masato Kiyama (Kumamoto University), Motoki Amagasaki (Kumamoto University), and Masahiro Iida (Kumamoto University) Tumour Detection using Convolutional Neural Network on a Lightweight Multi-Core Device 87 T. Hui Teo (Singapore University of Technology & Design), Wei Ming Tan (Singapore University of Technology & Design), and Yi Shu Tan (Singapore University of Technology & Design) Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks 93 Ryosuke Kuramochi (Tokyo Institute of Technology), Youki Sada (Tokyo Institute of Technology), Masayuki Shimoda (Tokyo Institute of Technology), Shimpei Sato (Tokyo Institute of Technology), and Hiroki Nakahara (Tokyo Institute of Technology) Distributed Neural Networks using TensorFlow over Multicore and Many-Core Systems 101 Jagadish Kumar Ranbirsingh (East Stroudsburg University), Hanke Kimm (East Stroudsburg University), and Haklin Kimm (East Stroudsburg University) Session 4-A: Digital Circuit & FPGA-based Design - II An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA 108 Katsunoshin Matsui (Tokyo Institute of Technology), Md Ashraful Islam (Tokyo Institute of Technology), and Kenji Kise (Tokyo Institute of Technology) vi Prototype of FPGA Dynamic Reconfiguration Based-on Context-Oriented Programming 116 Takeshi Ohkawa (Tokai University), Ikuta Tanigawa (Kyushu University), Mikiko Sato (Tokai University), Kenji Hisazumi (Kyushu University), Nobuhiko Ogura (Tokyo City University), and Harumi Watanabe (Tokai University) Implementation of Content-Based Anonymization Edge Router on NetFPGA 123 Akihiro Fukuhara (Keio University), Tomomu Iwai (Keio University), Yuiko Sakuma (Keio University), and Hiroaki Nishi (Keio University) Session 4-B: Intelligent Systems and Learning Technologies: Models, Methods, and Applications - II Unified Symbol Framework to Improve UI Comprehension 129 Rentaro Yoshioka (University of Aizu) and Naoyuki Murata (University of Aizu) Smart Ontology-Based Event Identification 135 Sarika Jain (National Institute of Technology, Kurukshetra) and Archana Patel (National Institute of Technology, Kurukshetra) A Semi-Lossless Image Compression Procedure using a Lossless Mode of JPEG 143 Md. Atiqur Rahman (University of Aizu) and Mohamed Hamada (University of Aizu) Session 5-A: Interconnection Networks - I A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems 149 Miguel Gorgues Alonso (Universidad Politecnica de Valencia), José Flich (Universidad Politecnica de Valencia), Meriem Turki (University of Ferrara), and Davide Bertozzi (University of Ferrara) Low-Cost Congestion Detection Mechanism for Networks-on-Chip 157 Zhengqian Han (Waseda University), Michael Conrad Meyer (Waseda University), Xin Jiang (Kitakyushu College), and Takahiro Watanabe (Waseda University) A Machine Learning Enabled Long-Term Performance Evaluation Framework for NoCs 164 Jie Hou (University of Stuttgart), Qi Han (University of Stuttgart), and Martin Radetzki (University of Stuttgart) Fault-Tolerant Traffic-Aware Routing Algorithm for 3-D Photonic Networks-on-Chip 172 Michael Conrad Meyer (Waseda University), Yu Wang (The University of Aizu), and Takahiro Watanabe (Waseda University) Session 5-B: Intelligent Systems and Learning Technologies: Models, Methods, and Applications - II Algorithm to Determine Extended Edit Distance between Program Codes 180 Kazuki Anzai (University of Aizu) and Yutaka Watanobe (University of Aizu) vii Automatic Generation of Fill-in-the-Blank Programming Problems 187 Kenta Terada (University of Aizu) and Yutaka Watanobe (University of Aizu) Convolutional Neural Network for Classification of Source Codes 194 Hiroki Ohashi (University of Aizu) and Yutaka Watanobe (University of Aizu) Design of Knowledge Templates and Multi-View Symbols for Experiential Learning 201 Takayuki Hoshino (University of Aizu) and Rentaro Yoshioka (University of Aizu) Session 6-A: Interconnection Networks - II A Traffic-Robust Routing Algorithm for Network-on-Chip Systems 209 Siying Xu (Waseda University, Japan), Michael Conrad Meyer (Waseda University, Japan), Xin Jiang (Waseda University, Japan), and Takahiro Watanabe (Waseda University, Japan) Fault Detection and Localization for Network-on-Chips in Mixed-Criticality Systems 217 Adele Maleki (University of Siegen), Hamidreza Ahmadian (University of Siegen), and Roman Obermaisser (University of Siegen) An on-Communication Multiple-TSV Defects Detection and Localization for Real-Time 3D-ICs 223 Khanh N. Dang (Vietnam National University, Hanoi), Akram Ben Ahmed (Keio University), and Xuan-Tu Tran (Vietnam National University, Hanoi) A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip 229 Yaoying Luo (Waseda University, Japan), Michael Conrad Meyer (Waseda University, Japan), Xin Jiang (Waseda University, Japan), and Takahiro Watanabe (Waseda University, Japan) Session 6-B: Applications and Architectures designed for energy efficient hardware Integrating Intra-and Intercellular Simulation of a 2D HL-1 Cardiac Model Based on Embedded GPUs 236 Baohua Liu (Shanghai University), Wenfeng Shen (Shanghai University), Xin Zhu (The University of Aizu), and Xingyu Wangchen (Shanghai University) Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators 241 Lu Peng (LSU), Wentao Shi (LSU), Jian Zhang (LSU), and Samuel