2019 IEEE 13th International Symposium on Embedded Multicore/Many-core

Systems-on-Chip (MCSoC 2019)

Singapore

1 – 4 October 2019

IEEE Catalog Number: CFP19MCO-POD ISBN: 978-1-7281-4883-0

Copyright © 2019 by the Institute of Electrical and Electronics Engineers, Inc. All Rights Reserved

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limit of U.S. copyright law for private use of patrons those articles in this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is paid through Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.

For other copying, reprint or republication permission, write to IEEE Copyrights Manager, IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08854. All rights reserved.

*** This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version.

IEEE Catalog Number: CFP19MCO-POD ISBN (Print-On-Demand): 978-1-7281-4883-0 ISBN (Online): 978-1-7281-4882-3

Additional Copies of This Publication Are Available From:

Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: [email protected] Web: www.proceedings.com

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems- on-Chip (MCSoC) MCSoC 2019

Table of Contents

Message from the Chairs xi Committee Members xii Keynotes xvii

Session 1: Auto-Tuning for Multicore and GPU (ATMG2019)

Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices 1 Chenhan D. Yu (The University of Texas at Austin), Severin Reiz (Technische Universität München), and George Biros (The University of Texas at Austin) Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU 9 Satoshi Ohshima (), Ichitaro Yamazaki (University of Tennessee), Akihiro Ida (The ), and Rio Yokota (Tokyo Institute of Technology) An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems 17 Mulya Agung (), Muhammad Alfian Amrizal (Tohoku University), Ryusuke Egawa (Tohoku University), and Hiroyuki Takizawa (Tohoku University) Performance Tuning of Tile Matrix Decomposition 25 Tomohiro Suzuki (University of Yamanashi)

Session 2: Low-power Solutions for Future SoC design

A System Delay Monitor Exploiting Automatic Cell-Based Design Flow and Post-Silicon Calibration 32 Hayate Okuhara (), Ryosuke Kazami (Keio University), and Hideharu Amano (Keio University) Multicore Power Estimation using Independent Component Analysis Based Modeling 38 Mark Sagi (Technical University of Munich), Nguyen Anh Vu Doan (Technical University of Munich), Thomas Wild (Technical University of Munich), and Andreas Herkersdorf (Technical University of Munich) Building Scalable and Highly Efficient Accelerators Near the End of Conventional Scaling 46 Johannes Maximilian Kühn (Preferred Networks Inc.)

v

Session 3-A: Digital Circuit & FPGA-based Design - I

FPGA/Python Co-Design for Lane Line Detection on a PYNQ-Z1 Board 53 Koki Honda (Keio University), Kaijie Wei (Keio University), and Hideharu Amano (Keio University) Design of Asynchronous CNN Circuits on Commercial FPGA from Synchronous CNN Circuits 61 Hayato Kato (The University of Aizu, ) and Hiroshi Saito (The University of Aizu, Japan) Modular Memory System for RISC-V Based MPSoCs on Xilinx FPGAs 68 Ahmed Kamaleldin (Technische Universität Dresden), Muhammad Ali (Technische Universität Dresden), Pedram Amini Rad (Technische Universität Dresden), Marcus Gottschalk (Technische Universität Dresden), and Diana Göhringer (Technische Universität Dresden) A Novel SLM-Based Virtual FPGA Overlay Architecture 74 Theingi Myint (), Motoki Amagasaki (Kumamoto University), Qian Zhao (Kyushu Institute of Technology), Masahiro Iida (Kumamoto University), and Masato Kiyama (Kumamoto University)

Session 3-B: Machine Learning

Deep Learning Framework with Arbitrary Numerical Precision 81 Masato Kiyama (Kumamoto University), Motoki Amagasaki (Kumamoto University), and Masahiro Iida (Kumamoto University) Tumour Detection using Convolutional Neural Network on a Lightweight Multi-Core Device 87 T. Hui Teo (Singapore University of Technology & Design), Wei Ming Tan (Singapore University of Technology & Design), and Yi Shu Tan (Singapore University of Technology & Design) Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks 93 Ryosuke Kuramochi (Tokyo Institute of Technology), Youki Sada (Tokyo Institute of Technology), Masayuki Shimoda (Tokyo Institute of Technology), Shimpei Sato (Tokyo Institute of Technology), and Hiroki Nakahara (Tokyo Institute of Technology) Distributed Neural Networks using TensorFlow over Multicore and Many-Core Systems 101 Jagadish Kumar Ranbirsingh (East Stroudsburg University), Hanke Kimm (East Stroudsburg University), and Haklin Kimm (East Stroudsburg University)

Session 4-A: Digital Circuit & FPGA-based Design - II

An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA 108 Katsunoshin Matsui (Tokyo Institute of Technology), Md Ashraful Islam (Tokyo Institute of Technology), and Kenji Kise (Tokyo Institute of Technology)

vi Prototype of FPGA Dynamic Reconfiguration Based-on Context-Oriented Programming 116 Takeshi Ohkawa (Tokai University), Ikuta Tanigawa (Kyushu University), Mikiko Sato (Tokai University), Kenji Hisazumi (Kyushu University), Nobuhiko Ogura (Tokyo City University), and Harumi Watanabe (Tokai University) Implementation of Content-Based Anonymization Edge Router on NetFPGA 123 Akihiro Fukuhara (Keio University), Tomomu Iwai (Keio University), Yuiko Sakuma (Keio University), and Hiroaki Nishi (Keio University)

Session 4-B: Intelligent Systems and Learning Technologies: Models, Methods, and Applications - II

Unified Symbol Framework to Improve UI Comprehension 129 Rentaro Yoshioka (University of Aizu) and Naoyuki Murata (University of Aizu) Smart Ontology-Based Event Identification 135 Sarika Jain (National Institute of Technology, Kurukshetra) and Archana Patel (National Institute of Technology, Kurukshetra) A Semi-Lossless Image Compression Procedure using a Lossless Mode of JPEG 143 Md. Atiqur Rahman (University of Aizu) and Mohamed Hamada (University of Aizu)

Session 5-A: Interconnection Networks - I

A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems 149 Miguel Gorgues Alonso (Universidad Politecnica de Valencia), José Flich (Universidad Politecnica de Valencia), Meriem Turki (University of Ferrara), and Davide Bertozzi (University of Ferrara) Low-Cost Congestion Detection Mechanism for Networks-on-Chip 157 Zhengqian Han (), Michael Conrad Meyer (Waseda University), Xin Jiang (Kitakyushu College), and Takahiro Watanabe (Waseda University) A Machine Learning Enabled Long-Term Performance Evaluation Framework for NoCs 164 Jie Hou (University of Stuttgart), Qi Han (University of Stuttgart), and Martin Radetzki (University of Stuttgart) Fault-Tolerant Traffic-Aware Routing Algorithm for 3-D Photonic Networks-on-Chip 172 Michael Conrad Meyer (Waseda University), Yu Wang (The University of Aizu), and Takahiro Watanabe (Waseda University)

Session 5-B: Intelligent Systems and Learning Technologies: Models, Methods, and Applications - II

Algorithm to Determine Extended Edit Distance between Program Codes 180 Kazuki Anzai (University of Aizu) and Yutaka Watanobe (University of Aizu)

vii Automatic Generation of Fill-in-the-Blank Programming Problems 187 Kenta Terada (University of Aizu) and Yutaka Watanobe (University of Aizu) Convolutional Neural Network for Classification of Source Codes 194 Hiroki Ohashi (University of Aizu) and Yutaka Watanobe (University of Aizu) Design of Knowledge Templates and Multi-View Symbols for Experiential Learning 201 Takayuki Hoshino (University of Aizu) and Rentaro Yoshioka (University of Aizu)

Session 6-A: Interconnection Networks - II

A Traffic-Robust Routing Algorithm for Network-on-Chip Systems 209 Siying Xu (Waseda University, Japan), Michael Conrad Meyer (Waseda University, Japan), Xin Jiang (Waseda University, Japan), and Takahiro Watanabe (Waseda University, Japan) Fault Detection and Localization for Network-on-Chips in Mixed-Criticality Systems 217 Adele Maleki (University of Siegen), Hamidreza Ahmadian (University of Siegen), and Roman Obermaisser (University of Siegen) An on-Communication Multiple-TSV Defects Detection and Localization for Real-Time 3D-ICs 223 Khanh N. Dang (Vietnam National University, Hanoi), Akram Ben Ahmed (Keio University), and Xuan-Tu Tran (Vietnam National University, Hanoi) A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip 229 Yaoying Luo (Waseda University, Japan), Michael Conrad Meyer (Waseda University, Japan), Xin Jiang (Waseda University, Japan), and Takahiro Watanabe (Waseda University, Japan)

Session 6-B: Applications and Architectures designed for energy efficient hardware

Integrating Intra-and Intercellular Simulation of a 2D HL-1 Cardiac Model Based on Embedded GPUs 236 Baohua Liu (Shanghai University), Wenfeng Shen (Shanghai University), Xin Zhu (The University of Aizu), and Xingyu Wangchen (Shanghai University) Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators 241 Lu Peng (LSU), Wentao Shi (LSU), Jian Zhang (LSU), and Samuel Irving (LSU)

Session 7-A: System Design

Towards an Efficient Hardware Architecture for Odd-Even Based Merge Sorter 249 Elsayed A. Elsayed (Tokyo Institute of Technology, Aswan University) and Kenji Kise (Tokyo Institute of Technology)

viii Energy and Performance Analysis of STTRAM Caches for Mobile Applications 257 Kyle Kuan (University of Arizona) and Tosiron Adegbija (University of Arizona) Designing Application-Specific Heterogeneous Architectures from Performance Models 265 Thanh Cong (Univ Rennes, INRIA, CNRS, IRISA) and François Charot (Univ Rennes, INRIA, CNRS, IRISA) Efficient Search-Space Encoding for System-Level Design Space Exploration of Embedded Systems 273 Valentina Richthammer (Ulm University) and Michael Glaß (Ulm University)

Session 7-B: Multicore/Manycore SoCs Programming

A Cloud Based Super-Optimization Method to Parallelize the Sequential Code's Nested Loops 281 Amin Majd (Åbo Akademi University, Finland), Mohammad Loni (Mälardalen University, Sweden), Golnaz Sahebi (University of Turku, Finland), Masoud Daneshtalab (Mälardalen University, Sweden), and Elena Troubitsyna (KTH Royal Institute of Technology, Sweden) Real-Time Implementation of Time-Space Continuous Dynamic Programming for Air-Drawn Character Recognition Using GPUs 288 Aki Nakamura (The University of Aizu), Yuichi Okuyama (The University of Aizu), and Ryuichi Oka (The University of Aizu) Graph Transformations and Derivation of Scheduling Constraints Applied to the Mapping of Real-Time Distributed Applications 295 Stephane Louise (CEA, LIST)

Session 8-A: Digital Circuit & FPGA-based Design - III

MITRACA: A Next-Gen Heterogeneous Architecture 304 Riadh Ben Abdelhamid (, Japan), Yoshiki Yamaguchi (University of Tsukuba, Japan), and Taisuke Boku (University of Tsukuba, Japan) A Preliminary Evaluation of Building Block Computing Systems 312 Sayaka Terashima (Keio University), Takuya Kojima (Keio University), Hayate Okuhara (Keio University), Kazusa Musha (Keio University), Hideharu Amano (Keio University), Ryuichi Sakamoto (The University of Tokyo), Masaaki Kondo (The University of Tokyo), and Mitaro Namiki (Tokyo University of Agriculture and Technology) Enhanced ID Authentication Scheme Using FPGA-Based Ring Oscillator PUF 320 Van-Toan Tran (Le Quy Don Technical University), Quang-Kien Trinh (Le Quy Don Technical University), and Van-Phuc Hoang (Le Quy Don Technical University) A STDM (Static Time Division Multiplexing) Switch on a Multi-FPGA System 328 Keita Azegami (Keio University), Kazusa Musha (Keio University), Kazuei Hironaka (Keio University), Akram Ben Ahmed (Keio University), Michihiro Koibuch (National Institute of Informatics), Yao Hu (National Institute of Informatics), and Hideharu Amano (Keio University)

ix

Session 8-B: Scalable and Flexible Many-Core Mapping and Runtime Techniques

Data-Driven Scenario-Based Application Mapping for Heterogeneous Many-Core Systems 334 Jan Spieck (Friedrich-Alexander-Universität Erlangen-Nürnberg), Stefan Wildermann (Friedrich-Alexander-Universität Erlangen-Nürnberg), Tobias Schwarzer (Friedrich-Alexander-Universität Erlangen-Nürnberg), Jürgen Teich (Friedrich-Alexander-Universität Erlangen-Nürnberg), and Michael Glaß (Friedrich-Alexander-Universität Erlangen-Nürnberg) Real-Time Attitude Estimation of Sigma-Point Kalman Filter via Matrix Operation Accelerator 342 Zeyang Dai (University of Aizu) and Lei Jing (University of Aizu) Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems 347 Manuel Strobel (University of Stuttgart) and Martin Radetzki (University of Stuttgart)

Session 9: Reliable and Real-time Multicore/Manycore SoCs

A Real-Time Fault-Tolerant and Power-Efficient Multicore System on Chip 354 Alexander Gruzlikov (Concern CSRI Elektropribor, JSC), Nikolai Kolesov (Concern CSRI Elektropribor, JSC), Dmitrii Kostygov (Concern CSRI Elektropribor, JSC), and Marina Tolmacheva (Concern CSRI Elektropribor, JSC) Statistical Analysis for Shared Resources Effects with Multi-Core Real-Time Systems 362 Julien Durand (CPT), Youcef Bouchebaba (ONERA), and Luca Santinelli (ONERA) Lightweight Semantics-Preserving Communication for Real-Time Automotive Software 372 Eugene Yip (University of Bamberg, Germany), Erjola Lalo (Vector Informatik GmbH, Germany), Gerald Lüttgen (University of Bamberg, Germany), and Andreas Sailer (Vector Informatik GmbH, Germany)

Author Index 381

x