A Bdi Agent Based Framework for Modeling and Simulation of Cyber Physical Systems

Total Page:16

File Type:pdf, Size:1020Kb

A Bdi Agent Based Framework for Modeling and Simulation of Cyber Physical Systems A BDI AGENT BASED FRAMEWORK FOR MODELING AND SIMULATION OF CYBER PHYSICAL SYSTEMS A Thesis Submitted to Temple University Graduate Board In Partial Fulfillment Of the Requirements for the Degree MASTER OF SCIENCE IN ELECTRICAL ENGINEERING By Qiangguo Ren August, 2011 Thesis Approval(s): __________________________________________ __________________ Dr. Li Bai, ECE (Thesis Advisor) Date __________________________________________ __________________ Dr. Saroj Biswas, ECE (Committee Member) Date __________________________________________ __________________ Dr. Frank Ferrese, ECE (Committee Member) Date ABSTRACT Cyber-physical systems refer to a new generation of synergy systems with integrated computational and physical processes which interact with one other. The development and simulation of cyber-physical systems (CPSs) are obstructed by the complexity of the subsystems of which they are comprised, fundamental differences in the operation of cyber and physical elements, significant correlative dependencies among the elements, and operation in dynamic and open environments. The Multiple Belief-Desire-Intention (BDI) agent system (BDI multi-agent system) is a promising choice for overcoming these challenges, since it offers a natural way to decompose complex systems or large scale problems into decentralized, autonomous, interacting, more or less intelligent entities. In particular, BDI agents have the ability to interact with, and expand the capabilities of, the physical world through computation, communication, and control. A BDI agent has its philosophical grounds on intentionality and practical reasoning, and it is natural to combine a philosophical model of human practical reasoning with the physical operation and any cyber infrastructure. In this thesis, we introduce the BDI Model, discuss implementations of BDI agents from an ideal theoretical perspective as well as from a more practical perspective, and show how they can be used to bridge the cyber infrastructure and the physical operation using the framework. We then strengthen the framework’s performance using the state-of-the-art parallel computing architecture and eventually propose a BDI agent based software framework to enable the efficient modeling and simulation of heterogeneous CPS systems in an integrated manner. ii TABLE OF CONTENTS ABSTRACT ................................................................................................................................................. II LIST OF TABLES ........................................................................................................................................ V LIST OF FIGURES .................................................................................................................................... VI CHAPTER 1 INTRODUCTION ................................................................................................................................... 1 Cyber-Physical System .................................................................................................................... 2 BDI Multi-Agent System ................................................................................................................. 4 GPGPU............................................................................................................................................. 8 Proposed Research ........................................................................................................................... 9 Organization of the Thesis ............................................................................................................. 10 2 BDI MULTI-AGENT SYSTEM .......................................................................................................... 11 Belief-Desire-Intention Model ....................................................................................................... 12 BDI-Agent ...................................................................................................................................... 16 Beliefs ................................................................................................................................... 16 Desires .................................................................................................................................. 17 Intentions .............................................................................................................................. 17 BDI Multi-Agent System ............................................................................................................... 19 Fundamental Platform - JADEX........................................................................................... 21 BDI in JADEX ...................................................................................................................... 21 Reasoning Mechanism in JADEX ........................................................................................ 26 Standard Compliance in JADEX .......................................................................................... 29 Implementation in JADEX ................................................................................................... 30 Summary ............................................................................................................................... 35 3 GPU COMPUTING .............................................................................................................................. 37 Compute Unified Device Architecture ........................................................................................... 38 Parallel Computing ............................................................................................................... 39 CPUs ..................................................................................................................................... 39 GPU Computing ................................................................................................................... 41 CUDA ................................................................................................................................... 44 Parallel Computing with CUDA .................................................................................................... 46 Parallel Programming Model ................................................................................................ 51 Kernels .................................................................................................................................. 53 Thread Model ....................................................................................................................... 54 Memory Model ..................................................................................................................... 56 Hardware Distinctions .......................................................................................................... 60 Programming on a CUDA-Enabled Device .......................................................................... 61 Programming Interface ......................................................................................................... 64 Typical Applications of CUDA ...................................................................................................... 67 Medical Imaging ................................................................................................................... 67 Computational Fluid Dynamics ............................................................................................ 69 Environmental Science ......................................................................................................... 71 Summary ............................................................................................................................... 73 iii 4 PROPOSED RESEARCH .................................................................................................................... 76 BDI Agent Based Modeling ........................................................................................................... 82 Technology ........................................................................................................................... 83 Methodology ......................................................................................................................... 85 Implementation ..................................................................................................................... 87 Summary ............................................................................................................................... 90 BDI Reasoning ............................................................................................................................... 90 Constitution .......................................................................................................................... 91 Mechanism ........................................................................................................................... 92 Implementation ..................................................................................................................... 94 Summary ............................................................................................................................... 97 CUDA Capabilities .......................................................................................................................
Recommended publications
  • Parallel Rendering on Hybrid Multi-GPU Clusters
    Eurographics Symposium on Parallel Graphics and Visualization (2012) H. Childs and T. Kuhlen (Editors) Parallel Rendering on Hybrid Multi-GPU Clusters S. Eilemann†1, A. Bilgili1, M. Abdellah1, J. Hernando2, M. Makhinya3, R. Pajarola3, F. Schürmann1 1Blue Brain Project, EPFL; 2CeSViMa, UPM; 3Visualization and MultiMedia Lab, University of Zürich Abstract Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a NUMA architecture, where multiple processor sockets each have their locally attached memory and where auxiliary devices such as GPUs and network interfaces are directly attached to one of the processors. Such so called fat NUMA processing and graphics nodes are increasingly used to build cost-effective hybrid shared/distributed memory visualization clusters. In this paper we present a thorough analysis of the asynchronous parallelization of the rendering stages and we derive and implement important optimizations to achieve highly interactive framerates on such hybrid multi-GPU clusters. We use both a benchmark program and a real-world scientific application used to visualize, navigate and interact with simulations of cortical neuron circuit models. Categories and Subject Descriptors (according to ACM CCS): I.3.2 [Computer Graphics]: Graphics Systems— Distributed Graphics; I.3.m [Computer Graphics]: Miscellaneous—Parallel Rendering 1. Introduction Hybrid GPU clusters are often build from so called fat render nodes using a NUMA architecture with multiple The decomposition of parallel rendering systems across mul- CPUs and GPUs per machine.
    [Show full text]
  • Fast Calculation of the Lomb-Scargle Periodogram Using Graphics Processing Units 3
    Draft version August 7, 2018 A Preprint typeset using LTEX style emulateapj v. 11/10/09 FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS R. H. D. Townsend Department of Astronomy, University of Wisconsin-Madison, Sterling Hall, 475 N. Charter Street, Madison, WI 53706, USA; [email protected] Draft version August 7, 2018 ABSTRACT I introduce a new code for fast calculation of the Lomb-Scargle periodogram, that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match 8 CPU cores, and on a high-end GPU it is faster by a factor approaching thirty. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities; and Monte-Carlo simulation of periodogram statistical properties. Subject headings: methods: data analysis — methods: numerical — techniques: photometric — stars: oscillations 1. INTRODUCTION of the newly emergent field of GPU computing; then, Astronomical time-series observations are often charac- Section 3 reviews the formalism defining the L-S peri- terized by uneven temporal sampling (e.g., due to trans- odogram, and Section 4 presents a GPU-based code im- formation to the heliocentric frame) and/or non-uniform plementing this formalism. Benchmarking calculations coverage (e.g., from day/night cycles, or radiation belt to evaluate the accuracy and performance of the code passages).
    [Show full text]
  • Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers
    Eurographics Symposium on Parallel Graphics and Visualization (2006) Alan Heirich, Bruno Raffin, and Luis Paulo dos Santos (Editors) Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers S. Bachthaler1, M. Strengert1, D. Weiskopf2, and T. Ertl1 1Institute of Visualization and Interactive Systems, University of Stuttgart, Germany 2School of Computing Science, Simon Fraser University, Canada Abstract We adopt a technique for texture-based visualization of flow fields on curved surfaces for parallel computation on a GPU cluster. The underlying LIC method relies on image-space calculations and allows the user to visualize a full 3D vector field on arbitrary and changing hypersurfaces. By using parallelization, both the visualization speed and the maximum data set size are scaled with the number of cluster nodes. A sort-first strategy with image-space decomposition is employed to distribute the workload for the LIC computation, while a sort-last approach with an object-space partitioning of the vector field is used to increase the total amount of available GPU memory. We specifically address issues for parallel GPU-based vector field visualization, such as reduced locality of memory accesses caused by particle tracing, dynamic load balancing for changing camera parameters, and the combination of image-space and object-space decomposition in a hybrid approach. Performance measurements document the behavior of our implementation on a GPU cluster with AMD Opteron CPUs, NVIDIA GeForce 6800 Ultra GPUs, and Infiniband network connection. Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Viewing algorithms I.3.3 [Three-Dimensional Graphics and Realism]: Color, shading, shadowing, and texture 1.
    [Show full text]
  • Evolution of the Graphical Processing Unit
    University of Nevada Reno Evolution of the Graphical Processing Unit A professional paper submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science by Thomas Scott Crow Dr. Frederick C. Harris, Jr., Advisor December 2004 Dedication To my wife Windee, thank you for all of your patience, intelligence and love. i Acknowledgements I would like to thank my advisor Dr. Harris for his patience and the help he has provided me. The field of Computer Science needs more individuals like Dr. Harris. I would like to thank Dr. Mensing for unknowingly giving me an excellent model of what a Man can be and for his confidence in my work. I am very grateful to Dr. Egbert and Dr. Mensing for agreeing to be committee members and for their valuable time. Thank you jeffs. ii Abstract In this paper we discuss some major contributions to the field of computer graphics that have led to the implementation of the modern graphical processing unit. We also compare the performance of matrix‐matrix multiplication on the GPU to the same computation on the CPU. Although the CPU performs better in this comparison, modern GPUs have a lot of potential since their rate of growth far exceeds that of the CPU. The history of the rate of growth of the GPU shows that the transistor count doubles every 6 months where that of the CPU is only every 18 months. There is currently much research going on regarding general purpose computing on GPUs and although there has been moderate success, there are several issues that keep the commodity GPU from expanding out from pure graphics computing with limited cache bandwidth being one.
    [Show full text]
  • Literature Review and Implementation Overview: High Performance
    LITERATURE REVIEW AND IMPLEMENTATION OVERVIEW: HIGH PERFORMANCE COMPUTING WITH GRAPHICS PROCESSING UNITS FOR CLASSROOM AND RESEARCH USE APREPRINT Nathan George Department of Data Sciences Regis University Denver, CO 80221 [email protected] May 18, 2020 ABSTRACT In this report, I discuss the history and current state of GPU HPC systems. Although high-power GPUs have only existed a short time, they have found rapid adoption in deep learning applications. I also discuss an implementation of a commodity-hardware NVIDIA GPU HPC cluster for deep learning research and academic teaching use. Keywords GPU · Neural Networks · Deep Learning · HPC 1 Introduction, Background, and GPU HPC History High performance computing (HPC) is typically characterized by large amounts of memory and processing power. HPC, sometimes also called supercomputing, has been around since the 1960s with the introduction of the CDC STAR-100, and continues to push the limits of computing power and capabilities for large-scale problems [1, 2]. However, use of graphics processing unit (GPU) in HPC supercomputers has only started in the mid to late 2000s [3, 4]. Although graphics processing chips have been around since the 1970s, GPUs were not widely used for computations until the 2000s. During the early 2000s, GPU clusters began to appear for HPC applications. Most of these clusters were designed to run large calculations requiring vast computing power, and many clusters are still designed for that purpose [5]. GPUs have been increasingly used for computations due to their commodification, following Moore’s Law (demonstrated arXiv:2005.07598v1 [cs.DC] 13 May 2020 in Figure 1), and usage in specific applications like neural networks.
    [Show full text]
  • High-Performance and Energy-Efficient Irregular Graph Processing on GPU Architectures
    High-Performance and Energy-Efficient Irregular Graph Processing on GPU architectures Albert Segura Salvador Doctor of Philosophy Department of Computer Architecture Universitat Politècnica de Catalunya Advisors: Jose-Maria Arnau Antonio González July, 2020 2 Abstract Graph processing is an established and prominent domain that is the foundation of new emerging applications in areas such as Data Analytics, Big Data and Machine Learning. Appli- cations such as road navigational systems, recommendation systems, social networks, Automatic Speech Recognition (ASR) and many others are illustrative cases of graph-based datasets and workloads. Demand for higher processing of large graph-based workloads is expected to rise due to nowadays trends towards increased data generation and gathering, higher inter-connectivity and inter-linkage, and in general a further knowledge-based society. An increased demand that poses challenges to current and future graph processing architectures. To effectively perform graph processing, the large amount of data employed in these domains requires high throughput architectures such as GPGPU. Although the processing of large graph-based workloads exhibits a high degree of parallelism, the memory access patterns tend to be highly irregular, leading to poor GPGPU efficiency due to memory divergence. Graph datasets are sparse, highly unpredictable and unstructured which causes the irregular access patterns and low computation per data ratio, further lowering GPU utilization. The purpose of this thesis is to characterize the bottlenecks and limitations of irregular graph processing on GPGPU architectures in order to propose architectural improvements and extensions that deliver improved performance, energy efficiency and overall increased GPGPU efficiency and utilization. In order to ameliorate these issues, GPGPU graph applications perform stream compaction operations which process the subset of active nodes/edges so subsequent steps work on compacted dataset.
    [Show full text]
  • GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications
    University of Central Florida STARS HIM 1990-2015 2014 GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications Adam Phillips University of Central Florida Part of the Mathematics Commons Find similar works at: https://stars.library.ucf.edu/honorstheses1990-2015 University of Central Florida Libraries http://library.ucf.edu This Open Access is brought to you for free and open access by STARS. It has been accepted for inclusion in HIM 1990-2015 by an authorized administrator of STARS. For more information, please contact [email protected]. Recommended Citation Phillips, Adam, "GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications" (2014). HIM 1990-2015. 1613. https://stars.library.ucf.edu/honorstheses1990-2015/1613 GPU ACCELERATED APPROACH TO NUMERICAL LINEAR ALGEBRA AND MATRIX ANALYSIS WITH CFD APPLICATIONS by ADAM D. PHILLIPS A thesis submitted in partial fulfilment of the requirements for the Honors in the Major Program in Mathematics in the College of Sciences and in The Burnett Honors College at the University of Central Florida Orlando, Florida Spring Term 2014 Thesis Chair: Dr. Bhimsen Shivamoggi c 2014 Adam D. Phillips All Rights Reserved ii ABSTRACT A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applica- tions is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods.
    [Show full text]
  • Introduction to Gpus for HPC
    CSC 391/691: GPU Programming Fall 2011 Introduction to GPUs for HPC Copyright © 2011 Samuel S. Cho High Performance Computing • Speed: Many problems that are interesting to scientists and engineers would take a long time to execute on a PC or laptop: months, years, “never”. • Size: Many problems that are interesting to scientists and engineers can’t fit on a PC or laptop with a few GB of RAM or a few 100s of GB of disk space. • Supercomputers or clusters of computers can make these problems practically numerically solvable. Scientific and Engineering Problems • Simulations of physical phenomena such as: • Weather forecasting • Earthquake forecasting • Galaxy formation • Oil reservoir management • Molecular dynamics • Data Mining: Finding needles of critical information in a haystack of data such as: • Bioinformatics • Signal processing • Detecting storms that might turn into hurricanes • Visualization: turning a vast sea of data into pictures that scientists can understand. • At its most basic level, all of these problems involve many, many floating point operations. Hardware Accelerators • In HPC, an accelerator is hardware component whose role is to speed up some aspect of the computing workload. • In the olden days (1980s), Not an accelerator* supercomputers sometimes had array processors, which did vector operations on arrays • PCs sometimes had floating point accelerators: little chips that did the floating point calculations in hardware rather than software. Not an accelerator* *Okay, I lied. To Accelerate Or Not To Accelerate • Pro: • They make your code run faster. • Cons: • They’re expensive. • They’re hard to program. • Your code may not be cross-platform. Why GPU for HPC? • Graphics Processing Units (GPUs) were originally designed to accelerate graphics tasks like image rendering.
    [Show full text]
  • Background on GPGPU Programming
    Background on GPGPU Programming • Development of powerful 3D graphics chips – Consumer demand was accelerated by immersive first person games such as Doom and Quake – Microsoft DirectX 8.0 standard released in 2001 required hardware to handle both programmable vertex and programmable pixel shading standards – This meant developers had some control over computations on a GPU – The NVIDIA GeForce 3 series were examples of such chips Early Attempts at GPGPU Programming • Calculated a color value for every pixel on screen – Used (x,y) position, input colors, textures, etc. – Input color and textures were controlled by programmer – Any data value could be used as input and, after some computation, the final data value could be used for any purpose, not just the final pixel color • Restrictions that would prevent widespread use – Only a handful of input color and texture values could be used; there were restrictions on where results could be read from and written to memory – There was no reasonable way to debug code – Programmer would need to learn OpenGL or DirectX NVDIA CUDA Architecture • Introduced in November 2006 – First used on GeForce 8800 GTX – Introduced many features to facilitate GP programming • Some Features – Introduced a unified shader pipeline that allowed every ALU to be used for GP programming – Complied with IEEE single-precision floating point – Allowed arbitrary read and write access to memory via shared memory – Developed a free compiler, CUDA C, based on the widely used C language Selected Applications - 1 • Medical
    [Show full text]
  • NVIDIA Index Accelerated Computing for Visualizing Cholla's Galactic
    NVIDIA IndeX Accelerated Computing for Visualizing Cholla’s Galactic Winds Evan Schneider Brant Robertson Alexander Kuhn Christopher Lux Marc Nienhaus Princeton University University of California NVIDIA NVIDIA NVIDIA [email protected] Santa Cruz [email protected] [email protected] [email protected] [email protected] Abstract Galactic winds – outflows of gas driven out of galaxies by the combined effects of thousands of supernovae – are a crucial feature of galaxy evolution. By removing gas from galaxies, they regulate future star formation, and distribute the dust and heavy elements formed in stars throughout the Universe. Despite their importance, a complete theoretical picture of these winds has been elusive. Simulating the complicated interaction between the hot, high pressure gas created by supernovae and the cooler, high density gas in the galaxy disk requires massive computational resources and highly sophisticated software. In addition, galactic wind simulations generate terabytes of output posing additional challenges regarding the effective analysis of the simulated physical processes. In order to address those challenges, we present NVIDIA IndeX as a scalable framework to visualize the simulation output. The framework features a streaming- based architecture to interactively explore simulation results in distributed multi-GPU environments. We demonstrate how to customize specialized sampling programs for volume and surface rendering to cover specific analysis questions of galactic wind simulations. This provides an extensive level of control over the visualization while efficiently using available resources to achieve high levels of performance and visual accuracy. Galactic Winds. When a galaxy experiences a period of rapid star formation, the combined effect of thousands of supernovae exploding within the disk drives gas to flow out of the galaxy and into its surroundings, as seen in the simulation rendered here.
    [Show full text]
  • Time Complexity Parallel Local Binary Pattern Feature Extractor on a Graphical Processing Unit
    ICIC Express Letters ICIC International ⃝c 2019 ISSN 1881-803X Volume 13, Number 9, September 2019 pp. 867{874 θ(1) TIME COMPLEXITY PARALLEL LOCAL BINARY PATTERN FEATURE EXTRACTOR ON A GRAPHICAL PROCESSING UNIT Ashwath Rao Badanidiyoor and Gopalakrishna Kini Naravi Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education Manipal, Karnataka 576104, India f ashwath.rao; ng.kini [email protected] Received February 2019; accepted May 2019 Abstract. Local Binary Pattern (LBP) feature is used widely as a texture feature in object recognition, face recognition, real-time recognitions, etc. in an image. LBP feature extraction time is crucial in many real-time applications. To accelerate feature extrac- tion time, in many previous works researchers have used CPU-GPU combination. In this work, we provide a θ(1) time complexity implementation for determining the Local Binary Pattern (LBP) features of an image. This is possible by employing the full capa- bility of a GPU. The implementation is tested on LISS medical images. Keywords: Local binary pattern, Medical image processing, Parallel algorithms, Graph- ical processing unit, CUDA 1. Introduction. Local binary pattern is a visual descriptor proposed in 1990 by Wang and He [1, 2]. Local binary pattern provides a distribution of intensity around a center pixel. It is a non-parametric visual descriptor helpful in describing a distribution around a center value. Since digital images are distributions of intensity, it is helpful in describing an image. The pattern is widely used in texture analysis, object recognition, and image description. The local binary pattern is used widely in real-time description, and analysis of objects in images and videos, owing to its computational efficiency and computational simplicity.
    [Show full text]
  • Locality-Centric Data and Threadblock Management for Massive Gpus
    2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) Locality-Centric Data and Threadblock Management for Massive GPUs Mahmoud Khairy Vadim Nikiforov David Nellans Timothy G. Rogers Purdue University Purdue University NVIDIA Purdue University [email protected] [email protected] [email protected] [email protected] Abstract—Recent work has shown that building GPUs with Single Logical GPU Controller chip hundreds of SMs in a single monolithic chip will not be practical (Kernel/TB Scheduler, ..) due to slowing growth in transistor density, low chip yields, and GPU .….. GPU GPU GPU ..... HBM photoreticle limitations. To maintain performance scalability, HBM Chiplet Chiplet proposals exist to aggregate discrete GPUs into a larger virtual GPU and decompose a single GPU into multiple-chip-modules Switch with increased aggregate die area. These approaches introduce Inter-Chiplet Connection Network non-uniform memory access (NUMA) effects and lead to .….. GPU GPU decreased performance and energy-efficiency if not managed HBM GPU ..... HBM GPU Chiplet appropriately. To overcome these effects, we propose a holistic Chiplet Locality-Aware Data Management (LADM) system designed to Silicon Interposer or Package Substrate operate on massive logical GPUs composed of multiple discrete Fig. 1: Future massive logical GPU containing multiple dis- devices, which are themselves composed of chiplets. LADM has three key components: a threadblock-centric index analysis, a crete GPUs, which are themselves composed of chiplets in a runtime system that performs data placement and threadblock hierarchical interconnect. scheduling, and an adaptive cache insertion policy. The runtime combines information from the static analysis with topology 1 information to proactively optimize data placement, threadblock systems, chiplet based architectures are desirable because scheduling, and remote data caching, minimizing off-chip traffic.
    [Show full text]