Police Services Policy Manual

Total Page:16

File Type:pdf, Size:1020Kb

Load more

ACKNOWLEDGEMENT OF RECEIPT OF, AND COMPLIANCE WITH, STANDARD OPERATING PROCEDURES MANUAL I, the undersigned, acknowledge that I have received a copy of the Socorro Independent School District Police Department’s Standard Operating Procedures Manual. I understand that, as a condition of my employment, I am responsible for understanding and complying with the contents of this manual as well as with all other policies and procedures, laws, rules, and regulations issued by the Socorro Independent School District, the Police Department, and the State of Texas. (EMPLOYEE) (DATE) This signed form shall be retained in the employee’s Department personnel file. Copies will be maintained in the District’s personnel file and this manual. 1 Police Department Policy Manual Socorro ISD County-District No. 071-909 Development Date: August 2008 Effective the 2014-15 School year in compliance with Policy CKE(LOCAL), the Chief of Police and the Superintendent shall review the manual annually and make any appropriate revisions. The Board shall be notified of any revisions to the manual. Compliance Superintendent/Chief of Police Review: 03/16/2015 Board Notified of Revisions: 03/24/2015 2 MISSION STATEMENT It is the mission of the Socorro ISD Police Department to provide services with integrity and dedication, to preserve life, to enforce the law, and to work in partnership with the Socorro Independent School District, the City of El Paso, and other law enforcement agencies in order to enhance the quality of education and life in our community. 3 CORE VALUES We respect life: We hold the preservation of life as our sacred duty, our value of human life set our priorities. We revere the truth: We accept nothing less than truth, honesty, and integrity in our profession. We enforce the law: We recognize that our basic responsibility is to enforce the law. Our role is to resolve problems through the law, not to judge or punish. We seek community partnership: We view the people of our school district as partners who deserve our concern, care, and attention. We are committed to reducing the fear of crime in our schools and community. We endeavor to do this by creating partnerships in our schools and neighborhoods. We honor our police powers: We understand that our police powers are derived from the people we serve. We do not tolerate the abuse of our police authority. We conduct ourselves with dignity: We recognize that our personal conduct, both on and off duty, is inseparable from the professional reputation of the officer, the police department, and the school district. 4 DEFINITIONS CHAIN OF COMMAND The unbroken line of communication and authority from the Superintendent to the lowest ranking employee. CIVILIAN Any person other than a sworn officer in the Department. COMMISSIONED OFFICER Person who performs law enforcement duties in the field and possesses approved State of Texas commissions. DEPARTMENT The Socorro Independent School District Police Department. DIRECTIVES Written rules, regulations, orders or procedures. EMPLOYEE All personnel of the Department unless otherwise indicated. FIELD SUPERVISOR Person in a command capacity whom is responsible for the proper work performance of subordinates. GENDER Use of masculine gender also includes, where applicable, the female gender. INCOMPETENT Incapable of satisfactorily performing assigned duties. I.D. Identification INSPECTION Examination to ensure that appearance, uniforms, buildings, offices, equipment, duties, and operations conform to established standards. INSUBORDINATION Willful disobedience of any order lawfully issued. JUVENILE A person under 17 years of age. LINE OF DUTY Only those actions taken while in the lawful and proper discharge of duty. LINE OFFICER Person who performs enforcement/security duties in the field at the execution level. May be commissioned or non-commissioned. MALFEASANCE An act or instance of wrongdoing that is legally unjustified or contrary to law. MAY When used, the word may is permissive. MISFEASANCE The performance of a lawful action in an illegal or improper manner. NEGLECT OF DUTY A knowledgeable failure to perform assigned duties or to respond to a situation requiring appropriate action. NONFEASANCE Omission to do what should be done. OFF DUTY That period of time when not scheduled for duty. OFFICER Includes line, supervisory, and command personnel unless otherwise indicated. 5 ON DUTY That period of time when scheduled to perform official duties or when required to act in an official capacity while off duty. ORDER Instruction or directive, written or oral, issued by a supervisor to a subordinate or group of subordinates in the course of duty. PERSONNEL All employees of the Department unless otherwise indicated. POLICY A general oral or written statement which guides the Department and its employees in the direction of Department goals and objectives. PROBATIONARY EMPLOYEE A full-time employee who has not yet completed the trial phase of employment with this department. PROCEDURE Officially approved method of handling, responding to, or dealing with any given situation. May be enacted verbally, in writing, or by approved custom. RELIEVED OF DUTY Not permitted to perform any official duty. SHALL When used, shall is mandatory SHIFT SUPERVISOR Person in a command capacity that is responsible for the proper work performance of subordinates. SHOULD Highly recommended course of action. SUSPENSION An action placing an employee in a non-duty status for a period of time. TERMINATION The resignation, retirement, dismissal, or death of an employee 6 CHAPTER ONE GENERAL PROVISIONS 1.00 ESTABLISHMENT OF STANDARD GENERAL MANUAL The General Manual for the Socorro Independent School District (SISD) Police Department is hereby established and is hereafter referred to as “The General Manual” or the “Manual.” This manual is adopted pursuant to and subject to SISD Board Policies CKE (Legal) and (Local). It is also subject to applicable state and federal laws, and applicable Board policies of the Socorro Independent School District. 1.01 CONTENT OF GENERAL MANUAL The General Manual describes the organizational structure of the Department, projects the police purpose and sets forth policies and procedures in keeping with that purpose. 1.02 AUTHORITY OF MANUAL The General Manual is an official instrument of the SISD Police Department. All commissioned and non- commissioned employees are required to comply with the provisions of the General Manual unless by their nature, the provisions are inapplicable to the person’s position or assigned duties. All existing orders and regulations that are in conflict with the General Manual are hereby revoked. All, or portions of, existing orders and regulations which have not been included herein, and which do not conflict with the provisions of the General Manual, remain in effect. 1.03 VARIATION OF TERMS AND GENDER The definition of any term used in this manual applies to each grammatical variation and gender of that term. 1.04 NUMBERING SYSTEM A modified numerical decimal system is used to number each chapter, section, and subsection of the manual to provide quick reference to all or selected materials. 1.04.001 PROCEDURE Chapter and sub-topic or section, followed by procedure, designates the numbering system, for example: 5.08.003 would be Chapter 5, Section 8, etc., followed by numbers 1,2,3, in the accepted outline form. Exceptions are noted as such. 1.05 DISTRIBUTION OF MANUAL The SISD Police Department distributes the General Manual to all commissioned and non-commissioned officers and personnel and a record receipt by signature is maintained. 1.05.001 CARE AND MAINTENANCE OF THE MANUAL Each employee is responsible for the care and updating of the Manual. The Office of the Chief of Police will furnish revisions and additions and will maintain distribution records. Supervisors will periodically inspect the manual of each subordinate to ensure completeness. 7 1.05.002 CONTENTS OF MANUAL CONFIDENTIAL All employees are reminded that copies of the General Manual are solely concerned with policies and procedures of the SISD Police Department. The General Manual is not for distribution or use by any non- department employee without the express, written authority of the Chief of Police. 1.05.003 DISPOSITION OF MANUAL WHEN EMPLOYEE LEAVES DEPARTMENT Upon transfer, termination, resignation, or retirement, etc., from the Department, it is the employee’s responsibility to return the Manual to the Office of the Chief of Police. 1.05.004 LOST OR STOLEN MANUAL In cases of lost or stolen Manuals, appropriate documentation in the form of an Incident Report must be prepared. If the loss of the Manual is found to be due to employee negligence, the employee will purchase a Manual from the school district. The cost of the Manual will be $25.00 and the Office of the Chief of Police will retain records. 1.06 OPERATIONS MANUALS “Operations Manuals” may be developed for the implementation of the policies and procedures as set forth by the General Manual. Operations Manuals may be developed at the direction of Chief of Police for each division or section in the department. These manuals are to be extensions of the General Manual and do not conflict with the General Manual. 1.06.001 COORDINATION AND CONTROL OF OPERATIONS MANUALS Operations Manuals may be developed at the direction of the Chief of Police. Prior to approval by the Chief of Police, the manuals will be checked for conflict with the General Manual. The Office of the Chief of Police will handle printing and distribution of the Operations Manuals. 1.07 SUGGESTIONS FOR CHANGES AND CHANGES IN THE GENERAL MANUAL Department employees may submit suggestions for improvement to the provisions of the General Manual in the form of an inter-office memorandum, via the chain of command. These communications are forwarded to the Chief of Police for consideration. If approved, a notification of change will be issued by the Office of the Chief of Police to incorporate the change or revision into the field Manuals.
Recommended publications
  • 2.5 Classification of Parallel Computers

    2.5 Classification of Parallel Computers

    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
  • Computer Hardware Architecture Lecture 4

    Computer Hardware Architecture Lecture 4

    Computer Hardware Architecture Lecture 4 Manfred Liebmann Technische Universit¨atM¨unchen Chair of Optimal Control Center for Mathematical Sciences, M17 [email protected] November 10, 2015 Manfred Liebmann November 10, 2015 Reading List • Pacheco - An Introduction to Parallel Programming (Chapter 1 - 2) { Introduction to computer hardware architecture from the parallel programming angle • Hennessy-Patterson - Computer Architecture - A Quantitative Approach { Reference book for computer hardware architecture All books are available on the Moodle platform! Computer Hardware Architecture 1 Manfred Liebmann November 10, 2015 UMA Architecture Figure 1: A uniform memory access (UMA) multicore system Access times to main memory is the same for all cores in the system! Computer Hardware Architecture 2 Manfred Liebmann November 10, 2015 NUMA Architecture Figure 2: A nonuniform memory access (UMA) multicore system Access times to main memory differs form core to core depending on the proximity of the main memory. This architecture is often used in dual and quad socket servers, due to improved memory bandwidth. Computer Hardware Architecture 3 Manfred Liebmann November 10, 2015 Cache Coherence Figure 3: A shared memory system with two cores and two caches What happens if the same data element z1 is manipulated in two different caches? The hardware enforces cache coherence, i.e. consistency between the caches. Expensive! Computer Hardware Architecture 4 Manfred Liebmann November 10, 2015 False Sharing The cache coherence protocol works on the granularity of a cache line. If two threads manipulate different element within a single cache line, the cache coherency protocol is activated to ensure consistency, even if every thread is only manipulating its own data.
  • Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2

    Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2

    Overview SIMD and MIMD in the Multicore Context Single Instruction Multiple Instruction ● (note: Tute 02 this Weds - handouts) ● Flynn’s Taxonomy Single Data SISD MISD ● multicore architecture concepts Multiple Data SIMD MIMD ● for SIMD, the control unit and processor state (registers) can be shared ■ hardware threading ■ SIMD vs MIMD in the multicore context ● however, SIMD is limited to data parallelism (through multiple ALUs) ■ ● T2: design features for multicore algorithms need a regular structure, e.g. dense linear algebra, graphics ■ SSE2, Altivec, Cell SPE (128-bit registers); e.g. 4×32-bit add ■ system on a chip Rx: x x x x ■ 3 2 1 0 execution: (in-order) pipeline, instruction latency + ■ thread scheduling Ry: y3 y2 y1 y0 ■ caches: associativity, coherence, prefetch = ■ memory system: crossbar, memory controller Rz: z3 z2 z1 z0 (zi = xi + yi) ■ intermission ■ design requires massive effort; requires support from a commodity environment ■ speculation; power savings ■ massive parallelism (e.g. nVidia GPGPU) but memory is still a bottleneck ■ OpenSPARC ● multicore (CMT) is MIMD; hardware threading can be regarded as MIMD ● T2 performance (why the T2 is designed as it is) ■ higher hardware costs also includes larger shared resources (caches, TLBs) ● the Rock processor (slides by Andrew Over; ref: Tremblay, IEEE Micro 2009 ) needed ⇒ less parallelism than for SIMD COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 1 COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 3 Hardware (Multi)threading The UltraSPARC T2: System on a Chip ● recall concurrent execution on a single CPU: switch between threads (or ● OpenSparc Slide Cast Ch 5: p79–81,89 processes) requires the saving (in memory) of thread state (register values) ● aggressively multicore: 8 cores, each with 8-way hardware threading (64 virtual ■ motivation: utilize CPU better when thread stalled for I/O (6300 Lect O1, p9–10) CPUs) ■ what are the costs? do the same for smaller stalls? (e.g.
  • Computer Architecture: Parallel Processing Basics

    Computer Architecture: Parallel Processing Basics

    Computer Architecture: Parallel Processing Basics Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/9/13 Today What is Parallel Processing? Why? Kinds of Parallel Processing Multiprocessing and Multithreading Measuring success Speedup Amdhal’s Law Bottlenecks to parallelism 2 Concurrent Systems Embedded-Physical Distributed Sensor Claytronics Networks Concurrent Systems Embedded-Physical Distributed Sensor Claytronics Networks Geographically Distributed Power Internet Grid Concurrent Systems Embedded-Physical Distributed Sensor Claytronics Networks Geographically Distributed Power Internet Grid Cloud Computing EC2 Tashi PDL'09 © 2007-9 Goldstein5 Concurrent Systems Embedded-Physical Distributed Sensor Claytronics Networks Geographically Distributed Power Internet Grid Cloud Computing EC2 Tashi Parallel PDL'09 © 2007-9 Goldstein6 Concurrent Systems Physical Geographical Cloud Parallel Geophysical +++ ++ --- --- location Relative +++ +++ + - location Faults ++++ +++ ++++ -- Number of +++ +++ + - Processors + Network varies varies fixed fixed structure Network --- --- + + connectivity 7 Concurrent System Challenge: Programming The old joke: How long does it take to write a parallel program? One Graduate Student Year 8 Parallel Programming Again?? Increased demand (multicore) Increased scale (cloud) Improved compute/communicate Change in Application focus Irregular Recursive data structures PDL'09 © 2007-9 Goldstein9 Why Parallel Computers? Parallelism: Doing multiple things at a time Things: instructions,
  • Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over Next 2 Weeks)!

    Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over Next 2 Weeks)!

    CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over next 2 weeks)! Lecture 23" Introduction to Parallel Processing! University of Notre Dame! University of Notre Dame! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 3! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 4! Processor components! Multicore processors and programming! Processor comparison! vs.! Goal: Explain and articulate why modern microprocessors now have more than one core andCSE how software 30321 must! adapt to accommodate the now prevalent multi- core approach to computing. " Introduction and Overview! Writing more ! efficient code! The right HW for the HLL code translation! right application! University of Notre Dame! University of Notre Dame! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 6! Pipelining and “Parallelism”! ! Load! Mem! Reg! DM! Reg! ALU ! Instruction 1! Mem! Reg! DM! Reg! ALU ! Instruction 2! Mem! Reg! DM! Reg! ALU ! Instruction 3! Mem! Reg! DM! Reg! ALU ! Instruction 4! Mem! Reg! DM! Reg! ALU Time! Instructions execution overlaps (psuedo-parallel)" but instructions in program issued sequentially." University of Notre Dame! University of Notre Dame! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! Multiprocessing (Parallel) Machines! Flynn#s
  • Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures

    Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures

    Computer Architecture A Quantitative Approach, Fifth Edition Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. SIMD architecture 2. Vector architectures optimizations: Multiple Lanes, Vector Length Registers, Vector Mask Registers, Memory Banks, Stride, Scatter-Gather, 3. Programming Vector Architectures 4. SIMD extensions for media apps 5. GPUs – Graphical Processing Units 6. Fermi architecture innovations 7. Examples of loop-level parallelism 8. Fallacies Copyright © 2012, Elsevier Inc. All rights reserved. 2 Classes of Computers Classes Flynn’s Taxonomy SISD - Single instruction stream, single data stream SIMD - Single instruction stream, multiple data streams New: SIMT – Single Instruction Multiple Threads (for GPUs) MISD - Multiple instruction streams, single data stream No commercial implementation MIMD - Multiple instruction streams, multiple data streams Tightly-coupled MIMD Loosely-coupled MIMD Copyright © 2012, Elsevier Inc. All rights reserved. 3 Introduction Advantages of SIMD architectures 1. Can exploit significant data-level parallelism for: 1. matrix-oriented scientific computing 2. media-oriented image and sound processors 2. More energy efficient than MIMD 1. Only needs to fetch one instruction per multiple data operations, rather than one instr. per data op. 2. Makes SIMD attractive for personal mobile devices 3. Allows programmers to continue thinking sequentially SIMD/MIMD comparison. Potential speedup for SIMD twice that from MIMID! x86 processors expect two additional cores per chip per year SIMD width to double every four years Copyright © 2012, Elsevier Inc. All rights reserved. 4 Introduction SIMD parallelism SIMD architectures A. Vector architectures B. SIMD extensions for mobile systems and multimedia applications C.
  • An Introduction to CUDA/Opencl and Graphics Processors

    An Introduction to CUDA/Opencl and Graphics Processors

    An Introduction to CUDA/OpenCL and Graphics Processors Bryan Catanzaro, NVIDIA Research Overview ¡ Terminology ¡ The CUDA and OpenCL programming models ¡ Understanding how CUDA maps to NVIDIA GPUs ¡ Thrust 2/74 Heterogeneous Parallel Computing Latency Throughput Optimized CPU Optimized GPU Fast Serial Scalable Parallel Processing Processing 3/74 Latency vs. Throughput Latency Throughput ¡ Latency: yoke of oxen § Each core optimized for executing a single thread ¡ Throughput: flock of chickens § Cores optimized for aggregate throughput, deemphasizing individual performance ¡ (apologies to Seymour Cray) 4/74 Latency vs. Throughput, cont. Specificaons Sandy Bridge- Kepler EP (Tesla K20) 8 cores, 2 issue, 14 SMs, 6 issue, 32 Processing Elements 8 way SIMD way SIMD @3.1 GHz @730 MHz 8 cores, 2 threads, 8 14 SMs, 64 SIMD Resident Strands/ way SIMD: vectors, 32 way Threads (max) SIMD: Sandy Bridge-EP (32nm) 96 strands 28672 threads SP GFLOP/s 396 3924 Memory Bandwidth 51 GB/s 250 GB/s Register File 128 kB (?) 3.5 MB Local Store/L1 Cache 256 kB 896 kB L2 Cache 2 MB 1.5 MB L3 Cache 20 MB - Kepler (28nm) 5/74 Why Heterogeneity? ¡ Different goals produce different designs § Manycore assumes work load is highly parallel § Multicore must be good at everything, parallel or not ¡ Multicore: minimize latency experienced by 1 thread § lots of big on-chip caches § extremely sophisticated control ¡ Manycore: maximize throughput of all threads § lots of big ALUs § multithreading can hide latency … so skip the big caches § simpler control, cost amortized over
  • Chapter 1 Introduction

    Chapter 1 Introduction

    Chapter 1 Introduction 1.1 Parallel Processing There is a continual demand for greater computational speed from a computer system than is currently possible (i.e. sequential systems). Areas need great computational speed include numerical modeling and simulation of scientific and engineering problems. For example; weather forecasting, predicting the motion of the astronomical bodies in the space, virtual reality, etc. Such problems are known as grand challenge problems. On the other hand, the grand challenge problem is the problem that cannot be solved in a reasonable amount of time [1]. One way of increasing the computational speed is by using multiple processors in single case box or network of computers like cluster operate together on a single problem. Therefore, the overall problem is needed to split into partitions, with each partition is performed by a separate processor in parallel. Writing programs for this form of computation is known as parallel programming [1]. How to execute the programs of applications in very fast way and on a concurrent manner? This is known as parallel processing. In the parallel processing, we must have underline parallel architectures, as well as, parallel programming languages and algorithms. 1.2 Parallel Architectures The main feature of a parallel architecture is that there is more than one processor. These processors may communicate and cooperate with one another to execute the program instructions. There are diverse classifications for the parallel architectures and the most popular one is the Flynn taxonomy (see Figure 1.1) [2]. 1 1.2.1 The Flynn Taxonomy Michael Flynn [2] has introduced taxonomy for various computer architectures based on notions of Instruction Streams (IS) and Data Streams (DS).
  • Transforming TLP Into DLP with the Dynamic Inter-Thread Vectorization Architecture Sajith Kalathingal

    Transforming TLP Into DLP with the Dynamic Inter-Thread Vectorization Architecture Sajith Kalathingal

    Transforming TLP into DLP with the dynamic inter-thread vectorization architecture Sajith Kalathingal To cite this version: Sajith Kalathingal. Transforming TLP into DLP with the dynamic inter-thread vectorization archi- tecture. Hardware Architecture [cs.AR]. Université Rennes 1, 2016. English. NNT : 2016REN1S133. tel-01426915v3 HAL Id: tel-01426915 https://tel.archives-ouvertes.fr/tel-01426915v3 Submitted on 28 Aug 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ANNEE´ 2016 THESE` / UNIVERSITE´ DE RENNES 1 sous le sceau de l’Universit´eBretagne Loire pour le grade de DOCTEUR DE L’UNIVERSITE´ DE RENNES 1 Mention : Informatique Ecole´ doctorale Matisse pr´esent´eepar Sajith Kalathingal pr´epar´ee`al’unit´ede recherche INRIA Institut National de Recherche en Informatique et Automatique Universit´ede Rennes 1 Th`esesoutenue `aRennes Transforming TLP into le 13 D´ecembre 2016 DLP with the Dynamic devant le jury compos´ede : Bernard GOOSSENS Inter-Thread Vector- Professeur `al’Universit´ede Perpignan Via Domitia / Rapporteur Smail NIAR ization Architecture Professeur `al’Universit´ede Valenciennes / Rapporteur Laure GONNORD Maˆitre de conf´erences `a l’Universit´e Lyon 1 / Examinatrice C´edricTEDESCHI Maˆitre de conf´erences `a l’Universit´e Rennes 1 / Examinateur Andr´eSEZNEC Directeur de recherches Inria / Directeur de th´ese Sylvain COLLANGE Charg´ede recherche INRIA / Co-directeur de th´ese Acknowledgement I would like to express my sincere gratitude to my thesis advisors, Andr´eSEZNEC and Sylvain COLLANGE.
  • Multiprocessors and Multithreading Multiprocessors Classifying Multiprocessors Flynn Taxonomy

    Multiprocessors and Multithreading Multiprocessors Classifying Multiprocessors Flynn Taxonomy

    Multiprocessors • why would you want a multiprocessor? Multiprocessors and Multithreading Processor Processor Processor more is better? Cache Cache Cache Single bus Memory I/O CSE 141 Dean Tullsen CSE 141 Dean Tullsen Classifying Multiprocessors Flynn Taxonomy • SISD (Single Instruction Single Data) – Uniprocessors • SIMD (Single Instruction Multiple Data) Flynn Taxonomy • – Examples: Illiac-IV, CM-2, Nvidia GPUs, etc. » Simple programming model • Interconnection Network » Low overhead • MIMD (Multiple Instruction Multiple Data) • Memory Topology – Examples: many, nearly all modern multiprocessors or multicores » Flexible • Programming Model » Use off-the-shelf microprocessors or microprocessor cores • MISD (Multiple Instruction Single Data) – ??? CSE 141 Dean Tullsen CSE 141 Dean Tullsen Interconnection Network Memory Topology P • Bus 0 • UMA (Uniform Memory Access) P1 Network P NUMA (Non-uniform Memory Access) • 2 • P • pros/cons? 3 • pros/cons? P4 P5 Processor Processor Processor P6 Processor Processor Processor P7 Cache Cache Cache Processor Processor Processor Cache Cache Cache Single bus Cache Cache Cache Single bus Memory I/O cpu M Memory Memory Memory Memory I/O cpu M Network Network cpu M . CSE 141 Dean Tullsen CSE 141. Dean Tullsen cpu M Programming Model Parallel Programming i = 47 • Shared Memory -- every processor can name every address Processor A Processor B location • Message Passing -- each processor can name only it’s local index = i++; index = i++; memory. Communication is through explicit messages. • pros/cons? Processor Processor Processor ChCache ChCache ChCache Memory Memory Memory • Shared-memory programming requires synchronization to provide mu tua l excl usi on and prevent race conditi ons Network – locks (semaphores) • find the max of 100,000 integers on 10 processors.
  • Parallel Architectures Impact of Parallel Architectures

    Parallel Architectures Impact of Parallel Architectures

    An Introduction to Parallel Architectures Impact of Parallel Architectures • From cell phones to supercomputers • In regular CPUs as well as GPUs Parallel HW Processing • Why? • Which types? • Which novel issues? Why Multicores? The SPECint performance of the hottest chip grew by 52% per year from 1986 to 2002, and then grew only 20% in the next three years (about 6% per year). [from Patterson & Hennessy] Diminishing returns from uniprocessor designs Power Wall [from Patterson & Hennessy] • The design goal for the late 1990’s and early 2000’s was to drive the clock rate up. • by adding more transistors to a smaller chip. • Unfortunately, this increased the power dissipation of the CPU chip beyond the capacity of inexpensive cooling techniques The Multicore Approach Multiple cores on the same chip – Simpler – Slower – Less power demanding Transition to Multicore Intel Xeon (18cores) Sequential App Performance 5/4/2018 Spring 2011 ‐‐ Lecture #15 7 Flynn Taxonomy of parallel computers Data streams Single Parallel Single SISD SIMD Instruction Streams Multiple MISD MIMD 5/4/2018 Spring 2011 ‐‐ Lecture #13 8 Alternative Kinds of Parallelism: Single Instruction/Single Data Stream • Single Instruction, Single Data stream (SISD) – Sequential computer that exploits no parallelism in either the instruction or data Processing Unit streams. Examples of SISD architecture are traditional uniprocessor machines 5/4/2018 Spring 2011 ‐‐ Lecture #13 9 Alternative Kinds of Parallelism: Multiple Instruction/Single Data Stream • Multiple Instruction, Single Data streams (MISD) – Computer that exploits multiple instruction streams against a single data stream for data operations that can be naturally parallelized. For example, certain kinds of array processors.
  • Accelerating CUDA Graph Algorithms at Maximum Warp

    Accelerating CUDA Graph Algorithms at Maximum Warp

    Accelerating CUDA Graph Algorithms at Maximum Warp Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi and Kunle Olukotun Graph Analysis Graph Analysis Fundamental data structure; random relationship between entities Wide usage of graph analysis Social Networks, Computational Biology, … Abundant data -level parallelism Still, is computationally challenging Growing data size Expensive algorithms e.g. betweenness centrality: O(NM) Random memory access Hard to partition for cluster execution (large surface to volume ratio) Machines for Graph Analysis Graphics Memory Supercomputers (e.g. Cray XMT) Memory Control Unit Instr Instr Instr Large, single shared address space ALU ALU ALU Uniform memory access time (cache-less) Shared Many processors, heavily multithreaded Mem (parallelism, latency hiding) Reg File SM Unit SM Unit SM Unit Large memory bandwidth But, rare and expensive GPU architecture ~ supercomputers Difference GPU has limited memory capacity (a few GB; no VM) Let’s use GPU as long as the problem size fits. Example Algorithm: BFS Breadth First Search (BFS) Starting from a node, visit all nodes in breadth-first order Node visit at each level is parallel. A building block for many other algorithm 1 Assigns BFS level to each node 1 1 e.g. Kevin-Bacon Number 2 3 2 2 2 3 3 3 3 3 Previous Work GPU Implementation [Harish et al, HiPC 2007] Frontier-expansion method Good for CUDA; no atomic operation required Each thread process a Foreach (v: G. Nodes ) int v = THREAD_ID;