Best Practice Guide Intel Xeon Phi V2.0

Total Page:16

File Type:pdf, Size:1020Kb

Best Practice Guide Intel Xeon Phi V2.0 Best Practice Guide Intel Xeon Phi v2.0 Emanouil Atanassov, IICT-BAS, Bulgaria Michaela Barth, KTH, Sweden Mikko Byckling, CSC, Finland Vali Codreanu, SURFsara, Netherlands Nevena Ilieva, NCSA, Bulgaria Tomas Karasek, IT4Innovations, Czech Republic Jorge Rodriguez, BSC, Spain Sami Saarinen, CSC, Finland Ole Widar Saastad, University of Oslo, Norway Michael Schliephake, KTH, Sweden Martin Stachon, IT4Innovations, Czech Republic Janko Strassburg, BSC, Spain Volker Weinberg (Editor), LRZ, Germany 31-01-2017 1 Best Practice Guide Intel Xeon Phi v2.0 Table of Contents 1. Introduction .............................................................................................................................. 5 2. Intel MIC architecture & system overview ..................................................................................... 6 2.1. The Intel MIC architecture ............................................................................................... 6 2.1.1. Intel Xeon Phi coprocessor architecture overview ....................................................... 6 2.1.2. The cache hierarchy .............................................................................................. 8 2.2. Network configuration & system access .............................................................................. 9 3. Programming Models ............................................................................................................... 15 4. Native compilation ................................................................................................................... 16 5. Offloading .............................................................................................................................. 17 5.1. Introduction .................................................................................................................. 17 5.2. Simple C example ......................................................................................................... 17 5.3. Simple Fortran-90 example ............................................................................................. 18 5.3.1. User function offload. .......................................................................................... 18 5.3.2. Offloading regions in program .............................................................................. 19 5.3.3. Offloading functions or subroutines ........................................................................ 19 5.3.4. Load distribution and balancing, hosts cpus, co-processors (single and multiple) .............. 20 5.4. Obtaining informations about the offloading ....................................................................... 22 5.5. Syntax of pragmas ........................................................................................................ 23 5.6. Recommendations ......................................................................................................... 24 5.6.1. Explicit worksharing ........................................................................................... 24 5.6.2. Persistent data on the coprocessor .......................................................................... 24 5.6.3. Optimizing offloaded code ................................................................................... 25 5.7. Intel Cilk Plus parallel extensions .................................................................................... 25 6. OpenMP, hybrid and OpenMP 4.x Offloading .............................................................................. 26 6.1. OpenMP ...................................................................................................................... 26 6.1.1. OpenMP basics .................................................................................................. 26 6.1.2. Threading and affinity ......................................................................................... 27 6.1.3. Loop scheduling ................................................................................................. 28 6.1.4. Scalability improvement ....................................................................................... 28 6.2. Hybrid OpenMP/MPI ..................................................................................................... 28 6.2.1. Programming models ........................................................................................... 28 6.2.2. Threading of the MPI ranks .................................................................................. 28 6.3. OpenMP 4.x Offloading ................................................................................................. 29 6.3.1. Execution Model ................................................................................................ 29 6.3.2. Overview of the most important device constructs .................................................... 30 6.3.3. The target construct ............................................................................................ 31 6.3.4. The teams construct ............................................................................................ 31 6.3.5. The distribute construct ....................................................................................... 32 6.3.6. Composite constructs and shortcuts in OpenMP 4.5 ................................................... 32 6.3.7. Examples .......................................................................................................... 33 6.3.8. Runtime routines and environment variables ............................................................ 34 6.3.9. Best Practices .................................................................................................... 34 7. MPI ...................................................................................................................................... 34 7.1. Setting up the MPI environment ...................................................................................... 34 7.2. MPI programming models .............................................................................................. 35 7.2.1. Coprocessor-only model ....................................................................................... 35 7.2.2. Symmetric model ................................................................................................ 36 7.2.3. Host-only model ................................................................................................. 36 7.3. Simplifying launching of MPI jobs ................................................................................... 36 8. Intel MKL (Math Kernel Library) ............................................................................................... 37 8.1. Introduction .................................................................................................................. 37 8.2. MKL usage modes ........................................................................................................ 37 8.2.1. Automatic Offload (AO) ...................................................................................... 38 8.2.2. Compiler Assisted Offload (CAO) ......................................................................... 38 8.2.3. Native Execution ................................................................................................ 39 2 Best Practice Guide Intel Xeon Phi v2.0 8.3. Example code ............................................................................................................... 40 8.4. Intel Math Kernel Library Link Line Advisor ..................................................................... 40 9. TBB: Intel Threading Building Blocks ........................................................................................ 40 9.1. Advantages of TBB ....................................................................................................... 40 9.2. Using TBB natively ....................................................................................................... 41 9.3. Offloading TBB ............................................................................................................ 43 9.4. Examples ..................................................................................................................... 43 9.4.1. Exposing parallelism to the processor ..................................................................... 43 9.4.2. Vectorization and Cache-Locality .......................................................................... 47 9.4.3. Work-stealing versus Work-sharing ........................................................................ 47 10. IPP: The Intel Integrated Performance Primitives ......................................................................... 47 10.1. Overview of IPP ......................................................................................................... 47 10.2. Using IPP .................................................................................................................. 48 10.2.1. Getting Started ................................................................................................. 48 10.2.2. Linking of Applications ..................................................................................... 49 10.3. Multithreading ............................................................................................................ 49 11. Further programming models ..................................................................................................
Recommended publications
  • Microbenchmarks in Big Data
    M Microbenchmark Overview Microbenchmarks constitute the first line of per- Nicolas Poggi formance testing. Through them, we can ensure Databricks Inc., Amsterdam, NL, BarcelonaTech the proper and timely functioning of the different (UPC), Barcelona, Spain individual components that make up our system. The term micro, of course, depends on the prob- lem size. In BigData we broaden the concept Synonyms to cover the testing of large-scale distributed systems and computing frameworks. This chap- Component benchmark; Functional benchmark; ter presents the historical background, evolution, Test central ideas, and current key applications of the field concerning BigData. Definition Historical Background A microbenchmark is either a program or routine to measure and test the performance of a single Origins and CPU-Oriented Benchmarks component or task. Microbenchmarks are used to Microbenchmarks are closer to both hardware measure simple and well-defined quantities such and software testing than to competitive bench- as elapsed time, rate of operations, bandwidth, marking, opposed to application-level – macro or latency. Typically, microbenchmarks were as- – benchmarking. For this reason, we can trace sociated with the testing of individual software microbenchmarking influence to the hardware subroutines or lower-level hardware components testing discipline as can be found in Sumne such as the CPU and for a short period of time. (1974). Furthermore, we can also find influence However, in the BigData scope, the term mi- in the origins of software testing methodology crobenchmarking is broadened to include the during the 1970s, including works such cluster – group of networked computers – acting as Chow (1978). One of the first examples of as a single system, as well as the testing of a microbenchmark clearly distinguishable from frameworks, algorithms, logical and distributed software testing is the Whetstone benchmark components, for a longer period and larger data developed during the late 1960s and published sizes.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Introduction to Intel Xeon Phi Programming Models
    Introduction to Intel Xeon Phi programming models F.Affinito F. Salvadore SCAI - CINECA Part I Introduction to the Intel Xeon Phi architecture Trends: transistors Trends: clock rates Trends: cores and threads Trends: summarizing... The number of transistors increases The power consumption must not increase The density cannot increase on a single chip Solution : Increase the number of cores GP-GPU and Intel Xeon Phi.. Coupled to the CPU To accelerate highly parallel kernels, facing with the Amdahl Law What is Intel Xeon Phi? 7100 / 5100 / 3100 Series available 5110P: Intel Xeon Phi clock: 1053 MHz 60 cores in-order ~ 1 TFlops/s DP peak performance (2 Tflops SP) 4 hardware threads per core 8 GB DDR5 memory 512-bit SIMD vectors (32 registers) Fully-coherent L1 and L2 caches PCIe bus (rev. 2.0) Max Memory bandwidth (theoretical) 320 GB/s Max TDP: 225 W MIC vs GPU naïve comparison The comparison is naïve System K20s 5110P # cores 2496 60 (*4) Memory size 5 GB 8 GB Peak performance 3.52 TFlops 2 TFlops (SP) Peak performance 1.17 TFlops 1 TFlops (DP) Clock rate 0.706 GHz 1.053 GHz Memory bandwidth 208 GB/s (ECC off) 320 GB/s Terminology MIC = Many Integrated Cores is the name of the architecture Xeon Phi = Commercial name of the Intel product based on the MIC architecture Knight's corner, Knight's landing, Knight's ferry are development names of MIC architectures We will often refer to the CPU as HOST and Xeon Phi as DEVICE Is it an accelerator? YES: It can be used to “accelerate” hot-spots of the code that are highly parallel and computationally extensive In this sense, it works alongside the CPU It can be used as an accelerator using the “offload” programming model An important bottleneck is represented by the communication between host and device (through PCIe) Under this respect, it is very similar to a GPU Is it an accelerator? / 2 NOT ONLY: the Intel Xeon Phi can behave as a many-core X86 node.
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]
  • Power Measurement Tutorial for the Green500 List
    Power Measurement Tutorial for the Green500 List R. Ge, X. Feng, H. Pyla, K. Cameron, W. Feng June 27, 2007 Contents 1 The Metric for Energy-Efficiency Evaluation 1 2 How to Obtain P¯(Rmax)? 2 2.1 The Definition of P¯(Rmax)...................................... 2 2.2 Deriving P¯(Rmax) from Unit Power . 2 2.3 Measuring Unit Power . 3 3 The Measurement Procedure 3 3.1 Equipment Check List . 4 3.2 Software Installation . 4 3.3 Hardware Connection . 4 3.4 Power Measurement Procedure . 5 4 Appendix 6 4.1 Frequently Asked Questions . 6 4.2 Resources . 6 1 The Metric for Energy-Efficiency Evaluation This tutorial serves as a practical guide for measuring the computer system power that is required as part of a Green500 submission. It describes the basic procedures to be followed in order to measure the power consumption of a supercomputer. A supercomputer that appears on The TOP500 List can easily consume megawatts of electric power. This power consumption may lead to operating costs that exceed acquisition costs as well as intolerable system failure rates. In recent years, we have witnessed an increasingly stronger movement towards energy-efficient computing systems in academia, government, and industry. Thus, the purpose of the Green500 List is to provide a ranking of the most energy-efficient supercomputers in the world and serve as a complementary view to the TOP500 List. However, as pointed out in [1, 2], identifying a single objective metric for energy efficiency in supercom- puters is a difficult task. Based on [1, 2] and given the already existing use of the “performance per watt” metric, the Green500 List uses “performance per watt” (PPW) as its metric to rank the energy efficiency of supercomputers.
    [Show full text]
  • Openimageio 1.7 Programmer Documentation (In Progress)
    OpenImageIO 1.7 Programmer Documentation (in progress) Editor: Larry Gritz [email protected] Date: 31 Mar 2016 ii The OpenImageIO source code and documentation are: Copyright (c) 2008-2016 Larry Gritz, et al. All Rights Reserved. The code that implements OpenImageIO is licensed under the BSD 3-clause (also some- times known as “new BSD” or “modified BSD”) license: Redistribution and use in source and binary forms, with or without modification, are per- mitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of condi- tions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of con- ditions and the following disclaimer in the documentation and/or other materials provided with the distribution. • Neither the name of the software’s owners nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIB- UTORS ”AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FIT- NESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUD- ING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABIL- ITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    [Show full text]
  • Accelerators for HP Proliant Servers Enable Scalable and Efficient High-Performance Computing
    Family data sheet Accelerators for HP ProLiant servers Enable scalable and efficient high-performance computing November 2014 Family data sheet | Accelerators for HP ProLiant servers HP high-performance computing has made it possible to accelerate innovation at any scale. But traditional CPU technology is no longer capable of sufficiently scaling performance to address the skyrocketing demand for compute resources. HP high-performance computing solutions are built on HP ProLiant servers using industry-leading accelerators to dramatically increase performance with lower power requirements. Innovation is the foundation for success What is hybrid computing? Accelerators are revolutionizing high performance computing A hybrid computing model is one where High-performance computing (HPC) is being used to address many of modern society’s biggest accelerators (known as GPUs or coprocessors) challenges, such as designing new vaccines and genetically engineering drugs to fight diseases, work together with CPUs to perform computing finding and extracting precious oil and gas resources, improving financial instruments, and tasks. designing more fuel efficient engines. As parallel processors, accelerators can split computations into hundreds or thousands of This rapid pace of innovation has created an insatiable demand for compute power. At the same pieces and calculate them simultaneously. time, multiple strict requirements are placed on system performance, power consumption, size, response, reliability, portability, and design time. Modern HPC systems are rapidly evolving, Offloading the most compute-intensive portions of already reaching petaflop and targeting exaflop performance. applications to accelerators dramatically increases both application performance and computational All of these challenges lead to a common set of requirements: a need for more computing efficiency.
    [Show full text]
  • Quick-Reference Guide to Optimization with Intel® Compilers
    Quick Reference Guide to Optimization with Intel® C++ and Fortran Compilers v19.1 For IA-32 processors, Intel® 64 processors, Intel® Xeon Phi™ processors and compatible non-Intel processors. Contents Application Performance .............................................................................................................................. 2 General Optimization Options and Reports ** ............................................................................................. 3 Parallel Performance ** ................................................................................................................................ 4 Recommended Processor-Specific Optimization Options ** ....................................................................... 5 Optimizing for the Intel® Xeon Phi™ x200 product family ............................................................................ 6 Interprocedural Optimization (IPO) and Profile-Guided Optimization (PGO) Options ................................ 7 Fine-Tuning (All Processors) ** ..................................................................................................................... 8 Floating-Point Arithmetic Options .............................................................................................................. 10 Processor Code Name With Instruction Set Extension Name Synonym .................................................... 11 Frequently Used Processor Names in Compiler Options ...........................................................................
    [Show full text]
  • Broadwell Skylake Next Gen* NEW Intel NEW Intel NEW Intel Microarchitecture Microarchitecture Microarchitecture
    15 лет доступности IOTG is extending the product availability for IOTG roadmap products from a minimum of 7 years to a minimum of 15 years when both processor and chipset are on 22nm and newer process technologies. - Xeon Scalable (w/ chipsets) - E3-12xx/15xx v5 and later (w/ chipsets) - 6th gen Core and later (w/ chipsets) - Bay Trail (E3800) and later products (Braswell, N3xxx) - Atom C2xxx (Rangeley) and later - Не включает в себя Xeon-D (7 лет) и E5-26xx v4 (7 лет) 2 IOTG Product Availability Life-Cycle 15 year product availability will start with the following products: Product Discontinuance • Intel® Xeon® Processor Scalable Family codenamed Skylake-SP and later with associated chipsets Notification (PDN)† • Intel® Xeon® E3-12xx/15xx v5 series (Skylake) and later with associated chipsets • 6th Gen Intel® Core™ processor family (Skylake) and later (includes Intel® Pentium® and Celeron® processors) with PDNs will typically be issued no later associated chipsets than 13.5 years after component • Intel Pentium processor N3700 (Braswell) and later and Intel Celeron processors N3xxx (Braswell) and J1900/N2xxx family introduction date. PDNs are (Bay Trail) and later published at https://qdms.intel.com/ • Intel® Atom® processor C2xxx (Rangeley) and E3800 family (Bay Trail) and late Last 7 year product availability Time Last Last Order Ship Last 15 year product availability Time Last Last Order Ship L-1 L L+1 L+2 L+3 L+4 L+5 L+6 L+7 L+8 L+9 L+10 L+11 L+12 L+13 L+14 L+15 Years Introduction of component family † Intel may support this extended manufacturing using reasonably Last Time Order/Ship Periods Component family introduction dates are feasible means deemed by Intel to be appropriate.
    [Show full text]
  • Energy Efficient Spin-Locking in Multi-Core Machines
    Energy efficient spin-locking in multi-core machines Facoltà di INGEGNERIA DELL'INFORMAZIONE, INFORMATICA E STATISTICA Corso di laurea in INGEGNERIA INFORMATICA - ENGINEERING IN COMPUTER SCIENCE - LM Cattedra di Advanced Operating Systems and Virtualization Candidato Salvatore Rivieccio 1405255 Relatore Correlatore Francesco Quaglia Pierangelo Di Sanzo A/A 2015/2016 !1 0 - Abstract In this thesis I will show an implementation of spin-locks that works in an energy efficient fashion, exploiting the capability of last generation hardware and new software components in order to rise or reduce the CPU frequency when running spinlock operation. In particular this work consists in a linux kernel module and a user-space program that make possible to run with the lowest frequency admissible when a thread is spin-locking, waiting to enter a critical section. These changes are thread-grain, which means that only interested threads are affected whereas the system keeps running as usual. Standard libraries’ spinlocks do not provide energy efficiency support, those kind of optimizations are related to the application behaviors or to kernel-level solutions, like governors. !2 Table of Contents List of Figures pag List of Tables pag 0 - Abstract pag 3 1 - Energy-efficient Computing pag 4 1.1 - TDP and Turbo Mode pag 4 1.2 - RAPL pag 6 1.3 - Combined Components 2 - The Linux Architectures pag 7 2.1 - The Kernel 2.3 - System Libraries pag 8 2.3 - System Tools pag 9 3 - Into the Core 3.1 - The Ring Model 3.2 - Devices 4 - Low Frequency Spin-lock 4.1 - Spin-lock vs.
    [Show full text]
  • On the Performance of MPI-Openmp on a 12 Nodes Multi-Core Cluster
    On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster Abdelgadir Tageldin Abdelgadir1, Al-Sakib Khan Pathan1∗ , Mohiuddin Ahmed2 1 Department of Computer Science, International Islamic University Malaysia, Gombak 53100, Kuala Lumpur, Malaysia 2 Department of Computer Network, Jazan University, Saudi Arabia [email protected] , [email protected] , [email protected] Abstract. With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall performance of a cluster built with nodes having a dual Quad-Core Processor on each node. Some benchmark results are presented and some observations are mentioned when handling such processors on a benchmark test. A Quad-Core-based cluster's complexity arises from the fact that both local communication and network communications between the running processes need to be addressed. The potentials of an MPI-OpenMP approach are pinpointed because of its reduced communication overhead. At the end, we come to a conclusion that an MPI-OpenMP solution should be considered in such clusters since optimizing network communications between nodes is as important as optimizing local communications between processors in a multi-core cluster. Keywords: MPI-OpenMP, hybrid, Multi-Core, Cluster. 1 Introduction The integration of two or more processors within a single chip is an advanced technology for tackling the disadvantages exposed by a single core when it comes to increasing the speed, as more heat is generated and more power is consumed by those single cores.
    [Show full text]
  • Openimageio Release 2.4.0
    OpenImageIO Release 2.4.0 Larry Gritz Sep 30, 2021 CONTENTS 1 Introduction 3 2 Image I/O API Helper Classes9 3 ImageOutput: Writing Images 41 4 ImageInput: Reading Images 69 5 Writing ImageIO Plugins 93 6 Bundled ImageIO Plugins 119 7 Cached Images 145 8 Texture Access: TextureSystem 159 9 ImageBuf: Image Buffers 183 10 ImageBufAlgo: Image Processing 201 11 Python Bindings 257 12 oiiotool: the OIIO Swiss Army Knife 305 13 Getting Image information With iinfo 365 14 Converting Image Formats With iconvert 369 15 Searching Image Metadata With igrep 375 16 Comparing Images With idiff 377 17 Making Tiled MIP-Map Texture Files With maketx or oiiotool 381 18 Metadata conventions 389 19 Glossary 403 Index 405 i ii OpenImageIO, Release 2.4.0 The code that implements OpenImageIO is licensed under the BSD 3-clause (also sometimes known as “new BSD” or “modified BSD”) license (https://github.com/OpenImageIO/oiio/blob/master/LICENSE.md): Copyright (c) 2008-present by Contributors to the OpenImageIO project. All Rights Reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
    [Show full text]