The GROMOS96 Benchmarks for Molecular Simulation

Total Page:16

File Type:pdf, Size:1020Kb

The GROMOS96 Benchmarks for Molecular Simulation Computer Physics Communications 128 (2000) 550–557 www.elsevier.nl/locate/cpc The GROMOS96 benchmarks for molecular simulation Alexandre M.J.J. Bonvin b;1,AlanE.Markc;2, Wilfred F. van Gunsteren a;∗ a Laboratory of Physical Chemistry, Swiss Federal Institute of Technology Zürich, ETH Zentrum, CH-8092 Zürich, Switzerland b Bijvoet Center for Biomolecular Research, NMR Spectroscopy, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands c Laboratory of Biophysical Chemistry, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands Received 22 November 1999 Abstract A set of biomolecular systems is presented, which can be used to benchmark the performance of simulation programs and computers. It is applied, using the GROMOS96 biomolecular simulation software, to a variety of computers. The dependence of computing time on a number of model and computational parameters is investigated. An extended pair list technique to select non-bonded interaction pairs and long-range interactions is shown to increase the efficiency by a factor 1.5 to 3 when compared to standard procedures. The benchmark results can be used to estimate the computer time required for simulation studies, and to evaluate the efficiency of various computers regarding molecular simulations. 2000 Elsevier Science B.V. All rights reserved. Keywords: Molecular dynamics; Non-bonded interactions; Cut-offs; Extended pair list 1. Introduction der to extend a simulation period as long as possible for a molecular system of a given size within a given Simulation of the behaviour of biomolecular sys- time span, it is of paramount importance to use a very tems on a computer is a steadily expanding area of fast computer and a very efficient simulation program. theoretical and computational biochemistry. In mole- The time needed to complete a simulation project cular dynamics (MD) simulation, Newton’s equations depends on a number of factors: of motion for thousands of atoms are integrated for- (1) Size of the molecular system, i.e. the number of ward in time using small, i.e. femtosecond size, time atoms or particles or interaction sites. steps during which the forces on the atoms may be (2) The length of the simulation that is needed to assumed to be nearly constant. Today, a simulation sufficiently sample the relevant motions of the study of a small protein containing about 100 amino system. acid residues in water involves about 106 of such time (3) The complexity of the potential energy function steps covering a simulation period of 1 nsec. Although used, from which the forces on the atoms are this seems sufficiently long to sample local, fast re- laxing properties, it is too short to study slow, global derived. processes such as protein folding and unfolding. In or- (4) The spatial range of the forces between atoms. (5) The settings of various interaction or computa- ∗ Corresponding author. E-mail: [email protected]. tional parameters, such as the non-bonded inter- 1 E-mail: [email protected]. action cut-off or the frequency of updating spe- 2 E-mail: [email protected]. cific interactions. 0010-4655/00/$ – see front matter 2000 Elsevier Science B.V. All rights reserved. PII:S0010-4655(99)00540-8 A.M.J.J. Bonvin et al. / Computer Physics Communications 128 (2000) 550–557 551 (6) The handling of spatial boundary conditions, cut-off radius Rcp. The atoms that belong to a charge e.g., periodic versus vacuum. group are chosen such that their partial atomic charges (7) The spatial homogeneity of the system, i.e. add up to zero, except for fully charged groups like whether the atom density is a smooth function the side-chains of Arg or Asp where the partial atomic of position or not. charges may add up to Ce or −e. When the (partial) (8) The characteristics of the computer used, e.g., atomic charges of a group of atoms add up to exactly its CPU, clock rate, memory handling, cache zero, the leading term of the electrostatic interaction size. between two such groups is of dipolar .1=r3/ char- (9) The structure of the program and algorithms acter and the error due to the application of a cut-off involved. radius is reduced compared to the 1=r monopole con- (10) The programming language, compiler and com- tributions. Therefore in the GROMOS96 non-bonded piler options used to produce an executable pro- interaction routines the cut-off radius is used to select gram. nearest-neighbor charge groups. The simplest way to In order to give an impression of the computing find the neighboring charge groups of a charge group, effort required to simulate a biomolecular system a that is, the charge groups that lie within Rcp, is to scan benchmark has been formulated and carried out on a all possible charge group pairs in the system. For a variety of computers using the Groningen Molecular system consisting of Ncg charge groups, the number 1 2 Simulation GROMOS96 software [1,2]. A number of of pairs amounts to 2 Ncg, which makes the computer the factors mentioned above, which determine the time time required for finding the neighbors in this way pro- 2 required for a simulation, have been investigated: portional to Ncg. Once the neighbors have been found, – size of the system (1), the time required for calculating the non-bonded in- – range of the forces (4), teraction is proportional to Ncg. We note that the non- – frequency of non-bonded force update (5), bonded interaction within a charge group may need to – vacuum versus periodic boundaries (6), be calculated, when the charge group contains more – computer architecture (8), than a few atoms. – use of an extended pair list technique to select non- In order to evaluate the non-bonded interaction bonded interaction pairs (9). with sufficient accuracy, a long cut-off radius Rcl has The results of the GROMOS96 benchmark can be to be used; for biomolecular systems a value of at used to estimate the CPU time required for a typical least 1.4 nm seems necessary [3]. But such a range biomolecular simulation project and to evaluate the ef- is very expensive if pair interactions are evaluated ficiency or performance-to-priceratio for various com- at every MD step; the number of neighbor atoms puters regarding simulation of biomolecular systems. within 1.4 nm will exceed 900. Therefore in GRO- MOS the non-bonded interaction can be evaluated using a triple-range method. The electrostatic inter- 2. Methods actions beyond the long-range cut-off Rcl – typi- cally 1.4 nm – can be approximated by a Poisson– The bulk of the computational time required by a Boltzmann generalized reaction field term [4]. The simulation time step is used for calculating the non- non-bonded interactions are evaluated at every sim- bonded interactions, that is, for finding the nearest ulation step using the charge group pair list that is neighbor atoms and subsequently evaluating the van generated with a short-range cut-off radius Rcp.The der Waals and electrostatic interaction terms for the longer-range non-bonded interactions, that is, those obtained atoms pairs. Since the non-bonded interac- between charge groups at a distance longer than Rcp tion between atoms decreases with the distance be- and smaller than Rcl, are evaluated less frequently, tween them, only interactions between atoms closer viz. only at every nth simulation step when also the to each other than a certain cut-off distance Rcp are pair list is updated. They are kept unchanged between generally taken into account in simulations. The GRO- these updates. In this way the long-range non-bonded MOS96 force field makes use of the concept of charge forces can be approximately taken into account, with- groups to reduce the errors due to the application of a out increasing the computing effort significantly, at the 552 A.M.J.J. Bonvin et al. / Computer Physics Communications 128 (2000) 550–557 expense of neglecting the fluctuation of the forces bey- which is done every nth MD step (typically 5 to 10) ond Rcp during n simulation steps. and allows further reduction of the computing time. The efficiency of the pair list generation on serial computers can be improved by using grid-search tech- niques [5–9], since the computational effort to select 3. Results nearest-neighbor charge-groups using a spatial grid is proportional to Ncg. The improvement over the scan- The six systems used for benchmarking are defined ning of all charge group pair is only significant, how- in Table 1. These are representative of the range of sys- ever, for systems in which the non-bonded interaction tems typically studied by MD simulations. The first two benchmarks are for a short cyclic peptide, cy- cut-off radius Rcl is an order of magnitude shorter than the size of the simulated molecular system. For closporin A (11 residues), in vacuum taking all in- teractions into account .R D1/ (I), and in wa- simulation of proteins in solution with Rcl ≈ 1:4nm cp and periodic box lengths of about 5.0 nm, contain- ter in a truncated octahedron under periodic bound- ing typically 104 atoms, the computational gain is very ary conditions (II). The next two are for a protein of modest. Another method to speed-up the non-bonded 295 residues, thrombin, in vacuum (III) and in water in pair list generation consists of keeping a second, “ex- a truncated octahedron under periodic boundary con- tended pair-list” generated using an extended cut-off ditions (IV). The last two benchmarks correspond to medium- (V) and large-sized (VI) pure water systems radius R >R > R and searching through this cx cl cp in a cubic box under periodic boundary conditions.
Recommended publications
  • Mipspro C++ Programmer's Guide
    MIPSproTM C++ Programmer’s Guide 007–0704–150 CONTRIBUTORS Rewritten in 2002 by Jean Wilson with engineering support from John Wilkinson and editing support from Susan Wilkening. COPYRIGHT Copyright © 1995, 1999, 2002 - 2003 Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the USA government or any contractor thereto, it is acquired as "commercial computer software" subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy 2E, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, IRIX, O2, Octane, and Origin are registered trademarks and OpenMP and ProDev are trademarks of Silicon Graphics, Inc. in the United States and/or other countries worldwide. MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, R2000, R3000, R4000, R4400, R4600, R5000, and R8000 are registered or unregistered trademarks and MIPSpro, R10000, R12000, R1400 are trademarks of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Portions of this publication may have been derived from the OpenMP Language Application Program Interface Specification.
    [Show full text]
  • Microprocessors History of Computing Nouf Assaid
    MICROPROCESSORS HISTORY OF COMPUTING NOUF ASSAID 1 Table of Contents Introduction 2 Brief History 2 Microprocessors 7 Instruction Set Architectures 8 Von Neumann Machine 9 Microprocessor Design 12 Superscalar 13 RISC 16 CISC 20 VLIW 23 Multiprocessor 24 Future Trends in Microprocessor Design 25 2 Introduction If we take a look around us, we would be sure to find a device that uses a microprocessor in some form or the other. Microprocessors have become a part of our daily lives and it would be difficult to imagine life without them today. From digital wrist watches, to pocket calculators, from microwaves, to cars, toys, security systems, navigation, to credit cards, microprocessors are ubiquitous. All this has been made possible by remarkable developments in semiconductor technology enabling in the last 30 years, enabling the implementation of ideas that were previously beyond the average computer architect’s grasp. In this paper, we discuss the various microprocessor technologies, starting with a brief history of computing. This is followed by an in-depth look at processor architecture, design philosophies, current design trends, RISC processors and CISC processors. Finally we discuss trends and directions in microprocessor design. Brief Historical Overview Mechanical Computers A French engineer by the name of Blaise Pascal built the first working mechanical computer. This device was made completely from gears and was operated using hand cranks. This machine was capable of simple addition and subtraction, but a few years later, a German mathematician by the name of Leibniz made a similar machine that could multiply and divide as well. After about 150 years, a mathematician at Cambridge, Charles Babbage made his Difference Engine.
    [Show full text]
  • Preparing for an Installation of a 33 to 256 Processor System HR-04122-0B Origin ™ Systems Last Modified: August 1999
    Preparing for an Installation of a 33 to 256 Processor System HR-04122-0B Origin ™ Systems Last Modified: August 1999 Record of Revision . 4 Overview . 5 System Components and Configurations . 6 Equipment Separation Limits . 8 Site Requirements . 10 Planning Your Access Route . 10 Environmental Requirements . 14 Facility Power Requirements . 15 Remote Support . 20 Network Connections . 20 Raised-floor Installations . 21 Securing the Cabinets . 25 Physical Specifications . 28 Origin 2000 Systems . 34 Onyx2 InfiniteReality2 Systems . 38 Onyx2 InfiniteReality2 Rack System . 38 Onyx2 InfiniteReality Multirack Systems . 38 Onyx2 InfiniteReality2 Multirack System Layout Options . 41 SCSI RAID Rack . 42 Origin Fibre Channel Rack . 43 O2 Workstation . 44 Site Planning Checklist . 46 Summary . 48 HR-04122-0B SGI Proprietary 1 Preparing for an Installation Figures Figure 1. Origin 2000 128- and 256-Processor Multirack Systems: Standard and Optional Floor Layouts Placed on 24 in. x 24 in. Floor Panels . 7 Figure 2. Distance between Racks (Standard Layout) . 8 Figure 3. Separation Limits . 9 Figure 4. Origin 2000 Rack, Onyx2 InfiniteReality2 Rack, and MetaRouter Shipping Configuration . 11 Figure 5. SCSI RAID Rack Shipping Configuration . 12 Figure 6. Origin Fibre Channel Rack Shipping Configuration . 13 Figure 7. Origin 2000 Rack and Onyx2 InfiniteReality2 Rack Floor Cutout 22 Figure 8. MetaRouter Floor Cutout . 23 Figure 9. SCSI RAID Rack Floor Cutout . 24 Figure 10. Origin Fibre Channel Rack Floor Cutout . 24 Figure 11. Securing the Origin 2000 Rack and Onyx2 Rack . 25 Figure 12. Securing the MetaRouter . 26 Figure 13. Securing the Origin Fibre Channel Rack . 27 Figure 14. Origin 2000 Rack . 35 Figure 15. MetaRouter . 36 Figure 16.
    [Show full text]
  • MIPS IV Instruction Set
    MIPS IV Instruction Set Revision 3.2 September, 1995 Charles Price MIPS Technologies, Inc. All Right Reserved RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and / or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor / manufacturer is MIPS Technologies, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. R2000, R3000, R6000, R4000, R4400, R4200, R8000, R4300 and R10000 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc. The information in this document is preliminary and subject to change without notice. MIPS Technologies, Inc. (MTI) reserves the right to change any portion of the product described herein to improve function or design. MTI does not assume liability arising out of the application or use of any product or circuit described herein. Information on MIPS products is available electronically: (a) Through the World Wide Web. Point your WWW client to: http://www.mips.com (b) Through ftp from the internet site “sgigate.sgi.com”. Login as “ftp” or “anonymous” and then cd to the directory “pub/doc”. (c) Through an automated FAX service: Inside the USA toll free: (800) 446-6477 (800-IGO-MIPS) Outside the USA: (415) 688-4321 (call from a FAX machine) MIPS Technologies, Inc.
    [Show full text]
  • SGI® Octane III®
    Making ® ® Supercomputing SGI Octane III Personal™ KEY FEATURES Supercomputing Gets Personal Octane III takes high-performance computing out Office Ready of the data center and puts it at the deskside. It A pedestal, one foot by two combines the immense power and performance broad HPC application support and arrives ready foot form factor and capabilities of a high-performance cluster with the for immediate integration for a smooth out-of-the- quiet operation portability and usability of a workstation to enable box experience. a new era of personal innovation in strategic science, research, development and visualization. Octane III allows a wide variety of single and High Performance dual-socket node choices and a wide selection of Up to 120 high- In contrast with standard dual-processor performance, storage, integrated networking, and performance cores and workstations with only eight cores and moderate graphics and compute GPU options. The system nearly 2TB of memory memory capacity, the superior design of is available as an up to ten node deskside cluster Broad HPC Application Octane III permits up to 120 high-performance configuration or dual-node graphics workstation Support cores and nearly 2TB of memory. Octane III configurations. Accelerated time-to-results significantly accelerates time-to-results for over with support for over 50 50 HPC applications and supports the latest Supported operating systems include: ® ® ® HPC applications Intel processors to capitalize on greater levels SUSE Linux Enterprise Server and Red Hat of performance, flexibility and scalability. Pre- Enterprise Linux. All configurations are available configured with system software, cluster set up is with pre-loaded SGI® Performance Suite system Ease of Use a breeze.
    [Show full text]
  • 10º Encontro Português De Computação Gráfica
    Actas do 10º Encontro Português de Computação Gráfica 1 – 3 de Outubro 2001 Lisboa – Portugal Patrocinadores de Honra Patrocinadores Organização 10º Encontro Português de Computação Gráfica 1-3 de Outubro 2001 PREFÁCIO A investigação, o desenvolvimento e o ensino na área da Computação Gráfica constituem, em Portugal, uma realidade positiva e de largas tradições. O Encontro Português de Computação Gráfica (EPCG), realizado no âmbito das actividades do Grupo Português de Computação Gráfica (GPCG), tem permitido reunir regularmente, desde o 1º EPCG realizado também em Lisboa, mas no já longínquo mês de Julho de 1988, todos os que trabalham nesta área abrangente e com inúmeras aplicações. Pela primeira vez no historial destes Encontros, o 10º EPCG foi organizado em ligação estreita com as comunidades do Processamento de Imagem e da Visão por Computador, através da Associação Portuguesa de Reconhecimento de Padrões (APRP), salientando-se, assim, a acrescida colaboração, e a convergência, entre essas duas áreas e a Computação Gráfica. Tal como nos Encontros anteriores, o programa está estruturado ao longo de três dias, sendo desta vez o primeiro dia dedicado a seminários por conferencistas convidados e os dois últimos à apresentação de comunicações e de "posters", decorrendo em simultâneo o Concurso para Jovens Investigadores, uma Exibição Comercial e, pela primeira vez, um Atelier Digital. Como novidade essencialmente dedicada aos jovens, realiza-se ainda em paralelo com o Encontro um torneio de jogos de computador. Em resposta ao apelo às comunicações para este 10º EPCG foram submetidos 38 trabalhos, na sua maioria de grande qualidade, tendo sido seleccionadas pela Comissão de Programa, após um cuidadoso processo de avaliação, apenas 19 comunicações; aos autores de 14 dos restantes trabalhos, considerados suficientemente promissores, foi sugerida a sua reformulação e uma nova submissão como "posters".
    [Show full text]
  • CXFSTM Administration Guide for SGI® Infinitestorage
    CXFSTM Administration Guide for SGI® InfiniteStorage 007–4016–025 CONTRIBUTORS Written by Lori Johnson Illustrated by Chrystie Danzer Engineering contributions to the book by Vladmir Apostolov, Rich Altmaier, Neil Bannister, François Barbou des Places, Ken Beck, Felix Blyakher, Laurie Costello, Mark Cruciani, Rupak Das, Alex Elder, Dave Ellis, Brian Gaffey, Philippe Gregoire, Gary Hagensen, Ryan Hankins, George Hyman, Dean Jansa, Erik Jacobson, John Keller, Dennis Kender, Bob Kierski, Chris Kirby, Ted Kline, Dan Knappe, Kent Koeninger, Linda Lait, Bob LaPreze, Jinglei Li, Yingping Lu, Steve Lord, Aaron Mantel, Troy McCorkell, LaNet Merrill, Terry Merth, Jim Nead, Nate Pearlstein, Bryce Petty, Dave Pulido, Alain Renaud, John Relph, Elaine Robinson, Dean Roehrich, Eric Sandeen, Yui Sakazume, Wesley Smith, Kerm Steffenhagen, Paddy Sreenivasan, Roger Strassburg, Andy Tran, Rebecca Underwood, Connie Woodward, Michelle Webster, Geoffrey Wehrman, Sammy Wilborn COPYRIGHT © 1999–2007 SGI. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of SGI. LIMITED RIGHTS LEGEND The software described in this document is "commercial computer software" provided with restricted rights (except as to included open/free source) as specified in the FAR 52.227-19 and/or the DFAR 227.7202, or successive sections. Use beyond
    [Show full text]
  • OCTANE Technical Report
    OCTANE Technical Report Silicon Graphics, Inc. ANY DUPLICATION, MODIFICATION, DISTRIBUTION, PUBLIC PERFORMANCE, OR PUBLIC DISPLAY OF THIS DOCUMENT WITHOUT THE EXPRESS WRITTEN CONSENT OF SILICON GRAPHICS, INC. IS STRICTLY PROHIBITED. THE RECEIPT OR POSSESSION OF THIS DOCUMENT DOES NOT CONVEY ANY RIGHTS TO REPRODUCE, DISCLOSE OR DISTRIBUTE ITS CONTENTS, OR TO MANUFACTURE, USE, OR SELL ANYTHING THAT IT MAY DESCRIBE, IN WHOLE OR IN PART. Cop VED Table of Contents Section 1 Introduction .....................................................................................................1-1 1.1 Manufacturing.............................................................................................................. 1-3 1.1.1 Industrial Design .....................................................................................1-3 1.1.2 CAD/CAM Solid Modeling ....................................................................1-4 1.1.3 Analysis...................................................................................................1-5 1.1.4 Digital Prototyping..................................................................................1-6 1.2 Entertainment............................................................................................................... 1-8 1.2.1 3D Animation..........................................................................................1-8 1.2.2 Film/Video/Audio ...................................................................................1-9 1.2.3 Publishing..............................................................................................1-10
    [Show full text]
  • Imusic: Inessential Guide to Listening to Music on Athena
    iMusic: Inessential Guide to Listening to Music on Athena The Student Information Processing Board Richard Tibbetts <[email protected]> May 27, 2001 Contents 1 Introduction 3 2 Identifying Athena Platforms 3 3 Connecting Headphones and Speakers 3 3.1 Headphones . 3 3.1.1 Dell GX1 . 4 3.1.2 Dell GX110 . 4 3.1.3 SGI O2 . 5 3.1.4 SGI Indy . 5 3.1.5 Sun Sparcstation 5 . 6 3.1.6 Sun Ultra 5 . 6 3.1.7 Sun Ultra 10 . 7 3.2 Attaching Speakers . 7 4 Setting Audio Device and Volume 7 4.1 Linux . 7 4.2 SGI . 7 4.3 Sun . 7 5 Playing Music 7 5.1 Compact Disc . 8 5.2 MP3 . 8 5.3 Ogg Vorbis . 8 5.4 Real Audio . 8 6 Miscellaneous Topics 8 6.1 Accessing Files on an NFS Shared Volume . 8 6.2 Accessing Files on a Windows Shared Drive . 8 1 A Identifying Athena Machines 8 2 1 Introduction These days most Athena workstations have suitable sound hardware and are quite able to play most music formats. However, each platform has its own peculiarities in the way sound is played, the way volume is controlled, where the headphone jack is located, and various other problems. This document clarifies all of these issues. If you notice any problems with this document, or have any questions which it doesn't answer, please let us know. You can send email to [email protected], drop by the office in W20-557, or call us at x3-7788.
    [Show full text]
  • Interactive Rendering with Coherent Ray Tracing
    EUROGRAPHICS 2001 / A. Chalmers and T.-M. Rhyne Volume 20 (2001), Number 3 (Guest Editors) Interactive Rendering with Coherent Ray Tracing Ingo Wald, Philipp Slusallek, Carsten Benthin, and Markus Wagner Computer Graphics Group, Saarland University Abstract For almost two decades researchers have argued that ray tracing will eventually become faster than the rasteri- zation technique that completely dominates todays graphics hardware. However, this has not happened yet. Ray tracing is still exclusively being used for off-line rendering of photorealistic images and it is commonly believed that ray tracing is simply too costly to ever challenge rasterization-based algorithms for interactive use. However, there is hardly any scientific analysis that supports either point of view. In particular there is no evidence of where the crossover point might be, at which ray tracing would eventually become faster, or if such a point does exist at all. This paper provides several contributions to this discussion: We first present a highly optimized implementation of a ray tracer that improves performance by more than an order of magnitude compared to currently available ray tracers. The new algorithm makes better use of computational resources such as caches and SIMD instructions and better exploits image and object space coherence. Secondly, we show that this software implementation can challenge and even outperform high-end graphics hardware in interactive rendering performance for complex environments. We also provide an brief overview of the benefits of ray tracing over rasterization algorithms and point out the potential of interactive ray tracing both in hardware and software. 1. Introduction Ray tracing is famous for its ability to generate high-quality images but is also well-known for long rendering times due to its high computational cost.
    [Show full text]
  • On the Efficacy of Source Code Optimizations for Cache-Based Systems
    On the efficacy of source code optimizations for cache-based systems Rob F. Van der Wijngaart, MRJ Technology Solutions, NASA Ames Research Center, Moffett Field, CA 94035 William C. Saphir, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 Abstract. Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize tem- poral and spatial locality of memory references by reusing data. and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive im- pact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates--as reported by a cache simulation tool, and con- firmed by hardware counters--only partially explain the results. By contrast, the compiler- generated assembly code provides more insight by revealing the importance of processor- specific instructions and of compiler maturity, both of which strongly, and sometimes unex- pectedly, influence performance.
    [Show full text]
  • Computing @SERC Resources,Services and Policies
    Computing @SERC Resources,Services and Policies R.Krishna Murthy SERC - An Introduction • A state-of-the-art Computing facility • Caters to the computing needs of education and research at the institute • Comprehensive range of systems to cater to a wide spectrum of computing requirements. • Excellent infrastructure supports uninterrupted computing - anywhere, all times. SERC - Facilities • Computing - – Powerful hardware with adequate resources – Excellent Systems and Application Software,tools and libraries • Printing, Plotting and Scanning services • Help-Desk - User Consultancy and Support • Library - Books, Manuals, Software, Distribution of Systems • SERC has 5 floors - Basement,Ground,First,Second and Third • Basement - Power and Airconditioning • Ground - Compute & File servers, Supercomputing Cluster • First floor - Common facilities for Course and Research - Windows,NT,Linux,Mac and other workstations Distribution of Systems - contd. • Second Floor – Access Stations for Research students • Third Floor – Access Stations for Course students • Both the floors have similar facilities Computing Systems Systems at SERC • ACCESS STATIONS *SUN ULTRA 20 Workstations – dual core Opteron 4GHz cpu, 1GB memory * IBM INTELLISTATION EPRO – Intel P4 2.4GHz cpu, 512 MB memory Both are Linux based systems OLDER Access stations * COMPAQ XP 10000 * SUN ULTRA 60 * HP C200 * SGI O2 * IBM POWER PC 43p Contd... FILE SERVERS 5TB SAN storage IBM RS/6000 43P 260 : 32 * 18GB Swappable SSA Disks. Contd.... • HIGH PERFORMANCE SERVERS * SHARED MEMORY MULTI PROCESSOR • IBM P-series 690 Regatta (32proc.,256 GB) • SGI ALTIX 3700 (32proc.,256GB) • SGI Altix 350 ( 16 proc.,16GB – 64GB) Contd... * IBM SP3. NH2 - 16 Processors WH2 - 4 Processors * Six COMPAQ ALPHA SERVER ES40 4 CPU’s per server with 667 MHz.
    [Show full text]