Download (633Kb)
Total Page:16
File Type:pdf, Size:1020Kb
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel, Intel Core and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Ct: A New Paradigm for Data Parallel Computing *Other names and brands may be claimed as the property of others. Hans–Christian Hoppe Intel Visual Computing Institute, Intel Labs Copyright © 2009. Intel Corporation. using material from http://intel.com/software/products Anwar Ghuloum, CJ Newburn, Michael McCool and Stefanus Du Toit Performance and Productivity Libraries, Developer Products Division, Software and Services Group Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. *Other brands and names are the property of their respective owners. 2 Contents Challenge: Multiple Parallelism Mechanisms Ct 101 Today’s parallel platforms have many kinds of – What is Ct, and what value does it provide? parallelism: – Basic language elements and examples • Pipelining • SIMD within a register (SWAR) vectorization Ct Next Steps – Where to go from here • Superscalar instruction issue or VLIW • Overlapping memory access with computation (prefetch) – Towards a parallel virtual machine • Simultaneous multithreading (hyperthreading) on one core •Multiple cores How to learn more • Multiple processors • Asynchronous host and accelerator execution Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 *Other brands and names are the property of their respective owners. 4 Automatically Select the Right Mechanisms What is Ct Technology? Solution: take a single abstract specification of latent parallelism and data • A generalized data parallel programming solution that locality and use automation to transform it into multiple implementations frees application developers from dependencies on that can exploit all these mechanisms. particular hardware architectures. User specifies: • A system that integrates with existing development tools to allow parallel algorithms to be specified at a high level. • A dynamic compiler and runtime that translates high-level Platform implements: specifications of computations into efficient parallel Example implementation uses: •Two cores implementations that can take advantage of both SIMD •Four-way vectorization and thread-level parallelism, as well as accelerators. • Memory latency hiding with streaming • A system that allows an application developer to combine performance, portability, and productivity. Actual distribution of work depends on hardware Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 *Other brands and names are the property of their respective owners. 6 What value does it provide? The Ct Runtime C++ APIs Application Other Language Single source calling Ct APIs Single source Productivity •Intel Ct Technology Bindings ..IntegratesIntegrates withwith existingexisting toolstools offers a standards ..Applicable to many problem domains compliant C++ library… ..Safe by default: maintainable …backed by a runtime CPU with CPU Compute Performance Co-processor SSESSE 44 Co-processor ..Efficient and scalable •Runtime generates and LRBniLRBni ..Harnesses both vectors and threads manages threads and ..Eliminates modularity overhead of C++ vector code, via – Machine independent optimization AVXAVX Hybrid Portability ProcessorProcessor ..High-level abstraction – Offload management ..Hardware independent – Machine specific code ..ForwardForward scalingscaling generation and optimizations – Scalable threading runtime (based on TBB!) CPU Accelerator Future Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. 7 8 *Other brands and names are the property of their respective owners. *Other brands and names are the property of their respective owners. 8 Ct Language Introduction What can it be used for? Bioinformatics Visual computing • Genomics and sequence analysis • Digital content creation (DCC) • Molecular dynamics • Physics engines and advanced rendering Engineering design • Visualization • Finite element and finite difference simulation • Compression/decompression • Monte Carlo simulation Signal and image processing Financial analytics • Computer vision • Option and instrument pricing • Radar and sonar processing • Risk analysis • Microscopy and satellite image processing Oil and gas Science and research • Seismic reconstruction • Machine learning and artificial intelligence • Reservoir simulation • Climate and weather simulation • Planetary exploration and astrophysics Medical imaging Enterprise • Image and volume reconstruction Database search • Analysis and computer aided detection (CAD) • • Business information Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9 *Other brands and names are the property of their respective owners. 10 Data Spaces Ct Data Objects The basic type in Ct is the vector, named as Vec C/C++ space Ct space – Vec-s are managed by the Ct runtime – Vec-s are single-assignment vectors – Vec-s are (opaquely) flat, multidimensional, sparse, or nested – Vec values are created & manipulated exclusively through Ct API copyin Declared Vecs are simply references to immutable values C(++) Vec<F64> doubleVec; // doubleVec can refer to any vector of Ct code // doubles … doubleVec = src1 + src2; … copyout doubleVec = src3 * src4; Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11 *Other brands and names are the property of their respective owners. 12 Moving Data In and Out of Ct Ct Operators - Vector Element-wise operators Bind data with Ct name using Vec constructors Unary Operators Vec<F32> prices(options, numOptions); // copy in from a C array A = ~B; // bitwise not of each element of B Vec<I8> red(image, length, 4); // copy element with a stride of 4 A = exp(B); // compute the exp() of each element of B Vec2D<I32> intVec( img, width, height); // A vector initialized to all -1s Binary Operators A = B + C; // an element-wise sum of B & C Define the data behavior in the kernel’s signature D = max(E, F); // an element-wise maximum of E and F G = 2*H; // element-wise multiplication of H and the scalar 2 • Pass-by-value – means copying in • Pass-by-reference – means both copying in and copying out Ternary Operators A = select( mask, B, C ); • Dynamic Compiler tries to recognize pure copying out A = select( mask, B, 0.f ); Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13 *Other brands and names are the property of their respective owners. 14 Ct Operators - Vector Reduce / Scan Ct Operators - Vector Permutation Operators Reductions (e.g. aggregation, collective communication) Shift Default Value // Sum all the element of B A = shift(B, 1); A = addReduce(B); A = shiftSticky(B, 1); // Some common cases for BOOLEAN Vec<Bool> B; //TRUE if all elements of B are TRUE Rotate Bool alltrue = all(B); //TRUE if at least one element of B is TRUE A = rotate(B, 1); Bool nonzero = any(B); 0 Scans Gather / Scatter // calculate the prefix sum of B A = B [ vIndex ]; A = addScan(B); A = scatter( B, vIndex, C ); Software & Services Group, Developer Products Division Software & Services Group,