C-DAC Activities on Many Cores and Accelerators
Total Page:16
File Type:pdf, Size:1020Kb
CC--DACDAC HPCHPC ActivitiesActivities && ExperienceExperience onon AcceleratorsAccelerators Goldi Misra Group Coordinator & Head HPC Solutions Group C-DAC, India Thematic R & D Areas of C-DAC • High Performance Computing & Grid Computing • Hardware, Software, Systems, Applications, Research, Technology, Infrastructure • Multilingual Computing and Heritage Computing • Tools, Fonts, Products, Solutions, Research, Technology Development • Health Informatics • Hospital Information System, Telemedicine, Decision Support System, Tools, Traditional Knowledge-base and DSS for Medicine • Software Technologies including FOSS • FOSS, Multimedia, ICT for masses, E-Governance, Geomatics, ICT4D • Professional Electronics including VLSI & Embedded Systems • Digital Broadband and Wireless Systems, Network Technologies, Power Electronics, Real- Time Systems, Control Electronics, Embedded Systems, VLSI/ASIC Design, Agri Electronics, Strategic Electronics • Cyber Security & Cyber Forensics • Cyber Security tools, technologies & solution development, Research & Training Education and Training forms an important component of C-DAC activities cutting across the above Thematic Areas Kaleidoscope of C-DAC Products Spectrum of HPC Activities Technologies Trainings Systems Spectrum of HPC Activities Solutions National Facilities Applications PARAM Series of Supercomputers PARAM YUVA Indian Supercomputing Scenario C-DAC’s PARAM 8000, India’s first Gigascale 1990 Supercomputer ANUPAM- BARC PARAM – C-DAC ANURAG- DRDO Flowsolver- NAL Parallel 1990 Initiatives -2000 Several HPC Facilities setup including those at C-DAC, IISc, BARC, 2000 NAL, CMMACS, DRDO, NCMRWF -2007 C-DAC’s PARAM Padma, India’s first Terascale Supercomputer 2002 Launched (Rank 171 in Top 500 List) CRL’s EKA ranks 4th in Top 500 List 2007 9 Terascale Systems from India in Top 500 List C-DAC’s PARAM Yuva Launched (Rank 68 in Top 500 List) 2008 Only 4 systems from India in Top 500 List as against 41 systems from China 2010 India’s best system ranked 47 in Top 500 List India Govt. takes initiative for a big leap in Supercomputing 2012 6 HPC Activities @ C-DAC National HPC Facilities NPSF @ Pune (1998) PARAM 10000 system CTSF @ Bangalore (2003) PARAM Padma System NPSF @ Pune (2009) PARAM Yuva System HPC Applications: 1988 -2011 CFD CFD Electro-magnetics (Launch Vehicle) Weather Forecasting T80 T172 RTWS, WRF IC Engine, Protein structure FRP Computing Seismic inversion Evolutionary Seismic inversion, pre and post stack Seismic Pre and post stack migration, 1D models Modelling migration, 2D & 3D models - Protein folding (300 ns), Bio Protein folding (1 ns) REMD, MEME informatics Fracture FRP ,Smart Composites Mechanics Structures Structural Engineering ’88 … … ’91 … … … ‘95 … … ‘98 … ‘00 … ‘02 … … ‘08 … … ‘11 1st Mission 2nd Mission 3rd Mission 4th Mission Garuda 54 TF • InClus- HPC cluster Building Toolkit • CHReME – HPC Resource Management Engine • ONAMA – HPC package for academic institutions • Parallel File System InClus Integrated Cluster Solution InClus addresses the need of technical challenges in the field of HPC, it makes cluster easy. • Web and Desktop based GUI • Provision of Operating Systems on physical as well as virtual machines: RHEL5.x, RHEL 6, CentOS5.x, CentOS6.x • Development platform; Compilers, Debuggers • Scheduler and resource manager • Policy based accounting • High availability support • Remote console. Powerful shell support. • Quickly set up and control Management node services: DNS, HTTP, DHCP, TFTP • User Management • GPU Support • Log monitoring • Critical Error/ Warning reporting via Web interface • SMS/Mail alerts for checking job status CHReME C-DAC’s HPC Resource Management Engine CHReME addresses challenge of efficient and easy usage and management of resources of HPC systems CHReME portal is an end-user job submission, management and monitoring tool that works with various schedulers or Workload Managers such as Torque, OpenPBS, Sun Grid Engine, Moab, Load leveler, etc. Timely E-mail notification regarding job status; personalized job list and job status information Secure credential specific access on web through https Allows users to configure their execution environment through compilers and libraries selection, scheduling parameters etc. Scientific & Research Applications specific portals ONAMA With a mission of “ Equipping Premier Academic Institutions with top of the class HPC solutions from C-DAC packaged with open source software and world class services. This would enable the Premier Academic Institutions to benefit in terms of service delivery and affordability.” Onama is an integrated package which opens a new door to future technocrats, providing them a Quantum leap in developing a firm understanding through HPC in several engineering disciplines. Onama comprises of a well selected set of parallel & serial applications and tools across various engineering disciplines such as Computer Science, Mechanical, Electronics and Communication, Electrical, Civil, Chemical engineering etc. Besides, it consists of a number of nVIDIA CUDA enabled applications in several domains such as molecular dynamics and physics. • Parallel, Multicore and Manycore Programming • System Administration & Management • Network Security & Audits • Storage Management Technologies • Facility Operations Management and Maintenance • GPU based Programming • HPC User Symposiums • C-DAC Certified HPC Professional Indigenous capability in • Engineering & managing of large supercomputing systems and national supercomputing facility • HW & SW skills in designing System Area Network • Chip/PCB/system design skills (HW) • Networking stack & system software (SW) • Prototyping/Validation/Certification/Benchmarking/Training • HW & SW skills in designing RC accelerators • RC HW having upto 12 million logic gates for computing, with different host interfaces • Porting applications/algorithms as HW circuits to achieve large speed-ups • SW ecosystem design for various operating systems (contd…) • Porting and scaling applications on large clusters • Several collaborative projects in Science & Engineering research • Increase of HPC user community in the nation • Publications Activities on Intel Many Integrated Core (MIC) Architecture Knights Ferry Co-Processor Card • 1.2 GHz, upto 32 cores, 2 GB GDDR5, 4 threads/core, 300W, 45nm process • MIC Platform Software Stack (MPSS) 1.0 and 2.0 • Development Tools: Intel FORTRAN & C++ Compilers, Intel MPI and OpenMP, Intel MKL, IPP, TBB, ARBB, Cilk Plus, Support for Eclipse IDE, OpenCL support in future Expected Specifications of Knights Corner • More than 50 cores per chip • 22nm process size • ~1TF • Mathematical Algorithms: Mandelbrot Set (An example of a simple mathematical definition leading to complex behavior) • Molecular Dynamics: MD_OPENMP • Oceanography: Tsunami-N2 (Numerical simulation program with the linear theory in deep sea and with the shallow water theory in shallow sea and on land with constant grid length in the whole region) • Astrophysics: CAMB (Code for Anisotropies in the Microwave Background [CAMB] computes cosmic microwave background spectra given a set of input cosmological parameters) • Linear scalability and results are encouraging • Based on familiar x86 architecture • Run on standard, existing programming tools and methods • Minimal Porting efforts • No reprogramming for native compilation and execution • Directive based offloading • Availability of tools (profilers, debugging, monitoring etc.) We intend to work on the commercial product KNC to get an exact idea of performance, scalability etc. Application Accelerators Facts Long-term viability of the technology Technology is changing at a faster pace Many technology providers do not stay in business long enough Application development methodology and tools Differ substantially from conventional multi-core programming Applications can be tuned to achieve a good performance Require a deep study of the underlying hardware architecture to achieve a good performance Codes are platform dependent Emerging Standards OpenCL for many-core architectures GPUs OpenFPGA efforts for reconfigurable computing • 2nd and 3rd gen Reconfigurable Computing (RC) platform • Uses RC hardware with state of the art FPGAs • RC hardware has upto 12 million logic gates for computing with different host interfaces • Avatars – hw routines/ libraries • Varada – APIs, kernel agent, Linux support • Eco-friendly HPC solution Digital Systems Design System Scientific Software Application Design C-DAC RC C-DAC’s Expertise Hardware Library PCB Design Design & Assembly (Avatars) One of the enabling technologies useful in RC is the field-programmable gate array (FPGA). Putting FPGAs on add-on cards or motherboards allow FPGAs to serve as compute- intensive co-processors. FPGAs can be re-configured over and over again, to perform multitude of operations. This enables application-specific, dynamically "programmable" hardware accelerators. Application: Smith-Waterman Sequence Search Nodes: HP DL580G5 Quad Core Quad Database: Protein Socket Xeon 2.93 GHz Query Software RC Speed-Up 3 hrs 8 min (256 (16 Per card cores) Cards) in terms of cores AAN10358 2 hr 18 22 100.1 min 31 min 8 sec sec NP_597681 2 hr 41 25 100.4 min 2 min sec 39 sec 29 min XP_001065 3 hr 8 29 101.2 Time 955 min 28 min sec 47 sec 16 nodes 16 RC (256 Cores) 3000 2250 1500 FPGA 750 CPU Frequency(MHz) 0 1990 1992 1996 1998 2000 2002 2004 2006 2007 Year 2008 Source: Intel, IBM, Xilinx, Altera datasheets • Scientific and engineering applications