Large Scale Simulations for Carbon Nanotubes
Total Page:16
File Type:pdf, Size:1020Kb
Large Scale Simulations for Carbon Nanotubes Author: Syogo Tejima, Noejung Park, *Yoshiyuki Miyamoto, Kazuo Minami, Mikio Iizuka, Hisashi Nakamura and **David Tomanek Research Organization for Information Science &Technology 2-2-54, Naka-Meguro, Meguro-ku, Tokyo, 153-0061, Japan *Nanotube Technology Center NEC Corporation 34 Miyukigaoka, Tsukuba, 305-8501 JAPAN **Michigan State University CARBON NANOTUBE RESEARCH GROUP Morinobu Endo, Eiji Osawa, Atushi Oshiyama, Yasumasa Kanada, Susumu Saito, Riichiro Saito, Hisanori Shinohara, David Tomanek, Tsuneo Hirano, Shigeo Maruyama, Kazuyuki Watanabe, Takahisa Ohno, Yoshiyuki Miyamoto, Mari Ohfuti, Hisashi Nakamura Tera Flopes on 435 nodes (3480 processors) in the ABSTRACT simulation of thermal conductivity of carbon nanotube with 48600 atoms. Computational approaches have brought Earth Simulator could give a possibility of large- powerful new techniques to understand chemical scale realistic simulationsin finding and creating reactions and material physics as well as novel nano materials. experimental and theoretical methods, and they are playing a crucial role in nano technology. Growth Keywords: Large scale simulation, TB theory, ab in applications of codes enabled us to show the initio theory, DFT, Carbon Nanotube properties of molecular and atoms interacting with surrounding complex environment and new 1. INTRODUCTION material finding. We developed Molecular modeling codes based Carbon nanotubes and fullerenes have been on tight binding (TB) approach, conventional expected to make a breakthrough in the new density functional method (DFT) and time- material development of nano-technology. A dependent DFT. These codes have used for many considerable number of potential applications have phenomena as the need arose. been reported about semiconductors-device, nano- As for nano material design, we have diamond, battery, super strong threads and fibers, challenged large scale simulations up to ten and so on. For material design of nano-carbon it is thousand atoms without the spatial symmetry and required to grasp the properties through nano-scale homogeneous condition. The TB method is world. suitable for treatment of a large number of atoms. Microscopic experiments are quite limited by Thus at first we concentrated on the TB code for hardness of measurements in nano space and by optimizations. high experimental cost for the advanced The Carbon-Recursive-Technique-Molecular- microscope as HTTEM or STM. The numerical Dynamics (CRTBD) is a tight binding software simulation based on quantum mechanics has specialized for Carbon systems with Order-N turned to be a very efficient methodology. The scaling. Through efficient optimization by the trials for so many simulations with any conditions parallelization and vectorization on the Earth give us valuable information on material Simulator, we achieved the performance of 7.1 properties and design ways. However, to simulate Proceedings of the Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region (HPCAsia’04) 0-7695-2138-X/04 $ 20.00 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on March 31, 2009 at 16:11 from IEEE Xplore. Restrictions apply. the nano-scale phenomena on realistic time and center atom is not equal to zero only first cluster space it is necessary for us to deal with heavy atoms. computation. According to our estimation, in the case of the material formation from the bottom of atom scale to the sub-micron scale that could be applicable for sensor class, it would be required the Peta flops scale computing to deal with 108 atom system by the tigth binding method. A recent higher performance computing provides a large- scale simulation of up to 104 atoms. However the high performance computational facilities are so limited presently for researchers and designers as user and its capability is also restricted for the material science in performance. Thus, the collaboration with experiment, theory and simulation researchers launched the challenging large scale simulation for the nano- carbon materials. In this project we developed and optimized a parallelized and vectorized MD code for massively parallel and vector supercomputer, the Earth Simulator. Furthermore, by using the CRTMD, we conducted simulations on thermal Figure.1 Flow chart of CRTMD properties of the nanotube and on some nano- manufacturing. (2) Matrix element The CRTMD is the order-N method. Speedup The tigth-binding Hamiltonian and overlap by parallelization and optimization to more than matrix are calculated by using Slater-Koster 1000 processors on the Earth Simulator needs new method for four orbital sates (s, px, py, pz)[2]. algorisms to reduce the heavy computation time Matrix elements coupling atoms can be expressed and cost. By optimization of code, we achieved by a 44 matrix. Therefore, the total Hamiltonian performance of 7.1 Tera flops on a thermal forms the 4N4*N matrix where N is the number conducting simulation. Furthermore, some of cluster atoms not total atoms. example cases are reported to show the applicability of the large scale simulation to the (3)Green's Function material science and design. In the process of calculating a band energy, it is necessary to get local density of state, D(I,E). The 2. PERFORMANCE OF CODE D(I,E) is given by the imaginary part of Green’s function matrix G(E): 2.1 Outline of the CRTMD Code In this section, an outline of CRTMD code in 1 Ea b 0 Fig.1 is explained . The main part of the code is §·01 ¨¸bEab 1 112 G(E)( ) ¨¸ 1 , as follows. 00 00 bEa b2 EH TB ¨¸0 22 Ea 1 ¨¸0 b2 ¨¸ E a 2 ©¹00 1 (1) Cluster ,,,(1) To search the pair of interacting atoms, the where the matrix is tridiagonalized elements and CRTMD introduce the concept of cluster. The recursion technique is appled. Since almost all atoms within the cut off radius are called first physical information is given by the lower order of cluster atoms for a center atom of sphere. The electrical density of states in first and second order. center atom interacts with only first cluster atoms. Thus it is sufficient to get the recursion The tight-binding Hamiltonian matrix in terms of coefficients, an and bn, from n=0 up to n=3 in eq. Proceedings of the Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region (HPCAsia’04) 0-7695-2138-X/04 $ 20.00 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on March 31, 2009 at 16:11 from IEEE Xplore. Restrictions apply. (1). The concept of cluster and recursion technique which was explained here was applied to other make computational load of TB method fall off. loops in tridiagonalization processes of the Hamiltonian calculation. (4) Integration The electrical states are derived from the Green's (2) Band Energy by Integration function in Eq.1. The Fermi energy is determined Loop structure of integration process includes 7 by the occupied maximum energy. The total band piled DO and GO TO loop (judgment of energy in system is given by the integration over convergence) structure as shown in left side of the occupied energies from zero point to the Fermi Fig.2. This loop structure is coded in energy as subroutine(function)-call structure with four hierarchy. The loop in the direction of xyz is most E f ibandi i,0, (E)dED)EE(E outside and the loop of cluster is in the inner side. ³ f …(2) Inside the loop of cluster atoms, DO LOOP and GOTO LOOP exist to judge the convergence. 2.2 Parallelization Furthermore, the loop of Gauss integration is The development to parallelization of CRTMD inside. In the convergence (judging) loop, there Code was, firstly, carried out on the SR2201 and are the loop of orbital (=4) and the loop of SR8000 at Information Technology Center (ITC), recursive method. Among these loops, unified the University of Tokyo. three loops, these are the directions, cluster atoms, The particle decomposition method is suitable for and Gauss integration have sufficient loop length the parallelization of the tight binding code based and no independency. However, since four loops on atomic picture. The atomic loop calculating including GO TO loop remain in these loops, energy and force on each atom has been vector process cannot be performed. Then, the parallelised by MPI. By use of mpi_allgather, only programs were largely changed as follows. The the force data obtained each processor are arranged loop is shown in the right side of Fig. 2. transmitted among all processors. After The recursive method loop in the most inner side communication all atomic positions and velocities was expanded to the higher rank loop. Next, the are calculated from previous data and new force loop of convergence part is moved out of the data not communicated with other processors. At vectorized loop. Finally, the loop of the Gauss the aim of the high parallel performance to 1000 integration, the cluster particle and the directions processors, it is necessary to reduce the amount of were unified, and it is considered as the and the number of transmission to the limit. vectorization loop. Loop exchange & combine 2.3 Vectorization Direction Go to Loop for convergence Loop of cluster atoms Loop of convergence The main targets for vectorization are as follows, Loop of convergence Loop of active orbits Loop of integral calculation Loop of cluster atoms Loop of active orbits Direction * Tridiagonalization of the Hamiltonian matrix Loop of integral calculation Loop of recursive method(1-3) * Integration of band energy Recursive method1 Recursive method2 Loop expansion Recursive method3 (1) Tridiagonalized Hamiltonian The tridiagonalization procedure could be Go to Loop for convergence vectorized after simple subroutine expansion in Figure.2 Loop Structure of before and after the outer loop. But, we could not get a good Optimization performance since that the loop has VJG inner product calculation having short vector.Therefore, 2.5 Execution Profile before optimization.