Computational Techniques for Solving the Eigenvalue Problem for Semiconductor Bandstructure Calculation

Samuel SMITH University of Florida samuelsmith@ufl.edu

Abstract

The tight binding model used to efficiently model nanoscale systems is computationally bound by symmetric matrix eigenvalue problem. We present an overview of semiconductor crystallography and computational techniques to efficiently model these crystal structures. Using a generated crystal structure, we calculate the tight binding Hamiltonian matrix that governs the system. We then explore the computational tools for calculating the eigenvalues of this matrix and benchmark several popular packages. Finally, we use these tools to compute the bandgap for quantum dots of various dimensions. 1

CONTENTS

I Crystal Properties and Generation 2 I-A Basis and primitive vectors ...... 2 I-B Schrodinger¨ equation and the Bloch theorem ...... 2 I- Zincblende lattice ...... 3 I-D Wurtzite lattice ...... 4 I-E Adjacency matrix generation ...... 5 I-F Asymptotically optimal connectivity mapping algorithm ...... 5

II Generation of the Tight Binding Hamiltonian 6 II-A Same atom terms ...... 6 II-B Passivation of surface effects ...... 6 II-C Nearest neighbor terms ...... 7

III Eigenvalue Computation 9 III-A Overview of the eigenvalue problem for sparse symmetric matrices ...... 9 III-B Possible solutions and selection criteria ...... 9 III-C ARPACK ...... 9 III-D and Anasazi ...... 10 III-E Other Python features for future work ...... 10

IV System and benchmarks 11 IV-A Overview of test system ...... 11 IV-B High Performance LINPACK Benchmark ...... 11 IV-C Comparison of eigenvalue solvers ...... 11

V Bandgap Calculation for Quantum Dots 13

VI Conclusions and Future Work 13

References 14

Appendix 15 2

I.CRYSTAL PROPERTIES AND GENERATION

A. Basis and primitive vectors

All major semiconductors used today in industry are crystalline materials[1]. The key feature of crystals that differentiates them from other solid matter is that they are spatially periodic. This allows the structure of the crystal to be completely described by a single unit cell. A crystal can be mathematically described using primitive vectors a1, a2, and a3. The full lattice can be determined by integral combinations of these primitive vectors, such that :

0 R = R + m1a1 + m2a2 + m3a3, where R is any known lattice point and m1, m2, m3 ∈ Z. For more complicated crystals like the zincblende structure, described later, two simple lattices are in superposition. For structures like these, we can define a basis vectors b1 and b2 which describe the relative offsets of the two lattices.

B. Schrodinger¨ equation and the Bloch theorem

In quantum mechanics, the famous Schrodinger¨ wave equation is used to describe the behavior of systems. For the time-independent case, we write this equation as: " # − 2 ~ ∇2 + V (r) ψ = Eψ, 2m where ~ is the reduced Planck constant, m is the mass of the particle, V (r) is the spatially dependent potential energy, ψ is the wavefunction, and E is the energy operator. We define the LHS operator on the wavefunction as the Hamiltonian, Hˆ . Using this definition, we can rewrite the Schrodinger¨ equation as:

Hψˆ = Eψ.

A useful result for the Schrodinger¨ equation for particles in a periodic potential structure like a crystal is the Bloch theorem [1]. The Bloch theorem states that for particles in a periodic potential, the eigenfuntions of the ik·r Hamiltonian will be the product of a plane wave e . and some function uk(r) with the same periodicity as the lattice. We can write this as: ik·r ψk(r) = e uk(r).

We note that:

uk(r) = uk(r + R) where R is the periodicity of the lattice. 3

C. Zincblende lattice

The zincblende structure and the closely related diamond structure are perhaps the most important crystal structures in the semiconductor industry. Silicon and germanium (group IV semiconductors) have a diamond structure, and gallium arsenide (a III-V semiconductor) has a zincblende structure[2]. The only major difference between these structures is that the anion and cation species are the same for the diamond structure and different for the zincblende structure. The primitive vectors for the zincblende structure with lattice constant a and orthogonal basis [ˆx, y,ˆ zˆ] are: 1 1 a = aˆy + aˆz 1 2 2 1 1 a = aˆx + aˆz 2 2 2 1 1 a = aˆx + aˆy. 3 2 2 The basis vectors are:

b1 = 0 1 1 1 b = a + a + a , 2 4 1 4 2 4 3 where b1 is the basis for the cation sites and b2 is the basis for the anion sites. A silicon quantum dot is shown in figure 1. A quantum dot is a structure that is fully confined in all three dimensions. While it is locally periodic, it has well defined boundary conditions. This leads to it displaying dramatically different properties from bulk material. The dimension shown in the caption for the picture refers to the number of iterations for each sublattice (anionic and cationic). A 3 × 3 × 3 crystal has 54 atoms, 27 anion sites and 27 cation sites.

Fig. 1. Silicon quantum dot (3 × 3 × 3) 4

D. Wurtzite lattice

In the early stages of the project, the wurtzite structure was also considered along with the similar hexagonal diamond structure. Gallium nitride, a common wide bandgap semiconductor, has this structure. It is actually possible [3] to make silicon into this structure, but it is not commonly done. The wurtize lattice has more complicated primitive vectors than the zincblende lattice: 1 1 a = aˆx − 31/2aˆy 1 2 2 1 1 a = aˆx + 31/2aˆy 2 2 2

a3 = cˆy, where c/a = (8/3)1/2. The basis vectors are: 1 2 b = a + a 1 3 1 3 2 2 1 1 b = a + a + a 2 3 1 3 2 2 3 1 2 b = a + a + ua 3 3 1 3 2 3 2 1 1  b = a + a + + u a , 4 3 1 3 2 2 3 where u = 3/8. For the wurtzite lattice, b1 and b2 are basis vectors for the cation sites, and b3 and b4 are basis vectors for the anion sites. A wurtzite structure generated by the old MATLAB code used at the start of the project is shown in figure 2.

Fig. 2. Wurtzite quantum dot 5

E. Adjacency matrix generation For computational modeling of a crystal system[4], we begin by iterating through integral combinations of the primitive vectors added to the appropriate basis vectors where applicable. This number of iterations is bounded by the variables xcells, ycells, and zcells, specifying the number of iterations (largest multiple) for each basis vector to be allowed. The sites for the anions and cations are stored in an n × 3 array for fast access. A matrix A over GF(2) of dimension n × m is created to store the connections between the n anions and the m cations (usually, m = n). Each element of the matrix Aij is defined as 1 if anion i is a nearest neighbor of cation j and 0 otherwise. Generation of this matrix is performed by iterating over every cation site for every anion site.

F. Asymptotically optimal connectivity mapping algorithm Generating or iterating over a connectivity matrix is an inherently inefficient operation as we must perform an operation for every cation for every anion. This is asymptotically O(n2) complexity, where n is the number of atoms in the system. As n grows large, the calculations quickly become intractable. This looping structure is made even worse by the nature of most interpreted programming languages like MATLAB and Python. We present a new crystal generation algorithm that avoids these problems. We begin by finding the coordinates of all the atoms in the usual manner by finding linear combinations of primitive vectors and performing some affine transformation to offset the either the anionic or cationic sites (this analysis was performed with a simple zincblende structure, but could easily be extended to other crystal types of arbitrary shape). We improved this part slightly by multiplying the all the vectors by some constants to make all lattice points integers for fast comparisons and eliminating round-off difficulties. There is no harm in doing so because anytime a real distance is needed another proportionality factor can be used. After generating a list of all the sites, we sort the list of cation sites using Timsort, an O(n log n) sort that can take advantage of any ordering already present in the list. After we have a sorted list of cation sites, we perform an iteration over all the anion sites. For each anion, we apply the translation for all four possible crystal directions and perform the very efficient O(log n) binary tree search on the sorted list of cations to determine if that cation actually present in the system. If the atom is found, it is recorded in an adjacency list. Adjacency lists are used because low degree graphs (crystals with this sort of connectivity are essentially isomorphic to low degree non-planar graphs) are more efficiently represented in terms of lists than matrices. This includes sparse matrices as it is easier to create an iterator for a list than for a sparse matrix in most programming languages. The list stores only connected sites and can thus be iterated over in linear time. The overall asymptotic time complexity for the adjacency list generation is O(n log n). Using this data structure, it will be possible to generate the sparse Hamiltonian (described later) much faster. Using a single processor, connectivity lists (including bonding directions each site) for a 101306 atom system were generated in just under 40 seconds. Using the old connectivity matrix iterative generation method, this would have taken hours, if not days or weeks, even if it were parallelized to run on many processors. 6

II.GENERATION OF THE TIGHT BINDING HAMILTONIAN

The tight binding model, known as the Linear Combination of Atomic Orbitals (LCAO) method in quantum chemistry, is a semiempirical method to calculate the electronic structure of solids [5]. The method is based on creating an atomic-like orbital orthogonal basis for each atom in the system and building a Hamiltonian based upon symmetric interactions between these orbitals. The tight binding model usually only considers interactions between nearest neighbors [6]. Some more advanced simulations based on the model also take into account interactions from second nearest neighbors, but we do not consider this. We build the tight binding Hamiltonian using an sp3s∗ atomic basis. This gives five bands for each atom. The size of the Hamitonian is thus 5(m + n) × 5(m + n), where m is the number of anions and n is the number of cations. Again, m = n for virtually all cases. Parameters for the sp3s∗ basis are taken from [7]. As most terms in the matrix are zero, we use a sparse matrix representation. The entire Hamiltonian can represented in a block matrix of the form: " H H # H = aa ac . Hca Hcc

A. Same atom terms As demonstrated in [6], the same atom terms are in the form of a 5 × 5 diagonal matrix. The same atom term for each anion [Haa] can each be expressed as:   Es    Ep      [Haa] =  Ep  .    E   p  Es∗

The [Hcc] same atom terms for each cation are calculated in exactly the same manner. While this representation works perfectly for most atoms in the crystal, it does not take into account the effects encountered at the boundaries of the crystal.

B. Passivation of surface effects For quantum dots, there is a very high surface area to volume ratio. It is therefore important to account for surface effects in these very small systems. When an atom on the surface of the crystal does not have four nearest neighbors, the remaining bonds are left dangling using the above definition for [Haa]. These dangling bonds can lead to serious errors when attempting to compute the energy bandstructure for the crystal. Specifically, a large number of trap states will appear in the middle of what should be an otherwise empty bandgap region. Determining whether this step needs to be taken for each individual atom was greatly accelerated by using the adjacency list method listed above. In generating the adjacency lists, which bond angles are connected or left dangling is precalculated for every atom. There are two common approaches to this method. Hydrogen atoms could be attached to all the dangling bonds[6]. This approach requires computing a large number of interactions between the crystal atoms and the passivating atoms. Instead we employ the explicit hybrid orbital passivation (EHOP) method[8]. This method involves converting the same atoms terms with dangling bonds to an sp3 molecular orbital basis, adding energy to passivate the bonds in all required directions, and transforming back to the atomic orbital basis. For anions this linear transformation 7 is:  1 1 1 1     1 −1 −1 1  [V]A→H =   .    1 1 −1 −1  1 −1 1 −1 For cations, this transformation is:  1 −1 1 1     1 1 −1 1  [V]C→H =   .    1 −1 −1 −1  1 1 1 −1 We transform from the atomic basis to the hybridized basis by applying the transformation:

† [H]Hybrid = [V]A/C→H [H]Atom[V]A/C→H .

To passivate the bond, we add some term δsp3 to the diagonal element corresponding to the direction of the bond to be passivated as in [6]. We reverse the transformation to go back to the atomic basis:

† [H]Atom = [V]A/C→H [H]Hybrid[V]A/C→H .

It is important to note that these operations are performed only on H1:4,1:4 and that the H5,5 = Es∗ term is unaffected as this high energy term does not contribute to any trap states. For numerical performance, the transformed matrix is forced to be symmetric to guarantee that a symmetric solver can be used on the problem. There is usually some slight roundoff that prevents this matrix from being completely symmetric in this step. This problem was not encountered elsewhere in the Hamiltonian generation.

C. Nearest neighbor terms The nearest neighbor terms in the matrix are computed using the Slater-Koster overlap integrals. To build the matrix, we iterate through the adjacency lists and place the overlap matrix block into the Hamiltonian for each connected pair of atoms. This matrix is based on the direction of the bond between the two atoms through directional cosine terms which appear in the complete expansion for the terms in the matrix (see [9] or [4] for a complete list of terms). We can write a simplified form of the overlap matrix as:   Vss Vsx Vsy Vsz Vss∗    Vxs Vxx Vxy Vxz Vxs∗      [H]ac/ca =  Vys Vyx Vyy Vyz Vys∗  .    V V V V V ∗   zs zx zy zz zs  Vs∗s Vs∗x Vs∗y Vs∗z Vs∗s∗

The original MATLAB/Octave code was used to generate the Hamiltonian. The resultant matrix based on the natural ordering of the atoms from the crystal generator before any sorting was applied resulted in a very nice matrix with terms only on a few diagonals. This structure of matrix is shown in figure 3. When the code was ported to SciPy with the optimized crystal generator, the structure of the Hamiltonian was changed due to the way the sites were reordered. While the matrix has elements on almost every diagonal, there was no observed performance penalty for performing operations on this matrix. The SciPy Hamiltonian matrix is shown in figure 4. Should it be necessary, it would be easy to modify the crystal generator to reorder the cation site list to generate a matrix of the original structure without sacrificing any significant amount of performance. 8

Fig. 3. Zincblende Hamiltonian generated by MATLAB

Fig. 4. Zincblende Hamiltonian generated by SciPy with fast crystal generator 9

III.EIGENVALUE COMPUTATION A. Overview of the eigenvalue problem for sparse symmetric matrices The eigenvalue problem for large matrices has applications all across science and engineering. While the work in this thesis focuses on applications to semiconductor physics, the concepts can be applied across the broad field of scientific computing. Most algorithms for solving extreme eigenvalues for large matrices are based on the concept of Krylov subspace iterations. A Krylov subspace can be constructed by starting with a random vector b and iterating the powers of A: 2 r−1 Kr(A, b) = span {b, Ab, A b,..., A b}.

Performing this iteration, it is possible to generate the largest eignenvalues of A and the corresponding eigenvectors. The details of how this actually works is beyond the scope of this thesis, but a detailed treatment can be found in[10]. The Krylov subspace iteration eigensolvers are designed to extract a small number of eigenvalues and eigenvectors instead of a complete set of eigenvalues and eigenvectors. The rate of convergence is strongly dependent on the condition number of the matrix. For the tight binding Hamiltonian matrix we generated earlier, the condition number is usually well above 20 for any crystal configuration. While our symmetric matrix is well-conditioned[10], the eignevalues may not be. Eigenvalues within close proximity of each other can affect solver performance. This is likely thought to be the fundamental reason for some of the performance issues which will be discussed later. The eigenenergies for our Hamiltonian are likely to be tightly clustered because of the high density of states in the valence and conduction bands where we are looking for eigenvalues.

B. Possible solutions and selection criteria In selecting an eigensolver and the surrounding programming framework, a number of considerations had to be taken into account. The eventual goal is to be able to integrate the eigensolver with some high level programming language, such as MATLAB, Octave, or Python, so that knowledge of a lower-level programming language, such as C/C++ or , is not required to build simulations of electronic devices. The greatest concern was performance of the eigenvalue calculation in a parallel computing environment. An initial proposal was to use graphical processing units (GPUs) to accelerate the calculation. While GPUs are revolutionizing the scientific computing field in general, it was determined that the technology was not yet mature enough for practical implementation for sparse matrix operations where random access to the data structure is required. The latest compute solutions from AMD and Nvidia are just now beginning to address this issue and should be fully ready within the next few years to solve this type of problem. An easy, albeit not particularly interesting, way to parallelize the problem for the specific application of bandstructure computation, is to simply solve different k (wave vector) points on different nodes. For each k point, there is a separate Hamiltonian that a node could simply run a serial eigensolver on. This is fairly trivial to implement in Python, Octave, and MATLAB (with the purchase of the rather expensive MATLAB Parallel Computing Toolbox). This method only works for structures with enough long range periodicity to have the bandstructure depend on the k point. For the quantum dots examined in this thesis, the energy states are delocalized, and thus this method cannot be implemented. A true parallel solution to the eigenvalue problem must be implemented.

C. ARPACK The ARnoldi PACKage (ARPACK) is the most popular eigensolver package for large matrices. It is used in implementation of the eigs function in MATLAB and Octave as well as the equivalent function in SciPy. ARPACK 10 implements an algorithm called the Implicitly Restarted (IRAI). This is essentially a numerically stablized version of Krylov subspace iteration. The BSD License used by ARPACK has allowed MathWorks to make their own improvements to ARPACK without recontributing these improvements to the community. As a result, ARPACK development has mostly stagnated. A parallel version of the library exists, but does not appear to be actively maintained. ARPACK is written in Fortran 77, and while it is available for use in many high-level languages, the parallel version is not.

D. Trilinos and Anasazi Trilinos is a package developed at Sandia National Laboratories for solving large scale scientific and engineering problems[11]. It is written in C++ and provides an object-oriented framework for all of its subpackages. It was chosen to implement the eigensolver for the bandstructure calculator because it readily interfaces with the NumPy and SciPy packages for Python using the PyTrilinos interface[12]. The Anasazi package in Trilinos contains multiple algorithms for solving eigenvalue problems. Trilinos is available under the GNU LGPL, which offers copyleft protection to the library.

E. Other Python features for future work Python was chosen for this project as it offers a number of packages to readily enable future work in the area. The NumPy and SciPy packages provide almost a complete replacement for MATLAB. There are also bindings for CUDA and OpenCL, which might enable a GPU-based solver at some point in the future. The Matplotlib package can be used to create very impressive graphics. The tools to make graphical interfaces in Python could be used in education to make easy-to-use programs to simulate quantum dots for quick demonstrations. Python is also freely redistributable, unlike MATLAB. 11

IV. SYSTEMANDBENCHMARKS A. Overview of test system All calculations were performed on a high-performance workstation with two 12-core AMD Opteron Magny- Cours CPUs and 64 GB of DDR3 memory. The workstation also has an Nvidia Fermi architecture GPU for future work in this area, but it was not used in this computation. The operating system is Ubuntu Linux 10.10 Maverick Meerkat.

B. High Performance LINPACK Benchmark As an initial test of the workstation’s performance, the High-Performance LINPACK (HPL) benchmark was executed on the test machine. HPL is the standard benchmark in high-performance computing. It is the metric used by the Top500 supercomputer rankings for the fastest computers in the world. The standard implementation of ATLAS from the Ubuntu package manager was used. For future work, the AMD Core Math Library (ACML) could be used. The benchmark was executed on all 24 cores with a problem size N=14000. The computer’s performance was measure at 54.09 GFLOPS (billions of floating point operations per second). This was determined to be consistent with the specifications of the machine.

C. Comparison of eigenvalue solvers The eigenvalue solvers described earlier were executed on the test system. The problem size was a 10 × 10 × 10 (2000 atom) silicon crystal (shown in figure 5). The ARPACK-based solver was used in GNU Octave. MATLAB was not used due to licensing costs. Comparisons of the eigensolver speed between Octave and MATLAB on other systems revealed no substantial performance gap as both programs use the same underlying library. The tolerance for the ARPACK solver was set to the machine epsilon, reported to be 2.2204e-16. This was later changed to reflect the tolerance used by the Anasazi solver. The Anasazi Block Krylov-Schur[13] solver was used through the PyTrilinos interface. It was run with 180 blocks, a tolerance of 1.0e-7, and a restart limit of 100. The Block Krylov-Schur algorithm is closely related to the IRAI algorithm used by ARPACK. The Anasazi eigensolver was run on a varying number of cores using MPI. Eigensolver Execution Time (s) Bandgap (eV) ARPACK (np = 1, tol = epsilon) 45.76 1.5464 ARPACK (np = 1, tol = 1.0e-7) 38.14 1.5464 Anasazi (np=1) 112.7151899 1.546366618 Anasazi (np=2) 66.80645108 1.546991891 Anasazi (np=4) 37.64097404 1.613904834 Anasazi (np=6) 24.90681005 1.546366618 Anasazi (np=8) 20.09551597 1.546366618 Anasazi (np=12) 16.27054 1.546366618 Anasazi (np=16) 14.8668232 1.546991891 Anasazi (np=20) 15.559304 1.546366618 Anasazi (np=24) Does not work N/A For each calculation, performance was measured using simple wall clock time. In each case, the 60 smallest magnitude eigenvalues were solved for both the conduction and valence band Hamiltonians. By using a priori knowledge of the approximate location of the middle of the bandgap, the bandgap was calculated by finding 12

Fig. 5. Silicon quantum dot (10 × 10 × 10) difference of the conduction band minimum and the valence band maximum . There were a number of issues with the Block Krylov-Schur solver in Anasazi. A full set of 60 eigenvalues was rarely returned due to convergence issues. For a single process, it was significantly slower than the solver in Octave. Reasons for this are not completely understood. A possibility is that the problem is simply not large enough to overcome the overhead of using the distributed data structures provided by the Epetra package in Trilinos. Sometimes, the Anasazi solver would not converge for enough eigenvalues to generate an accurate bandgap result, particularly with the conduction band parameters. This can be seen for the execution with np=4 when the bandgap was not calculated as the correct value. This is thought to be a result of tighly clustered eigenvalues in the conduction band. A plot of the calculated eigenvalues (from ARPACK) is shown in figure 6.

Fig. 6. Hamiltonian eigenenergies for a 10 × 10 × 10 quantum dot 13

V. BANDGAP CALCULATION FOR QUANTUM DOTS To demonstrate the physical properties of quantum dots, the PyTrilinos bandstructure calculator was run on quantum dots of various sizes and shapes. The results are shown below: Dimensions Number of Atoms Bandgap (eV) 3 × 3 × 3 54 3.38 4 × 4 × 4 128 2.73 5 × 5 × 5 250 2.33 6 × 6 × 6 432 2.05 7 × 7 × 7 686 1.86 8 × 8 × 8 1024 1.73 9 × 9 × 9 1458 1.62 10 × 10 × 10 2000 1.55 15 × 15 × 15 6750 1.56* 5 × 5 × 10 500 1.97 5 × 10 × 5 500 1.97 10 × 5 × 5 500 1.97 1 × 5 × 5 50 3.39 2 × 5 × 5 100 2.91 2 × 2 × 10 80 2.95 20 × 20 × 20 16000 No convergence The results show a general trend. As the level of quantum confinement increases, so does the bandgap. The results also highlight the problem of the solver converging for very large matrices. The 15 × 15 × 15 results are likely invalid as they do not follow the clearly observed trend of decreasing bandgap for increasing crystal size. There is no obvious physical reason why this might occur. The solver likely did not converge for an eigenvalue near the bandedge. No eigenvalues were found at all for the 20 × 20 × 20 case.

VI.CONCLUSIONSAND FUTURE WORK We presented an overview of the eigenvalue problem and its applications to semiconductor physics. We summarized the features of two software packages to help solve the problem, ARPACK and Trilinos/Anasazi, and gave a demonstration of the relative performance and accuracy of these programs. Using a parallel solver, substantial performance gains were achieved, but issues with making the solvers work well prevents use of the work in its current form. Future work should focus on gaining a better understanding of the inner-workings of parallel eigensolver algorithms in a effort to generate useful solutions to the bandstructure calculation for large crystals. A further exploration of GPU computing and other hardware-accelerated solutions should be considered in the future. 14

REFERENCES

[1] J. Singh, Modern Physics for Engineers. Weinheim, Germany: Wiley-VCH, 2004. [2] “Crystal lattice structures,” 2008. [3] Y. Zhang, Z. Iqbal, S. Vijayalakshmi, and H. Grebel, “Stable hexagonal-wurtzite silicon phase by laser ablation,” Applied Physics Letters, vol. 75, pp. 2758 –2760, Nov. 1999. [4] R. N. Sajjad, “Full band simulation of silicon nanowire field effect transistor,” 2008. [5] D. Vasileska, “Tutorial on semiempirical bandstructure methods,” Jul 2008. [6] A. Rahman, “Exploring new channel materials for nanoscale cmos,” May 2006. [7] G. Klimeck, R. C. Bowen, T. B. Boykin, C. Salazar-Lazaro, T. A. Cwik, and A. Stoica, “Si tight-binding parameters from genetic algorithm fitting,” Superlattices and Microstructures, vol. 27, no. 2-3, pp. 77 – 88, 2000. [8] N. Bernstein, “Surface passivation for tight-binding calculations of covalent solids,” Journal of Physics: Condensed Matter, vol. 19, no. 26, 2007. [9] J. C. Slater and G. F. Koster, “Simplified lcao method for the periodic potential problem,” Phys. Rev., vol. 94, pp. 1498–1524, Jun 1954. [10] E. de Sturler, “Eigenvalues and singular values.”. [11] M. Heroux, R. Bartlett, V. H. R. Hoekstra, J. Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, and A. Williams, “An Overview of Trilinos,” Tech. Rep. SAND2003-2927, Sandia National Laboratories, 2003. [12] M. Sala, M. A. Heroux, and D. M. Day, “Trilinos Tutorial,” Tech. Rep. SAND2004-2189, Sandia National Laboratories, 2004. [13] Y. Zhou and Y. Saad, “Block krylov-schur method for large symmetric eigenvalue problems, tech,” tech. rep., 2004. 15

APPENDIX Python Code for Bandstructure Calculator

1 #!/usr/bin/python

3 # Silicon crystal bandstructure calculator # This program generatesa silicon crystal and calculates its bandgap 5 # using the tight binding model. The eigensolver is implemtned using # the Anasazi package from Trilinos. 7 # Samuel Smith 2011

9 import time from numpy import * 11 from import * from scipy.sparse import * 13 from scipy.io import mmwrite import matplotlib.pyplot as plt 15 from mpl_toolkits.mplot3d import Axes3D from mpl_toolkits.mplot3d.art3d import Line3D 17 from matplotlib.backends.backend_agg import FigureCanvasAgg from PyTrilinos import Epetra, EpetraExt, Anasazi 19 __author__="SamuelJ. Smith" 21 __date__="$Apr7,20117:55:23 PM$"

23 def generate_crystal(basis, A1,A2,A3,A1iter, A2iter, A3iter): #Generatea crystal froma basis vector anda set of primitive vectors 25 #Can set the number of iterations in each direction volume= xcells * ycells * zcells; 27 sites= zeros((volume,3)) index=0; 29 #Calculate the position of all the atoms for i in range(xcells): 31 for j in range(ycells): for k in range(zcells): 33 sites[index,:]= basis+i *A1+j *A2+k *A3 index= index+1; 35 return sites

37 def plot_crystal(anSites, catSites, anAdjacency): fig= plt.figure() 39 ax= Axes3D(fig) ax.scatter(anSites[:,0],anSites[:,1],anSites[:,2], marker='s', c='b') 41 ax.scatter(catSites[:,0],catSites[:,1],catSites[:,2], marker='o', c='r') for i in range(len(anAdjacency)): 43 for j in anAdjacency[i]: ax.add_line(Line3D([anSites[i,0],catSites[j,0]],\ 45 [anSites[i,1],catSites[j,1]],[anSites[i,2],catSites[j,2]],\ linewidth=0.5)) 47 ax.set_xlabel('X') ax.set_ylabel('Y') 49 ax.set_zlabel('Z') fig.suptitle('Silicon Quantum Dot') 51 plt.show() canvas= FigureCanvasAgg(fig) 53 canvas.print_figure("siliconDot.png") 16

55 def compare_atoms(atom1, atom2): if atom1[0]== atom2[0]: 57 if atom1[1]== atom2[1]: if atom1[2]== atom2[2]: 59 return 0 return atom1[2]- atom2[2] 61 return atom1[1]- atom2[1] return atom1[0]- atom2[0] 63 def binary_find_atom(key, sites): 65 max= sites.shape[0]-1 min=0 67 while min < max: pivot=(min+ max) //2 69 compare= compare_atoms(sites[pivot,:], key) if compare==0: 71 return pivot elif compare >0: 73 max= pivot-1 elif compare <0: 75 min= pivot+1 if compare_atoms(sites[min,:], key)==0: 77 return min return -1 #not found 79 def overlap_matrix(l, m, n, band): 81 if band=="Cb": s_s_sig=-1.99285 83 se_se_sig=0 se_s_sig=0 85 s_p_sig=3.84284 se_p_sig=2.34336 87 p_p_sig=12.085945 p_p_pi=-5.40713 89 elif band=="Vb": s_s_sig=-2.39974 91 se_se_sig=0 se_s_sig=0 93 s_p_sig=3.0927 se_p_sig=3.139567 95 p_p_sig=2.81127 p_p_pi=-0.77005 97 #%Calculate overlap integral matrix 99 overlap= zeros((5,5)) #This section of code from mofified code from Mehmet 101 # energy matrix elements from two-center integrals after 1954 Slater and # Koster (UNSTRAINED) 103 overlap[0,0]=s_s_sig #ith atoms orbital overlap with all jth atom orbitals overlap[0,1]=l*s_p_sig 105 overlap[0,2]=m*s_p_sig overlap[0,3]=n*s_p_sig 107 overlap[0,4]=se_s_sig

109 overlap[1,0]=-l*s_p_sig #ith atom px orbital overlap with all jth atom orbitals overlap[1,1]=l**2*p_p_sig+(1-l**2)*p_p_pi 111 overlap[1,2]=l*m*(p_p_sig-p_p_pi) 17

overlap[1,3]=n*l*(p_p_sig-p_p_pi) 113 overlap[1,4]=-l*se_p_sig

115 overlap[2,0]=-m*s_p_sig #ith atom py orbital overlap with all jth atom orbitals overlap[2,1]=m*l*(p_p_sig-p_p_pi) 117 overlap[2,2]=m**2*p_p_sig+(1-m**2)*p_p_pi overlap[2,3]=m*n*(p_p_sig-p_p_pi) 119 overlap[2,4]=-m*se_p_sig

121 overlap[3,0]=-n*s_p_sig; #ith atom pz orbital overlap with all jth atom orbitals overlap[3,1]=n*l*(p_p_sig-p_p_pi) 123 overlap[3,2]=n*m*(p_p_sig-p_p_pi) overlap[3,3]=n**2*p_p_sig+(1-n**2)*p_p_pi 125 overlap[3,4]=-n*se_p_sig

127 overlap[4,0]=se_s_sig #ith atom se orbital overlap with all jth atom orbitals overlap[4,1]=l*se_p_sig 129 overlap[4,2]=m*se_p_sig overlap[4,3]=n*se_p_sig 131 overlap[4,4]=se_se_sig

133 return overlap

135 def hamiltonian(anSites, catSites, anAdjacency, catAdjacency, anBonds, catBonds, band ): atoms= anSites.shape[0]+ catSites.shape[0] 137 numOrbitals=5 #Start witha linked list sparse matrix 139 spHamiltonian= lil_matrix((numOrbitals * atoms, numOrbitals * atoms))

141 #Same atom terms if(band=="Cb"): 143 E_s=-3.65866 #s-like orbital same atom Si E_p=1.67889 #p-like orbital same atom Si 145 E_se=3.87567 #exciteds -like orbital same atom Si elif band=="Vb": 147 E_s=-3.31789 #s-like orbital same atom Si E_p=1.67862 #p-like orbital same atom Si 149 E_se=8.23164 #exciteds -like orbital same atom Si

151 # Anions same atom terms sameAtomAn= diag([E_s, E_p, E_p, E_p, E_se]); 153 #111 ; -1-11 ; 1-11 ; -11-1 directions anHybridTrans=(1.0/2.0) *array([[1,1,1,1],\ 155 [1,-1,-1,1],\ [1,1,-1,-1],\ 157 [1,-1,1,-1]])

159 anHybrid= dot(dot(anHybridTrans,diag([E_s, E_p, E_p, E_p])),anHybridTrans.T)

161 for i in range(anSites.shape[0]): if (anBonds[i,0] and anBonds[i,1] and anBonds[i,2] and anBonds[i,3]): 163 spHamiltonian[(5*i):(5*(i+1)),(5 *i):(5*(i+1))]= sameAtomAn; else: 165 anHybridLocal= anHybrid.copy() #local loop variable if (not anBonds[i,0]): #111 direction dangling bond 167 anHybridLocal[0,0]=30 #30 eV to passivate bond 18

if (not anBonds[i,1]): #-1-11 direction dangling bond 169 anHybridLocal[1,1]=30 #30 eV to passivate bond if (not anBonds[i,2]): #1-1-1 direction dangling bond 171 anHybridLocal[2,2]=30 #30 eV to passivate bond if (not anBonds[i,3]): #-11-1 direction dangling bond 173 anHybridLocal[3,3]=30 #30 eV to passivate bond anHybridLocal= dot(dot(anHybridTrans.T,anHybridLocal),\ 175 anHybridTrans) # Reverse transformation

177 #Ensure matrix is symmetric anHybridLocal[1,0]= anHybridLocal[0,1]; 179 anHybridLocal[2,0]= anHybridLocal[0,2]; anHybridLocal[3,0]= anHybridLocal[0,3]; 181 anHybridLocal[2,1]= anHybridLocal[1,2]; anHybridLocal[3,1]= anHybridLocal[1,3]; 183 anHybridLocal[3,2]= anHybridLocal[2,3];

185 spHamiltonian[(5*i):(5*(i+1)-1),(5 *i):(5*(i+1)-1)]= anHybridLocal; spHamiltonian[(5*(i+1)-1),(5 *(i+1)-1)]= E_se; 187 # Cation same atom terms 189 sameAtomCat= diag([E_s, E_p, E_p, E_p, E_se]); #-111 ; 1-11 ; -1-1-1 ; 11-1 directions 191 catHybridTrans=(1.0/2.0) *array([[1,-1,1,1],\ [1,1,-1,1],\ 193 [1,-1,-1,-1],\ [1,1,1,-1]]) 195 catHybrid= dot(dot(catHybridTrans,diag([E_s, E_p, E_p, E_p])),catHybridTrans.T) 197 for i in range(anSites.shape[0], anSites.shape[0]+ catSites.shape[0]): 199 j=i- anSites.shape[0] #cation index if (catBonds[j,0] and catBonds[j,1] and catBonds[j,2] and catBonds[j,3]): 201 spHamiltonian[(5*i):(5*(i+1)),(5 *i):(5*(i+1))]= sameAtomCat; else: 203 catHybridLocal= catHybrid.copy() #local loop variable if (not catBonds[j,0]): #-111 direction dangling bond 205 catHybridLocal[0,0]=30 #30 eV to passivate bond if (not catBonds[j,1]): #1-11 direction dangling bond 207 catHybridLocal[1,1]=30 #30 eV to passivate bond if (not catBonds[j,2]): #-1-1-1 direction dangling bond 209 catHybridLocal[2,2]=30 #30 eV to passivate bond if (not catBonds[j,3]): #11-1 direction dangling bond 211 catHybridLocal[3,3]=30 #30 eV to passivate bond catHybridLocal= dot(dot(catHybridTrans.T, catHybridLocal),\ 213 catHybridTrans) # Reverse transformation

215 #Ensure matrix is symmetric catHybridLocal[1,0]= catHybridLocal[0,1]; 217 catHybridLocal[2,0]= catHybridLocal[0,2]; catHybridLocal[3,0]= catHybridLocal[0,3]; 219 catHybridLocal[2,1]= catHybridLocal[1,2]; catHybridLocal[3,1]= catHybridLocal[1,3]; 221 catHybridLocal[3,2]= catHybridLocal[2,3];

223 spHamiltonian[(5*i):(5*(i+1)-1),(5 *i):(5*(i+1)-1)]= catHybridLocal; spHamiltonian[(5*(i+1)-1),(5 *(i+1)-1)]= E_se; 19

225 #[Anion, Cation] interactions (top right block of Hamiltonian matrix) 227 catOffset= anSites.shape[0] * numOrbitals for i in range(len(anAdjacency)): 229 for j in anAdjacency[i]: l=(catSites[j,0]- anSites[i,0])/linalg.norm(catSites[j,:]- anSites[i ,:]) 231 m=(catSites[j,1]- anSites[i,1])/linalg.norm(catSites[j,:]- anSites[i ,:]) n=(catSites[j,2]- anSites[i,2])/linalg.norm(catSites[j,:]- anSites[i ,:]) 233 overlap= overlap_matrix(l,m,n, band) spHamiltonian[(5*i):(5*(i+1)),(catOffset+5 *j):(catOffset+5 *(j+1))]= overlap 235 #[Cation, Anion] interactions (bottom left block of Hamiltonian matrix) 237 for j in range(len(catAdjacency)): for i in catAdjacency[j]: 239 l=-(catSites[j,0]- anSites[i,0])/linalg.norm(catSites[j,:]- anSites[ i,:]) m=-(catSites[j,1]- anSites[i,1])/linalg.norm(catSites[j,:]- anSites[ i,:]) 241 n=-(catSites[j,2]- anSites[i,2])/linalg.norm(catSites[j,:]- anSites[ i,:]) overlap= overlap_matrix(l,m,n, band) 243 spHamiltonian[(catOffset+5 *j):(catOffset+5 *(j+1)),(5 *i):(5*(i+1))]= overlap H= spHamiltonian.tocsc(); 245 return H

247 def parallel_eigensolver(comm, matrixDimension,mm_filename): #Code based on exAnasazi_BlockDavidson.src from PyTrilinos examples 249 map= Epetra.Map(matrixDimension,0, comm) #Setup problem 251 nev=60 253 blockSize=1 numBlocks=3 *nev 255 maxRestarts=100 tol=1.0e-7 257 ivec= Epetra.MultiVector(map, blockSize) ivec.Random() 259 matrix= EpetraExt.MatrixMarketFileToCrsMatrix(mm_filename, comm, map) 261 # Create the eigenproblem myProblem= Anasazi.BasicEigenproblem(matrix[1], ivec) 263 # Inform the eigenproblem that matrix is not symmetric 265 myProblem.setHermitian(True)

267 # Set the number of eigenvalues requested myProblem.setNEV(nev) 269 # All done defining problem 271 if not myProblem.setProblem(): print "Anasazi.BasicEigenProblem.setProblem() returned an error" 273 return -1 20

# Define the parameter list 275 myPL={"Which":"SM", #least magnitude eigenvalues "Convergence Tolerance": tol} 277

279 # Create the solver manager mySolverMgr= Anasazi.BlockKrylovSchurSolMgr(myProblem, myPL) 281 # Solve the problem 283 returnCode= mySolverMgr.solve()

285 # Get the eigenvalues sol= myProblem.getSolution() 287 evals= sol.Evals() assert(isinstance(evals, ndarray)) 289 return evals 291 if __name__=="__main__": 293 comm= Epetra.PyComm() iAmRoot= comm.MyPID()==0 295 #Primitive vectors for zincblende crystal A1= array([0,2,2]) #multiplied by 4( not 2) to normalize everything to integers 297 A2= array([2,0,2]) A3= array([2,2,0]) 299 #Basis for anion and cation latices 301 B_an= array([0,0,0]) B_cat= array([1,1,1]) #again, multiplied by 4 303 # Number of iterations to grow the crystal in each direction 305 xcells=20; #Not necessarilyx direction, actually direction ofA 1 ycells=20; 307 zcells=20; volume= xcells * ycells * zcells; 309 anSites= generate_crystal(B_an, A1,A2,A3, xcells, ycells, zcells) 311 catSites= generate_crystal(B_cat, A1,A2,A3, xcells, ycells, zcells) catSites.view('i8,i8,i8').sort(order=['f0'], axis=0) #sort catSites 313 #Crystal direction definitions 315 #anDirections =[ '111',' -1-11',' 1-1-1',' -11-1'] #catDirections =[ '-111',' 1-11',' -1-1-1',' 11-1'] 317 anBonds= zeros((volume,4), dtype= bool) catBonds= zeros((volume,4), dtype= bool) 319 anAdjacency=[] 321 catAdjacency=[] #cation adjacency list for i in range(volume): 323 anAdjacency.append([]) #populate with empty lists catAdjacency.append([]) 325 for i in range(volume): #Very fast,O (n logn) j0= binary_find_atom(anSites[i,:]+[1,1,1], catSites) 327 j1= binary_find_atom(anSites[i,:]+[-1,-1,1], catSites) j2= binary_find_atom(anSites[i,:]+[1,-1,-1], catSites) 329 j3= binary_find_atom(anSites[i,:]+[-1,1,-1], catSites) 21

if j0!=-1: 331 anAdjacency[i].append(j0) catAdjacency[j0].append(i) 333 anBonds[i,0]= True catBonds[j0,2]= True 335 if j1!=-1: anAdjacency[i].append(j1) 337 catAdjacency[j1].append(i) anBonds[i,1]= True 339 catBonds[j1,3]= True if j2!=-1: 341 anAdjacency[i].append(j2) catAdjacency[j2].append(i) 343 anBonds[i,2]= True catBonds[j2,0]= True 345 if j3!=-1: anAdjacency[i].append(j3) 347 catAdjacency[j3].append(i) anBonds[i,3]= True 349 catBonds[j3,1]= True

351 if iAmRoot: #plot_crystal(anSites, catSites, anAdjacency) 353 print "System has"+ str(2 *volume)+" atoms." HCb= hamiltonian(anSites, catSites, anAdjacency, catAdjacency, anBonds, catBonds,"Cb") 355 HVb= hamiltonian(anSites, catSites, anAdjacency, catAdjacency, anBonds, catBonds,"Vb") mmwrite("conductionBand.mtx", HCb) 357 mmwrite("valenceBand.mtx", HVb) print "Generated Hamiltonians" 359 startTime= time.time() vbVals= parallel_eigensolver(comm,5 *2*volume,"valenceBand.mtx") 361 cbVals= parallel_eigensolver(comm,5 *2*volume,"conductionBand.mtx") endTime= time.time() 363 if iAmRoot: print "Solved eigenvalues in"+ str(endTime-startTime)+"s." 365 cbVals= real(cbVals) cbVals.sort() 367 vbVals= real(vbVals) vbVals.sort() 369 if iAmRoot: 371 print "Conduction Band Eigenvalues" print cbVals 373 print "Valence Band Eigenvalues" print vbVals 375 vbMax= max(filter(lambdax:x<0, vbVals)) cbMin= min(filter(lambdax:x>0, cbVals)) 377 bandgap= cbMin- vbMax #plt.scatter(cbVals,ones((1,len(cbVals))), color='blue') 379 #plt.scatter(vbVals,zeros((1,len(vbVals))), color = 'red') print "The calculated bandgap is"+ str(bandgap)+" eV." 381 #plt.spy(HCb, markersize=1)#show sparsity #plt.show() 383 print "Done!" Epetra.Finalize()