arXiv:2003.09536v2 [physics.comp-ph] 21 Jul 2020 approximations Keywords ihcosscin agrta nm 5 than nanowires larger resolved cross-sections atomically with in phonons transport electron on of scattered solution the for allows This shown. ini civdwt arxrn eutoso more of than reductions 200 more of rank improvements Time-to-solution matrix representa- 97%. with than original achieved the is of tion results with per- agreement Virtually self- fect scattering. scattering of retarded impact the the increases of energy part real Kramers-Kronig the the for of relation inclusion Kramers- The the relations. using self-energies representation Kronig scattering basis retarded reduced calcula- of the exact part in real the silicon the includes in of also tion phonons work acoustic This and nanowires. inelastic optical include to on [1] Ref. scattering of atom- is approach the space extends phonons mode work istic this on load, scattering this ease nu- To inelastic included. immense when an atomisti- load yields in merical and transport nanodevices predict resolved to cally used often is method Abstract date Accepted: / date Received: Kubis Tillmann Lemus A. Daniel implementations atomistic function Green’s in nonequilibrium scattering inelastic Mode-space-compatible iu ies,Pru nvriy etLfyte N47907 IN Lafayette, West University, USA Purdue Disease, tious USA 47907, IN Lafayette, 4 West University, 47907, due IN Lafayette, West 3 Dr, Jischke USA Martin 207 versity, 47907 IN Lafayette, West Ave, 2 Northwestern 465 University, 1 [email protected] E-mail: Lemus D. wl eisre yteeditor) the by inserted be (will No. manuscript Noname udeIsiueo namto,Imnlg n Infec- and Immunology Inflammation, of Pur- Institute Devices, Purdue and Materials Predictive for Center Purdue Uni- Purdue , Computational for Network colo lcrcladCmue niern,Purdue Engineering, Computer and Electrical of School × n ekmmr eutoso oeta 7 than more of reductions memory peak and h oeulbimGensfnto (NEGF) function Green’s nonequilibrium The · nlsi scattering Inelastic 1,2,3,4 1 (orcid.org/0000-0002-0759-8848) · NEGF × · oespace Mode · nm. 5 Low-rank × are , ietypootoa otedgeso freedom is of method degrees RGF the the with to solved device, proportional blocks nanowire directly typical the a of In size device. the the of length RGF, cross-section With and the 23]. on 22, depends 21, complexity [20, computational sets basis repre- realistic nanodevices In in for solved 19]. sented been 18, discretized has NEGF [17, be case, matrices that can sparse that block-tridiagonal equations with NEGF recursive for block-wise a solution provides func- [16] Green’s (RGF) recursive method the tion load, 15]. numerical 14, this [13, ease columns To and com- rows matrices of be required thousands of the can consisting to basis due binding cumbersome tight putationally a the in Solving [12]. equations orbitals NEGF atomic elements matrix representing of atom tens binding per contain tight usually ba- empirical 11] Accurate [10, the method nanometers. as cubic such the representations few in sis a atoms only tran- of typical of thousands a volume to but hundreds atoms, contains of sistor number countable a tain nanode- for 9]. in model [8, transport vices well-accepted electron incoherent a and is func- coherent formalism Green’s (NEGF) nonequilibrium Understand- tion The requires 7]. always models. almost 6, effects predictive these 5, optimizing [4, and ing performance drastically such effects device confinement quantum and change scale, interference this At tunneling, 3]. as num- [2, countable atoms a of with ber dimensions logic reached state-of-the-art has of devices scale length characteristic The Introduction 1 rxoeain nteebok clso h re of order the on scales blocks these O ma- on of operations Time-to-solution trix device. the of cross-section the · ( N hrceitcnneetoi eiedmnin con- dimensions device nanoelectronic Characteristic ae Charles James 3 .Mmr clso h re of order the on scales Memory ). 1 · O ( N 2 ). N in 2 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al.

The NEGF equations must be self-consistently solved often called the mode space approach [14, 17, 45, 53]. with the Poisson equation that represents the electro- Since scattering phenomena are important to retain in static effects caused by the quantum mechanical evolu- quantum transport simulations, the goal of this work is tion of the system [8, 24, 25]. This introduces a degree to introduce a low rank approximation that accurately of complexity to the solution of NEGF, since solving retains scattering phenomena and is still based on an the equations is required multiple times. atomistic device representation. An advantage of the NEGF and RGF methods is the ability to introduce incoherent scattering through self- energies, which represent device structure uncertainties 2 Method such as roughness, alloy disorder and geometric errors, and temperature fluctuations through phonons [8, 26, 2.1 Mode space approach in tight binding 27, 28, 29, 30, 31, 32, 33, 34, 35]. However, the intro- Low-rank approximations such as the mode space method duction of incoherent scattering into the RGF solution [14, 50, 53] follow a common process: The system’s introduces yet another degree of complexity through Hamiltonian is transformed into a basis representation the self-consistent solution of retarded (GR) and lesser that allows for filtering of degrees of freedom that are (G<) Green’s functions. Their equations read symboli- unlikely to contribute to device operation. This reduces cally the rank of the system’s Hamiltonian and thus the com- GR = (EI − H − ΣR)−1, (1) plexity of the NEGF equations. Choosing the eigen- vectors of the Hamiltonian according to their eigen- energies often provides a good measure of filtering empty < R < R† G = G Σ G , (2) states [1, 52]. Unfortunately, this direct filtering fails in and the respective scattering self-energies tight binding due to the appearance of spurious states [1, 53]. The method developed by Mil’nikov et al. [1] R R R R < < R Σ = G D + G D + G D , (3) removes these spurious states. For completeness we repeat this method here: The first step of the method is to obtain the eigenvectors Σ< = G

R,< R,< this work are homogeneous, and only require one basis of Σacoustic and Σoptical of Eqs. 4-6 of Ref. [2] as Σi,j to transform all portions of the nanowire, a heteroge- and each element of a Green’s function matrix GR,< neous device of varying cross-sections or materials may as Gk,l. We also define C as the product of all scalar be transformed by use of multiple basis transformation factors involved in each of the Eqs. 4-6 of Ref. [2]. The matrices Φ. It is therefore possible to introduce explicit form factor elements Fi,j,k,l are applied to the Green’s defects such as roughness and impurities when Φ is ob- function elements Gk,l as follows: tained with such defects. Σi,j = CFi,j,k,lGk,l. (9) Xl Xk 2.2 Mode generation in NEMO5 In this way, all matrices remain in mode space. In this work, the mode space basis states are deter- mined by following Mil’nikov et al. [1] with the Mod- eSpace solver [53] of the multipurpose nanodevice sim- 2.5 Approximation of form factor ulation tool NEMO5 [54, 55]. Details of this algorithm can be found in Refs. [1] and [53]. Ratios of the re- The form factor F is four-dimensional and scales rapidly 4 duced n and original N Matrix ranks n/N ≤ 10% are with the number of modes in terms of memory (O(n )), 4 regularly achieved with this NEMO5 solver while the time for construction (O(n N)) and time for applica- 4 transport physics are preserved [14, 53]. This has en- tion (O(n )). This can easily result in the form factor abled speedups for ballistic NEGF simulations of up to construction taking a significant amount of time and 10,000 times [17]. memory and application taking a significant amount of time. Similarly to Ref. [57], we have observed that eliminating off-diagonal elements of the form factor F , 2.3 Expanding atomistic mode space to incoherent such that Fi,j,k,l = 0 for i =6 j and k =6 l, provides scattering simulations reasonable physical results. This approximation corre- sponds to the lack of interaction between modes which In this work, this method is augmented to handle inco- are uncoupled in real space. Therefore, no inter-mode herent scattering that allows for intermode transitions. scattering takes place when the form factor is diago- Scattering self-energies for the scattering of electrons nal. This does not restrict inelastic scattering, since on phonons are originally defined in a real space rep- the electronic energy is a mode-independent parame- resentation. Electron scattering on acoustic and optical ter. This approximation provides a memory-thin form phonons via deformation potentials is considered fol- factor with memory scaling on the order of O(n2). The lowing Ref. [2] (cf. Eqs. 1-6 of Ref. [2]). Calculations construction complexity of the form factor is also re- in polar materials (such as InAs) include scattering of duced to O(n2N), while the application complexity is electrons on polar optical phonons as well [56]. Since reduced to O(n2). Note that although this yields an ac- these self-energies are formulated in real space and re- R,< R,< curate calculation of self-energies Σacoustic and Σoptical, quire position information, an issue arises as this in- mode coupling terms (Gk,l for k =6 l) must remain for formation is no longer directly available after a mode an accurate calculation of electron density [57]. space basis transformation.

2.6 Inclusion of real part of retarded scattering 2.4 Form factor transformation self-energies using Kramers-Kronig relations To make position information available for the solution The general form of the retarded scattering self-energy of scattering self-energies while limiting the number of ΣR includes a principal value integral P of large com- transformations, a form factor is introduced. This form putational burden [1, 2, 58, 59, 60, 61]. ΣR(E) can be factor is fully explained by Ref. [57]. The form factor F obtained by its separate real and imaginary parts [58, contains all modes involved in the respective scattering 59, 60] such that process: R ′ R i ′ Im[Σ (E )] Fi,j,k,l = φi(ν)φj (ν)φk(ν)φl(ν) (8) Re[Σ (E)] = P dE ′ . (10) Xν π Z E − E where i,j,k,l are indices of the n real modes (columns) Typically, the real part of the retarded self-energy is of the transformation matrix Φ. The index ν is iter- entirely excluded, and although the approximation of- ated through the N rows of Φ. We define each element ten yields reasonable physical results [1, 60], it is known 4 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al. that excluding the real part causes deviations. In par- ticular, off-state current densities are underestimated in this approximation [2, 58, 59]. Note that the real part of retarded self-energies shifts resonance energies and thus influences band edges and threshold voltages [37]. In this work, the exact real part of the retarded scatter- ing self-energies is obtained using the Kramers-Kronig Fig. 1 Schematic of the nanowire devices considered in this R work with a w × w cross-section and a 1 nm gate oxide relations [62]. For each matrix element Σi,j of a re- R layer surrounding the center of the device. s labels the source tarded self-energy, its real part Σ(E)i,j,real is obtained length, c the channel length and d the drain length of the by applying the Kramers-Kronig relation on its imag- device R inary part Σ(E)i,j,imag . Using a Hilbert transform H, the real part becomes:

R R [2, 28]. For polar materials, scattering on polar optical Σ(E)i,j,real = H(Σ(E)i,j,imag). (11) phonons was included as well. The inhomogeneous en- ergy grid was generated using an adaptive grid genera- This Hilbert transform is performed using a fast Fourier tor in NEMO5 [2]. Due to the high numerical load of the transform (FFT), a multiplication in the Fourier space, Kramers-Kronig relation for scattering in tight bind- and an inverse FFT afterwards [63]. ing representations, the real parts of all scattering self- energies in the benchmarking scenarios of Secs. 3.2, 3.3 and 3.4 were neglected. This is not the case in Sec. 3.5, 3 Results and discussion where a non-zero real part of the scattering self-energy will be included. 3.1 Simulation setup An assessment of the real part of the retarded self- To ensure the validity of the presented low-rank ap- energies was done by comparison of the resulting current- proximation for transport in nanowire devices includ- voltage (I-V) characteristics shown in Sec. 3.5. For this ing inelastic scattering, multiple tests were performed assessment, the material of the transistor in Fig. 1 was with NEMO5 [54, 64, 65]. First, for validation, results chosen to be InAs, with two tested device widths w = of simulations in a mode space basis were benchmarked 2.42 nm and w = 3.63 nm. Both devices had an s = against calculations in the original tight binding ba- 5.97 nm p-type source doped at 5×1019 cm-3, an n-type sis. These result comparisons are shown in Sec. 3.2. d = 9.66 nm drain doped at 2 × 1019 cm-3 and a c = Second, multiple performance tests comparing time-to- 14.66 nm central undoped region. A source-drain bias of solution and peak memory improvements in mode space 0.3 V was applied. Since TFETs require the occupation are shown in Sec. 3.3 for various device widths w. The of both electrons and holes, the method of Ref. [1] was device used for both validation and performance tests applied to obtain modes for a wide energy window that was a w × w × 20.65 nm silicon nanowire as shown included bands near the conduction and valence band in Fig. 1, where w is the variable width in nm of the edges. The inclusion of holes also necessitates a proper square cross-section of the device. The device had a definition of electrons and holes as states tunnel from 1 nm gate oxide layer surrounding the central region. valence band to conduction band in the TFET. An in- The original basis was a 10-band sp3d5s∗ tight binding terpolation method was applied as defined by Ref. [2] model using the parameter set of Ref. [66]. A source- to avoid sharp transitions from holes to electrons or drain bias of 0.2 V was applied to the device. Note that vice versa. Simulations included optical phonon, acous- the applied source-drain bias does not affect the valid- tic phonon and polar-optical phonon scattering to rep- ity of the presented method, and mode space calcula- resent the polar nature of InAs. Due to the non-local tions with higher source-drain voltages can be found nature of polar-optical phonon scattering, such a calcu- in Refs. [17, 53, 67]. The device was NIN doped, with lation would be very expensive even in a reduced basis. the s = 5.97 nm source and d = 6.66 nm drain re- To avoid this, a local scattering calculation was per- gions having a 1020 cm-3 doping density and the central formed using a cross-section-dependent compensation c = 8.02 nm intrinsic region having a 1015 cm-3 dop- factor defined in Ref. [29]. Compensating scaling fac- ing density. The lengths s, d and c are labeled in Fig. 1. tors of 30.0 and 26.56 were used in the calculation of Simulations of Si devices included both inelastic optical polar-optical phonon scattering for the w = 2.42 nm phonon and elastic acoustic phonon deformation poten- and w = 3.63 nm devices respectively. Note, the form tial scattering, applied to the NEGF equations through factor approximation as described in Sec. 2.5 was not self-energies in the self-consistent Born approximation performed in this case. Mode-space-compatible inelastic scattering in atomistic nonequilibrium Green’s function implementations 5

Fig. 2 Current-gate-voltage (I-V) characteristic curve of a Fig. 3 I-V curve of the 3.26 nm × 3.26 nm × 20.65 nm 3.26 nm × 3.26 nm × 20.65 nm silicon nanowire. The agree- silicon nanowire of Fig. 2 with an approximate form factor. ing results prove the mode space approach provides a valid The agreeing results justify the form factor approximation physical model. All simulations include inelastic scattering on phonons

3.2 Validation of mode space simulation results

For validation, a silicon nanowire of width w =3.26 nm was used (see Fig. 1). The mode space simulation had a reduction ratio n/N of 2.8%, transforming matrix blocks from 2880 × 2880 matrices to 81 × 81 matrices. NEGF was solved using the scattering-compatible RGF algorithm [28]. Fig. 2 shows the current-voltage (I-V) characteristic curves of both the original tight binding basis and mode space basis for sweeping gate biases ranging from -0.1 V to 0.5 V. The mode space scatter- ing results of Fig. 2 were obtained using the full form factor as described in Sec. 2.4. The virtually identical results of mode space and tight binding show that the < mode space low-rank approximation provides a valid and highly efficient model for quantum transport sim- ulations with inelastic scattering. Fig. 3 shows that the Fig. 4 Potential profile (contour plot) of the center cross- × × mode space approach with approximate form factors, section of the simulated 3.26 nm 3.26 nm 20.65 nm silicon nanowire device in original tight binding basis. Contour lines as discussed in Sec. 2.5, also yields results very close to represent the relative error of the potential in mode space those of the original basis calculations. Fig. 4 shows a compared to tight binding representation contour plot of the potential profile of the center cross- section of the device for a tight binding simulation at the applied gate bias of 0.5 V. Contour lines show the 3.3 Assessment of computational performance relative error of the mode space potential profile re- sults relative to the original tight binding data. Note The device in Fig. 1 was used with varying widths w to that the mode space method agrees with NEGF cal- measure performance improvements in NEMO5 time- culations in the original tight binding representation to-solution and peak memory. Each width also had a for many wire cross-sections as similarly well as those corresponding mode space transformation matrix with shown in Figs. 2 and 3. Similar benchmark data can be its respective number of modes. Correspondingly, the found in Refs. [1, 53, 67]. reduction ratios n/N in Figs. 5 and 6 vary. The exact 6 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al. width values simulated were 4, 6, 8, 10 and 12 silicon unit cells and the respective reduction ratios n/N were 5.6%, 2.8%, 2.9%, 2.8% and 3.0%. The lattice param- eter of silicon was assumed to be 0.54 nm. All perfor- mance simulations were performed with the same in- puts of Sec. 3.2, with the exception that a fixed number of 256 energies was used. Since results for the approxi- mate form factor have been shown in Fig. 3 to closely match those of the full form factor, mode space data for performance comparisons in this section were gen- erated using the approximate form factor. The Green’s functions were solved for 256 energies with 1 energy per MPI process. Each MPI process was designated to a 32- core node on the Blue Waters petascale supercomputer at the University of Illinois at Urbana-Champaign [68]. Fig. 5 Time-to-solution for a single self-consistent Born iter- Each MPI process was assigned 32 OpenMP threads ation (left) and speedup ratio (right) with low-rank approxi- for multithreaded matrix operations and form factor mations for the 20.65 nm silicon nanowire of Fig. 1 for various w construction and application. Fig. 5 shows the average widths . The tight binding timing data was extrapolated be- yond w = 5.43 nm using a power fitting function shown as a time (of 6 iterations) to compute a single self-consistent dashed line. All simulations include inelastic scattering Born iteration. Each self-consistent Born iteration in- cludes the time to compute the RGF algorithm as well as the time to compute lesser scattering self-energies Σ< and retarded scattering self-energies ΣR for opti- cal and acoustic deformation potential inelastic scatter- ing. The calculation of scattering self-energies involves a large degree of communication between MPI processes as discussed in Ref. [2].

The timing shown does not include the calculation of other aspects of quantum transport such as the so- lution of the Poisson’s equation and the generation of the adaptive energy grid. This exclusion of such cal- culations can be justified by the fact that the time-to- solution is negligible when compared to the solution of NEGF. In production runs, those calculations are per- Fig. 6 Peak memory (left) and memory improvement ra- formed only a small fraction of times when compared to tio (right) with low-rank approximations for 20.65 nm silicon the multiple self-consistent Born iterations per Poisson nanowires of Fig. 1 for various widths w. All simulations in- iteration. The maximum speedup obtained with low- clude inelastic scattering rank approximations for an iteration in this work was of 209.5 times. Due to computational limitations, the of w = 6.52 nm, which results in a predicted peak mem- tight binding simulation for the point w = 6.52 nm was ory reduction of 5.67×. not assessed, since a single iteration would have taken about 38,000 seconds according to a power fitting func- tion of the existing data. By extrapolating the data, the 3.4 Simulating beyond existing capabilities speedup for w = 6.52 nm is predicted to be of 187.5 times, as is shown in Fig. 5. It can be noted that this is With the time-to-solution and memory footprint sig- lower than the speedup of w = 5.43 nm. This is likely nificantly reduced, the opportunity to simulate larger due to the fact that the reduction ratio for w = 6.52 nm devices with complex physical phenomena such as in- is slightly higher at 3.0% than for w = 5.43 nm at 2.8%. coherent scattering of multiple types (phonons, rough- Fig. 6 shows the peak memory of the same simulations ness, impurities) is now accessible. Ref. [2] describes run for Fig. 5. The maximum peak memory reduction the simulation of a circular nanowire, with acoustic and was of 7.14×. Similarly to Fig. 5, a power fitting func- optical deformation potential scattering and a 10-band tion was used to predict the peak memory for a device tight binding basis. The diameter of the cross-section Mode-space-compatible inelastic scattering in atomistic nonequilibrium Green’s function implementations 7

-5 1.2x10 charge-self-consistent calculations is misleading, how-

ballistic ever, since scattering impacts the density of states: The

scattering

-5 1.0x10 Poisson potential would compensate some of the density of state differences to accommodate the device’s dop-

-6 8.0x10 ing profile. Therefore, for this comparison only, scat- tering self-energies and Green’s functions were solved

-6 6.0x10 self-consistently with a fixed Poisson potential. That potential was deduced from a converged ballistic trans- Current(A)

-6 4.0x10 port solution of the same device. The calculations were performed for the on-state bias of 0.4 V. Table 1 shows

-6

2.0x10 the 2-norm values of the real and imaginary parts of the ΣR when the Kramers-Kronig relation is observed and

0.0 when the real part is set to 0. In both of the simulated -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 cross-sections, the norm of the real part is comparable Gate bias (V) to the norm of the imaginary part. Due to the reduced Fig. 7 Comparison of I-V characteristics for a 5.43 nm × size of self-energy matrices, it was possible to perform 5.43 nm × 20.65 nm n-type FET device for simulations with the Hilbert transform on all energies without introduc- and without inelastic scattering. The reduction ratio n/N ing memory issues and long computation times. There- for this simulation was 2.8%. This device size significantly exceeds the largest nanowires possible to resolve in a scattered fore in this work, all energy points generated by the NEGF calculation in the original atomic representation adaptive energy grid in NEMO5 [2] were included in the Hilbert transform of the self-energies. of this device was 3 nm, and the device length was 27 nm. Solution of an I-V characteristic curve took ap- Figs. 8 and 9 show the I-V characteristics of the proximately 275 hours on 330 cores on the Blue Waters w = 2.42 nm and w = 3.64 nm devices respectively. petascale supercomputer. The peak memory was 60 GB Both figures show the differences of the two scattering R per node, which is close to the maximum node memory models (with and without the real part of Σ ) when of 64 GB. This device therefore approaches the limit compared to the ballistic transport. Incoherent scatter- of what can be simulated in a full basis representation ing increases the off-current density due to scattering- such as tight binding. To demonstrate the capability supported gate leakage and decreases the on-current of solving larger devices in a reduced basis, a full I-V density due to stronger back-scattering. This is in agree- curve was generated for a square nanowire of Fig. 1 with ment with findings in literature [2, 38, 58, 60, 69]. It w = 5.43. Due to the different cross-sectional geometry should also be noted that the real part of scattering this nanowire has over 4 times more atoms in the cross- self-energies has notable effects on devices of any di- section than the circular nanowire of Ref. [2]. The re- mension, e.g., 1D, 2D and 3D [29, 58]. The impact of R duction ratio n/N for the square nanowire was of 2.8%. the real part of Σ becomes more apparent in situa- Fig. 7 shows an I-V characteristic curve for optical and tions with larger scattering strengths, e.g. when higher acoustic phonon deformation potential scattering com- temperatures, impurity scattering, or surface roughness pared to that of a ballistic simulation. As expected, the scattering are present. Fig. 10 shows the I-V character- on-current density is reduced by the inelastic scattering istics of the device in Fig. 9 solved with NEGF when all on phonons [2, 36, 58]. The scattered transport simula- electron-phonon scattering self-energies were multiplied tion of the w = 5.43 nm device took approximately 160 by 2. More significant gate leakage and back-scattering total hours on 16,384 cores (2.62 million core hours) effects can be observed than that shown in Fig. 9. More R on the Blue Waters supercomputer. We estimate that importantly, however, Fig. 10 shows that the exact Σ the same I-V calculation would take about 550 million with a non-zero real part provides even higher scat- core hours and 168 GB of memory in the original tight tering strengths than the approximate, zero real part binding basis representation. case. This impact is particularly visible if the charge self-consistency does not compensate deviations from the doping profile. This is exemplified in Fig. 1 of Sup- 3.5 Assessment of real part of retarded self-energies plementary Material 1 which shows the charge density in a transistor after a single scattering iteration both The 2-norms of the real and imaginary parts of the re- in mode space and full tight binding representations. tarded self-energy ΣR can show the relative amplitude This figure also confirms the physics of the real part of of their contributions. Comparing the 2-norms of fully scattering is correctly covered in mode space. 8 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al.

w R width (nm) zero real Σ Kramers-Kronig -6 10

real imag. real imag. ballistic

R 2.42 0 0.1184 0.0965 0.1130 zero real part

-7 Kram ers-Kronig relation 3.64 0 0.1080 0.0920 0.1104 10 R

-7

Table 1 2-norms of the retarded scattering self-energies Σ 1.8x10

-7

1.6x10

-7 solved in NEGF simulations of two InAs TFETs with a width -8

1.4x10 10 w -7 and an applied gate bias of 0.4 V. The norm of the real part, 1.2x10

-7 calculated using the Kramers-Kronig relations, is comparable 10

0.34 0.36 0.38 0.40 to the norm of the imaginary part, and must have a similar -9 10

significance to simulation results Current(A)

-10

10

-8

10 -11

ballistic 10

R

zero real part -11

10

Kram ers-Kronig relation -9 -0.20 -0.18

10

-0.2 -0.1 0.0 0.1 0.2 0.3 0.4

-9

2x10

-10 Gate bias (V)

10

-9 10 Fig. 10 Similar to Fig. 9, I-V characteristics of a 3.64 nm ×

-12 0.34 0.36 0.38 0.40

10 -11 3.64 nm × 30.29 nm InAs TFET device, but with scattering 10

Current(A) self-energies multiplied by 2

-12

10

-13 10 4 Conclusion

-13

10

0.08 0.10 In this work, the atomistic mode space approach of

0.10 0.15 0.20 0.25 0.30 0.35 0.40 Ref. [1] has been augmented to handle inelastic scatter-

Gate bias (V) ing on various types of phonons. The method was veri- Fig. 8 I-V characteristics for a 2.42 nm × 2.42 nm × fied and benchmarked against results solved in the orig- 30.29 nm InAs TFET device solved in NEGF including inco- inal representation for silicon nanowires of various sizes. herent scattering on polar optical phonons, acoustic phonons Valid results were achieved with matrix ranks reduced and optical deformation potential phonons. Scattering, even R down to 2.8% of their original rank. Time-to-solution without a real part of Σ , increases the off-current densities and lowers on-current densities. When the real part of the re- was improved by up to 209.5 times, and peak memory tarded self-energy ΣR is included, the Kramers-Kronig rela- was improved by up to 7.14 times. A full I-V calculation tions are obeyed and scattering shows an even larger impact. was performed in mode space for a 5.43 nm × 5.43 nm The insets zoom into the first two and the last two points of × 20.65 nm silicon nanowire in a sp3d5s∗ tight binding the curves basis, which represents a system size larger than can

-6 normally be atomically simulated including inelastic 10 ballistic phonon scattering. The solution of the real part of the

R zero real part retarded scattering self-energies ΣR with the Kramers- -7 Kram ers-Kronig relation 10 Kronig relations ensures the exact treatment of incoher-

-7

1.8x10

-7

1.6x10 ent scattering. It is demonstrated with calculations of

-7 -8

1.4x10 10 R

-7 various nanowires that the real part of Σ contributes

1.2x10

-7 10 to transport similarly to the imaginary part. Therefore,

0.34 0.36 0.38 0.40

-9

10 a reliable prediction of transport in NEGF must solve Current(A) for the total complex ΣR.

-10

10

-11 Acknowledgements The authors acknowledge financial sup- 10 port by Silvaco Inc. This research is part of the Blue Waters -11 10 sustained-petascale computing project, which is supported by

-0.20 -0.18 the National Science Foundation (award number ACI 1238993)

-0.2 -0.1 0.0 0.1 0.2 0.3 0.4 and the state of Illinois. Blue Waters is a joint effort of the

Gate bias (V) University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. This work is also Fig. 9 Similar to Fig. 8, I-V characteristics of a 3.64 nm part of the Advanced Nanoelectronic Device Design with Atom- × 3.64 nm × 30.29 nm InAs TFET device. The effects of R istic Simulations PRAC allocation support by the National scattering with and without a real part of Σ are larger than Science Foundation (award number OCI-1640876). The au- in the smaller wire of Fig. 8 thors acknowledge the Texas Advanced Computing Center Mode-space-compatible inelastic scattering in atomistic nonequilibrium Green’s function implementations 9

(TACC) at The University of Texas at Austin for providing ism: From boundary conditions to strain calcula- Stampede2 resources that have contributed to the research re- tions. Physical Review B, 74(20):205323, 2006. sults reported within this paper. URL: http://www.tacc.utexas.edu13. Robert Andrawis, Jose David Bermeo, James Charles, Jianbin Fang, Jim Fonseca, Yu He, Ger- hard Klimeck, Zhengping Jiang, Tillmann Ku- References bis, Daniel Mejia, et al. Nemo5: Achieving high-end internode communication for performance 1. Gennady Milnikov, Nobuya Mori, and Yoshi- projection beyond moore’s law. arXiv preprint nari Kamakura. Equivalent transport models in arXiv:1510.04686, 2015. atomistic quantum wires. Physical Review B, 14. Aryan Afzalian, J Huang, H Ilatikhameneh, 85(3):035317, 2012. J Charles, D Lemus, J Bermeo Lopez, S Perez Ru- 2. James Charles, Prasad Sarangapani, Roksana biano, T Kubis, M Povolotskyi, G Klimeck, et al. Golizadeh-Mojarad, Robert Andrawis, Daniel Mode space tight binding model for ultra-fast simu- Lemus, Xinchen Guo, Daniel Mejia, James E lations of iii-v nanowire mosfets and heterojunction Fonseca, Michael Povolotskyi, Tillmann Kubis, tfets. In Computational Electronics (IWCE), 2015 et al. Incoherent transport in nemo5: realistic International Workshop on, pages 1–3. IEEE, 2015. and efficient scattering on phonons. Journal of 15. Hong Guo. Nano molecular modeling method, Computational Electronics, 15(4):1123–1129, 2016. April 19 2005. US Patent App. 11/568,103. 3. International technology roadmap for semiconduc- 16. Roger Lake, Gerhard Klimeck, R Chris Bowen, and tors. https://www.itrs2.net/, 2017. [Online]. Dejan Jovanovic. Single and multiband model- 4. Robert F Pierret. device fundamen- ing of quantum electron transport through layered tals. Pearson Education India, 1996. semiconductor devices. Journal of Applied Physics, 5. Supriyo Datta. Quantum transport: atom to tran- 81(12):7845–7869, 1997. sistor. Cambridge University Press, 2005. 17. Aryan Afzalian, Tim Vasen, Peter Ramvall, and 6. S Ciraci, A Buldum, and Inder P Batra. Quantum Matthias Passlack. An efficient tight-binding mode- effects in electrical and thermal transport through space negf model enabling up to million atoms iii- nanowires. Journal of Physics: Condensed Matter, v nanowire mosfets and tfets simulations. arXiv 13(29):R537, 2001. preprint arXiv:1705.00909, 2017. 7. Guo Jing, Ali Javey, Hongjie Dai, and Mark Lund- 18. Quan Chen, Jun Li, Chiyung Yam, Yu Zhang, Ngai storm. Performance analysis and design optimiza- Wong, and Guanhua Chen. An approximate frame- tion of near ballistic carbon nanotube field-effect work for quantum transport calculation with model transistors. IEDM Technical Digest. IEEE Interna- order reduction. Journal of Computational Physics, tional Electron Devices Meeting, 2004., pages 703– 286:49–61, 2015. 706, 2004. 19. Andrey Kuzmin, Mathieu Luisier, and Olaf Schenk. 8. Supriyo Datta. Nanoscale device modeling: the Fast methods for computing selected elements of greens function method. Superlattices and mi- the greens function in massively parallel nanoelec- crostructures, 28(4):253–278, 2000. tronic device simulations. In European Conference 9. Mathieu Luisier, Andreas Schenk, Wolfgang Ficht- on Parallel Processing, pages 533–544. Springer, ner, and Gerhard Klimeck. Atomistic simulation of 2013. nanowires in the s p 3 d 5 s* tight-binding formal- 20. Daniel Valencia, Evan Wilson, Prasad Sarangapani, ism: From boundary conditions to strain calcula- Gustavo A Valencia-Zapata, Gerhard Klimeck, tions. Physical Review B, 74(20):205323, 2006. Michael Povolotskyi, and Zhengping Jiang. Grain 10. Timothy B Boykin, Jan PA Van der Wagt, and boundary resistance in nanoscale copper intercon- James S Harris Jr. Tight-binding model for nections. In Simulation of Semiconductor Processes gaas/alas resonant-tunneling diodes. Physical Re- and Devices (SISPAD), 2016 International Confer- view B, 43(6):4777, 1991. ence on, pages 105–108. IEEE, 2016. 11. Yaohua P Tan, Michael Povolotskyi, Tillmann Ku- 21. Kuang-Chung Wang, Teodor K Stanev, Daniel Va- bis, Timothy B Boykin, and Gerhard Klimeck. lencia, James Charles, Alex Henning, Vinod K Tight-binding analysis of si and gaas ultrathin bod- Sangwan, Aritra Lahiri, Daniel Mejia, Prasad ies with subatomic wave-function resolution. Phys- Sarangapani, Michael Povolotskyi, et al. Control ical Review B, 92(8):085301, 2015. of interlayer delocalization in 2h transition metal 12. Mathieu Luisier, Andreas Schenk, Wolfgang Ficht- dichalcogenides. arXiv preprint arXiv:1703.02191, ner, and Gerhard Klimeck. Atomistic simulation of 2017. nanowires in the s p 3 d 5 s* tight-binding formal- 10 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al.

22. Hesameddin Ilatikhameneh, Yaohua Tan, Bozidar Applied Physics, 113(1), 2013. Novakovic, Gerhard Klimeck, Rajib Rahman, and 33. R. Valin, M. Aldegunde, A. Martinez, and J. R. Joerg Appenzeller. Tunnel field-effect transistors Barker. Quantum transport of a nanowire field- in 2-d transition metal dichalcogenide materials. effect transistor with complex phonon self-energy. IEEE Journal on Exploratory Solid-State Compu- Journal of Applied Physics, 116(8), 2014. tational Devices and Circuits, 1:12–18, 2015. 34. Yu He, Yu Wang, Gerhard Klimeck, and Tillmann 23. Fan W Chen, Michael Manfra, Gerhard Klimeck, Kubis. Non-equilibrium Green’s functions method: and Tillmann Kubis. Nemo5: Why must we treat Non-trivial and disordered leads. Applied Physics topological insulator nanowires atomically? In Letters, 105(21), 2014. Proc. IWCE, 2015. 35. E. B. Ramayya, D. Vasileska, S. M. Goodnick, 24. Supriyo Datta. The non-equilibrium green’s func- and I. Knezevic. Electron transport in silicon tion (negf) formalism: An elementary introduction. nanowires: The role of acoustic phonon confinement In Electron Devices Meeting, 2002. IEDM’02. In- and surface roughness scattering. Journal of Ap- ternational, pages 703–706. IEEE, 2002. plied Physics, 104(6), 2008. 25. A Trellakis, AT Galick, A Pacelli, and U Ra- 36. T Kubis, C Yeh, P Vogl, A Benz, G Fasching, vaioli. Iteration scheme for the solution of the and C Deutsch. Theory of nonequilibrium quan- two-dimensional schr¨odinger-poisson equations in tum transport and energy dissipation in tera- quantum structures. Journal of Applied Physics, hertz quantum cascade lasers. Physical Review B, 81(12):7880–7884, 1997. 79(19):195323, 2009. 26. T Kubis and P Vogl. Assessment of approximations 37. Tillmann Christoph Kubis. Quantum transport in in nonequilibrium greens function theory. Physical semiconductor nanostructures. PhD thesis, Techni- Review B, 83(19):195304, 2011. cal University of Munich, 2009. 27. Yu He, Lang Zeng, Tillmann Kubis, Michael Po- 38. Mathieu Luisier and Gerhard Klimeck. Simula- volotskyi, and Gerhard Klimeck. Efficient solu- tion of nanowire tunneling transistors: From the tion algorithm of non-equilibrium greens functions wentzel–kramers–brillouin approximation to full- in atomistic tight binding representation. In Proc. band phonon-assisted tunneling. Journal of Applied 15th Int. Workshop Comput. Electron, pages 1–3, Physics, 107(8):084507, 2010. 2012. 39. Y. Lee, M. Bescond, N. Cavassilas, D. Logoteta, 28. MP Anantram, Mark S Lundstrom, and Dmitri E L. Raymond, M. Lannoo, and M. Luisier. Quan- Nikonov. Modeling of nanoscale devices. Proceed- tum treatment of phonon scattering for modeling ings of the IEEE, 96(9):1511–1550, 2008. of three-dimensional atomistic transport. Physical 29. Prasad Sarangapani, Yuanchen Chu, James Review B, 95(20):1–6, 2017. Charles, Gerhard Klimeck, and Tillmann Kubis. 40. Y. Lee, M. Lannoo, N. Cavassilas, M. Luisier, and Band-tail Formation and Band-gap Narrowing M. Bescond. Efficient quantum modeling of inelas- Driven by Polar Optical Phonons and Charged tic interactions in nanodevices. Physical Review B, Impurities in Atomically Resolved III-V Semi- 93(20):1–14, 2016. conductors and Nanodevices. Physical Review 41. H. Mera, M. Lannoo, C. Li, N. Cavassilas, and Applied, 12(4):1, 2019. M. Bescond. Inelastic scattering in nanoscale de- 30. Wenxing Zhang, Christophe Delerue, Yann Michel vices: One-shot current-conserving lowest-order ap- Niquet, Guy Allan, and Enge Wang. Atomistic proximation. Physical Review B - Condensed Mat- modeling of electron-phonon coupling and trans- ter and Materials Physics, 86(16):2–5, 2012. port properties in n -type [110] silicon nanowires. 42. K Miao, S Sadasivam, J Charles, G Klimeck, Physical Review B - Condensed Matter and Mate- TS Fisher, and T Kubis. B¨uttiker probes for dis- rials Physics, 82(11):2–8, 2010. sipative phonon quantum transport in semicon- 31. Viet Hung Nguyen, Fran¸cois Triozon, Fr´ed´eric D.R. ductor nanostructures. Applied Physics Letters, Bonnet, and Yann Michel Niquet. Performances 108(11):113107, 2016. of strained nanowire devices: Ballistic versus 43. Yuanchen Chu, Jingjing Shi, Kai Miao, Yang scattering-limited currents. IEEE Transactions on Zhong, Prasad Sarangapani, Timothy S. Fisher, Electron Devices, 60(5):1506–1513, 2013. Gerhard Klimeck, Xiulin Ruan, and Tillmann Ku- 32. M. Aldegunde, A. Martinez, and J. R. Barker. bis. Thermal boundary resistance predictions with Study of individual phonon scattering mechanisms non-equilibrium Green’s function and molecular and the validity of Matthiessen’s rule in a gate- dynamics simulations. Applied Physics Letters, all-around silicon nanowire transistor. Journal of 115(23), 2019. Mode-space-compatible inelastic scattering in atomistic nonequilibrium Green’s function implementations 11

44. Junzhe Geng, Prasad Sarangapani, Kuang Chung 55. Nemo5. https://engineering.purdue.edu/gekc Wang, Erik Nelson, Ben Browne, Carl Wordel- ogrp/software-projects/nemo5/, 2017. [Online]. man, James Charles, Yuanchen Chu, Tillmann Ku- 56. F. A. Riddoch and B. K. Ridley. On the scattering bis, and Gerhard Klimeck. Quantitative Multi- of electrons by polar optical phonons in quasi-2D Scale, Multi-Physics Quantum Transport Modeling quantum wells. Journal of Physics C: Solid State of GaN-Based Light Emitting Diodes. Physica Sta- Physics, 16(36):6971–6982, 1983. tus Solidi (A) Applications and Materials Science, 57. Aryan Afzalian. Computationally efficient self- 215(9):1–7, 2018. consistent born approximation treatments of 45. R Venugopal, Magnus Paulsson, S Goasguen, phonon scattering for coupled-mode space non- Supriyo Datta, and MS Lundstrom. A simple equilibrium greens function. Journal of Applied quantum mechanical treatment of scattering in Physics, 110(9):094517, 2011. nanoscale transistors. Journal of Applied Physics, 58. Martin Frey. Scattering in nanoscale devices. PhD 93(9):5613–5625, 2003. thesis, ETH Zurich, 2010. 46. D Mamaluy, M Sabathil, and P Vogl. Efficient 59. Aniello Esposito, Martin Frey, and Andreas method for the calculation of ballistic quantum Schenk. Quantum transport including nonparabol- transport. Journal of Applied Physics, 93(8):4628– icity and phonon scattering: Application to silicon 4633, 2003. nanowires. Journal of Computational Electronics, 47. Anisur Rahman, Jing Guo, Supriyo Datta, and 8(3-4):336–348, 2009. Mark S Lundstrom. Theory of ballistic nanotran- 60. Alexei Svizhenko and M. P. Anantram. Role of sistors. IEEE Transactions on Electron Devices, scattering in nanotransistors. IEEE Transactions 50(9):1853–1864, 2003. on Electron Devices, 50(6):1459–1466, 2003. 48. Xue Shao and Zhiping Yu. Nanoscale finfet simu- 61. Andreas Wacker. Semiconductor superlattices: A lation: A quasi-3d quantum mechanical model us- model system for nonlinear transport. Physics Re- ing negf. Solid-State Electronics, 49(8):1435–1445, ports, 357(1):1–111, 2002. 2005. 62. R. Kronig. Optical Society of America Review of 49. Ivan Markovsky. Low rank approximation: algo- Scientific Instruments. Journal of the Optical Soci- rithms, implementation, applications. Springer Sci- ety of America, 12(6):459–463, 1925. ence & Business Media, 2011. 63. V´aclav C´ıˇzek.ˇ Discrete Hilbert Transform. 50. Lang Zeng, Yu He, Michael Povolotskyi, XiaoYan IEEE Transactions on Audio and Electroacoustics, Liu, Gerhard Klimeck, and Tillmann Kubis. Low 18(4):340–343, 1970. rank approximation method for efficient green’s 64. James E Fonseca, T Kubis, Michael Povolot- function calculation of dissipative quantum trans- skyi, Bozidar Novakovic, Arvind Ajoy, G Hegde, port. Journal of Applied Physics, 113(21):213707, Hesameddin Ilatikhameneh, Zhengping Jiang, Pari- 2013. jat Sengupta, Y Tan, et al. Efficient and realistic de- 51. Ulrich Hetmaniuk, Dong Ji, Yunqi Zhao, and Man- vice modeling from atomic detail to the nanoscale. jeri P Anantram. A reduced-order method for Journal of Computational Electronics, 12(4):592– coherent transport using greens functions. IEEE 600, 2013. Transactions on Electron Devices, 62(3):736–742, 65. Jean Sellier, Jim Fonseca, Tillmann C Ku- 2015. bis, Michael Povolotskyi, Yu He, Hesameddin 52. Stefan Birner, Christoph Schindler, Peter Greck, Ilatikhameneh, Zhengping Jiang, S Kim, Daniel Matthias Sabathil, and Peter Vogl. Ballistic quan- Mejia, Parijat Sengupta, et al. Nemo5, a parallel, tum transport using the contact block reduction multiscale, multiphysics nanoelectronics modeling (cbr) method. Journal of computational electron- tool. In Proc. SISPAD, pages 1–4, 2012. ics, 8(3):267–286, 2009. 66. Timothy B Boykin, Gerhard Klimeck, and Fabiano 53. Jun Z. Huang, Hesameddin Ilatikhameneh, Michael Oyafuso. Valence band effective-mass expressions Povolotskyi, and Gerhard Klimeck. Robust Mode in the sp 3 d 5 s* empirical tight-binding model Space Approach for Atomistic Modeling of Real- applied to a si and ge parametrization. Physical istically Large Nanowire Transistors. Journal of Review B, 69(11):115201, 2004. Applied Physics, 044303:1–9, 2017. 67. A Afzalian, T Vasen, P Ramvall, T. M. Shen, J. Wu, 54. Sebastian Steiger and Michael Povolotskyi. Nemo5: and M. Passlack. Physics and performances of III-V a parallel multiscale nanoelectronics modeling tool. nanowire broken-gap heterojunction TFETs using Nanotechnology, . . . , 10(6):1464–1474, 2011. an efficient tight-binding mode-space NEGF model enabling million-atom nanowire simulations. Jour- 12 Daniel A. Lemus1 (orcid.org/0000-0002-0759-8848) et al.

nal of Physics Condensed Matter, 30(25), 2018. 68. About blue waters. http://www.ncsa.illinois. edu/enabling/bluewaters, 2017. [Online]. 69. James Charles. Modeling Nonlocality in Quantum Systems. PhD thesis, , 2018.