Parallel FDTD Modelling of Nonlocality in Plasmonics Joshua Baxter, Antonino Cala` Lesina, and Lora Ramunno

Parallel FDTD modelling of nonlocality in plasmonics Joshua Baxter, Antonino Cala` Lesina, and Lora Ramunno

Abstract—As nanofabrication techniques become more precise, expected spread of charge density near plasmonic surfaces as with ever smaller feature sizes, the ability to model nonlocal well as predict the blueshift in plasmon resonance with de- effects in plasmonics becomes increasingly important. While creasing particle size [13], [17], [19]. More recently, the effects nonlocal models based on hydrodynamics have been implemented using various computational electromagnetics techniques, the of electron diffusion were incorporated within a generalized finite-difference time-domain (FDTD) version has remained elu- nonlocal optical response (GNOR) model, correctly predicting sive. Here we present a comprehensive FDTD implementation of a broadened line-shape that was observed experimentally [20]. nonlocal hydrodynamics, including for parallel computing. As a Nonlocal hydrodynamic models have been implemented us- sub-nanometer step size is required to resolve nonlocal effects, a ing several computational electromagnetic methods including parallel implementation makes the computational cost of nonlocal FDTD more affordable. We first validate our algorithms for small the finite element method [21], the discontinuous Galerkin spherical metallic particles, and find that nonlocality smears out time domain method [22], and the boundary element method staircasing artifacts at metal surfaces, increasing the accuracy [23]. A finite-difference time-domain (FDTD) implementation over local models. We find this also for a larger nanostructure would be advantageous due to the wide-spread use of FDTD in with sharp extrusions. The large size of this simulation, where the photonics community, its relative ease of implementation, nonlocal effects are clearly present, highlights the importance and impact of a parallel implementation in FDTD. its broadband capabilities, and its ability to produce time- domain movies. Only two attempts have been made [24], Index Terms—FDTD, plasmonics, nonlocality, hydrodynamic [25], where one is based on an erroneous approximation, plasma model, GNOR, parallel computing neither consider high performance computing and parallel implementations, and neither include electron diffusion. A cor- I.INTRODUCTION rect and comprehensive FDTD implementation has remained ABRICATING objects with nanoscale precision is pos- unreported. F sible due to significant progress in nanofabrication tech- In this paper, we present a parallel FDTD implementation of niques over the last few decades [1]. As a result, plasmonic nonlocal hydrodynamics, including a GNOR implementation. nanostructures and metamaterials [2] are having tremendous A parallel implementation in FDTD is especially important impact in many fields including biosensing [3], quantum for simulating nonlocality, as the grid cell size needs to cryptography [4], nonlinear optics [5], photovoltaics [6], light be smaller than the Fermi wavelength λF ∼ 0.5 nm to emitting devices [7], and precision medicine [8]. capture the spread of electron density [26]. This can require Numerical modelling in plasmonics is typically based on a large amount of memory, and can present a prohibitive optical models for bulk permittivity. These models – such as computational load. High-performance computing represents the Drude model for free electron response, and the Lorentz a viable solution. The FDTD rectangular-meshing scheme, and critical points models for bound electron response [9], where a field update at one location requires fields only in [10] – are based on the local response approximation (LRA), adjacent Yee-cells [27], lends itself well to parallel computing; which assumes that the induced polarization or current at a indeed, message passing interface (MPI)-based FDTD solvers given location depends only on the electromagnetic field at that are well reported for LRA models [28]. Furthermore, high- same location. While appropriate for many applications, the performance computing is becoming more accessible with Drude model is insufficient for modelling plasmonic structures cloud computing, computing consortiums, increased high- smaller than 10 nm [11]–[14], as well as those containing performance computing investments, and new, higher-level sharp features or nanoscale gaps [15], [16]. The charge density parallel programming languages such as Chapel [29]. arXiv:2005.06997v1 [physics.optics] 14 May 2020 near the surface of plasmonic objects is spread over a finite The structure of this paper is as follows. We review nonlocal thickness on the order of the a few angstroms [16], [17], and hydrodynamics in Section II, including the most common thus for small features, cannot be treated as localized at the version without electron diffusion, as well as GNOR, which surface, as is implicit in LRA models. does include electron diffusion. In Section III, we derive The hydrodynamic plasma model treats the conduction band from the nonlocal models FDTD update equations for the electrons as a free electron gas [5], [12], [16]–[18], accounting polarization field via the auxiliary differential equation (ADE) for free electron fluctuations via a pressure term. Unlike the method. In Section IV, we discuss the implementation of the Drude model, it does not make an LRA and thus nonlocality nonlocal FDTD update equations for parallel computing within is incorporated. It has been found to correctly predict the a MPI framework. In Section V, we test our implementations by simulating the optical response of small metallic J. Baxter, A. Cala` Lesina, and L. Ramunno are with the Department of Physics and Center for Research in Photonics, University of Ottawa, Ottawa, nanospheres and comparing our results to analytic solutions Ontaria, Canada e-mail: [email protected]. and experimental results. We find an unexpected benefit of 2

2 2 nonlocal versus LRA modelling: a marked decrease in stair- where β = 1/3vF and vF is the Fermi velocity. The free casing artifacts at the metal boundary. Due to the rectangular electron density fluctuations are accounted for in the last term discretization inherent in most FDTD approaches, fields can of Eq. 5. This is often referred to as the pressure term, and build up in an unphysical manner at plasmonic surfaces, and is responsible for the known blue shift in the surface plasmon this can be particularly problematic in applications that rely on resonance with decreasing particle size [13], [17]. Along with plasmonic near-field enhancement [30]. The incorporation of this formula for the velocity field, we require the continuity nonlocality significantly decreases this unphysical field build equation given by up. In Section VI, we simulate the response from a spherical nanoparticle containing sharp extrusions as a demonstration of ∂n = −∇ · (nv). (6) a large-scale simulation that requires both parallel computing ∂t and nonlocal modelling. We find that despite the nanoparticle From here we consider two approaches. The first is the most being larger, the sharp extrusions exhibit plasmonic features widely used, and assumes a current density given by J = not accessible to the LRA. Finally, in Section VII we give −env. As we use a polarization field formulation in this paper, concluding remarks. ∂PNL we set J = ∂t , where we have defined PNL to be the (nonlocal) free electron polarization field. The velocity field II.HYDRODYNAMICMODELSFORNONLOCALITY 1 ∂PNL is thus v = − ne ∂t , and Eq. 6 becomes The most general response of a linear optical material to incident radiation is described by 1 n = n + ∇ · P , (7) 0 e NL Z where n is the equilibrium free electron density. From Eq. 5, D(r, ω) = ε ε(r, r0, ω)E(r0, ω)dr0, (1) 0 0 we then obtain where the dielectric function of the material, ε(r, r0, ω), is non- 2 local when the displacement field D at one location depends ∂ PNL ∂PNL 2 2 + γ = ε0ωpE + β ∇(∇ · PNL), (8) on the electric field E at other locations. In many applications, ∂t2 ∂t q 2 it is appropriate to use a local response approximation (LRA), e n0 where ωp = . We call Eq. 8 the “nonlocal Drude wherein ε(r, r0, ω) = δ(r − r0)ε(ω). In plasmonic modelling, mε0 model” because it consists of the classical LRA Drude model the interaction of the electric field and the free electron plasma plus one additional term that gives rise to nonlocality (i.e., the is typically described by the Drude model, which employs the term proportional to β2). LRA, where the dielectric function reduces to [9] The model represented by Eq. 8 differs from those used in the two previous nonlocal FDTD works. In Ref. [24], the 2 ωp gradient-divergence term was simplified to a Laplacian, and ε(ω) = 1 − , (2) ω2 + iγω this resulted in spurious, non-physical resonances [18]. In Ref. [23], a current density formulation is used. A benefit to using where ω is the plasma frequency, γ is a collisional damping p a polarization field formulation is that it gives ready access to rate, and i is the imaginary unit. the free electron density via Gauss’s Law, as we demonstrate The hydrodynamic plasma model [17] goes beyond the in Section V. LRA, more accurately describing spatial-temporal free elec- The second approach we consider includes electron dif- tron dynamics via fusion, which was also found to play an important role in the optical response of small metallic particles [20]. This has ∂v e 1 δG[n] + (v · ∇)v = − (E + v × B) − γv − ∇ , (3) been described by the generalized nonlocal optical response ∂t m m δn (GNOR) model, where diffusion is considered by modifying 1 ∂PG where v is the velocity field of the free electron plasma, E the expression for the velocity field via v = − en ( ∂t + and B are the electric and magnetic fields, and n is the free D∇(∇ · PG)), where D is the diffusion coefficient and where electron density. The energy functional G[n] considers the we have denoted the nonlocal GNOR free electron polarization internal kinetic energy of the electron gas and is usually taken field by PG. GNOR correctly predicts both the blueshift in to be the Thomas-Fermi functional, giving the plasmon resonance frequency as well as the broadening of the absorption peak with decreasing particle size. A thorough

2 2 discussion of the nonlocal Drude and GNOR models is given δG[n] h 3 3 = n , (4) in Ref. [17]. δn 2m 8π Using in Eqs. 5 and 6 the GNOR definition for the velocity where h is Plancks constant. As we are only considering the field, we obtain the following time-domain nonlocal-diffusive linear nonlocal response in this paper, we neglect the nonlinear hydrodynamics model for the polarization field, which we terms (magnetic-Lorentz and convection) in Eq. 3 giving hereafter refer to as the “GNOR” model:

∂v e β2 ∂2P ∂P ∂ + γv = − E − ∇n, (5) G +γ G = ε ω2E+η∇(∇·P )+D ∇(∇·P ), (9) ∂t m n ∂t2 ∂t 0 p G ∂t G 3

2 where η = β + Dγ. We start by deriving f4 from the nonlocal Drude model. It is from Eqs. 8 and 9 that we derive in the next section Using central differencing, we discretize Eq. 8 centered at time our FDTD update equations for implementing the two different n∆t, to obtain models of the free electron response of a plasmonic material, one accounting for nonlocality only (Eq. 8), and the other Pn+1 − 2Pn + Pn−1 Pn+1 − Pn−1 nonlocality with diffusion (Eq. 9). NL NL NL + γ NL NL To properly model the optical response of many plasmonic ∆t2 2∆t 2 n 2 n materials, one must also include the contribution of bound = ε0ωpE + β ∇(∇ · PNL). (11) electrons. We employ the LRA-based N-critical points model that assumes a susceptibility of the form Further, using weighted central averaging as discussed in Ref. [30], we set En = (En−1 + 2En + En+1)/4 to obtain χCP (ω) = (ε∞ − 1) N eiφp e−iφp n+1 n n−1 n−1 n n+1 X P = D1PNL + D2P + D3(E + 2E + E ) + ApΩp + , (10) NL NL Ωp − ω − iΓp Ωp + ω + iΓp n p=1 + DNL∇(∇ · PNL), (12) where ε∞ is the infinite frequency permittivity, and where Ap, Ωp, Γp, and φp are fitting parameters. This model can be readily transformed to the time-domain for FDTD implementation, as described in detail in Ref. [31]. 2 D1 = 2 , (13) Dd∆t ! III.UPDATE EQUATIONS FOR NONLOCAL FDTD 1 γ 1 D2 = − 2 , (14) We derive in this section the FDTD update equations for Dd 2∆t ∆t the time-domain nonlocal Drude and GNOR models (Eqs. 8 2 ε0ωp and 9, respectively), using the ADE method. We discretize D3 = , (15) our domain via the Yee cell [27] where the electric fields are 4Dd 2 collocated with the polarization fields in time and space; the β DNL = , (16) Yee cell positions of the electric, magnetic, and polarization Dd ! fields used in this paper are listed in Table I. We denote the γ 1 Dd = + . (17) free electron polarization field by Pf (which signifies either 2∆t ∆t2 PNL or PG), and the bound electron polarization field by PCP . n The FDTD update algorithm at time = n∆t, where ∆t is As the x, y, and z components of ∇(∇ · PNL) must be the time step size, consists of updating (in order) the: collocated with the x, y, and z components of PNL, we have n+1/2 n−1/2 n (1) magnetic field H = f1(H , E ), n+1 n n n n+1/2 (2) electric field E = f2(E , Pf , PCP , H ), i+1/2,j,k i+1/2,j,k n+1 n+1 n ! (3) bound charge polarization PCP = f3(E , PCP ), n ∂ ∂Px ∂Py ∂Pz n+1 n+1 n ∇(∇·PNL) = + + (4) free charge polarization Pf = f4(E , Pf ), ∂x ∂x ∂y ∂z x x via update equations f , f , f , and f , whose form and i+3/2,j,k i+1/2,j,k i−1/2,j,k 1 2 3 4 P − 2P + P required inputs depend on the model from which they are ≈ x x x ∆x2 derived. The equation f1 is the magnetic field update derived P i+1,j+1/2,k − P i+1,j−1/2,k − P i,j+1/2,k + P i,j−1/2,k through discretization of the Maxwell-Faraday equation [32]; + y y y y we do not derive this here because in plasmonic simulations it ∆x∆y is typically unchanged from the vacuum equation. In what P i+1,j,k+1/2 − P i+1,j,k−1/2 − P i,j,k+1/2 + P i,j,k−1/2 + z z z z , follows, we present the electric field update equation f2 ∆x∆z (derived from the Maxwell-Ampere` law), the bound charge (18a) polarization update equation f3 (derived from Eq. 10), and the free charge polarization update f4 (derived from Eq. 8 for i,j+1/2,k ! i,j+1/2,k the nonlocal Drude model, and Eq. 9 for the GNOR model). ∂ ∂P ∂P ∂P ∇(∇·Pn ) = x + y + z Though we present f4 for two nonlocal models, one must NL ∂y ∂x ∂y ∂z chose which to use – they cannot be used simultaneously. y y P i,j+3/2,k − 2P i,j+1/2,k + P i,j−1/2,k ≈ y y y ∆y2 TABLE I i+1/2,j+1,k i−1/2,j+1,k i+1/2,j,k i−1/2,j,k YEE CELLPOSITIONSOFTHEELECTRIC, MAGNETIC, AND POLARIZATION P − P − P + P + x x x x FIELDS ∆x∆y Electric Magnetic Polarization i,j+1,k+1/2 i,j+1,k−1/2 i,j,k+1/2 i,j,k−1/2 Pz − Pz − Pz + Pz Ex,Hx,Px i+1/2, j, k i, j+1/2, k+1/2 i+1/2, j, k + , Ey,Hy,Py i, j+1/2, k i+1/2, j, k+1/2 i, j+1/2, k ∆y∆z Ez,Hz,Pz i, j, k+1/2 i+1/2, j+1/2, k i, j, k+1/2 (18b) 4

where

i,j,k+1/2 ! i,j,k+1/2 ∂ ∂P ∂P ∂P 2 2 ! n x y z 1 2 Ωp + Γp ∇(∇·PNL) = + + C = − (24) ∂z ∂x ∂y ∂z 1p C ∆t2 2 z z p i,j,k+3/2 i,j,k+1/2 i,j,k−1/2 2 2 ! Pz − 2Pz + Pz Ω + Γ ≈ 1 Γp 1 p p 2 C2p = − 2 − (25) ∆z Cp ∆t ∆t 4 i+1/2,j,k+1 i−1/2,j,k+1 i+1/2,j,k i−1/2,j,k Px − Px − Px + Px C ε A Ω sin φ + C = 4p − 0 p p p (26) ∆x∆z 3p 2 2∆tC i,j+1/2,k+1 i,j−1/2,k+1 i,j+1/2,k i,j−1/2,k p Py − Py − Py + Py ε A Ω (Ω cos φ − Γ sin φ ) + , C = 0 p p p p p p (27) ∆y∆z 4p C (18c) p C4p ε0ApΩp sin φp C5p = + (28) 2 2∆tCp 2 2 where, for brevity, the components of PNL are written as Pa Γ 1 Ω + Γ C = p + + p p , (29) for a = x, y, z. p ∆t ∆t2 4 For the GNOR model, the f4 update equation includes Eqs. 12-18 (with β2 replaced by η) along with a suitable discretization for the last term of Eq. 9, the only term that where p denotes the pth critical point. does not appear in the nonlocal Drude model. As this term n+1 cannot be achieved via central differencing, we use an alternate Next we turn to the f2 update equation for E . Ampre’s second order ﬁnite difference scheme given by law is discretized in time to give

∂ 3∇(∇ · Pn ) − 4∇(∇ · Pn−1) + ∇(∇ · Pn−2) ∇(∇·Pn ) ≈ G G G . n+1 n n+1 n N n+1 n G E − E Pf − Pf X (PCP,p − PCP,p) ∂t 2∆t ε ε + + (19) 0 ∞ ∆t ∆t ∆t p=1 = ∇ × Hn+1/2. (30)

The GNOR f4 update then becomes

For the nonlocal Drude model we set Pf = PNL, and plug

n+1 n n−1 n−1 n n+1 Eq. 12 into Eq. 30 to obtain PG = D1PG + D2PG + D3(E + 2E + E ) n + DG1∇(∇ · PG) n n−1 n−2 + DG2(3∇(∇ · PG) − 4∇(∇ · PG ) + ∇(∇ · PG )), (20) N ! n+1 X E ε0ε∞ + D3 + C3p p=1 where N ! n+1/2 X n = ∆t(∇ × H ) + ε0ε∞ − 2D3 + C4p E η p=1 DG1 = , (21) Dd N ! D X n−1 n − D3 + C5p E − (D1 − 1)P DG2 = . (22) NL 2∆tDd p=1 N X n n−1 − (C1p − 1)PCP,p − D2PNL p=1 The f3 update equation for the bound charge density is derived from the critical points model (Eq. 10) in Ref. [31]. N X n−1 n We state it here: − C2pPCP,p − DNL∇(∇ · PNL). p=1 (31)

n+1 n n−1 n+1 n n−1 PCP,p = C1pPCP,p+C2pPCP,p+C3pE +C4pE +C5pE (23) For the GNOR model we set Pf = PG, and plug Eq. 20 into 5

Eq. 30 to obtain TABLE II SUBDOMAIN UPDATE SCHEME FOR DIFFERENT FIELD COMPONENTS.

N ! Component x dimension y dimension z dimension n+1 X Ex and Px 0 → Nx − 1 1 → Ny 1 → Nz E ε0ε∞ + D3 + C3p Ey and Py 1 → Nx 0 → Ny − 1 1 → Nz p=1 Ez and Pz 1 → Nx 1 → Ny 0 → Nz − 1 N ! Hx 1 → Nx 0 → Ny − 1 0 → Nz − 1 n+1/2 X n = ∆t(∇ × H ) + ε0ε∞ − 2D3 + C4p E Hy 0 → Nx − 1 1 → Ny 0 → Nz − 1 p=1 Hz 0 → Nx − 1 0 → Ny − 1 1 → Nz N ! X − D + C En−1 − (D − 1)Pn 3 5p 1 G � and � field updates � field updates p=1 � , � �! ! ! �! �! N �", �" �" X n n−1 − (C1p − 1)PCP,p − D2PG p=1

N � X n−1 n − C2pPCP,p − DG1∇(∇ · PG) � p=1 0 0 n n−1 n−2 0 � 0 � − DG2(3∇(∇ · PG) − 4∇(∇ · PG ) + ∇(∇ · PG )). " " (32)

n−1 Fig. 1. Two dimensional representation of the field update regions within For the GNOR update, to avoid recalculation of ∇ ∇ · PG , and ∇ ∇ · Pn−2, we store them in arrays; therefore GNOR a subdomain for the electric/polarization fields (left) and the magnetic fields G (right). Different colours are chosen to represent the different field compo- requires additional memory. nents. The hatched regions indicate where both field components are updated. The update equations derived above must be implemented in all cells in which the plasmonic material exists. However, care needs to be taken if the nonlocal material extends to the bound- according to a local grid with (Nx + 1) × (Ny + 1) × (Nz + 1) ary of the simulation domain. Terminating a nonlocal material points. The vector (i, j, k) identifies an individual grid cell with a perfectly matched layer (PML) may result in instability within the local grid, where i ranges from 0 to Nx, j from and convergence issues. Within a total field/scattered field 0 to Ny, and k from 0 to Nz, inclusively. The electric and (TF/SF) framework – which is applicable to many plasmonic magnetic field components at (i, j, k) correspond to different nanostructure scattering problems – this can be overcome by locations in physical space within the grid cell, according to using the nonlocal model only in the total field region. If the Yee cell shown in Table I. For example, Ex(i, j, k) refers TF/SF cannot be used, such as for geometries that include to Ex(i∆x + ∆x/2, j∆y, k∆z), whereas Ey(i, j, k) refers to a plasmonic substrate, one may use the LRA-Drude model in Ey(i∆x, j∆y + ∆y/2, k∆z). the PML region, and the nonlocal models everywhere else. The use of domain decomposition requires that data from Finally, we would like to highlight that since the polariza- the boundaries of subdomains be transferred to other subdo- tion fields are nonlocal, an additional boundary condition is mains at each time step. For subsequent updates to be executed required at the interface between the plasmonic and external efficiently, an overlap of information is required between media. In our FDTD implementation, we impose the Pekar adjacent subdomains. For example, the cells (Nx, j, k) in sub- additional boundary condition [33] by setting Pf = 0 outside domain (mx, my, mz) represents the same physical locations of the plasmonic structure (that is, by not updating Pf outside as the cells (0, j, k) in subdomain (mx + 1, my, mz). The the structure). update scheme we describe below guarantees that the update equations for a given field component at a given physical IV. PARALLEL IMPLEMENTATION OF NONLOCAL FDTD location are only applied once. In this section we describe a parallel FDTD scheme using The components of the magnetic and electric fields in a the message passing interface (MPI) framework. Nonlocal subdomain are updated via f1 and f2, respectively, however simulations require a lot of memory since the step-size ∆x each is updated for different ranges of indices (i, j, k) as needs to be smaller than the Fermi wavelength. In turn, summarized in Table II. The notation 0 → Nd means we significant computation time is required because the Courant update from index 0 to index Nd, inclusively. Bound and Friedrichs Lewy condition restricts the time-step ∆t according free charge polarization fields are updated via f3 and f4, to ∆x. Parallel computing thus becomes essential. respectively, for the same ranges of indices as for the electric The simulation domain is decomposed into nx × ny × nz field, as listed in Table II. To understand this visually, the field MPI processes where nd is the number of MPI processes in the update regions for the x and y components of all fields within d direction, with d = x, y, z. Each MPI process is identified by a single subdomain are illustrated in Fig. 1. a vector (mx, my, mz) which gives its relative spatial position While the components of the magnetic, electric, and bound within the simulation domain, where 0 ≤ md < nd. charge polarization fields in a subdomain all require (Nx + Each process performs field updates within its own sub- 1) × (Ny + 1) × (Nz + 1) values to be stored, the free charge domain (i.e., its section of the simulation domain) defined polarization fields requires (Nx +2)×(Ny +2)×(Nz +2). This 6

Setup Simulation Process Process (푚푥, 푚푦 + 1) (푚푥 + 1, 푚푦 + 1)

퐸 , 퐸 for x,y,z in Sub Domain: 푁푦 푁푦 푥 푧 if Plasmonic Area: 퐻 , 퐻 푛 Update 푯푛+1/2 푥 푧 Calculate ∇ ∇ ∙ 푷푓 푛+1 Calculate 푬 퐸푦, 퐸푧 Process Process Update CPMLs 푛+1/2 푛+1 푛+1 Exchange 푯 (푚푥, 푚푦) (푚푥 + 1, 푚푦) Calculate 푷푓 and 푷퐶푃 퐻푦, 퐻푧 푛−1 푛 Update 푷푓 = 푷푓 else: Update 푬푛+1 and 푷푛+1 Calculate 푬푛+1 Calculate CPMLs 0 0 푛+1 푛+1 Exchange 푬 and 푷푓 0 푁푥 0 푁푥 푛 푛+1 Process 푦 Process 푷푓 = 푷푓 (푚푥, 푚푦 − 1) (푚푥 + 1, 푚푦 − 1) Update previous iteration fields if GNOR: 푛−2 푛−1 푥 ∇ ∇ ∙ 푷퐺 = ∇ ∇ ∙ 푷퐺 푛−1 푛 ∇ ∇ ∙ 푷퐺 = ∇ ∇ ∙ 푷퐺 퐸 , 퐸 if 푡 < 푡푚푎푥 푦 푧 Yes No Postprocessing 퐸푦 퐸푧

퐻푦 Fig. 2. FDTD algorithm for the nonlocal Drude and GNOR models in an 퐻푧 MPI framework.

퐻푦, 퐻푧 Process 푚푥 Process 푚푥+1 is because the free charge polarization field updates via Eq. 18 requires information from additional cells. For example, to update Px,NL(0, j, k) via Eq. 18 (a), we need Px,NL(−1, j, k); Fig. 3. Two dimensional representation of the magnetic and electric field in general, we need an extra cell in each dimension to store the data transfers between adjacent subdomains. The different colours represent the different field components as indicated. The inset illustrates all field −1 index. However, we do not calculate updates at this index, component transfers along x, and their relative positions in the Yee cell. as (−1, j, k) in subdomain (mx, my, mz) obtain their values from a transfer of data from cells (Nx − 1, j, k) in subdomain (mx − 1, my, mz). field in the x and y directions are also illustrated in the top We now discuss the flow of the FDTD algorithm for the both part of Fig. 3, with further detail given in the inset. Unlike the the nonlocal Drude and GNOR models, including what data magnetic field data transfers, those for the electric field are needs to be communicated, and when. This is summarized in made in the “forward” direction. For example, Ey(Nx, j, k) Fig. 2, and we go through each step in detail in the following. updated in subdomain (mx, my, mz), is passed to Ey(0, j, k) The first update to execute after the setup of the simulation in subdomain (mx + 1, my, mz). Note that only the E and is that of the magnetic field via f1. Once this has been H components tangential to the direction of the data transfer completed in each subdomain, a subset of the magnetic field need to be exchanged. values at the subdomain boundaries must be communicated In general, FDTD updates for LRA polarization models only to adjacent subdomains. The necessary communications for require the collocated electric and polarization field values so Hx are given in the first row of Table III; those for the that the polarization field values need not be communicated other components of H can be obtained from this table by to other subdomains. Thus no communication is necessary for an ordered permutation of x → y → z. The inter-subdomain the bound charge polarization field PCP . This is not true for communications in the x and y directions are illustrated in the the nonlocal models. The necessary communications for the x top part of Fig. 3. Note that the data transfers of the magnetic component of Pf are given in the third row of Table III and fields are all made in the “backward” direction. The inset visualized in the top part Fig. 4. The communication for the y in Fig. 3 details all data transfers along x and their relative and z field components are obtained from Table III again via Yee cell positions. For example, Hy(0, j, k) updated locally an ordered permutation of x → y → z. The inset of Fig. 4 in subdomain (mx + 1, my, mz) is passed to Hy(Nx, j, k) in shows the data transfers along x for all field components. Note subdomain (mx, my, mz). that there are now four data transfers for every Pf component After the magnetic field data transfer, the electric field is – three “forward” and one “backward” – due to the increased updated in each subdomain via f2, after which the bound and data required for updates via Eq. 18. Thus, the inclusion of free charge polarization fields are updated in each subdomain nonlocality doubles the number of communications required via f3 and f4, respectively. Subsequently, a subset of their at each time-step. This has an effect on performance and values at the subdomain boundaries need to be communicated scalability and is discussed further in Section V. to adjacent subdomains. The necessary communications for It is worth noting that MPI is not the only solution for Ex are given in the second row of Table III; again, those for parallel computing as new higher-level languages are being the other components are obtained by an ordered permutation introduced for this purpose. Chapel, a language produced by of x → y → z. The inter-subdomain communications electric Cray [29], allows for algorithm implemenation on a distributed 7

Process Process (푚푥, 푚푦 + 1) (푚푥 + 1, 푚푦 + 1) 10 nm (LRA Mie) 10 nm (Nonlocal Mie) 푃 푃 10 nm (Nonlocal Drude FDTD) 푥 푥 1.5 6 nm (LRA Mie) 6 nm (Nonlocal Mie) 푃푥 send 푁푦 푁푦 6 nm (Nonlocal Drude FDTD) 4 nm (LRA Mie) 푃푥 receive 4 nm (Nonlocal Mie) 1.0 4 nm (Nonlocal Drude FDTD) 푃푦, 푃푧 Process Process (푚 , 푚 ) 푃푥 (푚 + 1, 푚 ) 푥 푦 푥 푦 0.5 Absorption Efficiency

0 0 0.0 a) −1 0 푃푥 푁푥 푦 −1 0 푃푥 푁푥 10 nm (LRA Mie) Process Process 10 nm (GNOR Mie) (푚푥, 푚푦 − 1) (푚푥, +1, 푚푦 − 1) 10 nm (GNOR FDTD) 1.5 6 nm (LRA Mie) 푥 6 nm (GNOR Mie) 6 nm (GNOR FDTD) 4 nm (LRA Mie) 푃 4 nm (GNOR Mie) 푥 푃푦, 푃푧 1.0 4 nm (GNOR FDTD)

0.5 푃푥 Absorption Efficiency 푃푦

푃푧 0.0 b) 300 320 340 360 380 400 Wavelength (nm)

Process 푃푥 Process 푚 푚 +1 푥 푥 Fig. 5. Absorption cross sections for silver spheres of diameter 4 nm (blue), 6 nm (green) and 10 nm (red) using a) nonlocal Drude FDTD (filled circles), and b) GNOR FDTD (filled circles). The solid lines represent the nonlocal Fig. 4. Two dimensional representation of the data transfers between adjacent Mie theory solutions a) without diffusion and b) with diffusion. The dashed subdomains for the x component of the free charge polarization field. The lines in both plots represent the LRA-Mie theory solutions. j = −1 row in the y direction is not shown or required for Px, but would be required for Py. The blue shade represents regions where Px is locally calculated and sent to adjacent processes, and the purple represents regions where P is received from adjacent processes. The inset illustrates all x V. NONLOCAL FDTD APPLIEDTOSMALLSPHERES free charge polarization field component transfers along x, and their relative positions in the Yee cell. In this section, we test and validate our FDTD implementations of the nonlocal models by using them to simulate the system without the challenges of MPI. For example, Ref. optical response of small silver nanospheres. We compare [34] presents a finite difference implementation of Poissons our results to (1) analytic solutions based on Mie theory, equation in Chapel. One may also use a shared memory (2) LRA FDTD plasmonic simulations that employ the LRA implementation where the memory is shared amongst the Drude model for free-electron response, and (3) experimental processes (or threads) and therefore no data need be com- results from the literature. Not only do our nonlocal FDTD municated. This can be implemented via OpenMP [35] for simulations agree well with analytic and experimental results, multi-threading on CPUs or via CUDA [36] or OpenCL [37] we also find an unexpected benefit over the LRA approach: a on graphics processing units (GPUs). Indeed, GPU-FDTD pronounced reduction of staircasing artifacts. implementations are well reported in literature [38]. Since In Fig. 5, we plot the absorption efficiencies for silver GPUs can launch thousands of parallel threads that all have nanospheres in vacuum for three different diameters – 4 nm access to shared memory, a GPU-based implementation of (blue), 6 nm (green) and 10 nm (red) – and different free nonlocal FDTD requires no special treatment beyond what was charge polarization models. To model silver, we use the fitting presented in Section III. While GPUs do suffer from memory parameters reported in Ref. [39] for ωp and γ in Eqs. 8 and constraints, they can be still be useful for smaller nonlocal 9, and for all critical points model parameters in Eq. 10. We 2 2 6 plasmonic simulations. set β = 1/3vF [17] where vF = 1.39 · 10 m/s [40]. The FDTD domain is a 200 cell × 200 cell × 200 cell box TABLE III truncated by a convolutional perfectly matched layer (CPML) DATA TRANSFER PROTOCOL FOR THE x-COMPONENT OF THE FIELDS. [32] consisting of 20 additional cells at each boundary. We use a uniform step-size of ∆x = ∆y = ∆z = D/100, where D Hx Transfer y = 0 plane backward to adjacent y = Ny plane Transfer z = 0 plane backward to adjacent z = Nz plane is the diameter of the sphere. For the particle sizes of interest Ex Transfer y = Ny plane forward to adjacent y = 0 plane here, this guarantees both that the step size is less than the Transfer z = Nz plane forward to adjacent z = 0 plane Fermi wavelength, and that the spherical shape is sufficiently Pf,x Transfer x = 0 plane backward to adjacent x = Nx plane Transfer x = Nx − 1 plane forward to adjacent x = −1 plane resolved. The total number of iterations vary with step-size Transfer y = Ny plane forward to adjacent y = 0 plane and therefore particle size. For a 10 nm diameter particle, Transfer z = Nz plane forward to adjacent z = 0 plane 3 · 105 iterations are used to reach convergence; this is scaled appropriately for the other particle sizes. 8

In Fig. 5 (a), we compare our nonlocal Drude FDTD 8 7 calculations (ﬁlled circles) with classical (LRA) Mie the- 6

ory solutions (dashed lines), and with nonlocal Mie theory ) 6 ) m

n 5 solutions (solid lines). Nonlocal Mie theory [41]–[43] is a m ( / 4 s V i 4 ( modiﬁed version of Mie theory [44] that allows for longi- x

| a

3 E tudinal modes, which permits a non-zero free charge density | within the nanoparticle. It has been successfully validated with z 2 2 experimental results for small particles larger than several 1 nanometers. For example, a quasistatic version predicted ab- 0 0 a) 0 2 4 6 b) 0 2 4 6 sorption peaks that agree quantitatively with experiment for x axis (nm) x axis (nm) particle diameters down to 10 nm with a further qualitative agreement down to 2 nm [19]. Fig. 6. Electric ﬁeld amplitude distribution of a 4 nm diameter silver sphere We ﬁnd in Fig. 5 (a) excellent agreement between nonlocal in vacuum at λ = 343 nm using a) LRA FDTD and b) nonlocal Drude FDTD. Drude FDTD and nonlocal Mie theory, with less than 2% mean error for all three particle sizes. Doubling the FDTD 2.0 5 ) step-size increases the mean error for the 10 nm particle case m 1.5 n (

s 4 to 6.4%. Halving the step-size decreases it to 1.3 %. Thus i 1.0 x reasonable convergence is reached with reasonable simulation a 0.5 z domain sizes. As expected, we see an increasing blue-shift 3 ) m 0.0 /

a) V in the absorption peak with decreasing nanoparticle size with (

2.0 | respect to the LRA Mie theory solution. E |

) 2

In Fig. 5 (b) we compare our GNOR-FDTD calculations m 1.5 n (

s (filled circles) with LRA Mie theory solutions (dashed lines), i 1.0 x 1 and nonlocal diffusive Mie theory solutions (solid lines), a 0.5 where diffusion is accounted for by substituting β2 with z 2 0.0 0 β +Dγ −iDω in nonlocal Mie theory [17]. The free electron b) 0 1 2 3 4 5 6 diffusion coefficient in silver is taken as D = 3.61 · 10−4 m2/s x axis (nm) [45]. We again see excellent agreement, with less than 1.6% Fig. 7. Close-up of the electric field amplitude distribution of a 4 nm diameter mean error in GNOR-FDTD relative to nonlocal diffusive Mie silver sphere in vacuum at λ = 425 nm. Staircasing artifacts are much more theory for all three particle sizes. As expected the resonance prominent in a) for LRA FDTD than in b) for nonlocal Drude FDTD. positions predicted by the nonlocal Drude FDTD in Fig. 5 (a) and GNOR FDTD in Fig. 5 (b) are the same, with an increased line width for GNOR FDTD. boundary. The staircasing-induced boundary artifacts seen for We now turn to examining the near field and free electron LRA FDTD are clearly reduced with nonlocal Drude FDTD. density distributions produced by nonlocal Drude FDTD, com- Artifacts at the particle boundary are especially problematic paring to those produced by LRA FDTD. In Fig. 6 we show for calculations that involve fields just outside the particle, the electric field amplitude distribution for the simulations of such as, for example, determining the plasmonic near field the 4 nm diameter silver nanoparticle produced by (a) LRA enhancement of fluorescence, or engineering the spontaneous FDTD and (b) nonlocal Drude FDTD, for the wavelength cor- emission lifetime of fluorophosphores [46]. This could also be responding to the peak of the (nonlocal) absorption spectrum important for calculations of plasmonics-enhanced nonlinear (λ = 343 nm). Shown are cuts in the xz plane through the optics, where enhanced near fields close to plasmonic bound- centre of the particle, where the incident plane wave is z- aries can be harnessed to not only enhance nonlinear optical polarized and propagates along the y-axis. The field amplitudes processes by orders of magnitude, but also shape nonlinear are normalized, corresponding to an input field of 1 V/m. For optical fields [47]–[49]. With nonlocal FDTD, the reduction of higher quality images, we used a halved step size of ∆x = D/ staircasing artifacts would result in more reliable calculations. 200. LRA FDTD produces a constant electric field within the In Fig. 8 we plot the free electron density distribution sphere, as expected, while nonlocal Drude FDTD produces a corresponding to the 4 nm silver particle simulations of Fig. field gradient due to the non-zero free charge distribution. 6. Shown are cuts in the xz plane through the center of the Unexpected in Fig. 6 is the significant difference in the sphere for λ = 343 nm, as produced by (a) LRA FDTD appearance of the fields at the particle boundary. While the and (b) nonlocal Drude FDTD. As expected, nonlocal Drude effects of staircasing in LRA FDTD are clearly visible at the FDTD allows for the spread of the charge distribution near edges of the sphere in Fig. 6 (a), they appear smoothed out the particle boundary, which we see is on the order of the for nonlocal Drude FDTD in Fig. 6 (b). This is even more Fermi wavelength for silver, λF = 0.5 nm. In contrast, for evident in Fig. 7 which shows the normalized electric field LRA FDTD the charge is bound to the surface. amplitude distribution produced by a) LRA FDTD and b) We present in the Supplementary Information (SI) time- nonlocal Drude FDTD at λ = 425 nm, a wavelength where the domain movies for the 4 nm silver sphere simulations cor- absorption efficiency and near fields for both approaches are responding to Figs. 6 - 8 where the incident plane wave almost identical, except for dramatic differences at the sphere pulse function is a normalized raised cosine f(t) = [(1 − 9

1.4

6 1.2 ) ) 1.0 3 m n m ( ( 9

4 0.8 1 s 0 i 1

| a 0.6 e e r f N | z 2 0.4

0.2

0 0.0 a) 0 2 4 6 0 b) 2 4 6 x axis (nm) x axis (nm)

Fig. 8. Free electron density within a 4 nm diameter silver sphere in vacuum at λ = 343 nm using a) LRA FDTD and b) nonlocal Drude FDTD.

3 16 cos(ωmaxt))/2] , where ωmax = 1.26 · 10 rad/s is the maximum frequency of interest (corresponding to λ = 150 Fig. 9. Contour plot of the absorption cross section of silver spheres embedded in a n = 1.3 dielectric background as a function of incident plane wave photon nm). Included in the SI are movies of the electric ﬁeld energy (horizontal axis) and sphere diameter (vertical axis). amplitude dynamics for nonlocal Drude FDTD in the xz and yz planes, movie1, and movie2, respectively, where both planes cut through the center of the particle. The comparable movies 1e5 for LRA FDTD are movie3 and movie4. The nonlocal FDTD LRA FDTD 4.0 movies show a radially propagating wave inside the sphere Nonlocal Drude FDTD whereas the LRA FDTD movies do not; in the latter, the 3.5 ﬁeld is almost always constant across the sphere (as expected). 3.0 Further, while staircasing artifacts in the LRA FDTD movies 2.5 are quite pronounced, they are hardly visible in the nonlocal 2.0

Drude FDTD movies. The corresponding free electron density 1.5 Iterations/Hour movie for nonlocal Drude FDTD is movie5. Since the free 1.0 electron density does not need to be tracked within the LRA 0.5 FDTD implementation, and all charge is strictly localized to 0.0 the surface, we do not include a time-domain movie for this 0 100 200 300 400 500 case. Number of Processes We now turn to further validate our FDTD implementations by comparing our simulated results to those of experiment. Fig. 10. Scalability of the nonlocal Drude FDTD (blue line) compared to that of LRA FDTD (red line). Plotted is the number of FDTD time iterations In Ref. [13], electron energy loss spectroscopy measurements computed per hour versus the number of processes used. are presented for silver nanosphere diameters ranging from 2 to 24 nm on a carbon ﬁlm and compared to Mie theory calculations that use size-dependent permittivities derived by Compute Canada [50]. Recall that the workload is heavier quantum mechanically. Using GNOR FDTD, we calculate the for the smallest particle since our step size ∆x (and thus time absorption spectra of silver nanospheres in a n = 1.3 dielectric step size) is proportional to sphere diameter, and thus more background (as used in the calculations of Ref. [13]) with iterations in time are required to reach convergence for the diameters ranging from 2 to 24 nm, and plot these in Fig. 9. smallest particles. We ﬁnd the same trend as presented in Ref. [13], showing In general, when running simulations on a large number of that GNOR FDTD is consistent with experimental measure- cores, as we do for this paper, it is imperative to investigate ments and calculations using quantum-based permittivities, the implementation’s scalability, that is, the performance en- with quantitative agreement down to 10 nm diameter, and hancement obtained by increasing the number of CPUs. In the qualitative agreement to 2 nm. As discussed in Ref. [17], for ideal case, the scaling is linear, meaning when one doubles the diameters less than 10 nm, the nonlocal model predicts a res- number of processes, the computation time is halved; this is onance shift that is not as large as determined by experimental often not achieved due to overhead, including inter-processor measurements, which is consistent with our results. This may communications. be due to more complicated phenomena occurring in silver, In Fig. 10, we plot the number of FDTD time iterations such as inhomogeneous equilibrium electron density, or spill- calculated per hour as a function of the number of processes out effects, that are not included in the GNOR model we have used in the nonlocal Drude FDTD (blue line) and LRA implemented. FDTD (red line) simulations. The simulation set up is the Finally, we turn to a discussion of computational resources same as for Fig. 5, where we vary the number of processes and scalability. To give an example, for the 10 nm, 6 nm and while keeping the total number of cells constant. For ≤ 128 4 nm sphere simulations presented in Fig. 5, we used 64, 128, processes, the scalability of nonlocal Drude FDTD is nearly and 256 cores, respectively, on the Graham cluster operated linear and comparable to LRA FDTD. While the scalability 10

2.00 퐸푖푛푐 LRA FDTD Nonlocal Drude FDTD 1.75

1.50

1.25 40 nm 5 nm 1.00

0.75

푧 Absorption Efficiency 0.50

푥 0.25

0.00 xz Fig. 11. Cross section in the plane of the star-shaped silver nanoparticle, 200 300 400 500 600 700 consisting of a sphere with 40 nm diameter and triangular nanoprism extru- Wavelength (nm) sions extending 5 nm from the sphere surface. The incident plane wave used to illuminate this structure is z-polarized and propagates along the y axis. Fig. 12. Absorption efﬁciency of the silver star-shaped nanoparticle calculated using LRA FDTD (red dots) and nonlocal Drude FDTD (blue dots); the lines of LRA FDTD remains mostly linear for a larger number are interpolations of the FDTD data. of processes, that of nonlocal Drude FDTD does not, likely due to the doubled inter-process communication required, as 1.6 50 discussed in Section IV. As one increases the number of 1.4

) 40 1.2 ) processes while maintaining the same domain size, the subdo- m n m (

1.0 /

s main surface-to-volume ratio increases, and therefore the inter- 30 V i (

x 0.8

| process communication time eventually becomes comparable a E

20 0.6 | to the computation time within a time step. Thus, if we were z 0.4 10 to consider a larger domain size, we expect the near-linear 0.2 scalability to extend to larger process numbers. 0 0.0 a) 0 20 40 b) 0 20 40 x axis (nm) x axis (nm) VI.NONLOCAL FDTD APPLIEDTOLARGER, COMPLEX NANOPARTICLES Fig. 13. Electric field amplitude distribution of a star-shaped nanoparticle in vacuum at λ = 320 nm using a) LRA FDTD and b) nonlocal Drude FDTD. In this section, we demonstrate the benefit of our parallel FDTD implementation even further, by considering a much larger plasmonic structure containing small, sharp features. effects should be negligible for particles larger than 20 nm. Such a structure would still be expected to exhibit nonlocal The peaks near 575 nm, however, correspond to the resonance effects, and thus simulating it via FDTD would still require a of the nanoprisms, specifically the ones aligned parallel to small grid cell size, small enough to resolve electron dynamics the incident field polarization. Here we do see that nonlocal within the sharp features. We consider as an example a effects become important, as the peak wavelength predicted structure inspired by the star-shaped nanoparticles synthesized by nonlocal Drude FDTD is 10 nm blue shifted from that for Ref. [51], which are spherical particles around 50 nm in predicted by LRA FDTD. diameter with small 5 nm extrusions from their surfaces that Additional differences between the two models manifest in come to a sharp tip. the field amplitude distributions for wavelengths below the The structure we simulate is a silver nanosphere with interband transition wavelength λIB ≈ 330 nm. The electric triangular nanoprisms extruding from the equator in the xz field amplitude distributions at λ = 320 nm are shown in plane, perpendicular to the incident plane wave propagation Fig. 13 for (a) LRA FDTD and (b) nonlocal Drude FDTD. axis, as illustrated in Fig. 11. The sphere is 40 nm in diameter Standing waves inside the vertical triangles, with a wavevector and the nanoprisms are embedded into the sphere so that the in the z-direction, are visible in Fig. 13 (b) for nonlocal Drude effective prism length is 5 nm. The tips of the nanoprisms are FDTD. This is a longitudinal mode [18], and is expected since rounded with a radius of 1 nm. The FDTD step-size is uniform its wavelength lies below the epsilon-near-zero wavelength, with ∆x = 0.1 nm and the domain size is 600 × 600 × 600 which in silver is approximately equal to λIB. In the LRA, Yee cells (not including CPMLs). The simulations are run for these modes cannot be excited by an incident transverse wave, 4 · 105 iterations over 1000 processes (10 × 10 × 10). and we see in Fig. 13 (a) that they are not. However, with The absorption spectrum of this particle is shown in Fig. nonlocality, such excitation can occur [18] due to the Pekar 12, calculated using both LRA FDTD (red) and nonlocal additional material boundary conditions, discussed in Section Drude FDTD (blue). There are two resonances attributed to III. the star-shaped particle. The peaks near 355 nm corresponds These standing waves become more pronounced for isolated to the plasmonic resonance of the sphere, and these completely nano-triangles as shown in Fig. 14 (b) where nonlocal Drude overlap for the two models. This is expected, as nonlocal FDTD was used, whereas they are absent in Fig. 14 (a), where 11

20 16 1.6 )

25 m 15 n ( 1.4 14 s i 10 x

a 12 20 1.2 5 z 10 ) ) m /

) 0 m 1.0 a) V n ( m

( /

15 8 20 | s V i ( E

| x ) 0.8 | a 6 E m 15 | n ( z

10 0.6 s i 10 4 x a 0.4 5 2 z 5 0 0 0.2 b) 0 10 20 30 40 50 60 x axis (nm) 0 0.0 0 5 10 0 5 10 a) x axis (nm) b) x axis (nm) Fig. 15. Close up of the electric field distribution of the star-shaped particle at λ = 475 nm. The staircasing artifacts are much more prominent in a) for LRA FDTD than in b) for nonlocal Drude FDTD. Fig. 14. Electric field amplitude distribution of isolated nanotriangles in vacuum at λ = 320 nm using a) LRA FDTD and b) the nonlocal Drude FDTD. the local response approximation (LRA). This suggests that beyond its importance for modelling nonlocality, nonlocal LRA FDTD was used. The standing waves found for nonlocal FDTD might be advantageous for calculations that require FDTD are damped by interband transitions for wavelengths precise values for the fields at plasmonic boundaries. This below 300 nm. includes, for example, determining plasmonic fluorescence We now turn to examining the effect of staircasing for enhancement, plasmonics-mediated fluorescent lifetime engi- the star-shaped nanoparticle. We show in Fig. 15 a zoomed- neering, and plasmonics enhanced nonlinear optics. in view of the electric field amplitude distribution at λ = 475 nm, a wavelength where there is more contrast at the ACKNOWLEDGMENT particle boundary than for the wavelength considered in Fig. 13. Despite a very fine mesh of ∆x = 0.1 nm, one can The authors would like to thank Compute Canada and Scinet clearly see staircasing artifacts in the electric field amplitude for computational resources, SOSCIP for their computational distribution for LRA FDTD in Fig. 13 (a). These are notably resources and financial support, and the Natural Sciences and reduced in Fig. 13 (b), where nonlocal Drude FDTD was used. Engineering Research Council of Canada and the Canada This reduction in staircasing artifacts is further illustrated by Research Chairs program for financial support. the time-domain movies of the star-shaped nanoparticle simulations presented in the SI. These depict the time evolution REFERENCES of the electric field amplitude and free electron density in the [1] V.-C. Su, C. H. Chu, G. Sun, and D. P. Tsai, “Advances in optical xz and yz planes through the center of the particle, where the metasurfaces: fabrication and applications [Invited],” Optics Express, incident plane wave is polarized in z and propagates along y. vol. 26, no. 10, pp. 13 148–13 182, May 2018. [Online]. Available: The electric field evolution movies for nonlocal Drude FDTD https://www.osapublishing.org/oe/abstract.cfm?uri=oe-26-10-13148 [2] J. A. Schuller, E. S. Barnard, W. Cai, Y. C. Jun, J. S. White, and are movie6 and movie7, while those for the corresponding M. L. Brongersma, “Plasmonics for extreme light concentration and free electron density evolution are movie8 and movie9. The manipulation,” Nature Materials, vol. 9, no. 3, pp. 193–204, Mar. 2010. electric field evolution movies for LRA FDTD are movie10 [Online]. Available: https://www.nature.com/articles/nmat2630 [3] J. R. Meja-Salazar and O. N. Oliveira, “Plasmonic Biosensing,” and movie11. Chemical Reviews, vol. 118, no. 20, pp. 10 617–10 625, Oct. 2018. [Online]. Available: https://doi.org/10.1021/acs.chemrev.8b00359 [4] S. I. Bozhevolnyi and N. A. Mortensen, “Plasmonics for emerging VII.CONCLUSION quantum technologies,” Nanophotonics, vol. 6, no. 5, pp. 1185–1188, We have introduced a parallel FDTD implementation for 2017. [Online]. Available: https://www.degruyter.com/view/j/nanoph. 2017.6.issue-5/nanoph-2016-0179/nanoph-2016-0179.xml modelling nonlocality in plasmonics. We used an auxiliary [5] M. Kauranen and A. V. Zayats, “Nonlinear plasmonics,” Nature differential equation approach to model nonlocality with and Photonics, vol. 6, no. 11, pp. 737–748, Nov. 2012. [Online]. Available: without electron diffusion, and described in detail how to use http://www.nature.com/articles/nphoton.2012.244 [6] F. Enrichi, A. Quandt, and G. C. Righini, “Plasmonic enhanced solar the message passing interface framework for parallel compu- cells: Summary of possible strategies and recent results,” Renewable tation via domain decomposition. After validating our imple- and Sustainable Energy Reviews, vol. 82, pp. 2433–2439, Feb. mentation via comparisons with analytical and experimental 2018. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S136403211731239X results for small nanospheres, we demonstrated the robustness [7] N.-Y. Kim, S.-H. Hong, J.-W. Kang, N. Myoung, S.-Y. Yim, S. Jung, of our parallel implementation for larger particles with sharp K. Lee, C. W. Tu, and S.-J. Park, “Localized surface plasmon-enhanced nanoscale features. We find that the inclusion of nonlocality green quantum dot light-emitting diodes using gold nanoparticles,” RSC Advances, vol. 5, no. 25, pp. 19 624–19 629, Feb. 2015. within FDTD significantly reduces staircasing artifacts that [Online]. Available: https://pubs.rsc.org/en/content/articlelanding/2015/ can plague standard plasmonic FDTD modelling based on ra/c4ra15585h 12

[8] M. Sharifi, F. Attar, A. A. Saboury, K. Akhtari, N. Hooshmand, Scales,” IEEE Transactions on Antennas and Propagation, vol. 66, A. Hasan, M. A. El-Sayed, and M. Falahati, “Plasmonic gold no. 9, pp. 4759–4771, Sep. 2018. [Online]. Available: https: nanoparticles: Optical manipulation, imaging, drug delivery and //ieeexplore.ieee.org/document/8399847/ therapy,” Journal of Controlled Release, vol. 311-312, pp. 170–189, [24] J. M. McMahon, Topics in Theoretical and Computational Nanoscience: Oct. 2019. [Online]. Available: http://www.sciencedirect.com/science/ From Controlling Light at the Nanoscale to Calculating Quantum article/pii/S0168365919305164 Effects with Classical Electrodynamics, ser. Springer Theses. New [9] S. A. Maier, Plasmonics: fundamentals and applications. New York: York: Springer-Verlag, 2011. [Online]. Available: https://www.springer. Springer, 2007. com/gp/book/9781441982483 [10] P. G. Etchegoin, E. C. Le Ru, and M. Meyer, “An analytic [25] L. Zhang, “FDTD Algorithm for Plasmonic Nanoparticles with model for the optical properties of gold,” The Journal of Chemical Spatial Dispersion,” Ph.D. dissertation, The Ohio State University, Physics, vol. 125, no. 16, p. 164705, Oct. 2006. [Online]. Available: 2016. [Online]. Available: https://etd.ohiolink.edu/pg 10?0::NO:10: http://aip.scitation.org/doi/10.1063/1.2360270 P10 ACCESSION NUM:osu1452174003 [11] U. Kreibig and L. Genzel, “Optical absorption of small metallic [26] B. Gallinet, J. Butet, and O. J. F. Martin, “Numerical methods for particles,” Surface Science, vol. 156, pp. 678–700, Jun. 1985. nanophotonics: standard problems and future challenges: Numerical [Online]. Available: http://www.sciencedirect.com/science/article/pii/ methods for nanophotonics,” Laser & Photonics Reviews, vol. 9, no. 6, 0039602885902390 pp. 577–603, Nov. 2015. [Online]. Available: http://doi.wiley.com/10. [12] S. Raza, W. Yan, N. Stenger, M. Wubs, and N. A. Mortensen, “Blueshift 1002/lpor.201500122 of the surface plasmon resonance in silver nanoparticles: substrate [27] K. Yee, “Numerical solution of initial boundary value problems in- effects,” Optics Express, vol. 21, no. 22, pp. 27 344–27 355, Nov. 2013. volving maxwell’s equations in isotropic media,” IEEE Transactions on [Online]. Available: https://www.osapublishing.org/oe/abstract.cfm?uri= Antennas and Propagation, vol. 14, no. 3, pp. 302–307, May 1966. oe-21-22-27344 [28] A. C. Lesina, A. Vaccari, P. Berini, and L. Ramunno, “On the [13] J. A. Scholl, A. L. Koh, and J. A. Dionne, “Quantum plasmon convergence and accuracy of the FDTD method for nanoplasmonics,” resonances of individual metallic nanoparticles,” Nature, vol. 483, Optics Express, vol. 23, no. 8, pp. 10 481–10 497, Apr. 2015. no. 7390, pp. 421–427, Mar. 2012. [Online]. Available: http: [Online]. Available: https://www.osapublishing.org/oe/abstract.cfm?uri= //www.nature.com/articles/nature10904 oe-23-8-10481 [14] H. Baida, P. Billaud, S. Marhaba, D. Christofilos, E. Cottancin, [29] “Chapel: Productive Parallel Programming.” [Online]. Available: A. Crut, J. Lerm, P. Maioli, M. Pellarin, M. Broyer, N. Del Fatti, https://chapel-lang.org/ F. Valle, A. Snchez-Iglesias, I. Pastoriza-Santos, and L. M. Liz-Marzn, [30] W. H. P. Pernice, “Finite-Difference Time-Domain Methods and Material “Quantitative Determination of the Size Dependence of Surface Models for the Simulation of Metallic and Plasmonic Structures,” Plasmon Resonance Damping in Single Ag@SiO 2 Nanoparticles,” Journal of Computational and Theoretical Nanoscience, vol. 7, no. 1, pp. Nano Letters, vol. 9, no. 10, pp. 3463–3469, Oct. 2009. [Online]. 1–14, Jan. 2010. [Online]. Available: http://openurl.ingenta.com/content/ Available: https://pubs.acs.org/doi/10.1021/nl901672b xref?genre=article&issn=1546-1955&volume=7&issue=1&spage=1 [15] W. Zhu, R. Esteban, A. G. Borisov, J. J. Baumberg, P. Nordlander, [31] K. P. Prokopidis and D. C. Zografopoulos, “A Unified FDTD/PML H. J. Lezec, J. Aizpurua, and K. B. Crozier, “Quantum mechanical Scheme Based on Critical Points for Accurate Studies of Plasmonic effects in plasmonic structures with subnanometre gaps,” Nature Structures,” Journal of Lightwave Technology, vol. 31, no. 15, pp. 2467– Communications, vol. 7, no. 1, p. 11495, Sep. 2016. [Online]. 2476, Aug. 2013. Available: http://www.nature.com/articles/ncomms11495 [32] A. Taflove and S. C. Hagness, Computational electrodynamics: the finite- [16] C. Ciraci, R. T. Hill, J. J. Mock, Y. Urzhumov, A. I. Fernandez- difference time-domain method, 3rd ed., ser. Artech House antennas and Dominguez, S. A. Maier, J. B. Pendry, A. Chilkoti, and D. R. Smith, propagation library. Boston: Artech House, 2005. “Probing the Ultimate Limits of Plasmonic Enhancement,” Science, [33] P. Halevi and R. Fuchs, “Generalised additional boundary condition for vol. 337, no. 6098, pp. 1072–1074, Aug. 2012. [Online]. Available: non-local dielectrics. I. Reflectivity,” Journal of Physics C: Solid State http://www.sciencemag.org/cgi/doi/10.1126/science.1224823 Physics, vol. 17, no. 21, pp. 3869–3888, Jul. 1984. [Online]. Available: [17] S. Raza, S. I. Bozhevolnyi, M. Wubs, and N. Asger Mortensen, https://doi.org/10.1088/0022-3719/17/21/017 “Nonlocal optical response in metallic nanostructures,” Journal of [34] R. F. Barrett, P. C. Roth, and S. W. Poole, “Finite Difference Stencils Physics: Condensed Matter, vol. 27, no. 18, p. 183204, May 2015. Implemented Using Chapel.” [Online]. Available: http://stacks.iop.org/0953-8984/27/i=18/a=183204? [35] “OpenMP.” [Online]. Available: https://www.openmp.org/ key=crossref.8c354f8034f59793e8ac363e22b99808 [18] S. Raza, G. Toscano, A.-P. Jauho, M. Wubs, and N. A. Mortensen, [36] “CUDA,” Mar. 2014. [Online]. Available: https://developer.nvidia.com/ “Unusual resonances in nanoplasmonic structures due to nonlocal about-cuda response,” Physical Review B, vol. 84, no. 12, p. 121412, Sep. [37] “OpenCL,” Jul. 2013. [Online]. Available: https://www.khronos.org/ 2011. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB. opencl/ 84.121412 [38] V. Demir and A. Z. Elsherbeni, “Compute Unified Device Architecture [19] S. Raza, N. Stenger, S. Kadkhodazadeh, S. V. Fischer, N. Kostesha, (CUDA) Based Finite- Difference Time-Domain (FDTD) Implementa- A.-P. Jauho, A. Burrows, M. Wubs, and N. A. Mortensen, “Blueshift tion,” vol. 25, no. 4, p. 12, 2010. of the surface plasmon resonance in silver nanoparticles studied [39] A. Vial, T. Laroche, M. Dridi, and L. LeCunff, “A new model with EELS,” Nanophotonics, vol. 2, no. 2, pp. 131–138, 2013. of dispersion for metals leading to a more accurate modeling of [Online]. Available: https://www.degruyter.com/view/j/nanoph.2013.2. plasmonic structures using the FDTD method,” Applied Physics issue-2/nanoph-2012-0032/nanoph-2012-0032.xml A, vol. 103, no. 3, pp. 849–853, Jun. 2011. [Online]. Available: [20] N. A. Mortensen, S. Raza, M. Wubs, T. Sndergaard, and S. I. http://link.springer.com/10.1007/s00339-010-6224-9 Bozhevolnyi, “A generalized non-local optical response theory for [40] C. Kittel, Introduction to solid state physics, 8th ed. Hoboken, NJ: plasmonic nanostructures,” Nature Communications, vol. 5, no. 1, p. Wiley, 2005. 3809, Sep. 2014. [Online]. Available: http://www.nature.com/articles/ [41] R. Ruppin, “Optical Properties of a Plasma Sphere,” Physical Review ncomms4809 Letters, vol. 31, no. 24, pp. 1434–1437, Dec. 1973. [Online]. Available: [21] K. R. Hiremath, L. Zschiedrich, and F. Schmidt, “Numerical https://link.aps.org/doi/10.1103/PhysRevLett.31.1434 solution of nonlocal hydrodynamic Drude model for arbitrary shaped [42] ——, “Optical properties of small metal spheres,” Physical Review nano-plasmonic structures using Ndlec finite elements,” Journal of B, vol. 11, no. 8, pp. 2871–2876, Apr. 1975. [Online]. Available: Computational Physics, vol. 231, no. 17, pp. 5890–5896, Jul. https://link.aps.org/doi/10.1103/PhysRevB.11.2871 2012. [Online]. Available: http://www.sciencedirect.com/science/article/ [43] A. Pack, M. Hietschold, and R. Wannemacher, “Failure of local Mie pii/S0021999112002550 theory: optical spectra of colloidal aggregates,” Optics Communications, [22] N. Schmitt, J. Viquerat, C. Scheid, S. Lanteri, M. Moeferdt, and vol. 194, no. 4, pp. 277–287, Jul. 2001. [Online]. Available: K. Busch, “A 3D discontinuous Galerkin Time-Domain method for nano http://www.sciencedirect.com/science/article/pii/S0030401801013104 plasmonics with a nonlocal dispersion model,” in 2017 Progress In [44] H. C. v. d. Hulst, Light scattering by small particles. New York: Dover Electromagnetics Research Symposium - Spring (PIERS), May 2017, Publications, 1981. pp. 2792–2799. [45] S. Raza, M. Wubs, S. I. Bozhevolnyi, and N. A. Mortensen, [23] X. Zheng, M. Kupresak, R. Mittra, and G. A. E. Vandenbosch, “Nonlocal study of ultimate plasmon hybridization,” Optics Letters, “A Boundary Integral Equation Scheme for Simulating the Nonlocal vol. 40, no. 5, p. 839, Mar. 2015. [Online]. Available: https: Hydrodynamic Response of Metallic Antennas at Deep-Nanometer //www.osapublishing.org/abstract.cfm?URI=ol-40-5-839 13

[46] D. V. Guzatov, S. V. Vaschenko, V. V. Stankevich, A. Y. Lunevich, Y. F. Glukhov, and S. V. Gaponenko, “Plasmonic Enhancement of Molecular Fluorescence near Silver Nanoparticles: Theory, Modeling, and Experiment,” The Journal of Physical Chemistry C, vol. 116, no. 19, pp. 10 723–10 733, May 2012. [Online]. Available: http://pubs.acs.org/doi/10.1021/jp301598w [47] G. Li, S. Zhang, and T. Zentgraf, “Nonlinear photonic metasurfaces,” Nature Reviews Materials, vol. 2, no. 5, pp. 1–14, Mar. 2017, number: 5 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/natrevmats201710 [48] A. Krasnok, M. Tymchenko, and A. Al, “Nonlinear metasurfaces: a paradigm shift in nonlinear optics,” Materials Today, vol. 21, no. 1, pp. 8–21, Jan. 2018. [Online]. Available: http://www.sciencedirect.com/ science/article/pii/S136970211730233X [49] S. Keren-Zur, L. Michaeli, H. Suchowski, and T. Ellenbogen, “Shaping light with nonlinear metasurfaces,” Advances in Optics and Photonics, vol. 10, no. 1, pp. 309–353, Mar. 2018, publisher: Optical Society of America. [Online]. Available: https://www.osapublishing.org/aop/ abstract.cfm?uri=aop-10-1-309 [50] “Graham - CC Doc.” [Online]. Available: https://docs.computecanada. ca/wiki/Graham [51] L. Minati, F. Benetti, A. Chiappini, and G. Speranza, “One-step synthesis of star-shaped gold nanoparticles,” Colloids and Surfaces A: Physicochemical and Engineering Aspects, vol. 441, pp. 623–628, Jan. 2014. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0927775713007796

Joshua Baxter earned his BSc in Physics/Mathematics from the University of Ottawa in 2017 and started his MSc in the same year. He transferred to his PhD in 2019. His work has involved plasmonic colouring with ultrafast lasers, simulations of laser heating, and computational nanophotonics. His current work focuses on Deep Learning applications to plasmonic colouring and nonlinear plasmonics through hydrodynamics.

Antonino Cala` Lesina is a Research Associate at the University of Ottawa, Canada, since 2018. He obtained his B.Sc. in Electronics Engineering in 2006 and his M.Sc. in Telecommunications Engineering in 2009 from the University of Catania, Italy, and his Ph.D. in Information and Communication Technologies in 2013 from the University of Trento, Italy, conducting his research in computational electromagnetics at the Fondazione Bruno Kessler. He joined the University of Ottawa in 2013 as a Postdoctoral Fellow in the groups of professors Lora Ramunno and Pierre Berini. His research focuses on computational nanophotonics, nonlinear and tunable metasurfaces, plasmonics and metamaterials.

Lora Ramunno obtained her Ph.D. in physics from the University of Toronto in 2002. After a year in a high-tech start-up, she returned to academia as a postdoctoral fellow at uOttawa. Since 2007, she has been a faculty member in Physics at uOttawa where she is currently Full Professor. She held a Canada Research Chair in Computational Nanophotonics from 2007-2017, and was the recipient of an Early Researcher Award in 2007. Her theoretical/computational research focuses on nonlinear and nano-optics, including intense laser-matter interaction, nanophotonics, nonlinear optical microscopy and photonic devices. She also has several successful industrial collaborative projects. Prof. Ramunno is a fellow of the Max-Planck-UOttawa Centre for Extreme and Quantum Photonics, a fellow of the NRC-UOttawa Joint Centre for Extreme Photonics, and a member of the uOttawa Centre for Research in Photonics.