Supplementary information for “Unraveling Electronic Absorption Spectra using Nuclear Quantum Effects: Photoactive Yellow Protein and Green Fluorescent Protein Chromophores in Water” Tim J. Zuehlsdorff,1, a) Joseph A. Napoli,2 Joel M. Milanese,1 Thomas E. Markland,2, b) and Christine M. Isborn1, c) 1)Chemistry and Chemical Biology, University of California Merced, Merced, California 95343, USA 2)Department of Chemistry, Stanford University, Stanford, California 94305, USA

(Dated: 15 June 2018)

a)Electronic mail: tzuehlsdorff@ucmerced.edu b)Electronic mail: [email protected] c)Electronic mail: [email protected]

1 CONTENTS

I. Double-counting in the E-ZTFC approach 2 A. Updated model systems 3 B. A double-counting free approach: an average vibronic shape function applied to vertical excitation energies obtained from configurations where the chromophore geometry is optimized in a fixed solvent environment 12 C. Application to the GFP chromophore anion 16 D. Concluding remarks 20

II. Simulation cell sizes and QM regions 20

III. Convergence tests: QM region and MM box 23

IV. Convergence tests: AIMD and AI-PIMD sampling 26

V. Computing generalized moments of a distribution 28

VI. Quantifying non-Gaussian features in the ensemble absorption spectra 29

VII. Structural analysis 31 A. Collective motion 31 B. Nuclear quantum effects on specific bond lengths 34

VIII. Analysis of solvent effects 36

IX. Franck-Condon Shape Functions 39

X. Absorption spectra for the Ensemble and E-ZTFC approaches 47

References 47

I. DOUBLE-COUNTING IN THE E-ZTFC APPROACH

The E-ZTFC approach includes chromophore nuclear degrees of freedom both from sam- pling the chromophore motion in the ensemble of vertical excitation energies and from the

2 ground state vibrational wave function used to construct the zero temperature Franck- Condon shape function. To characterize how this double counting of the chromophore nuclear motion affects the computed spectra, we introduced a simple model system in Ref.1, consisting of an ensemble of identical displaced harmonic oscillators coupled to a uniform classical solvent environment described by the solvent reorganization energy. We found that when the ensemble spectrum was sampled from a Boltzmann distribution the double- counting in the E-ZTFC approach led to systematic overestimation of the spectral width compared to the exact solution of the model system, but that this overestimation was small in the limit of large solvent reorganization energies.

In the present work, the ensemble of vertical excitation energies is sampled from both AIMD and AI-PIMD trajectories, with the AI-PIMD ensemble corresponding to a quantum Wigner distribution. This quantum distribution includes nuclear quantum effects, including zero-point energy, allowing the system to sample more anharmonic regions of the potential energy surface. To test the validity of combining the E-ZTFC approach with an AI-PIMD ensemble that accounts for nuclear quantum effects on the ground state potential energy surface, we here again explore the effects of double counting the nuclear motion of the chromophore. We first extend our simple model system to include a Wigner distribution of the nuclear degree of freedom. Furthermore, to assess the behavior of the approach for anharmonic degrees of freedom, we extend the model system to that of an ensemble of identical displaced Morse oscillators. We later compare computed spectral results for a real system with and without double counting for a small subset of configurations.

A. Updated model systems

The model system of an ensemble of identical displaced harmonic oscillators coupled to a classical uniform solvent environment has been described in detail in Ref.1. Here we summarize the main expressions for the E-ZTFC approach based on an ensemble of classical and a quantum configurations, as well as the exact solution to the model system based on the finite temperature Franck-Condon spectrum with Gaussian solvent broadening.

The exact solution to the model system of the displaced harmonic oscillators can be

3 written as

∞ ∞ " 2 # exp [−D(2n + 1)] X X Di+j (ω − (i − j)) σexact (ω −∆ ; T ) ∝ √ (n+1)injexp − (1) harmonic eg 4πλk T i!j! 4λk T B i=0 j=0 B

−1  h 1 i  where n = exp − 1 , ∆eg is the adiabatic energy gap between the two potential kB T energy surfaces, and D is the Huang-Rhys factor, which, in the natural units of the model system is given by D = d2/2 (where d is the displacement of between the two minima of the potential energy surfaces). The parameter λ specifies the solvent reorganization energy measured in units of ~ω0, where ω0 is the frequency of the harmonic oscillator. The E-ZTFC absorption spectra for the model system, both based on a classical Boltz- mann ensemble distribution and the fully quantum mechanical Wigner distribution are given by:

−D ∞ i  2  E(cl)-ZTFC e X D (ω − i) σ (ω − ∆ ; T ) ∝ exp − (2) harmonic eg p i! 4k T (D + λ) 4πkBT (D + λ) i=0 B e−D σE(qm)-ZTFC(ω − ∆ ; T ) ∝ harmonic eg s   D 2π 2λkBT + h i tanh 1 2kBT   ∞ X Di  (ω − i)2  × exp −   (3) i!  D  i=0 2 2λkBT + h i tanh 1 2kBT

D The main difference between the two expressions is a factor of h i for the quantum tanh 1 2kBT ensemble, which approaches D as T → 0. This factor guarantees that the quantum E-ZTFC spectrum retains finite broadening at low temperatures as a result of zero-point motion. The harmonic oscillator model has closed expressions for the E-ZTFC spectra based on both quantum and classical ensembles, as well as the exact solution. However, it does not contain any anharmonicity, meaning that a harmonic finite-temperature Franck-Condon (FTFC) spectrum forms an exact solution. Realistic systems like the ones considered in this work possess a significant degree of anharmonicity. It is therefore interesting to use an anharmonic Morse oscillator model to determine whether the E-ZTFC approach based on a harmonic zero temperature vibronic shape function and a fully anharmonic ensemble spectrum provides a better approximation to the exact solution than a pure FTFC spectrum in the harmonic approximation. For this reason we consider a model system of an ensemble

4 of identical displaced Morse oscillators coupled to the same classical solvent environment as the harmonic system.

The Hamiltonians HGS and HES for the ground and excited state potential energy surfaces of the Morse oscillator are given by 1 d2 HMorse = − + A 1 − e−αx2 (4) GS 2 dx2 2 1 d 2 HMorse = − + A 1 − e−α(x−d) + ∆ (5) ES 2 dx2 eg For the purpose of this work, we fix the value of α such that α = √1 , which guarantees that 2A 1 2 the harmonic part of the potential is equal to 2 x , in agreement with the harmonic oscillator model system. The parameter A controls the degree of anharmonicity in the model system, with A → ∞ corresponding to the harmonic oscillator. The vertical excitation energy of the Morse oscillator system for a given position x can then be written as:

Morse −α(x−d)2 −αx2 ωvert (x) = ∆eg + A 1 − e − A 1 − e (6)

The classical and quantum function of position x on the ground state potential energy surface can be written as " 2 # Morse A (1 − exp [−αx]) ρcl (x, T ) ∝ exp − (7) kBT ∞ " EMorse # Morse X {g},i Morse 2 ρ (x, T ) = exp − ψ (x) (8) qm k T {g},i i B Morse Morse th where E{GS},i and ψ{GS},i(x) are the i energy eigenvalue and wavefunction of the Hamilto- Morse nian HGS . We can then construct the classical and quantum ensemble spectra of vertical excitations for the anharmonic model system. For the anharmonic model system, we again consider the influence of the solvent envi- √ ronment in terms of a Gaussian broadening with σ = 2kBT λ, where λ is the solvent reorganization energy. Because the Morse potential chosen has a quadratic term that agrees with the potential of the harmonic oscillator, the ZTFC shape function in the harmonic approximation including solvent broadening can be written as ∞ X Di  (ω − i)2  σharmonic(ω − ∆ ; T → 0) ∝ e−D exp − (9) ZTFC eg i! 4k T λ i=0 B The E-ZTFC approach based on the quantum and the classical Morse ensemble and the harmonic approximation for the vibronic shape function is then computed by convoluting Eqn. 9 with the respective ensemble spectra of vertical excitations.

5 The exact solution for the model anharmonic system is given by

" EMorse # Morse X {GS},i X Morse Morse 2 σ (ω − ∆eg; T ) ∝ exp − ψ |ψ exact k T {GS},i {ES},j i B j  2   Morse Morse  ω − E{GS},i + E{ES},j × exp −  (10)  4kBT λ 

where the sums run over all bound states of the ground and excited state Morse oscillator. Unlike for the harmonic oscillator, Eqn. 10 cannot be simplified to a closed expression. In this work we evaluate the Franck-Condon integrals between the vibrational wavefunctions of the displaced Morse oscillator numerically, considering a total of 30 eigenstates on each potential energy surface. Simulated E-ZTFC spectra based on a quantum and a classical ensemble distribution in comparison with the exact solution for a harmonic oscillator and two Morse oscillator systems with varying degrees of anharmonicity can be found in Fig. 1. The first column corresponds to a harmonic system with kBT = 0.175, which approximately corresponds to ~ω0 the frequency of a C-C single bond oscillation at room temperature. The second column contains results for a Morse oscillator at the same temperature and moderate anharmonicity of A = 40. The last column contains results for a Morse oscillator with kBT = 0.3 and a ~ω0 stronger anharmonicity of A = 20, thus modelling a more anharmonic vibrational with a frequency of approximately 700 cm−1 at room temperature. The first row shows results in vacuum for all model systems (i.e λ = 0), whereas the second and third row model weak and strong solvent fluctuations, respectively. The second row corresponds to a situation where √ effective solvent broadening is a factor of 2 smaller than the classical ensemble spectrum of nuclear degrees of freedom for the harmonic model system, and in the third row the √ solvent broadening is a factor of 2 larger than the classical ensemble spectrum. For the Morse oscillator systems, we also plot the finite temperature Franck-Condon spectrum in the harmonic approximation. For vacuum, the E-ZTFC approach yields smooth spectra without any vibronic fine struc- ture, in contrast to the exact results which are given by an array of δ-functions. Vacuum presents the most extreme example of the effect of double-counting in the E-ZTFC approach, as we discussed in the context of the harmonic model system in Ref.1. The E-ZTFC spec- trum based on the quantum ensemble is wider than that from the classical ensemble, which

6 k T k T k T Harmonic, B =0.175 Morse, B =0.175,A= 40 Morse, B =0.3,A= 20 ~! ~! ~! E(Classical)-ZTFC E(Classical)-ZTFC E(Classical)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC Exact Exact Exact

Vacuum

-4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 12 14 -5 0 5 10 15 E(Classical)-ZTFC FTFC(Harmonic) FTFC(Harmonic) E(Quantum)-ZTFC E(Classical)-ZTFC E(Classical)-ZTFC Exact E(Quantum)-ZTFC E(Quantum)-ZTFC Exact Exact

D = 2

-4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 12 -5 0 5 10 15

E(Classical)-ZTFC FTFC(Harmonic) FTFC(Harmonic) E(Quantum)-ZTFC E(Classical)-ZTFC E(Classical)-ZTFC Exact E(Quantum)-ZTFC E(Quantum)-ZTFC Exact Exact

=2D

-6 -4 -2 0 2 4 6 8 10 12 -4 -2 0 2 4 6 8 10 12 -5 0 5 10 15 FTFC FTFC(harmonic) FTFC(harmonic) E(Classical)-ZTFC E(Classical)-ZTFC E(Classical)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC Exact Exact Exact FIG. = 1.D Absorption spectra in the E-ZTFC approach based on a quantum or a classical probability distribution =0.8 as computed for the harmonic and the Morse oscillator model systems using a variety of different parameters. Energy is measured in units of ~ω, with ω being the harmonic frequency of the oscillator.-4 -2 A 0 Huang-Rhys 2 4 6 parameter 8 10 -5 D=2 0 is used 5 for 10 all model 15 systems,-5 0 corresponding 5 10 to 15 a q ~ displacement between the minima of the two wells of d = 2 in units of mω . can be attributed to the double-counting of zero-point motion. Becaue of the full double counting of all degrees of freedom of the system, the E-ZTFC approach should not be applied to systems in vacuum as in this limit it simply adds spurious broadening to the spectrum. After adding solvent broadening effects, the exact solution produces smooth spectra free of vibronic fine structure. For all systems the quantum E-ZTFC approach is systematically broader than the classical E-ZTFC approach. For the harmonic system the exact results are somewhat narrower than the classical E-ZTFC approach, and the increased double counting in the quantum E-ZTFC approach yields even stronger overestimation of the spectral width. However, the situation changes when going to the anharmonic model systems. Here the classical E-ZTFC approach underestimates the high energy anharmonic tail of the exact

7 spectrum due to insufficient sampling of the anharmonic region of the potential energy surface. The quantum E-ZTFC approach overestimates the spectral width, but also captures more of the anharmonicity, as seen by the improved slope and tail at higher energies. The finite temperature Franck-Condon (FTFC) approach within the harmonic approximation systematically underestimates the spectral width in comparison with the exact results and misses the high energy tail. In fact, the performance of the double-counting free harmonic FTFC approach for the more anharmonic system is worse than for the quantum E-ZTFC approach. Thus, in solvated systems with strong solute-solvent interactions that show a significant degree of anharmonicity, it is expected that a pure FTFC approach based on the harmonic approximation and solvent broadening derived from the solvent reorganization energy will significantly underestimate the spectral width. Although the quantum E-ZTFC approach suffers from double counting of nuclear degrees of freedom of the solute and will therefore overestimate spectral width, it has the advantage of partially capturing anharmonic effects and specific solute-solvent interactions, potentially producing better spectral shapes.

These model systems treat the solvent broadening as a simple Gaussian broadening based on the solvent reorganization energy. This approximation is likely valid in situations where the solvent environment is well described by an implicit solvent model, i.e. in situations where there are no specific solute-solvent interactions such as strong hydrogen bonding and no solvent-induced changes to the motion of the chromophore. However, in the systems studied in this work, the solute-solvent interaction is relatively strong and the approximation that solvent broadening is fully independent from the motion of the chromophore might break down. To study this effect, we attempt to construct an effective model system where the full ensemble spectrum of the chromophore is approximated through three independent components: solvent broadening arising purely from solvent fluctuations with no influence on the solute degrees of freedom, which is described by λ; broadening due to the temperature- dependent vibrations of the chromophore; and additional broadening due to changes to the equilibrium structure of the solute due to fluctuations in the solvent environment resulting from strong specific solute-solvent interactions. We note that only the second of these three contributions is double-counted in the E-ZTFC approach. The net effect of this dependence between solute and solvent nuclear degrees of freedom is that the ‘exact’ solutions to the model systems (Eqn. 1 and Eqn.10) should include additional broadening that goes beyond the purely electrostatic term due to the solvent reorganization energy λ. We will expand

8 on this point in the next section, showing how this extra broadening term originates from applying a separation of timescales between solute and solvent degrees of freedom, treating the solvent environment as static and the Franck-Condon transition of the chromophore as instantaneous. To include this additional solvent-induced broadening, we make one further change to the model systems. Rather than considering an ensemble of identical displaced harmonic/Morse oscillators coupled to a uniform classical solvent environment, we consider an ensemble of oscillators with identical shape and displacement, but with a range of different adiabatic

gaps ∆eg, still coupled to the uniform solvent environment. The uniform solvent environ- ment described by λ accounts for the purely electrostatic broadening, whereas the value for

∆eg accounts for direct solute-solvent interactions, such as solvent-induced changes to the chromophore geometry. We will rigorously justify this choice in the next section, where we derive a double-counting free approximation to the lineshape of a chromophore in solution that is closely related to this updated model system. In our updated model system with a range of adiabatic energy gaps, the exact solutions of Eqn. 1 and Eqn. 10 can then be written as

Z exact,δ av exact σharmonic(ω − ∆eg; δ; T ) ∝ dΓ ρ∆(Γ, δ, T )σharmonic(ω − ∆eg(Γ); T ) (11) Z exact,δ av exact σMorse (ω − ∆eg; δ; T ) ∝ dΓ ρ∆(Γ, δ, T )σMorse(ω − ∆eg(Γ); T ) (12)

av av R where ∆eg(Γ) = ∆eg + Γ, ∆eg = dΓ ρ∆(Γ, δ, T )∆eg(Γ) and ρ∆(Γ, δ, T ) is the probability distribution function of the adiabatic energy gap, which we assume to be given by the following expression for both the harmonic and the Morse oscillator:

  1  Γ2  ρ∆(Γ, δ, T ) = s exp −   (13)    δ2D  D 2 h 1 i δ 2π h 1 i tanh tanh 2kBT 2kBT Here we assume that the spread in the adiabatic energy gap is a Gaussian with standard deviation of the quantum ensemble spectrum in vacuum of the corresponding harmonic oscillator scaled by a factor of δ, where 0 ≤ δ ≤ 1. Setting δ = 0, the exact solutions of the updated model systems reduce to the solutions of Eqn. 1 and Eqn.10, where no additional

9 kBT k T k T Harmonic, =0.175 Morse, B =0.175,A= 40 Morse, B =0.3,A= 20 ~! ~! ~! FTFC FTFC(harmonic) FTFC(harmonic) E(Classical)-ZTFC E(Classical)-ZTFC E(Classical)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC E(Quantum)-ZTFC Exact Exact Exact

-4 -2 0 2 4 6 8 10 -5 0 5 10 15 -5 0 5 10 15

FIG. 2. Absorption spectra in the E-ZTFC approach based on a quantum or a classical probability distribution as computed for for the harmonic and the Morse oscillator model systems using a variety of different parameters. Energy is measured in units of ~ω, with ω being the harmonic frequency of the oscillator. A Huang-Rhys parameter D=2 is used for all model systems, the solvent reorganization energy λ = 1D and the additional broadening parameter δ for the exact results is set to 0.8. solvent broadening beyond the purely electrostatic term λ is considered. A choice of δ = 1 corresponds to the electrostatic solvent broadening and the solvent-induced broadening due to changes in the chromophore motion leading to the same broadening as the harmonic quantum ensemble spectrum in solution. In this limit, all internal motion of the chromophore is dominated by solvent-induced changes to the chromophore geometry, whereas temperature fluctuations are negligible. Thus, the parameter δ smoothly interpolates between a model system where the solute motion is fully independent from the solvent fluctuations and a model system where all solute motion is induced by the fluctuating solvent environment.

Note that we apply the additional broadening due to fluctuations in ∆eg only to the exact solutions of the model systems and not the E-ZTFC spectra. The reason for this is that the E-ZTFC approach is based on computing a single average vibronic shape function for the entire system, and the broadening due to both solute and solvent motion is accounted for by a convolution with the full solvent-broadened ensemble spectrum of the system that encompasses the effects of both δ and λ. We apply our updated model system to the same three oscillators as in Fig. 1, i.e. the harmonic oscillator and a moderately anharmonic oscillator at high frequency and a more strongly anharmonic oscillator at medium frequency. We set the purely electrostatic solvent broadening to λ = 1D, corresponding to the broadening due to solute motion and due to the

10 solvent environment being of the same magnitude in the classical ensemble spectrum of the harmonic oscillator. This choice of λ = 1D agrees with the estimated solvent broadening obtained for the GFP chromophore anion in water in the main manuscript. We furthermore fix the value of δ as δ = 0.8. The choice of this value will be justified more rigorously in the next sections, by comparing to a closely related, fully double-counting free approach applied to the GFP chromophore anion. The results for the updated model systems are given in Fig. 2, showing the exact solution of Eqn. 12, the E-ZTFC spectra based on a quantum and a classical ensemble distribution, as well as the finite temperature Franck-Condon (FTFC) spectrum in the harmonic approx- imation using the solvent broadening λ only. The FTFC spectrum thus models a situation where the harmonic approximation is applied to the ground and excited state potential energy surface and where the solvent broadening originates only from the solvent reorgani- zation energy, ignoring any direct solute-solvent interactions and solvent-induced changes to the chromophore geometry. The results show that for all three model systems, the E-ZTFC approach based on a classical ensemble slightly underestimates the spectral width, whereas the E-ZTFC approach based on a quantum ensemble slightly overestimates the spectral width. For the anharmonic Morse oscillators, the classical ensemble produces E-ZTFC spectra that do not fully capture the long high energy tail of the exact spectrum, which is correctly recovered by the quantum ensemble. Thus, in the anharmonic systems, the quantum ensemble broadening allows the E-ZTFC approach to approximately capture the correct anharmonic high energy behavior. The FTFC approach in the harmonic approximation with classical solvent broadening systematically underestimates the spectral width in all systems and the disagreement in- creases with increasing anharmonicity. For the more strongly anharmonic Morse oscillator in the mid-frequency range, the FTFC approach is missing spectral weight in the high energy, yielding a spectral lineshape that qualitatively disagrees with the exact solution. The results presented in Fig. 2 provide an explanation for why the E-ZTFC approach using AI-PIMD ensemble sampling can be expected to generate more accurate results than the E-ZTFC approach based on an AIMD ensemble of configurations. We might expect improved results for E-ZTFC sampling from an AI-PIMD ensemble in cases where strong solute-solvent interactions are expected to occur and where the ensemble spectra show signifi- cant signatures of anharmonicity. Even though the AI-PIMD sampling leads to an increased

11 double-counting of the nuclear degrees of freedom of the chromophore in high frequency modes due to inclusion of the zero point motion both in the ensemble sampling and the harmonic vibronic shape function, the ensemble sampling partially captures the effect of anharmonicities not included in the harmonic ZTFC spectrum. The updated model system accounts for additional solvent broadening beyond the clas- sical solvent environment by considering an ensemble of displaced harmonic oscillators with

a spread in adiabatic energy gaps ∆eg. This choice has not been rigorously justified. To do so, the next section will derive an approach of computing the lineshape of a chromophore in solution that is free of any double counting of nuclear degrees of freedom.

B. A double-counting free approach: an average vibronic shape function applied to vertical excitation energies obtained from configurations where the chromophore geometry is optimized in a fixed solvent environment

We next consider a single chromophore in solvent within the Born-Oppenheimer approxi- mation, with an electronic ground and excited state wavefunction given by |ΨGSi and |ΨESi,

respectively. The nuclear wavefunction of a given vibrational mode vi on the ground state E potential energy surface is denoted as Φ{GS} and the nuclear wavefunction v on the ex- vi f E cited state potential energy surface is given by Φ{ES} . Following Fermi’s golden rule, the vf absorption cross section α of a transition between the ground and excited state potential energy surface can be written as

2 X X {GS} {ES}E  {ES} {GS}  α(ω; T ) ∝ ρ(vi; T ) Φ µelec Φ δ ω − E + E − ∆ (14) vi vf vf vi vi vf

{ES} {GS} where the electronic transition dipole µelec = hΨGS| µˆ |ΨESi, Evf and Evi are the vibrational energies associated with the final and initial states vf and vi respectively and

∆ is the adiabatic energy gap between the two potential energy surfaces. The symbol ρ(vi; T ) denotes the probability that a given vibrational mode vi on the ground state potential energy surface is initially occupied at temperature T and is given by the Boltzmann factor. Making the approximation that the nuclear degrees of freedom of the solvent are much slower than those of the chromophore, the absorption cross section can be written as a conformational integral over solvent configurations and a double sum over chromophore vibrational wavefunctions. Similar separation of timescale arguments have been previously

12 applied2,3 to separate the slow, anharmonic degrees of freedom of a chromophore that can be efficiently sampled with an ensemble approach from the fast vibrational motions that can be accounted for fully quantum mechanically in the Franck-Condon picture. Denoting the collective nuclear positions of the system as R = {Rsolv, Rdye} and applying the Condon approximation for the nuclear degrees of freedom of the dye, we can write

Z α(ω; T ) ∝ dRsolv ρ(Rsolv; T )αFTFC [Rsolv](ω; T ) (15)

with αFTFC [Rsolv], the finite temperature Franck-Condon spectrum of the chromophore for a given nuclear configuration Rsolv given by

2 2 X X D {GS} {ES} E αFTFC [Rsolv](ω; T ) = |µ [Rsolv]| ρ(vi; T ) Φ [Rsolv] |Φ [Rsolv] vi vf vi vf   ×δ ω − E{ES} [R ] + E{GS} [R ] − ∆ [R ] (16) vf solv vi solv solv

Here, ρ(Rsolv; T ) denotes a quantum probability distribution for the solvent degrees of free- {ES} {GS} {ES} {GS} dom at temperature T and the quantities Evf , Evi ,Φvf and Φvi are functions of the nuclear degrees of freedom of the dye only, but parametrically depend on Rsolv. Similarly, the electronic dipole moment in the Condon approximation is taken to be independent of the nuclear motion of the dye, but still retains its parametric dependence on the solvent nuclear degrees of freedom. In practice, αFTFC is computed by making the harmonic approximation to the ground and excited state potential energy surface for the chromophore nuclear de- grees of freedom, such that the ground and excited state vibrational modes can be computed {GS,0} {ES,0} from the ground and excited state Hessians around Rdye and Rdye respectively, where {GS,0} Rdye denotes position of the minimum energy configuration of the ground state potential energy surface. Under the approximation that the solvent nuclear degrees of freedom are slow compared to the degrees of freedom of the chromophore and the timescale of an optical excitation,

Eqn. 15 can then be evaluated by extracting Nframes uncorrelated configurations {Ri} from a molecular dynamics simulation of the solvated system. For each configuration Ri, a finite-temperature Franck-Condon spectrum is computed by optimizing the positions of the i chromophore in the ground and excited state in the frozen solvent environment Rsolv. The

13 final spectrum is then given by

Nframes 1 X α(ω; T ) ∝ α Ri  (ω; T ) (17) N FTFC solv frames i The above expression can be interpreted both in terms of a separation of timescales and a separation into quantum and classical nuclear degrees of freedom during the excitation. The assumption is that the ensemble sampling used to extract uncorrelated configurations {Ri} describes the static disorder of the entire system and that the configurations are represen- tative of the different local environments the chromophore experiences in the solution. The full absorption spectrum of the system is then given by assuming that the Franck-Condon excitation occurs instantaneously and that the nuclear wavefunctions of the solvent do not couple to the excitation. The computed Franck-Condon spectrum represents the response of the nuclear wavefunction of the chromophore embedded in a static local environment. Eqn. 17 is free of any double-counting of nuclear degrees of freedom and thus does not suffer from the same potential limitations as the E-ZTFC approach. However, the formulation comes at a significant increase of computational cost, as it requires a full frequency calculation of the i chromophore in the ground and excited state for each frozen solvent configuration {Rsolv}. To reduce the computational cost, we apply one further approximation to Eqn. 17 to arrive at an expression that is closely related to the E-ZTFC approach but rigorously double- counting free. We assume that the shapes of the ground and excited state potential energy surfaces of the chromophore are approximately independent of the frozen solvent nuclear degrees of freedom. Thus, the only dependence on solvent degrees of freedom retained in the expression for αFTFC in Eqn. 16 is in the transition dipole moment and the adiabatic energy gap ∆. Eqn. 17 then reduces to N frames 2 1 X 2 X X D E α(ω; T ) ∝ |µ [R ]| ρ(v ; T ) Φ{GS}|Φ{ES} solv i vi vf Nframes i vi vf   ×δ ω − E{ES} + E{GS} − ∆ [R ] (18) vf vi solv

Nframes 1 X h i{GS,0}i ∝ f Ri , R N solv dye frames i av  h i i{GS,0}i av  ×αFTFC ω − ωvert Rsolv, Rdye + ωvert;T (19)

av where f is the oscillator strength of the transition, αFTFC is some average representative finite h i i{GS,0}i temperature Franck-Condon shape function, ωvert Rsolv, Rdye is the vertical excitation

14 energy for snapshot i as computed in at the ground state optimized geometry of the dye i av within the frozen solvent environment Rsolv, and ωvert is the same quantity but averaged

over all Nframes.

Eqn. 19 reduces to a convolution of a single average finite-temperature Franck-Condon shape function with an ensemble spectrum of vertical excitation energies computed for the chromophore at its ground state optimized geometry within a frozen solvent pocket. It is thus very similar to the E-ZTFC expression, with the difference that all temperature effects of the chromophore are now accounted for fully quantum mechanically through the FTFC shape function and the computed spectrum is rigorously free from any double counting of nuclear degrees of freedom. Compared with the E-ZTFC approach, Eqn. 19 still comes at a significant increase in computational cost, as it requires a full ground state geometry optimization of the chromophore for each ensemble snapshot. Furthermore, since the FTFC shape function is in practice computed in the harmonic approximation, the double-counting free approach accounts for anharmonicities in the sampling of solvent configurations but not in the nuclear degrees of freedom of the chromophore.

Interestingly, Eqn. 19 is also similar to the exact solution of our model systems based on a range of possible values for the adiabatic energy ∆eg (Eqn. 12). The main difference is that in our simplified model system we considered two sources of solvent broadening

that were Gaussian in nature, one in the Gaussian variation of ∆eg and one in the pure electrostatic solvent broadening λ. For Eqn. 19 these two effects are combined in the single ensemble spectrum of vertical absorption energies computed for the chromophore optimized in its frozen solvent pocket. Another difference is that for the expression in Eqn. 19, the harmonic approximation is applied to the FTFC shape function, whereas in our model system, the vibronic fine structure of the chromophore fully takes into account the anharmonicity of the potential energy surface. To further illustrate how Eqn. 19 relates to the model systems considered in the previous section, and how it justifies introducing the extra solvent broadening term in the exact solution of the model system, we next evaluate Eqn. 19 for a real system.

15 AIMD (unopt) AIMD (opt) AI-PIMD (unopt) AI-PIMD (opt) Strength (arb. units)

3 3.2 3.4 3.6 3.8 4 Energy (eV)

FIG. 3. Computed ensemble spectra for the 100 AIMD and AI-PIMD snapshots, where the ge- ometries of the chromophore are optimized in the frozen solvent pocket and without any geometry optimization. A Gaussian broadening of σ = 0.0315 eV is applied to all computed vertical transi- tions.

C. Application to the GFP chromophore anion

We test the influence of the double-counting in the E-ZTFC approach on the computed spectrum by comparing it to the spectrum obtained for the double-counting free approach of Eqn. 19 for the GFP chromophore anion in water. Given the high computational cost associated with geometry optimizations of a chromophore in a large frozen solvent environ- ment treated at the QM level, we limit ourselves to 100 snapshots (ten snapshots from each of the ten independent trajectories for both AIMD and AI-PIMD trajectories) of the GFP chromophore anion in water and use the smaller 6-31G basis set for all results in this sec- tion. Using the same QM region specified in the main manuscript, we compute the ensemble spectra of vertical excitation energies both for the unoptimized dye coordinates and for the dye coordinates optimized in their frozen solvent pocket. The same QM region is used for the computation of vertical excitation energies and the ground state geometry optimization of the chromophore (corresponding to an average of ≈ 600 atoms in the QM region). The ensemble spectra are shown in Fig. 3. The geometry optimization of the chromophore blueshifts the ensemble spectra, with a larger blueshift observed for the AI-PIMD trajectory. For a harmonic distribution of configurations, the geometry optimization would not lead to any spectral shift, only a narrowing of the spectra. Thus, the blueshift is a signature that the trajectories were sampling anharmonic regions of the chromophore’s potential energy surface; as the nuclear configurations of the chromophore are moved from anharmonic regions of the

16 potential energy surface to harmonic regions through the geometry optimization, the vertical excitation energy increases. For the AI-PIMD spectra, the optimization partially removes a long tail at low energies and reduces the tail at high energies, whereas in the AIMD trajectory the change in shape is less pronounced. However, what is perhaps surprising is that the full width at half maximum (FWHM) for both AIMD and AI-PIMD trajectories does not reduce significantly upon geometry optimization, particularly for the AI-PIMD configurations (FWHM for AIMD: 0.271 eV (unopt), 0.223 eV (opt); FWHM for AI-PIMD: 0.273 eV (unopt), 0.271 eV (opt)). In the main manuscript we estimate the broadening due to pure solvent fluctuations by comparing the ensemble spectra of the solvated chromophore with the spectra where the solvent environment is stripped away. Assuming a Gaussian shape for both the solvent broadening and the broadening due to internal motion of the chromophore, it is found that the broadening purely due to solvent fluctuations is of the same order of magnitude as the broadening due to the internal motion of the chromophore. Therefore, if the chromophore degrees of freedom and the solvent fluctuations are fully independent in the GFP chromophore anion, as is implicitly assumed in the model system calculations in the previous section where the solvent broadening is purely accounted for by the the solvent reorganization energy λ, one would expect a significant reduction of the width of the ensemble spectra upon geometry optimization of the chromophore. The agreement in FWHM for the spectra computed with AI-PIMD snapshots and AI-PIMD geometry optimized configurations suggests that a significant contribution to the spectral broadening of the main peak is due to specific solute-solvent interactions, which would presumably be treated more accurately in the AI-PIMD trajectories.

We next estimate the amount of spectral broadening incurred from specific chromophore- solvent interactions. We approximate the full width of the ensemble spectrum of the solvated chromophore as being due to three independent contributions: broadening due to solvent fluctuations described by the reorganization energy λ, broadening due to changes in the ground state geometry of the chromophore due to a given solvent configuration, and broad- ening due to the temperature-dependent fluctuations of the chromophore around its ground state structure for a given slowly-evolving solvent environment. Note that only the last one of these contributions is double-counted in the E-ZTFC approach, whereas the sec- ond term is due to specific chromophore-solvent interactions. We have already estimated the magnitude of the broadening term λ purely due to solvent fluctuations. Assuming all

17 broadening contributions are approximately Gaussian in nature, we can estimate the size of the other two contributions by comparing the width of the ensemble spectra obtained from the chromophore simulated in vacuum (the pure temperature-dependent vibrations of the chromophore), to the ensemble spectra obtained from the solvated configurations with the solvent removed from the excitation energy calculations (representing the combined effect of both temperature-dependent vibrations and solvent-induced changes to the ground state structure). In the main manuscript, the standard deviation of the vertical excitation ener- gies for the GFP chromophore using vacuum AI-PIMD configurations is 0.10 eV, whereas it is 0.12 eV for the solvated configurations with the solvent removed from the excitation energy calculations. Using these standard deviations, and assuming that the broadening due to all internal motion of the chromophore can be written as a convolution between tem- perature fluctuation broadening and solvent-induced changes in the chromophore geometry, the additional broadening parameter due to direct solute-solvent interactions (beyond the value of λ representing just the electrostatic environment of the solvent) would give a value of approximately δ = 0.5. Thus, the standard deviation of the broadening purely due to solvent-induced changes in the chromophore structure from direct solute-solvent interaction is about half the size of the standard deviation of the broadening due to all internal mo- tions of the chromophore (both solvent-induced changes in the chromophore geometry and temperature fluctuations). However, because the FWHM obtained for the ensemble spectra from optimized and unoptimized configurations of the chromophore in the frozen solvent are nearly identical for AI-PIMD (0.273 vs 0.271), this agreement suggests that the additional solvent induced broadening parameter should be closer to δ = 1.0. In the model system with this additional broadening, we use an intermediate value of δ = 0.8.

We next use the ensemble spectra for the unoptimized chromophore coordinates to com- pute spectra within the E-ZTFC approach. The ZTFC vibronic shape function is computed using CAM-B3LYP/6-31G, in agreement with the computation of vertical excitation ener- gies, but it is computed in a polarizable continuum model rather than as an average over individual frozen solvent environments as in the main manuscript. The AI-PIMD ensem- ble spectrum for the optimized chromophore is then used to evaluate Eqn. 19 to obtain a fully double-counting free estimate of the spectral lineshape. For simplicity, we use the same ZTFC shape function as for the E-ZTFC spectra to evaluate Eqn. 19, rather than a FTFC function. This amounts to ignoring all temperature-induced broadening effects in

18 E(classical)-ZTFC E(quantum)-ZTFC E(quantum,opt)-ZTFC Strength (arb. units)

3 3.2 3.4 3.6 3.8 4 4.2 4.4 Energy (eV)

FIG. 4. Computed E-ZTFC spectra for the unoptimized 100 AIMD and AI-PIMD snapshots, as well as an E-ZTFC spectrum based on the ensemble of vertical excitation energies computed for the optimized AI-PIMD snapshots. A Gaussian broadening of σ = 0.0315 eV is applied to the vibronic shape functions and the E-ZTFC spectra of the unoptimized ensemble spectra are scaled and shifted such that their absorption maximum agrees with the one computed for the optimized AI-PIMD ensemble spectrum. The same ZTFC shape function is used for all spectra.

the chromophore degrees of freedom, which likely corresponds to a small underestimation of the spectral width of the final spectrum in Eqn. 19 as compared to what would have been obtained with the correct FTFC shape function. The results for all three spectra can be found in Fig. 4. The rigorously-double counting free approach based on the ensemble spectrum of optimized chromophore configurations yields a spectrum with similar width to the E-ZTFC approach based on the AIMD trajectory. The E-ZTFC approach based on AI- PIMD configurations is slightly wider than the double-counting free approach, showing a low energy shoulder and slightly more of a high energy tail. It should be noted however, that, in addition to neglecting temperature effects for the chromophore, this double-counting free approach does not account for any anharmonicity in the nuclear degrees of freedom of the chromophore and there is clear evidence for anharmonic features in the ensemble spectrum of the GFP chromophore anion. It is thus likely that the exact spectrum of the system would be in somewhat closer agreement with the E-ZTFC results based on AI-PIMD configurations, as was shown in the anharmonic model system calculations in the previous section.

19 D. Concluding remarks

In this section we provided a detailed analysis of the influence of the double counting in- herent in the E-ZTFC approach by comparison to harmonic and anharmonic model systems, as well as by comparison to a rigorously double-counting free approach applied to the GFP chromophore anion in water. Although the simple harmonic model system considered in Ref.1 suggested that the E-ZTFC approach systematically overestimates the spectral width, and that this overestimation becomes worse if an underlying quantum ensemble is con- sidered, the considerations presented here suggest that using AI-PIMD ensemble sampling with the E-ZTFC approach likely yields improved results over AIMD in systems with strong solute-solvent coupling and anharmonicity. The AI-PIMD trajectories more accurately take into account specific chromophore-solvent interactions and sampling of anharmonic regions of the potential energy surface, both of which contribute to the spectra in the E-ZTFC approach. Although we show that the double-counting free spectrum using optimized ge- ometries is slightly more narrow that the E-ZTFC approach, the more narrow spectrum may be due to lack of temperature effects or decreased anharmonicity upon optimization of the chromophore geometry. The double counting free approach comes at a significant increase in computational cost, and it is unclear from our analysis if it is more accurate than the E-ZTFC approach. The results presented for the extended model systems also suggest why a FTFC spectrum in the harmonic approximation with classical solvent broadening based on the solvent reor- ganization energy will likely systematically underestimate the spectral width in these kinds of systems, as was found in previous studies on the GFP chromophore anion in water4. This approach neglects anharmonicity as well as the role of the specific solvent environment lead- ing to additional broadening beyond what would be expected from the solvent reorganization energy.

II. SIMULATION CELL SIZES AND QM REGIONS

In this work, various sizes of solvation environment are used to perform different compu- tations that contribute to the optical spectra. The number of atoms in the MD simulation cells and QM regions are summarized in Table I.

20 av av NMD Nresolv Nvert Nvib GFP (AIMD) 528 19020 609 81 GFP (AI-PIMD) 528 19020 615 82 PYP (AIMD) 527 22439 670 84 PYP (AI-PIMD) 527 22439 675 81

TABLE I. Number of atoms in the simulation cell when generating AIMD and AI-PIMD trajectories

(NMD), total number of atoms in the simulation cell for resolvating with MM waters (Nresolv), the average number of atoms in the QM region used for the calculation of the vertical excitation energies

av av (Nvert), and average number of atoms in the QM region used for the vibronic shape function (Nvib).

For the vertical excitation energies, all solvent molecules with a center of mass within

Rcut=8 A˚ of any atom of the dye were included in the QM region. The average number of atoms in the QM region is slightly higher for the AI-PIMD trajectories than for the AIMD trajectories, suggesting that the solvent environment around the chromophores is slightly more dense in AI-PIMD.

For the vibronic shape functions, the cutoff radius of Rcut = 3 A˚ for the QM region of frozen solvent atoms leads to an average QM region size for AIMD for the PYP chromophore containing on average one water molecule more than for the AI-PIMD trajectory. However, av only five snapshots are used to compute the average vibronic shape function, such that Nvib cannot be considered a reliable estimate of the average number of atoms within a 3 A.˚

An example of the two QM regions for a single snapshot of the PYP chromophore in water taken from an AI-PIMD trajectory can be found in Fig. 5. As can be seen, the QM region used for the calculation of vertical excitation shells covers more than a single solvation shell. The QM region for the calculation of vibronic shape functions, on the other hand does not include a full solvation shell. However, the water molecules that are expected to have the strongest influence on the computed spectra, such as the water molecules hydrogen-bonded to the phenolate oxygen, are treated fully quantum mechanically for the computation of the ZTFC shape functions.

av Table I also shows that Nvert is higher than NMD for all systems, meaning that some water molecules outside of the MD region that were added during the MM resolvation are included in the QM region for the computation of the vertical excitation energies.

21 a) Vertical excitations b) Vibronic shape

function

FIG. 5. Figure showing QM region sizes for a single snapshot of an AI-PIMD trajectory for the PYP chromophore anion in water. Both the QM region used for computing the vertical excitation energies and the frozen solvent QM region for the computation of the Franck-Condon shape function are shown.

FIG. 6. Visualization of the system during the MM equilibration step of the simulation protocol. The frozen DFT core is highlighted in green to illustrate the scale of the system.

Figure 6 illustrates the system during the MM equilibration step of the snapshot prepa- ration procedure. The frozen DFT core is highlighted in green in order to show the scale of the system. MM water molecules that are relaxed around the DFT core during this step are not highlighted. The system is equilibrated under NVT conditions with an explicit interface according to the procedure outlined in the main manuscript.

22 3827 MM waters 7470 MM waters 9091 MM waters Strength (arb. units)

2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 Energy (eV)

FIG. 7. Vertical absorption spectrum as generated from a single AI-PIMD trajectory of the PYP chromophore anion, where three different box sizes of MM water have been used for the re-equilibration process. A Gaussian broadening of σ = 0.021 eV is applied to all transitions.

III. CONVERGENCE TESTS: QM REGION AND MM BOX

In this section we report the results of three convergence tests carried out to ensure that the way the DFT QM region from the AIMD and AI-PIMD trajectories is solvated in a larger MM box does not significantly influence the computed vertical absorption spectra. Both convergence tests are carried out on 100 snapshots from a single AI-PIMD trajectory for the PYP chromophore. We also carry out a test on the influence of finite size effects due to the periodic simulation cell box used in generating the AIMD and AI-PIMD trajectories on the computed vertical spectra. Since AIMD and AI-PIMD simulations are computationally very expensive, this test is carried out using classical MM. We first study the convergence of the spectrum computed within the ensemble approach with respect to the amount of MM solvent placed around the DFT core consisting of the chromophore solvated by approximately 166 water molecules. In addition to the MM water box with 7470 water molecules used for all results reported in the main manuscript, we also repeat the MM equilibration process for the 100 AI-PIMD snapshots using MM water boxes containing 3827 water molecules and 9091 water molecules. For all MM boxes, the equilibration of the MM water around the frozen DFT core is carried out in the same way as described in the main manuscript. We thus obtain three sets of 100 snapshots, each containing identical frozen DFT core regions but different amounts of MM water. Spectra are then computed from the excitation energies computed for the 100 snapshots for each MM box, with the QM region defined in an identical way to the calculations performed in the

23 MM Equilibration 1 MM Equilibration 2 Strength (arb. units)

3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 Energy (eV)

FIG. 8. Vertical absorption spectrum as generated from a single AI-PIMD trajectory of the PYP chromophore anion, where two uncorrelated MM configurations are created for each snapshot in the re-equilibration process. A Gaussian broadening of σ = 0.021 eV is applied to all transitions. main manuscript and all additional MM waters included in the calculations as point charges. The results reported in Fig. 7. show that the spectra generated from three different MM box sizes give an almost identical spectral width and similar overall shape. We therefore conclude that the MM equilibration process using 7470 water molecules as performed in the main manuscript yields vertical energies and spectra that are converged with respect to the size of the MM box.

In a second test, we check whether the main contributions to the computed absorption spectra originate from the frozen DFT core generated from the MD, or whether the specific configuration of MM water molecules around the DFT core has a strong influence on spectral shape. Using the 7470 MM water molecule box around the DFT QM region, we choose two different points in the MM equilibration process, extracting a snapshot after 500 ps and after 1 ns. We repeat this process for all 100 AIMD snapshots, sampling 100 different QM regions, each with two different MM water molecule distributions. We thus obtain two different sets of snapshots that have identical DFT core regions but fully uncorrelated equilibrated MM water molecules. The vertical absorption spectra generated from the two sets of snapshots in Fig. 8 show that the spectra are almost identical, both in width and shape. These results indicate that the main influence of the solvent molecules on the excitation energies of the chromophore originates from the DFT core region that is obtained from AIMD and AI-PIMD trajectories.

Finally, we assess the influence of the size of the simulation cell box used to generate

24 137 waters 167 waters 234 waters Strength (arb. units)

2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 Energy (eV)

FIG. 9. Vertical absorption spectrum as generated from classical MD simulations for three different box sizes. A total of 200 uncorrelated snapshots are used to generate each spectrum. A Gaussian broadening of σ = 0.021 eV is applied to all transitions. the AIMD and AI-PIMD trajectories on the computed vertical excitation energies of the PYP chromophore. Because the AIMD and AI-PIMD simulations are very computationally expensive to perform, we instead make use of classical MM MD for this convergence test. The MD trajectories computed with a classical force field are not directly comparable to AIMD and AI-PIMD trajectories. However, we expect any potential influences on the computed vertical spectra of the AIMD and AI-PIMD trajectories due to finite size effects of the simulation cell to be also present in classical MM trajectories. Three box sizes are considered to investigate potential finite-size effects, containing 137, 167, and 234 water molecules, respectively. Classical MM MD is performed on the three simulation cell box sizes, using the same force field for the PYP chromophore as in the initial MM simulations performed in the main manuscript. For each box size, 200 uncorrelated snapshots are extracted from a 10 ns trajectory and are re-solvated in the larger box of MM charges following an identical protocol as described in the main text. We then compute the vertical excitation energies for each set of re-solvated snapshots, again following the procedure described in the main text.

The resulting ensemble spectra for all three simulation cell box sizes can be found in Fig. 9. As can be seen, there is some variation between the three box sizes with the width of the spectra increasing slightly with box size. This increase in width may indicate that a small box size places some artificial constraints on the solvent fluctuations in the first solvation shell. The difference in width between the simulation box containing 167 water molecules and the one containing 234 water molecules is relatively minor.

25 Taken together, the three convergence tests considered in this section provide strong evidence that the protocol used in this work for generating solvated snapshots yields robust ensemble spectra for the GFP chromophore anion and the PYP chromophore anion in water.

IV. CONVERGENCE TESTS: AIMD AND AI-PIMD SAMPLING

The AIMD and AI-PIMD sampling of solute-solvent configurations is carried out by obtaining ten individual 15 ps trajectories. The initial configurations for each trajectory were obtained from a classical MM run and were spaced by 1 ns in time, such that the individual trajectories can be considered fully uncorrelated. However, the individual snapshots sampled from a single trajectory retain some degree of correlation, as the high computational cost associated with AIMD and AI-PIMD does not allow us to access the timescales necessary to produce a large number of fully uncorrelated snapshots. On the other hand, the snapshots from the small DFT box are each independently resolvated in the larger MM box, meaning that the arrangement of the MM waters around two snapshots are fully uncorrelated, even if the configurations of the frozen DFT core show some degree of correlation. It is therefore interesting to investigate how well the resolvated snapshots used in this work to compute the vertical excitation energies sample the configuration space of the solvated chromophores. Fig. 10 shows the convergence of the ensemble spectrum of vertical excitations computed for the GFP chromophore anion from AIMD and AI-PIMD trajectories with respect to the number of individual trajectories summed over. We note that in both cases the sum over three independent trajectories shows a very similar spectral width and shape as the sum over all 10 trajectories. The spectrum becomes smoother with the number of added trajectories, but the larger number of trajectories do not produce a systematic change in spectral width. Furthermore, spectral features such as the pronounced high energy tail in the AI-PIMD results are already present in the spectrum computed for three trajectories. These results can be taken as a good indication that the ten uncorrelated trajectories do not sample strongly different regions of the configuration space, and suggest that the resulting ensemble spectrum is well converged with respect to the AIMD and AI-PIMD sampling carried out. As further evidence that the ten independent 15 ps trajectories collected for each system are sufficient in sampling the relevant configuration space, we compare the AIMD ensemble spectrum with the ensemble spectrum obtained from a classical MM trajectory of 8 ns (for

26 3 Trajs 5 Trajs 8 Trajs 10 Trajs Strength (arb. units)

3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Energy (eV)

a) AIMD

3 Trajs 5 Trajs 8 Trajs 10 Trajs Strength (arb. units)

2.8 3 3.2 3.4 3.6 3.8 4 Energy (eV)

b) AI-PIMD

FIG. 10. Convergence of the ensemble spectrum of the GFP chromphore anion in water with respect to the number of uncorrelated trajectories averaged over. All spectra are scaled to have the same maximum identity, and a Gaussian broadening of σ = 0.021 eV is applied to all transitions

computational details regarding the ensemble spectrum, as well as the generation of the MM trajectory, see Ref.1). The results are reported in Fig. 11. As can be seen, both the width and shape of the ensemble specrum for the 8 ns MM trajectory, computed from 2000 evenly spaced uncorrelated snapshots, is in close agreement with the AIMD ensemble spectrum calculated from ten 15 ps trajectories. While the AIMD and MM configurations must necessarily differ due to the underlying errors in the MM force field, the results shown here can be seen as a strong indication that the long MM trajectory does not sample any regions of configuration space missed by the short AIMD trajectories.

27 MM AIMD Strength (arb. units)

2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Energy (eV)

FIG. 11. Ensemble spectrum of the GFP chromophore anion as obtained from all 10 independent AIMD trajectories, in comparison with the spectrum obtained from 2000 uncorrelated snapshots extracted from a single 8 ns classical MM trajectory. The underlying data for the classical MM ensemble spectrum is taken from1.

V. COMPUTING GENERALIZED MOMENTS OF A DISTRIBUTION

In order to quantify the distribution of vertical excitation energies and certain geometrical parameters, we compute the and standard deviation. In some cases, we also quantify the non-Gaussian features of the distribution; therefore, we also compute the skew γ1 and the excess γ2. For a set of N random variables {xi} with mean µ and standard deviation σ, these two dimensionless quantities are defined as

PN (x − µ)3 /N γ = i i (20) 1 σ3 PN (x − µ)4 /N γ = i i − 3 (21) 2 σ4

Both quantities give a measure of the deviation of a distribution from Gaussian: for a

Gaussian distribution, both γ1 and γ2 are zero. The skew γ1 measures the asymmetry of a spectrum and the excess kurtosis γ2 measures the heaviness of its tails. It is possible to combine the two measures of kurtosis and skew into a single test determin- ing the likelihood that a given data set was drawn from an underlying Gaussian distribution. Here, we use the D’Agostino-Pearson5 test, which provides a p-value that can be taken as

28 a measure of confidence of concluding that a given data set does not follow a Gaussian distribution. We note that both the skew and the excess kurtosis are relatively sensitive to statistical outliers. When quantifying the non-gaussian nature of vertical excitation energies associated

with the bright S1 transition in the system, this can lead to problems. The reason for this is that for a number of solvated snapshots, the S1 state can split into two or more states with

some fraction of S1 character due to a coupling to solvent electronic degrees of freedom. To

minimize the effect from these split S1 states, only states with an oscillator strength of 0.4 or higher are included in the computation of higher order moments of the distribution of

vertical excitation energies. This guarantees that only states with a significant amount of S1 character, that contribute strongly to the vertical absorption spectrum, have any influence on the computed the higher order moments.

VI. QUANTIFYING NON-GAUSSIAN FEATURES IN THE ENSEMBLE ABSORPTION SPECTRA

In this section, we provide a detailed analysis of the distribution of vertical excitation

energies belonging to the bright S1 state for the fully solvated trajectories, the trajectories with the solvent molecules stripped away and the vacuum trajectories for both the GFP and the PYP anion chromophore computed wiht AIMD and AI-PIMD. We report the mean, standard deviation, higher order moments, as well as the result of the D’Agostino-Pearson test for normality. The results can be found in Table II for the PYP chromophore and Table III for the GFP chromophore respectively. The detailed comparison of moments of the distribution of vertical excitation energies for solvated, stripped and vacuum spectra allows us to quantify the influence of indirect solvent effects in terms of changes to the structure of the chromophore and direct solvent effects due to polarization of the solvent envrironment and hydrogen bonding. Comparing the vacuum and the stripped results, we note that the influence the indirect solvent effects is a net red shift of the mean excitation energy for both chromopores, as well as the AIMD and AI-PIMD data sets. The direct solvent polarization effects on the other hand cause a blue shift in both chromophores, as well as a significant increase in the standard deviation. Analyzing the higher order moments and the p-value of the D’Agostino-Pearson test, we

29 E (eV) σ (eV) γ1 γ2 p Solvated (AIMD) 3.461 0.141 0.463 0.173 3.9×10−8 Solvated (AI-PIMD) 3.388 0.176 0.672 0.798 1.1×10−17

Stripped (AIMD) 3.333 0.086 -0.267 2.606 1.2×10−17 Stripped (AI-PIMD) 3.265 0.116 -0.311 0.556 2.4×10−6

Vacuum (AIMD) 3.378 0.070 -0.142 0.549 0.014 Vacuum (AI-PIMD) 3.320 0.101 0.156 1.123 4.2×10−5

TABLE II. Mean (E), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) as calculated for the vertical excitation energies obtained from the PIMD and AI-PIMD snapshots of the PYP chromophore. ‘Solvated’ corresponds to the standard solvated snapshots, ‘Stripped’ corresponds to the solvated snapshots where the solvent is stripped away and ‘Vacuum’ corresponds to trajectories of the isolated chromophore run in vacuum. All vertical excitations with an oscillator strength larger than 0.4 are included in the analysis. The mean and standard deviation are measured in eV, whereas the skew and excess kurtosis are unitless quantities. The last column contains the p-value of the D’Agostino-Pearson test for normality.

note that the distributions of excitation energies for the GFP chromophore in vacuum closely follow an underlying Gaussian statistic, while those of the PYP chromophore in vacuum are less Gaussian. This is potentially due to a stronger coupling of low frequency anharmonic modes such as dihedral angle twists coupling to the vertical excitation energy of the S1 state in the case of the PYP chromophore. A more detailed analysis of individual degrees of freedom of the chromophores is provided in the next section.

In general, accounting for NQEs through AI-PIMD leads to distributions of vertical exci- tation energies that are less Gaussian in nature, as assessed by the test for normality. This trend is reversed for the distributions of excitation energies with the solvent stripped away, where AIMD produces highly non-Gaussian distributions for both chromophores. The highly non-Gaussian nature of the stripped spectra as compared to the vacuum spectra suggests that the molecule in solution adopts a number of configurations that are very far from equi- librium in vacuum, thus yielding strongly non-Gaussian features once the solvent is stripped away.

30 E (eV) σ (eV) γ1 γ2 p Solvated (AIMD) 3.341 0.104 0.199 0.368 0.004 Solvated (AI-PIMD) 3.291 0.161 0.533 0.931 1.9×10−14

Stripped (AIMD) 3.277 0.072 -0.461 1.382 8.4×10−15 Stripped (AI-PIMD) 3.205 0.117 -0.176 0.145 0.041

Vacuum (AIMD) 3.310 0.065 -0.072 0.025 0.71 Vacuum (AI-PIMD) 3.242 0.100 -0.147 -0.062 0.28

TABLE III. Mean (E), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) as calculated for the vertical excitation energies obtained from the PIMD and AI-PIMD snapshots of the GFP chromophore. ‘Solvated’ corresponds to the standard solvated snapshots, ‘Stripped’ corresponds to the solvated snapshots where the solvent is stripped away and ‘Vacuum’ corresponds to trajectories of the isolated chromophore run in vacuum. All vertical excitations with an oscillator strength larger than 0.4 are included in the analysis. The mean and standard deviation are measured in eV, whereas the skew and excess kurtosis are unitless quantities. The last column contains the p-value of the D’Agostino-Pearson test for normality.

VII. STRUCTURAL ANALYSIS

In this section we perform further analysis of the structure of the systems during the AIMD and AI-PIMD trajectories. We attempt to correlate the vertical excitation energies of the PYP and GFP chromophores with the observed structural changes. To study the solute degrees of freedom, the direct influence of the solvent environment, and the solvent- induced changes to the solute geometry, we analyze three sets of configurations: snapshots from the fully solvated systems, snapshots from the fully solvated systems where the solvent has been stripped away, and snapshots from MD trajectories of the chromophores in vacuum.

A. Collective motion

We first study the collective motion of the chromophores and attempt to quantify differ- ences between their structure in vacuum and in solution, both in AIMD and AI-PIMD. Since both the PYP chromophore and the GFP chromophore are semi-flexible, we are interested in the dihedral angles measuring the alignment of the ring systems with the conjugated

31 d (°) σ (°) γ1 γ2

AIMD

Phenolate (Vacuum) 0.695 9.973 0.075 0.216 Imidazole (Vacuum) 0.113 9.654 -0.137 -0.285 Phenolate (Solvated) 1.571 12.413 0.112 0.078 Imidazole (Solvated) -0.654 9.825 -0.077 0.039

AI-PIMD

Phenolate (Vacuum) 0.342 9.598 -0.014 -0.092 Imidazole (Vacuum) -0.159 10.537 0.020 -0.207 Phenolate (Solvated) -0.398 13.188 -0.126 -0.150 Imidazole (Solvated) -1.541 10.497 -0.075 0.052

TABLE IV. Mean (d), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) of the dihedral angles defining the orientation of the phenolate and imidazole groups in the GFP chromophore with respect to the conjugated backbone. The analysis is performed on the trajectories of the GFP chromophore in vacuum and in solution.The mean and standard deviation are measured in degrees, whereas the skew and excess kurtosis are unitless quantities. backbone. For both chromophores we measure the twist angle around the bond C1-C2 con- necting the phenolate with the conjugated backbone. For the GFP chromophore we also consider rotations around the C-C bond connecting the imidazole to the backbone, and for the PYP chromophore we measure the dihedral angle around the C-S bond connecting the thiophenol to the backbone. The resulting mean, standard deviation, and higher order mo- ments calculated from the snapshots extracted from the solvated and vacuum trajectories can be found in Table IV for the GFP chromophore and Table V for the PYP chromophore. For the GFP chromophore, the mean twist angles for both the phenolate and the imidazole rings is close to zero in vacuum and solution, indicating that on average the molecule does not deviate from its planar structure. The skew and excess kurtosis are small, suggesting that the distribution of dihedral angles is close to Gaussian. The distribution of the twists of the imidazole is similar in vacuum and solvated trajectories, both for AIMD and AI-PIMD, as demonstrated by the similar values for the standard deviation. In contrast, the phenolate dihedral angle distribution shows a significant increase of the standard deviation in solution

32 d (°) σ (°) γ1 γ2

AIMD

Phenolate (Vacuum) 0.641 10.596 0.123 -0.215 Thiophenol (Vacuum) 1.646 78.337 0.159 -0.900 Phenolate (Solvated) 0.808 12.996 0.232 0.340 Thiophenol (Solvated) 63.246 72.817 -1.478 1.353

AI-PIMD

Phenolate (Vacuum) 0.057 11.874 -0.024 0.313 Thiophenol (Vacuum) -6.383 94.976 0.064 -1.147 Phenolate (Solvated) 0.010 14.201 0.014 -0.100 Thiophenol (Solvated) 36.225 96.040 -0.806 -0.907

TABLE V. Mean (d), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) of the dihedral angles defining the orientation of the phenolate and thiophenol groups in the PYP chromophore with respect to the conjugated backbone. The analysis is performed on the trajectories of the PYP chromophore in vacuum and in solution. The mean and standard deviation are measured in °, whereas the skew and excess kurtosis are unitless quantities. for both AIMD and AI-PIMD. This larger standard deviation in solution suggests that twists around the C-C bond connecting the phenolate to the backbone are significantly stabilized in solution. Furthermore, apart from the phenolate dihedral angle in vacuum, the AI-PIMD angle distributions show a slight increase of the standard deviations over the AIMD results, meaning that AI-PIMD on average yields more twisted conformations. For the PYP chromophore, the distribution of the dihedral angle connecting the phenolate to the backbone follows very similar trends as for the GFP chromophore. The distribution is relatively Gaussian, with a mean close to zero and small higher order moments. The standard deviation increases when going from vacuum to solvated trajectories and the AI- PIMD data set again tends to yield slightly more twisted conformations than the AIMD results. The dihedral angle associated with the thiophenol, however, shows significantly different behaviour. The dihedral angles are distributed with a very large standard deviation and show significant non-Gaussian behaviour both in vacuum and solution. However, there is a large increase in the standard deviation going from the AIMD trajectories to the AI-PIMD

33 trajectories. Thus the AI-PIMD trajectories of the PYP chromophore show an increased tendency towards strong twists around C-S bond connecting the thiophenol to the backbone. In summary, both the solvent environment and the quantum treatment of the nuclei induce significant differences in conformations of the PYP and the GFP anion chromophores. Non-planar arrangements of the phenolate with respect to the backbone are increased in the solvent environment and are also increased by quantum treatment of the nuclei. The twists of the imidazole in GFP and the thiophenol in PYP however are not strongly influenced by the solvent environment, but the AI-PIMD trajectories yield a large increase in twisted thiophenol conformations. Thus both the solvent environment and the NQEs contribute to an increase of twisted conformations of the two chromophores.

B. Nuclear quantum effects on specific bond lengths

We next focus on selected bond lengths of the chromophores as well as hydrogen bonds formed between the chromophores and the water molecules. The bond lengths we consider are the C-O bond in the phenolate and carbonyl and the C-C bond connecting the pheno- late to the conjugated backbone (corresponding to atoms labeled C1 and C2 in the main manuscript). For the GFP chromophore we also consider the C-N double bond in the imi- dazole (corresponding to atoms labeled C3 and N1). To quantify hydrogen bonds between the chromophore and the water we consider both the phenolate and carbonyl oxygen as H-bonding sites. For the GFP chromophore the imidazole nitrogen labeled N1 is also inves- tigated as an H-bonding site. Hydrogen bonds are defined through both a distance and a directional criterion. For us to consider a water molecule hydrogen bonded to an H-bonding site in the chromophore, we require the distance XsiteOwater < 3.5 A.˚ We furthermore require the angle |π − ∠XsiteHwaterOwater| < 20°. For the total set of bond lengths extracted from the 1000 solvated snapshots, we compute the mean, standard deviation, skew, and excess kurtosis. The results are reported in Table VI for the GFP chromophore and in Table VII for the PYP chromophore. For the H-bonding sites, the mean hydrogen bond length is significantly shorter for the phenolate oxygen in both the PYP and the GFP chromophore anions. The distribution of hydrogen bond lengths to the phenolate oxygen shows very similar characteristics, both between the AIMD and AI-PIMD data sets and also between the two different chromophores.

34 r (A)˚ σ (A)˚ γ1 γ2

AIMD

Phenolate H-bonds 1.982 0.432 1.90 3.969 Carbonyl H-bonds 2.377 0.583 0.683 -0.440 Imidazole H-bonds 2.284 0.539 1.032 0.249 C-O Phenolate 1.315 0.027 0.325 0.095 C-O Carbonyl 1.253 0.024 0.232 0.022 C-C bond 1.432 0.028 0.189 -0.058 C-N bond 1.324 0.024 0.101 0.040

AI-PIMD

Phenolate H-bonds 1.951 0.462 1.681 2.803 Carbonyl H-bonds 2.438 0.601 0.567 -0.593 Imidazole H-bonds 2.254 0.571 1.113 0.395 C-O Phenolate 1.320 0.045 0.067 -0.197 C-O Carbonyl 1.258 0.041 0.064 -0.076 C-C bond 1.437 0.047 0.115 0.154 C-N bond 1.326 0.042 0.039 0.125

TABLE VI. Mean (r), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) as calculated for specific bond lengths obtained from the PIMD and AI-PIMD snapshots of the GFP chromophore. The mean and standard deviation are measured in A,˚ whereas the skew and excess kurtosis are unitless quantities.

In all cases, the hydrogen bond length distribution shows a large positive skew and excess kurtosis, corresponding to a steep onset in the short bond-lengths and a tail in the longer bond-lengths, as well as a general tendency to form heavy tails. Focusing on the differences between AIMD and AI-PIMD, we find that the average hydrogen bond length to the GFP phenolate oxygen decreases slightly when going from AIMD to AI-PIMD, suggesting an increased tendency towards short hydrogen bonds. For the PYP chromophore, the mean hydrogen bond length shows a small increase, however the standard deviation increases by almost 20%, suggesting that there is still an increase in the amount of short hydrogen bonds when going from AIMD to AI-PIMD.

35 r (A)˚ σ (A)˚ γ1 γ2

AIMD

Phenolate H-bonds 1.971 0.411 1.819 3.749 Carbonyl H-bonds 2.393 0.517 0.631 -0.134 C-O Phenolate 1.312 0.029 0.208 -0.032 C-O Carbonyl 1.240 0.023 0.173 0.167 C-C bond 1.437 0.030 0.214 0.199

AI-PIMD

Phenolate H-bonds 2.009 0.507 1.565 2.132 Carbonyl H-bonds 2.404 0.535 0.542 -0.375 C-O Phenolate 1.321 0.044 0.083 0.158 C-O Carbonyl 1.246 0.042 0.063 0.002 C-C bond 1.441 0.049 0.291 0.165

TABLE VII. Mean (r), standard deviation (σ), skew (γ1), and excess kurtosis (γ2) as calculated for specific bond lengths obtained from the PIMD and AI-PIMD snapshots of the PYP chromophore. The mean and standard deviation are measured in A,˚ whereas the skew and excess kurtosis are unitless quantities.

Regarding the C-O and C-C bond lengths for both chromophores, we find that NQEs yield a strong increase in the standard deviations, while leaving the relatively unchanged. For example, the standard deviation for the C-C and the C-N bonds of the GFP chromophore increases by over 40% when going from AIMD to AI-PIMD trajectories. For the PYP chromophore, the C-C bond length standard deviation increases by 38% and the phenolate and carbonyl C-O bond length standard deviations increase by 34% and 45% respectively. Thus the inclusion of NQEs in the MD trajectories yields a significant softening of bonds in the two chromophores.

VIII. ANALYSIS OF SOLVENT EFFECTS

We next perform a more detailed analysis of the solvent induced spectral broadening and the hydrogen bonding between the chromophores and the solvent. Below in Fig. 12 we

36 AIMD AI-PIMD Strength (arb. units)

2.8 3 3.2 3.4 3.6 3.8 4 Energy (eV)

a) PYP

AIMD AI-PIMD Strength (arb. units)

2.8 3 3.2 3.4 3.6 3.8 Energy (eV)

b) GFP

FIG. 12. Comparison of the vertical absorption spectra computed in the ensemble approach for the PYP and GFP chromophore anions for the vacuum snapshots and for the solvated snapshots with the solvent stripped away. Dotted lines correspond to spectra computed with the vacuum snapshots and solid lines correspond to spectra computed with the stripped solvent snapshots. show the absorption spectra computed in the ensemble approach for snapshots from vacuum and solvated trajectories, but the solvent has been removed when calculating the vertical excitation energies for snapshots from the solvated trajectories. The spectra thus measure how the solvent indirectly affects the spectrum by changing the configurations sampled by the chromophores. As can be seen, the indirect effect of the solvent environment on the vertical excitation energies is a small redshift of the spectra for both chromophores. Furthermore, this indirect solvent effect also yields an increase in the width of the spectra, and a tendency towards more pronounced tails in the low energy part of the spectrum, which is especially visible for the PYP chromophore anion. We attempt to correlate the increased solvent-induced broadening with structural pa-

37 rameters of the solvent environment, in order to assess the influence of direct solute-solvent interactions on vertical excitation energies. Specifically, we focus on the main hydrogen bonding site of the phenolate oxygen because NQEs were shown in the previous section to yield a broader distribution of hydrogen bond lengths to that site.

Figure 13 shows the distribution of δ-coordinate, also referred to as the proton sharing coordinate, values corresponding to the shortest hydrogen bond made to the phenolate oxygen for both chromophores. The delta coordinate is defined as δ = dDH - dAH, where dDH is the distance from the donor atom of the hydrogen bond to the shared hydrogen atom and dAH is the distance from the shared hydrogen atom to the hydrogen bond acceptor atom. δ = 0 for a perfectly shared hydrogen atom. The main effect of adding NQEs is to broaden the distribution of δ values corresponding to the hydrogen bond made to the phenolate. Specifically, we observe increased density closer to δ = 0, indicating that the hydrogen is more highly shared upon inclusion of NQEs.

In 13, we correlate the δ-coordinate with the excitation energy for both solvated chro- mophores.These plots illustrate that while the δ-coordinate distribution broadens apprecia- bly, its correlation with the excitation energy is weak relative to that obtained using the C-C and C-N bond distributions of the solute (see main manuscript). Because the correlation is very weak, we conclude that hydrogen bonding to the phenolate oxygen plays a negligible role in determining the solvent-induced shift in excitation energy.

It can be concluded that unlike in the spectral broadening induced by the solute degrees of freedom, where a strong correlation with specific bond oscillations can be identified, the solvent-induced broadening is likely a collective effect of the entire solvation shell. Thus, although NQEs yield increased solvent-induced broadening, it is difficult to link this broad- ening to a specific change in the solvent environment. From the analysis of the hydrogen bonding at the phenolate oxygen, it is clear the inclusion of NQEs through AI-PIMD yields on average slightly stronger hydrogen bonds. NQEs may lead to a collective strengthening of solute-solvent interactions through shortened hydrogen bonds in the first solvation shell, yielding the observed increase in solvent-induced broadening.

38 PYP chromophore

3.8 AI-PIMD AIMD 3.7

3.6

3.5

3.4

3.3 Excitation energy [eV]

3.2

3.1

3.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 -coordinate [Angstrom]

a) PYP

GFP chromophore 3.8 AI-PIMD AIMD

3.6

3.4

3.2 Excitation energy [eV]

3.0

2.8 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 -coordinate [Angstrom]

b) GFP

FIG. 13. Correlation of the δ-coordinate corresponding to the shortest hydrogen bond made to the phenolate oxygen with the computed excitation energy.

IX. FRANCK-CONDON SHAPE FUNCTIONS

In this work, an average vibronic shape function is constructed from five individual Franck-Condon spectra generated from uncorrelated solute-solvent configurations. To guar- antee fully uncorrelated conformations, each Franck-Condon spectrum is computed from a snapshot taken from a different 15 ps trajectory. All water molecules with a center of mass within 3 A˚ of any solute atom are included in the QM region of the calculation (corresponding to approximately 15-20 water molecules in the QM region for each snapshot, see Table I) and the long-range solvent polarization is included using a polarizable continuum model (PCM). The ground and excited state geometries of the solutes are optimized while keeping the solvent molecules fully frozen and a

39 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Averaged shape function Strength (arb. units)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Energy (eV)

a) AIMD

Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Averaged shape function Strength (arb. units)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Energy (eV)

b) AI-PIMD

FIG. 14. Vibronic shape functions for the PYP chromophore anion for five different solute-solvent configurations selected from the AIMD and AI-PIMD trajectories, and comparison with the average vibronic shape function. All spectra are shifted such that the most intense vibronic peak is at 0 eV. A Gaussian broadening of σ = 0.0105 eV is applied to all vibronic excitations. non-equilibrium solvation model is used for the PCM in the excited state calculations. Thus, the calculations include the influence of the explicit solvent environment on the vibrational frequencies and normal modes of the solute, but no contributions from solvent vibrational modes are included in the Franck-Condon spectra. Vibronic spectra are averaged by shifting their 0-0 transitions to the same energy. A plot of the five individual Franck-Condon spectra, as well as the average spectrum, for both AIMD and AI-PIMD snapshots can be found in Fig. 14 for the PYP chromphore and in Fig. 15 for the GFP chromophore. For all systems, there is some variability between Franck-Condon spectra generated from different solute-solvent snapshots. For the PYP chromophore anion AIMD spectra this vari- ability is relatively minor and mainly involves the intensity of the first vibronic peak. For

40 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Averaged shape function Strength (arb. units)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Energy (eV)

a) AIMD

Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Averaged shape function Strength (arb. units)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Energy (eV)

b) AI-PIMD

FIG. 15. Vibronic shape functions for the GFP chromophore anion for five different solute-solvent configurations selected from the AIMD and AI-PIMD trajectories and the average vibronic shape function. All spectra are shifted such that the most intense vibronic peak is at 0 eV. A Gaussian broadening of σ = 0.0105 eV is applied to all vibronic excitations. the AI-PIMD trajectories, frame 1 shows a strong increase in the intensity of the first vi- bronic peak compared to the other frames, but the overall shape of the spectrum undergoes relatively minor changes. The similar spectral shapes suggest that the approximation of using a single average shape function to represent the vibronic fine structure is likely valid in the case of the PYP chromophore in water. For the GFP chromophore anion, however, the differences between individual Franck-Condon spectra are significantly larger. For ex- ample, for frame 1 of the AI-PIMD simulations in Fig. 15b the Franck-Condon spectrum is dominated by the first vibronic peak, whereas for frame 2, a large amount of the intensity of that peak is redistributed to higher energies. Frame 3 of the AIMD simulations in Fig. 15a shows similar behavior, with a considerable amount of spectral weight shifted to high

41 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Average

PYP (AIMD) 0.085 (0.042) 0.091 (0.044) 0.113 (0.043) 0.119 (0.043) 0.062 (0.035) 0.094 (0.052) PYP (AI-PIMD) 0.132 (0.059) 0.076 (0.056) 0.117 (0.063) 0.086 (0.047) 0.112 (0.050) 0.105 (0.059) GFP (AIMD) 0.050 (0.023) 0.121 (0.066) 0.075 (0.059) 0.061 (0.039) 0.070 (0.057) 0.075 (0.057) GFP (AI-PIMD) 0.031 (0.016) 0.115 (0.110) 0.105 (0.087) 0.060 (0.027) 0.069 (0.056) 0.076 (0.075)

TABLE VIII. Mean distances between atomic positions of the optimized ground state and excited state chromophore coordinates. The values in brackets correspond to the standard deviations. Both mean and standard deviation are given in A.˚ The last column is the average of the five frames. energy vibronic transitions at ≈0.4 and 0.6 eV.

To understand the origins of the observed changes in intensity of the first vibronic peak in the case of the PYP chromophore and the strong changes in spectral shape for some snapshots in the GFP chromophore, we analyze the differences in the optimized geometries for the ground and the excited state in the frozen solvent pocket. Specifically, we com- pute the mean distances between the atoms of the ground state optimized structure and the excited state optimized structure, as well as the standard deviation. The results can be found in Table VIII. As can be seen, the frames that produce the highest intensities of the first vibronic peak tend to have a low mean displacement of atoms between the op- timized structures. This can be straightforwardly understood by considering that smaller displacements lead to larger overlaps between the vibrational ground state wave functions of the ground and excited state potential energy surface and thus a larger intensity for the 0-0 transition. Conversely, the frames with comparatively large displacements between the ground and excited state optimized geometries tend to correspond to lower intensities of the first vibronic peak. We also note that frame 2 extracted from the GFP chromophore AI-PIMD trajectory shows not only a large mean displacement between the initial and final state but also a large standard deviation, suggesting that some atoms are displaced relatively large distances in transitioning from the optimized ground state geometry to the optimized excited state geometry. This yields a very low overlap for the ground state vibrational wave functions, a weak 0-0 transition, and a significant amount of spectral weight being shifted to higher energy transitions, causing the strong change in spectral shape observed in Fig.

42 a) Frame 1 RGS b) Frame 1 REx

c) Frame 4 RGS d) Frame 4 REx

FIG. 16. Electron-hole density plots of the bright S1 state for two selected frames of the AI-PIMD trajectory of the PYP chromophore anion that are used for computing the ZTFC shape functions. For each frame, the electron-hole density is computed at both the optimized geometry for the electronic ground state (RGS) and the optimized geometry for the S1 state (REx).

15b. The results presented here show that for some explicit frozen solvent environments it is favorable for the dye to have an optimized excited state geometry that resembles the ground state geometry, whereas for other explicit solvent environments, the chromophore undergoes significant geometry changes upon excitation. To further examine the origin of the large displacement between optimized ground and excited state geometries in some frozen solvent environments, we plot the electron-hole density of the bright S1 state of interest for selected snapshots used in the ZTFC calculations. For each snapshot, the electron-hole density is calculated both at the optimized geometry of the chromophore in its electronic ground state and the optimized geometry in the S1 state. The results can be found in Fig. 16 for the PYP chromophore anion and Fig. 17 for the

43 a) Frame 1 RGS b) Frame 1 REx

c) Frame 2 RGS d) Frame 2 REx

FIG. 17. Electron-hole density plots of the bright S1 state for two selected frames of the AI-PIMD trajectory of the GFP chromophore anion that are used for computing the ZTFC shape functions. For each frame, the electron-hole density is computed at both the optimized geometry for the electronic ground state (RGS) and the optimized geometry for the S1 state (REx).

GFP chromophore anion. In both cases, we focus on the AI-PIMD trajectories only. As can be seen, in the snapshots considered the electron-hole density does not undergo any major changes between the optimized ground state and excited state geometry, meaning that the excited state retains its character. As expected, the electron-hole density is mainly confined to the conjugated backbone in both chromophores, but also has some significant contribution on the phenolate oxygen. Even though the character of the electronic excita- tion is relatively unchanged between all snapshots, the same is not true for the observed solute-solvent interactions. Specifically, in Fig. 16, it can be seen that Frame 1 of the PYP chromophore anion corresponds to a situation where in the optimized ground state struc- ture, two hydrogen bonds are formed between the phenolate oxygen and two nearby water

44 molecules. In the optimized excited state geometry, one of these hydrogen bonds is broken. Similarly, for frame 2 of the GFP chromophore anion (see Fig. 17), one hydrogen bond is formed between a water molecule and the phenolate oxygen in the ground state optimized structure and that hydrogen bond gets broken in the excited state optimized structure. The other frames for both the GFP chromophore anion and the PYP chromophore anion corre- spond to situations where there is no explicit hydrogen bonding observed between the water and the chromophore, neither in the optimized ground state nor the excited state. Note that the frames that show breaking of a hydrogen bond between the ground and excited state optimized geometry correspond to frames that show a large relative displacement between the two structures and a correspondingly large redistribution of spectral weight from the first vibronic peak to higher energy states. The reason for the breaking of the hydrogen bonds can be found in a shortening of the C-O bond of the phenolate oxygen in the excited state geometry. This shortening is found to be much larger (0.021 A˚ versus 0.012 A˚ for the pCT chromophore anion snapshots and 0.018 A˚ versus 0.004 A˚ for the GFP chromophore anion snapshots) in the snapshots where a hydrogen bond is broken than in the snapshots where no hydrogen bond breaking occurs. This effect is likely caused by the excited state density having a reduced negative charge on the phenolate oxygen, leading to a decreased favorabilty of a hydrogen bond between a nearby water and the oxygen for the excited state optimized geometry as compared to the ground state optimized geometry. Thus it can be concluded that the hydrogen bonding to the phenolate oxygen plays an important role in determining the shape of a ZTFC spectrum for a given snapshot in both systems. For snapshots where the water molecules are not in a strong hydrogen bonding arrangement, no significant structural reorganization occurs between the ground and excited state optimized geometry. However, for situations where a strong hydrogen bond is formed in the ground state optimized structure, this bond is less favorable in the excited state, yielding a more significant structural change between the ground and excited state and a breaking of the hydrogen bond. There are a variety of challenges associated with computing Franck-Condon spectra within an explicit solvent environment that are not present with implicit solvent. One challenge is the inclusion of enough explicit solvent in the QM region to properly account for long-range solvent effects. A second challenge is the choice of whether or not to include the strongly coupled solvent molecules (such as those hydrogen bonding to the solute) in

45 the geometry optimization of the chromophore, which may allow them to reform a hydro- gen bond in the excited state of the chromophore and decrease the high-energy tail of the Franck-Condon spectrum. A third challenge is the incorporation of anharmonicity in shape of the potential, which may play a large role for soft vibrational modes such as those of a hydrogen bond. Exploration of these challenges and the errors in the vibronic shape func- tion that accompany each of them represent interesting chemical questions, but each would require significant computational expense to pursue. In our study, our average vibronic shape function likely possesses errors from each of these challenges. However, the use of an identical vibronic function for each vertical excitation energy will also incur errors within the E-ZTFC approach, and the error due to an average shape function may be larger than the errors from the choice of computational protocol for each of the challenges discussed here.

We note that, even though there is a strong variability between ZTFC spectra for dif- ferent frozen solvent environments, the average shape function constructed for the GFP chromophore is nearly identical for the AIMD and the AI-PIMD trajectories. For the PYP chromophore, on the other hand, the average shape function constructed from the AI-PIMD trajectories shifts spectral weight from the first vibronic peak to higher energies. We note that the average atomic displacements (see last column of Table VIII), is almost identical for the optimized geometries obtained from the AIMD and AI-PIMD snapshots of the GFP chromophore, whereas for the PYP chromophore the average displacement is about 10% higher in the AI-PIMD data set compared to the AIMD data set. This increase for the PYP chromophore indicates that, for the small sample of shape functions considered in this work, the frozen solvent environment obtained from AI-PIMD trajectories encourages larger geometry changes between the ground and the excited state optimized structures than the AIMD frozen solvent environment, leading to a stronger shift of spectral weight from the 0-0 transition to higher energy peaks. However, given the importance of the hydrogen bonding between the water and the phenolate oxygen in determining the size of the structural changes between the ground- and excited state optimized geometry, and given that the AI-PIMD trajectories for both chromophores tend to produce stronger hydrogen bonds on average as compared to AIMD trajectores (see Table VI and VII), one could expect to observe dif- ferences between the AIMD and AI-PIMD shape functions for both chromophores. Here, we note that the average ZTFC shape functions used to construct vibronically broadened

46 spectra are obtained from only 5 snapshots in this work, so we are unable to state definitely whether the observed trends will hold for larger sample sizes, or whether for a large enough number of snapshots, similar discrepancies between the AIMD and AI-PIMD ZTFC shape functions would also be observed for the GFP chromophore anion.

X. ABSORPTION SPECTRA FOR THE ENSEMBLE AND E-ZTFC APPROACHES

In this section, we provide a comparison of all absorption spectra of the two solvated dyes studied in this work. For both the PYP chromophore and the GFP chromophore in water, the spectra computed in the ensemble and the E-ZTFC approach, both using the AIMD and the AI-PIMD trajcetories, can be found in Fig. 18. In the AIMD spectra computed with the ensemble approach, all NQEs are ignored, both in the generation of representative solute-solvent structures and in the calculation of the optical excitations. The AI-PIMD ensemble spectra account for NQEs in the sampling of ground state configurations, but ignore any coupling of an electronic excitation to vibrational modes. The AIMD E-ZTFC spectrum accounts for the vibronic fine structure of a given optical transition, but does not include NQEs in the sampling of ground state solute-solvent configuration. Only the AI-PIMD E-ZTFC spectrum accounts for the quantum nature of the nuclei both in the generation of representative solute-solvent configurations and in the coupling of the electronic exctation to nuclear motion. As can be seen, the vibronic fine structure accounted for through the E-ZTFC approach is vital in producing the correct asymmetry found in the experimental absorption spec- trum, with a long, smooth tail in the high energies. However, accounting for NQEs in the solute-solvent conformation through the AI-PIMD trajectory yields an additional broaden- ing, producing spectra that recover most of the width of the experimental spectra.

REFERENCES

1T. J. Zuehlsdorff and C. M. Isborn, “Combining the ensemble and franck-condon approaches for calculating spectral shapes of molecules in solution,” The Journal of Chemical Physics 148, 024110 (2018).

47 AIMD AI-PIMD Experiment Strength (arb. units)

2.6 2.8 3 3.2 3.4 3.6 3.8 4 Energy (eV)

a) PYP

AIMD AI-PIMD Experiment Strength (arb. units)

2.6 2.8 3 3.2 3.4 3.6 Energy (eV)

b) GFP

FIG. 18. Absorption spectra of the PYP and the GFP chromophore anion in water as computed with the E-ZTFC approach (solid lines) and the ensemble approach (dotted lines), using both the AIMD and the AI-PIMD trajectories. Both computed spectra are shifted and scaled to the height of the experimental peak6,7.

2J. Cerenzo, G. Mazzeo, G. Longhi, S. Abbate, and F. Santoro, “Quantum-Classical Calcu- lation of Vibronic Spectra along a Reaction Path: The Case of the ECD of Easily Intercon- vertible Conformers with Opposite Chiral Responses,” J. Phys. Chem. Lett. 7, 4891–4897 (2016). 3J. Cerenzo, D. Aranda, F. J. A. Ferrer, G. Prampolini, G. Mazzeo, G. Longhi, S. Abbate, and F. Santoro, “Toward a general mixed quantum/classical method for the calculation of the vibronic ECD of a flexible dye molecule with different stable conformers: Revisiting the case of 2,2,2trifluoroanthrylethanol,” Chirality 30, 730–743 (2018). 4F. J. Avila Ferrer, M. D. Davari, D. Morozov, G. Groenhof, and F. Santoro, “The lineshape of the electronic spectrum of the green fluorescent protein chromophore, part ii: Solution

48 phase,” ChemPhysChem 15, 3246–3257 (2014). 5R. B. D’Agostino, A. Berlanger, and R. B. D. Jr., “A Suggestion for Using Powerful and Informative Tests of Normality,” Am. Stat. 44, 316–321 (1990). 6I. Nielsen, S. Boy´e-P´eronne,M. E. Ghazaly, M. Kristensen, S. B. ndsted Nielsen, and L. Andersen, “Absorption spectra of photoactive yellow protein chromophores in vacuum,” Biophysical Journal 89, 2597 – 2604 (2005). 7S. B. Nielsen, A. Lapierre, J. U. Andersen, U. V. Pedersen, S. Tomita, and L. H. Andersen, “Absorption Spectrum of the Green Fluorescent Protein Chromophore Anion In Vacuo,” Phys. Rev. Lett. 87, 228102 (2001).

49