A simple variational quantum Monte Carlo-effective mass approach for excitons and trions in quantum dots

A computational model is presented to calculate the ground state energy of neutral and charged excitons confined in semiconductor quantum dots. The model is based on the variational Quantum Monte Carlo method and effective mass Hamiltonians. Through an iterative Newton-Rhapson process, minimizing the local energy, and (optional) parallelization of random walkers, fast and accurate estimates of both confinement and Coulomb binding energies can be obtained in standard desktop computers. To illustrate the reach of the model, we provide Fortran programs and illustrative calculations for colloidal CdSe nanoplatelets with large lateral dimensions and dielectric confinement, where electronic correlations are strong. The results compare well with exact variational calculations and largely outperform configuration interaction calculations in computational efficiency.

a cuboidal semiconductor nanoplatelet. Hard wall quantum confinement coexists with dielectric confinement (self-energy and Coulomb polarization terms).
Solution method: Variational quantum Monte Carlo with effective mass Hamiltonians, integrated into a Newton-Rhapson solver. OMP parallelization library can be (optionally) linked.

Introduction
Coulomb interactions within excitonic species in semiconductor quantum dots (QDs) determine optoelectronic properties of interest, such as the emission wavelength and thermal stability (both related to the binding energy), or the radiative recombination rate (related to exciton Bohr radius). [1,2,3] Early studies in self-assembled QDs and colloidal nanocrystals showed that perturbative estimates of Coulomb interaction provide a good approximation of emission energies (about 95%), but they miss electronic correlations, which are important in determining binding energies and effective Bohr radius, especially for particles beyond neutral excitons. [4] Recently developed colloidal nanoplatelets (NPLs) [5,6] are even more demanding. These nanostructures are in an intermediate confinement regime between that of QDs and that of quantum wells [7], and present strong dielectric confinement which enhances Coulomb interactions. In these structures, large exciton binding energies (around 200 meV) are observed, with non-perturbative terms representing over half that value. [8] An appropriate description of excitonic interactions becomes essential to study the photo-physics.
In order to account for electron-hole and electron-electron correlation in QDs, configuration interaction (CI) methods are arguably the most widely employed models to date. [1,2,3,4,9,10,11,12] These methods are generally built upon a basis set of independent particle or Hartree-Fock states, which has been found to provide an excellent description of repulsions in few-and many-fermion systems. [1,10,12] The description of strong attractions is however more demanding, because the afore mentioned basis functions are less suited to account for short-range interactions. [13] For the same reason, in charged excitons (trions) and multiexcitons they may not describe attractions and repulsions with comparable accuracy. As a result, CI calculations aiming at precise binding energies often require large basis sets, [4,9] which make the method computationally demanding.
Quantum Monte Carlo (QMC) methods are an alternative to CI techniques to calculate correlations with arbitrary accuracy. [14,15] Unlike CI models, QMC ones provide a description of the lowest state of each symmetry only, but they have the advantage of showing no pathological behavior for short-range interactions. Studies of excitonic complexes in QDs relying on QMC calculations have been reported. [4,16,17,18,19,20] Unfortunately, these methods are computationally expensive, which has prevented widespread use to date. Continuum QMC methods, which encompass variational and diffusion QMC, [15] have been implemented in high-performance software packages such as CASINO [21] or QMCpack [22]. However, these programs were originally developed for molecular and periodic crystal systems, and then take advantage of Jacobi coordinates in the Hamiltonian and wave function. This is well suited for atomistic descriptions of QDs. [23] The description of QDs with simple and intuitive effective mass Hamiltonians, which have proved extremely useful to understand many electronic properties of QDs, [1,2,3] is however handicapped by the usual presence of potentials forbidding the use of relative coordinates. A few examples are non-parabolic confinement, strain and non-local interaction with self-image charges. [2,24] In this work we present a QMC model for the study of excitons and trions in QDs within the effective mass formalism. The goal is to provide a flexible theoretical framework, capable of describing structures all the way from strongly confined to weakly confined regimes, with computational requirements affordable by ordinary desktop computers. To this end, we use the simple yet elegant variational Quantum Monte Carlo (VQMC) method, which offers a ground state energy solely limited by the choice of an appropiate trial wave function. [15] The latter is defined within envelope function approximation in effective mass theory. [25] The model is integrated into a Newton-Rhapson solver to optimize the variational parameters in few steps. A suit of Fortran codes developed for the specific case of colloidal CdSe NPLs, which constitute a particularly demanding system owing to the strong electronic correlations, is provided. Illustrative calculations show that the method largely outperforms CI calculations in terms of accuracy and computational efficiency, for both neutral excitons and trions.
where e and h stand for electron and hole, m i is the effective mass of carrier i,p i the momentum operator, V i the single-particle potential, V c (r e , r h ) the Coulomb interaction between electron and hole, and E gap the bulk energy gap. The trial wave function we use for this system is: where Φ e and Φ h are the envelope functions of non-interacting electron and hole, which vary smoothly within the QD dimensions, L, and σ e(h) is the associated spin function. Φ e and Φ h should preferrably be analytical functions, to allow rapid and exact evaluation of energies, gradients and Hessians, which we shall use next. This is often possible in QDs under the envelope function formalism. [1,3,25] J(r eh ) is a correlating Jastrow factor, which depends explicitly on the separation between electron and hole, r eh . We propose to use a short range Slater function: where a = α/r X B is a variational coefficient, r X B the exciton Bohr radius and α the actual variational parameter. The choice of this factor is driven by the fact that it captures the correct limits of weak and strong Coulomb interaction with a single variational parameter. When interactions are secondary to confinement (L ≪ r X B ), J(r eh ) → 1 and the independent particle scheme is retrieved. When interactions are dominant (L ≫ r X B ), Ψ X → J(r eh ), which ensures a hydrogen atom-like function is retrieved, where carriers are bound to each other through Coulomb attraction. In the latter case, J(r eh ) permits fulfilling the Kato cusp condition, [26] whereby kinetic energy compensates Coulomb potential at r eh → 0, preventing energy divergence. It has been shown in the 2D harmonium problem that a wave function like that in Eq. (2) gives energies close to the exact solution for a broad range of confinement strengths. [27]

Trion
We consider a positively charged trion (one electron, two holes), albeit the procedure is analogous for the negatively charged one. The Hamiltonian is:Ĥ We use a Slater-Jastrow trial wave function for the singlet ground state: and choose the following Jastrow factor: J(r 1 , r 2 , r 12 ) = e −Zr 1 e −Zr 2 e b r 12 /(1+a r 12 ) .
where r 1(2) = |r e − r h1(2) | and r 12 = |r h1 − r h2 |. The first two terms are short range cusp forms describing the correlation of each hole with the electron, as in the exciton case. Z = ζ/r X B is a variational coefficient, with ζ the parameter to be varied. The last term is a Padé Jastrow factor, which has the property of giving the desired limits with r 12 . At short ranges of interaction, r 12 → 0, the term becomes e b r 12 , which provides a cusp to compensate for the divergence in hole-hole Coulomb repulsion (b > 0). At the same time, the probability to find distant holes (r 12 → ∞) is more likely than that of proximal holes (r 12 → 0) by a factor (e b/a ) 2 . We define b = β/r X B and a = α/r X B , and let β and α be the variational parameters. More sophisticated trial functions have been suggested for trions in QDs or wells (see e.g. Ref. [28]), but the present proposal has the advantage of keeping the smallest number of parameters that captures the correct limit behavior, while being physically consistent with that of neutral excitons, Eq. (2).

Variational Monte Carlo
Within the VQMC framework, [14,15] the variational energy corresponding to a HamiltonianĤ (in our caseĤ X orĤ X+ ) is: where Ψ(R) is the trial function, E L is the local energy: 5 and p(R) the probability distribution: The Metropolis algorithm is used for importance sampling. Thus, particles are moved to different trial positions in a random walk. The new positions are accepted if the following condition is satisfied: where w is a random number homogeneously distributed between 0 and 1. For a sufficiently long calculation, this random walk allows to simplify Eq. (7) as: where N is the number of (accepted) points R i taken in the walk. Alternatively, an equivalent form can be used in the estimator, which considers not only accepted steps, but also rejected ones: with N here being the total number of attempted moves (whether accepted or not). Eqs. (11) and (12) have the same average, but the latter reduces the fluctuations caused by the acceptance of unlikely configurations and tends to have smaller variance. [14] As a result, for all the cases under study in this work, Eq. (12) is found to be at least as accurate as Eq. (11) in obtaining the correct energy E with the same number of points N, and often more so. We shall then opt for it. When evaluating the local energy, E L , it is convenient to rewrite the kinetic energy term of a particle i, as 6 where: and: Both forms of the kinetic energy term give the same average values, but latter has the advantage that the derivation of logarithms provides simpler mathematical expressions when the wave function is composed by products of functions, as in Ψ X and Ψ X + . It can be further shown that (14) gives significantly smaller variance. [14]

Random walk in confined structures
A brief discussion on the random walk nature is in order. In the Metropolis algorithm, trial steps in the walk are taken as a change in position R new = R old + r, where r is a random vector uniformly distributed in a cube of predefined side r 0 centered at R old . Random walks are usually applied to particles one by one. [14] Hamiltonians (1) and (4) are written in absolute coordinates, because the single-particle potentials V i generally prevent the use of Jacobi coordinates. On the other hand, excitonic complexes in QDs feel an attractive Coulomb interaction V c (r e , r h ), which shows a sharp peak as electron and hole approach coalescence (r eh → 0). This short-range potential can have a profound impact on the wave function, especially if confinement is weak. In order to obtain reliable importance sampling, we observe that local energy and other magnitudes of interest can be evaluated with absolute, Cartesian coordinates, but the random walk must be applied on the corresponding relative coordinates. Once the new position is set, we revert to absolute coordinates again. For the neutral exciton, relative coordinates can be center-of-mass and relative electron-hole motion. For the positive trion, optimal sampling is obtained using absolute coordinates for the electron, and relative coordinates of the two holes with respect to the electron. By doing so, the relative motion provides the Metropolis algorithm sensibility to the Coulomb singularity near r eh = 0, while center-of-mass (or absolute) motion provides sensibility to the confinement of the QD walls. Because of the different lengthscale of the two potential terms, it is also recommended that r rel 0 < r CM 0 , i.e. the cube where r rel is inscribed has smaller size than that of r CM , Optimizing the maximum step size (r 0 ) in the random walk is also convenient. If it is too large, only a small fraction of trial points is accepted, and the sampling is inefficient. If it is too small, a large number of trial points is accepted, but it takes too long to sample over the whole interval of interest. Ideally, we want about half of the steps to be accepted. [14] Then, at the beginning of the random walk, a thermalization process should be included. An arbitrary r 0 is chosen, and a random walk is taken to estimate the ratio of accepted points (with no need to calculate local energies). Once the walk is over, we count assets and failures and redefine the step size to fit the desired factor 1/2 of successful trial steps.

Newton-Rhapson solver
Having defined the Hamiltonians, trial wave functions and local energy estimator, we can evaluate the average energy for a given set of variational parameters, M = (α) in the case of the exciton, or M = (ζ, β, α) in the case of the trion. In order to find the parameters that minimize E(M) , we resort to the iterative Newton-Rhapson method. We look for the zero of the energy gradient, |g(M) . Given a set of parameters M i , the next set of parameters is given by: where H is the energy Hessian matrix. After labelling Ψ ′ M i the logarithmic derivative of Ψ with respect to the variational parameter M i , the components of |g and H can be written as: [29] and It is worth noting that for i = j the last term, Ψ ′ M j ∂E L ∂M i , is not symmetric when approximated by a finite sample, whereas the true Hessian should be. 8 We make it symmetric by replacing the term as 2 Ψ ′ [30] Additional considerations are in order to reduce the variance in the stochastic calcualtion. For real wave functions, the expectation value of the first derivative of the local value of any Hermitian operator with respect to a real parameter is zero. [29] Thus, ∂E L ∂M i = 0. It is then convenient to rewrite the non-symmetric terms in covariance form, i.e., a b − a b instead of a b , for the fluctuations of a b − a b are smaller than those of a b . [30] Therefore, we use: One can also note that the second derivatives of logarithmic functions vanish in some cases, which simplifies the results of Eq. (19). From Eq. (2), Eq. (17) is recursively applied until convergence. Convergence can be set by a threshold either in the energy ( E(M i+1 ) − E(M i ) ) or in the distance between variational parameters (|M i+1 − M i |).

Computational considerations
The whole calculation method we have described can be implemented computationally following the flowchart of Figure 1. Input data is first provided with all relevant Hamiltonian parameters (effective masses, QD dimensions, band offsets and dielectric constants definint potential terms) along with an initial guess of variational parameters (M 0 ).
Random walkers are then generated to evaluate the magnitudes of interest. In order to avoid a possible bias induced by the random initial position of a walk, it is customary to evaluate not one but several walkers. The mean value of the computed magnitude (e.g. E ) is then taken. From a computational point of view, it is both simple and efficient to distribute different walkers among different CPU processes. Rather than resorting to MPI parallelization, which is best suited for high-performance computing centers with multiple nodes/servers/computers instances, one can use OpenMP parallelization, which is easily applied to current standard computers.
For every trial point R, the walkers compute local magnitudes: density probability, energy, gradient vector and Hessian matrix. If the point Because local magnitudes are calculated on-the-fly and only accumulated values are stored, memory requirements in the VQMC calculations we propose are minimal. Together with the conceptual simplicity and the ease of parallelization, this is an additional asset for the model to be efficient in ordinary computers.

Case study: semiconductor nanoplatelets
Semiconductor NPLs are a particularly challenging system to calculate excitonic interactions. These nanostructures typically show rectangular cuboid shape, with only a few atomic monolayers thickness (1 − 2 nm) in one direction, and tens of nm in the other two. [5,6] They constitute intermediate structures betweeen zero-dimensional QDs and two-dimensional quantum wells. [7] Besides, Coulomb interactions are strongly enhanced by dielectric mismatch with the surrounding organic ligands. [33] As a result, correlations play a significant role and quantitative estimates of exciton properties are beyond perturbational and standard CI calculations. [27] For neutral excitons in CdSe NPLs, an accurate description of the ground state has been obtained through a full variational calculation of Ψ X , with a number of approximations to reduce the six-dimensional Coulomb integrals to two-dimensional ones. [8] For the trion, however, the larger number of variational parameters and particle coordinates renders this approach computationally impractical. In this section, we show how the VQMC model presented above enables such a study with modest computational requirements.

Neutral exciton
Following Ref. [8], excitons in CdSe NPLs can be described by giving H X the specific form: where m ,i is the in-plane mass of carrier i and m z,i that along the strongly confined direction ([001]). The single particle potential is written as is the confining potential set by the band offset between the NPL core and its surrounding material. We take V pot i = 0 inside the NPL and V pot i = ∞ outside. V self i is the self-energy potential resulting from the interaction of carriers with their image charges in the dielectric medium, which we model as in quantum wells.: [24] V self i = n=±1,±2,...
with q e being the electron charge, q n = ((ǫ in − ǫ out )/(ǫ in + ǫ out )) |n| , ǫ in (ǫ out ) the dielectric constants inside (outside) the NPL and L z the thickness of the NPL. For the electron-hole Coulomb interaction, we also take into account the influence of polarization due to the dielectric mismatch: [24] V c =  The trial wave function is of the form: where r ,eh = (r ,e − r ,h ) 2 is the in-plane distance between electron and hole. Φ e and Φ h are particle-in-box electron and hole single-particle states: Here k j = π/L j , with L j being the dimensions of the NPL along the direction j = x, y, z. Notice that, because L z ≪ L x ∼ L y , the Jastrow factor in Eq. (24) contains in-plane Coulomb terms only. Also, normalization constants are omitted in Ψ X because they cancel out in the evaluation of local magnitudes, see Eq. (8).
The program we provide in this work, vqmc-ema, uses the VQMC-Newton Rhapson model to obtain the exciton ground state of Eq. (21), with the trial function in Eq. (24). The code needs no external libraries other than OpenMP, and only in case we want the calculation of random walkers to be paralellized. The program determines the optimal variational parameter α and its associated exciton energy, for any input NPL dimensions and material parameters (effective masses, dielectric constants). [34] A number of calculations in CdSe NPLs are run to validate the model. The same effective masses as in Ref. [8] are taken, relative dielectric constants are set to ǫ in = 6 and ǫ out = 2, and E gap = 1.76 eV. [35] We first calculate the energy as a function of the variational parameter α in a NPL with dimensions (L x , L y , L z )=(30, 10, 1.4) nm. The result is shown in Figure 2(a), for different values of (m, nw), where m is the number of (accepted) points per walker and nw the number of walkers used to evaluate Eq. (12), with m · nw = N. The black solid line shows the result obtained with the full variational integration of Ref. [8], for comparison. A clear minimum of E develops around α = 0.72 in all cases. Since we define the exciton Bohr radius in the 2D limit, r X B = ǫ in /2µ (atomic units, µ is the exciton in-plane reduced mass), this means the effective Bohr radius (1/a) in a NPL is intermediate between that of a 2D exciton (α = 1) and a that of 3D one (α = 0.5). The agreement with the exact variational calculation is rough for (m, nw) = (10 6 , 10). However, it becomes excellent when the total number of points is increased from N = 10 7 to N = 10 9 , through either larger m or nw.
In Fig. 2(b) we compare the VQMC result with the exact variational one for different NPL sizes. Since L x = 30 nm, the platelet changes from rectangular to square with increasign L y . Also, by weakening confinement, the electronic correlations become more important. One can see in the figure that the stochastic calculation matches the exact result in all cases with sub-meV error. This is under 0.15% relative error with respect to the total exciton (confinement plus Coulomb) energy.
Because our model is intended to describe excitonic interactions even in the limit of weak confinement, we test its accuracy in fulfilling Kato cusp conditions. [26] To this end, because the Jastrow factor considers in plane correlations only, we rewrite the Coulomb terms in (23) removing z coordinates. From Hamiltonian (1) and the trial function (2), assuming homogeneous relative dielectric constant ǫ in = ǫ out and negligible confinement, it is easy to check that the Coulomb divergence at r ,eh → 0 is compensated by kinetic energy if a = 2µ/ǫ in = 1/r X B . That is, a 2D hydrogen atom-like limit should be retrieved. Fig. 2(c) shows the VQMC model successfully converges towards this value for large NPLs, since the value of α minimizing the exciton energy tends to 1, and then a → 1/r X B . For smaller NPLs, however, kinetic energy terms coming from Φ e and Φ h make α deviate from the 2D limit. The latter result illustrates that quantum confinement prevents inferring the value of variational parameters from simple bulk cusp conditions, and in general we need to optimize all the parameters in the trial wave function numerically.
(26) By analogy with the exciton case, for the trial wave function we use Ψ X + as in Eq. (5) but with Jastrow factors accounting for in-plane coordinates only. Φ e , Φ h1 and Φ h2 are particle-in-box functions, as in Eq. (25). vqmcema addressed the trion case as well. It optimizes the variational parameters (ζ, β, α) of the ground state and provides the associated energy, for a given input of NPL dimensions, effective masses and dielectric constants.
As an illustrative calculation, Figure 3 compares the energy of neutral and charged excitons in a CdSe NPL with (L x , L y , L z ) = (30, 10, 1.4) nm, as a function of the dielectric mismatch. The outer medium dielectric constant is varied from ǫ out = 2, which is a typical value for organic ligands passivating NPLs, [33] to ǫ out = ǫ in , which suppresses dielectric confinement. Fig. 3(a) and (b) show the total energy of exciton and trion, respectively. The exciton behavior is well known. With decreasing dielectric contrast, the energy decreases due to the weakening of self-energy repulsion, which exceeds the weakening of electron-hole Coulomb attraction. [36] We observe this behavior both with VQMC calculations (dots) and with the full variational integration of Ref. [8] (black line). The same trend is found for the positive trion, which reveals that self-energy terms prevail over Coulomb enhancement in charged excitons too. In fact, the shift in energy is more pronounced than for the exciton, because the larger number of particles translates into a more relevant contribution of self-energy repulsion.
Colored lines in Fig. 3 show exciton and trion energies calculated using a CI method, as described in Ref. [37]. Two basis sets are used, one built from all possible combinations (Hartree products and, for the trion, Slater determinants) obtained from the lowest 12 electron and hole spin-orbitals, and another from the 24 lowest ones. While the qualitative trend is consistent with that of the QMC calculations, a deviation of few tens of meV is observed in both cases. It is clear that CI calculations are far from convergence, and VQMC offer more accurate description. Remarkably, this is in spite of VQMC calculations requiring CPU time of several minutes (see next), as compared to few days in the case of the CI method. [38] The binding energies of exciton and trion, which are known to be sensitive to correlation energy, [4] are plot explicitly in Figs. 3(c) and (d), respectively. For the exciton, we define E b (X) = E(X) − E e − E h , with E(X) the total exciton energy, E e that of an independent electron and E h that of an independent hole. That is, E b (X) is the stabilization energy as compared a to non-interacting electron and hole pair. For the trion, it is defined as E b (X + ) = E(X + ) − E(X) − E h , where E(X + ) stands for the total energy of the trion. Then, E b (X + ) is the stabilization energy compared to one exciton plus one hole. One can see that CI results underestimate the binding energies as compared to VQMC calculations, which is indicative of the latter capturing a larger amount of correlation energy. In the case of the trion, the improved description reveals a qualitative trend opposed to that predicted by CI calculations, whereby E b (X + ) is weakened for small ε out . This is indicative of dielectric mismatch enhancing Coulomb repulsions, V c (r h1 , r h2 ), over attractions, V c (r e , r h1(2) ). Figure 4 provides more details on the computational efficiency of the VQMC model. As mentioned before, the calculation of random walkers can be distributed among different threads using OpenMP directives. We bench-mark the results of the parallelization in the figure. Simulations were run on a workstation with Dual Intel Xeon E5-2660v4 core (2.2 GHz), which admits a number of parallel threads (hereafter nt) of up to 40. The code was compiled with Intel Fortran compiler and OpenMP library l ibomp5. For the neutral exciton, the time required per iteration of the Newton-Rhapson process shows a nearly perfect scaling with 1/nt, see Fig. 4(a). For energy convergence threshold of about 1 meV, a few iterations suffice. Then, the total execution time is of a few minutes only. The scaling versus number of walkers nw, for a fixed number of threads, shows a nearly linear behavior as well, see Fig. 4(b). Linear scalings hold for trions too, as shown in Figs. 4(c) and (d), albeit the execution times increase by a factor ∼ 1.5, owing to the larger number of coordinates, wave function terms and variational parameters compared to excitons. The linearity of the previous plots confirms that the distribution of walkers among threads is well balanced.

Conclusions
We have developed Fortran programs to calculate exciton and trion ground state properteis in colloidal NPLs. The codes are based on a VQMC-effective mass model which can be easily extended to semiconductor QDs with different shapes and potentials. The model is conceptually simple and computationally efficient (fast execution times, further accelerated by OpenMP parallelization, and minimal memory requirements), and hence susceptible of being used in standard desktop computers. By using Jastrow factors which capture short range interactions, it outperforms standard CI calculations in both accuracy and computational efficiency. It also outpeforms full variational integrations used in previous studies for weakly confined NPLs, in that it gives access not only to neutral exciton but also to trion species, with only a moderate increase in execution time.