Analysis of multispecies point patterns by using multivariate log‐Gaussian Cox processes

Multivariate log‐Gaussian Cox processes are flexible models for multivariate point patterns. However, they have so far been applied in bivariate cases only. We move beyond the bivariate case to model multispecies point patterns of tree locations. In particular we address the problems of identifying parsimonious models and of extracting biologically relevant information from the models fitted. The latent multivariate Gaussian field is decomposed into components given in terms of random fields common to all species and components which are species specific. This allows a decomposition of variance that can be used to quantify to what extent the spatial variation of a species is governed by common or species‐specific factors. Cross‐validation is used to select the number of common latent fields to obtain a suitable trade‐off between parsimony and fit of the data. The selected number of common latent fields provides an index of complexity of the multivariate covariance structure. Hierarchical clustering is used to identify groups of species with similar patterns of dependence on the common latent fields.


Introduction
In tropical rainforest ecology, hypotheses regarding biodiversity are studied using large data sets containing locations of thousands of trees for each of hundreds of species. Statistical methodology based on spatial point processes is now well established in such studies where the pattern of tree locations for each species is regarded as a realization of a spatial point process (e.g. Seidler and Plotkin (2006), Wiegand et al. (2007), Liu et al. (2007), Shen et al. (2009), Law et al. (2009 and Lin et al. (2011)). Often a first step is to fit regression models depending on habitat variables for the intensity function of each species. Second, variation that is not accounted for by the intensity function may be assessed by using for example the inhomoge-neous K-function or pair correlation function (Waagepetersen, 2007;Waagepetersen and Guan, 2009). A natural third step after such univariate (single-species) analyses is to study possible interactions between species.
One approach to studying interspecies interactions is to consider cross-K or cross-pair correlation functions between all pairs of species possibly conducting simulation-based tests for the hypothesis of no interaction for each pair of species. As remarked in Perry et al. (2006) one problem is that it may be difficult to grasp the information in the potentially high number of cross-summary-statistics and multiple testing becomes an issue. This essentially bivariate approach further does not provide insight into the multivariate dependence structure of several species. Finally, from the non-parametric estimates of K or pair correlation functions it is not possible to study biologically interesting questions regarding decomposition of variation according to sources that are common for several species (e.g. unobserved environmental covariates) and sources which are species specific (e.g. seed dispersal). To address such questions a suitable modelling framework is needed. The literature on multivariate spatial point process models is mainly restricted to the bivariate case; see for example Diggle and Milne (1983), Harkness and Isham (1983), Högmander and Särkkä (1999), Brix and Møller (2001), Allard et al. (2001), Diggle (2003), Picard et al. (2009), Liang et al. (2009) andFunwi-Gabga andMateu (2012). Two exceptions are Diggle et al. (2005) and Baddeley (2010), who modelled four-and six-variate point patterns by using multivariate Poisson processes. The Poisson process assumption does not, however, seem appropriate for the clustered patterns of rainforest trees. A third exception is Illian et al. (2009) but the hierarchical model developed in this paper is quite specific for a case where so-called reseeders occur conditionally on locations of resprouters.
In this paper we consider a statistical analysis of two multivariate point pattern data sets containing locations of respectively six and nine species of trees. The first data set is the classical Lansing Woods data whereas the other contains species from the tropical rainforest plot at Barro Colorado Island (BCI). The data sets and the objectives of the analyses are described in more detail in Sections 1.1 and 1.2. For the analyses we develop an inferential framework for in principle an arbitrary number of species without a known hierarchy. We model multivariate point patterns by using the well-known multivariate log-Gaussian Cox processes (LGCPs) (Møller et al., 1998;Brix and Møller, 2001;Liang et al., 2009). The latent Gaussian fields are obtained as linear combinations of common Gaussian fields as well as Gaussian fields that are specific to each species as in Brix and Møller (2001), who considered the bivariate case. From a methodological point of view the first specific objective in this paper is to move beyond bivariate point processes and to address the challenges that are linked to the potentially quickly increasing number of parameters that occur for highly multivariate LGCPs. A second objective is to explore how biologically relevant information can be extracted from a fitted multivariate LGCP.
The number of parameters in our model depends strongly on the number q of common latent fields, and we introduce a cross-validation procedure to determine the number q. This leads to parsimonious models when the selected q is considerably smaller than the number of species. The selected q provides an index of the complexity of the multivariate dependence structure and a test for q = 0 yields an overall test for the hypothesis of independence between all species. Using a decomposition of the fitted multivariate covariance structure we further quantify to what extent the spatial distribution of trees is controlled by common and species-specific factors. We finally identify clusters of species with similar patterns of dependence on the latent common factors.

Lansing Woods
The Lansing Woods data (Gerrard, 1969) contain locations of 2251 trees in a 19.6 acre square plot. The trees are grouped according to six species (abundances are given in parentheses): black  Gerrard (1969) was to study a new type of competition index measuring the degree of competition between trees in a given region. Such indices are for example used to predict the growth of the trees. The Lansing Woods data were used in Diggle (2003) to demonstrate a range of statistical methods for spatial point patterns. However, Diggle (2003) did not consider multivariate analyses of the Lansing Woods data. Baddeley (2010) considered a multivariate Poisson point process model for the Lansing Woods data and rejected the null hypothesis of no segregation of species (i.e. the hypothesis of proportional intensity functions). Our focus is on the multivariate dependence structure of the tree species with the aim of obtaining a more incisive and parsimonious characterization of the multivariate dependence than what is provided for example by consideration of the 15 distinct pair and cross-pair correlation functions for the six species. We also wish to study whether the multivariate dependence structure can be related to the three major groups of species: oaks, hickories and maples.

Barro Colorado Island
The BCI data (Hubbell and Foster, 1983;Condit et al., 1996;Condit, 1998) contain locations of hundreds of thousands of trees observed in a 1000 m × 500 m plot. For computational reasons we cannot handle a joint analysis of all the hundreds of species found in the BCI plot. We therefore restrict attention to nine species of intermediate abundance in the range 2500-7500. In addition to the point patterns of trees, a number of covariates are available regarding top- ography and soil properties. For each spatial location, the covariate vector is 11 dimensional and in addition to the constant 1 contains soil potassium content, acidity, elevation, elevation gradient, multiresolution index of valley bottom flatness, incoming mean solar radiation and a topographic wetness index as well as soil contents of copper, mineralized nitrogen and phosphorus (plots of tree locations and selected covariates are provided in section 1 in the on-line supplementary material). Finally information of two types of functional traits are available: life form and mode of seed dispersal (Muller-Landau and Hardesty, 2005;Wright et al., 2007). Table  1 lists the species and their family names, life forms, modes of seed dispersal and abundances.
Regarding the modes of seed dispersal, big birds are birds of biomass larger than 300 g. However, the distinction between the classes bird-mammal and big bird-mammal is not completely clear cut (Dr Joseph Wright, personal communication). As for the Lansing Woods data we wish to analyse in detail the multivariate dependence structure of the nine species. A further aim is to connect this analysis to the information regarding species families, life forms and modes of seed dispersal. For instance, we want to study whether species of the same family or life form tend to be positively correlated or share similar properties regarding the relative influence of common and species-specific factors on their spatial pattern.

Multivariate log-Gaussian Cox processes
We consider a multivariate Cox point process (Møller et al., 1998) X = .X 1 , : : : , X p /, p > 1, where each component X i is a Cox process driven by a random intensity function Λ i , i.e., conditionally on the Λ i , the X i are independent Poisson point processes each with intensity function Λ i . The random intensity functions are of the form Λ i .u/ = exp{Z i .u/} with For each i, μ i is a deterministic function typically depending on covariates and Y i and U i are zero-mean Gaussian random fields. The U i are assumed to be independent of each other and of the Y i whereas the Y i may be correlated across species. The idea is that the Y i represent effects of for example unobserved environmental variables whereas the U i serve to model clustering due to species-specific factors such as seed dispersal. The U i are assumed to be stationary and isotropic with variance σ 2 i and correlation function c i .·/ so that cov{U i .u/, U i .u + h/} = σ 2 i c i . h /, h ∈ R 2 .

Model for correlated latent fields
Regarding the Y i we assume the factor-type model Y.u/ = .Y 1 .u/, : : : , Y p .u// T = α E.u/  where α = .α ij / ij is a p × q coefficient matrix, and is a q-dimensional stationary and isotropic zero-mean Gaussian process with independent components E l . Without loss of generality we assume that var{E l .u/} = 1 and we denote by r l .·/ the correlation function of E l . Thus the multivariate covariance function for E is R.t/ = cov{E.u/, E.u + h/} = diag{r 1 .t/, : : : , r q .t/}, where diag.a 1 , : : : , a n / means a diagonal matrix with diagonal entries a 1 , : : : , a n . It follows that the multivariate covariance function of Y is C.t/ = α R.t/α T = Σ q l=1 α ·l α T ·l r l .t/ where α ·l is the lth column in α. Fig. 2 shows the structure of the model in the case p = 3 and q = 2.
Our model for Y is well known in the signal processing literature where the problem of estimating α and E from observations of Y is known as blind source separation (see for example Belouchrani et al. (1997)). In spatial statistics the model was first proposed in Gelfand et al. (2004) as a generalization of the so-called intrinsic or proportional correlation model (e.g. section 5.6.4 in Chilès and Delfiner (1999)) which is obtained when r l = r k for all l and k. Moreover, the model is a special case of the so-called linear model of coregionalization (e.g. section 5.6.5 in Chilès and Delfiner (1999) and Genton and Kleiber (2014)). In the case of the proportional correlation model, the multivariate covariance function C depends on α only through αα T . Hence in this case we can without loss of generality take α = OD 1=2 where ODO T is the spectral factorization of αα T . The latent processes E l are then known as empirical orthogonal functions (Wackernagel, 2003).

Intensity function and multivariate pair correlation function
where α i· is a row vector containing the ith row of α, and the matrix g.t/ of cross-pair-correlations of X at lag t has entries (Møller et al., 1998) Large values of Σ q l=1 α 2 il r l .t/ + σ 2 i c i .t/ lead to strong intraspecies correlation for X i at lag t. Regarding between-species interaction, Σ q l=1 α il α jl r l .t/ less than 0 or greater than 0 implies respectively repulsion or attraction between points of X i and X j at lag t. The cross-pair correlation function g ij and the intensity functions ρ i and ρ j determine the covariances of counts N i .A/ and N j .B/ of the points from X i and X j falling in subsets A, B ⊆ R 2 : Thus, in the case i = j, g ij = 1 implies that counts from X i and X j are uncorrelated.
We model μ i .·/ by a linear regression depending on the available covariate vectors of environmental variables. For the correlation functions r l and c i we introduce parametric models r.·; φ l / and c i .·/ = r.·; ψ i /. The geostatistical literature offers a wide range of correlation function models; see for example Chilès and Delfiner (1999).

Least squares estimation and cross-validation
In this section we first consider a least squares approach to estimate the model parameters for a fixed q. In the least squares criterion, the dependent variables are given by log-transformed non-parametric estimates of cross-pair and pair correlation functions. On the basis of the least squares criterion we next introduce a cross-validation procedure for selecting q. In the case of no species-specific latent variation σ 2 i = 0, i = 1, : : : , p, section 3 in the on-line supplementary material describes an alternative method of estimation for α based on spectral decomposition. The least squares estimation and cross-validation methods are assessed in a simulation study in Appendix B.

Non-parametric estimation
Fitted regression modelsρ i .·/ are obtained in a standard way by using composite likelihood; see for example Waagepetersen (2007) and Waagepetersen and Guan (2009). In a second step we obtain non-parametric estimates (Baddeley et al., 2000;Møller and Waagepetersen, 2003) g ij of the cross-pair-correlation functions: where W is the observation window, k b is a kernel function depending on a smoothing parameter (bandwidth) b > 0, | · | is area and W h denotes the translate of W by the vector h ∈ R 2 (Møller and Waagepetersen, 2003).

Identifiability
Each cross-pair-correlation function g ij is invariant to (a) simultaneous permutation of the columns in α and the diagonal entries in R.t/ and (b) multiplication with −1 of a column in α.
Hence, if one local minimum is found for Q given by equation (3), there will be q! × 2 q − 1 other local minima with the same value of Q. Rather than imposing constraints to resolve this identifiability issue our strategy is to restrict attention to estimates of functions of α which are invariant to the transformations of α mentioned. However, we need a notion of 'local identifiability' stating that the aforementioned local minima actually exist. In particular one question is how large a q can be used.
To address this question we consider the Hessian matrix (6) of Q with respect to α which is given in Appendix A. Under an appropriate asymptotic setting with the non-parametric estimatesĝ ij tending to their true values, the Hessian matrix converges to 2.dβ T =dα/R T R.dβ=dα/ where R is a block diagonal matrix with p 2 diagonal blocks diag.
√ w ijk , k = 1, : : : , L/ R.φ/ i, j = 1, : : : , p. Hence, R has full rank if and only if R.φ/ has full rank. This will in general be so for any q L if all φ l , l = 1, : : : , q, are distinct. Appendix A also provides the entries in the pq × p 2 q matrix dβ=dα. For this matrix to be of full rank it is sufficient (although not necessary) that all α ij = 0. Hence, if the true φ l are all distinct and the true α ij are all non-zero, the object function will at least asymptotically have a local minimum with respect to α at the true parameter value for any q L. These theoretical considerations thus do not rule out consideration of large q. However, in practice the optimization of the object function becomes increasingly cumbersome for increasing q and we have restricted attention to q p.
We minimize the object function Q by using a combination of a quasi-Newton algorithm and a spectral projected gradient method. Specifically we use the R procedure optimx with 'method' equal to BFGS or spg and supply the analytical expressions for the gradient and Hessian (the latter for evaluating criteria for local minima); see Appendix A in this paper and section 2 in the on-line supplementary material. The choice of the starting point for the minimization is crucial. In particular, α = 0 is a stationary point for Q with respect to α since the derivative of Q with respect to α is always 0 when α is the zero vector 0. We used as starting point a crude estimate of α obtained by using a spectral method (section 3 in the on-line supplementary material) or picked a random starting point centred at 0 if the spectral method failed.

Estimation of number of latent factors
To determine q we apply a variant of K-fold cross-validation (e.g. Hastie et al. (2013)), i.e. we split the indices ijk, i = j, into K sets S 1 , : : : , S K . For each q and c = 1, : : : , K we then obtain an estimateθ c by minimizing equation (3) with w ijk replaced by 0 for ijk ∈ S c . A cross-validation score is then obtained by . 4/ We do not include diagonal indices iik in the sets S c . This is because the within-species log-paircorrelation functions log.g ii / have both species-specific components and components due to the common random factors. They therefore do not provide much information about q. Including such indices in the S c further makes the estimation of the species-specific parameters less stable. For a given ij, y ijk and y ijk are strongly correlated when k and k are close. Hence to obtain a sufficient sensitivity of the cross-validation score we need to leave out blocks of consecutive indices.
To obtain the subsets S c we arrange the ijk with i < j lexicographically in a vector .121, 122, : : :/ and split this vector into consecutive blocks of length b. These blocks are then assigned to the different S c at random. Moreover, if ijk, i < j, is assigned to S c so is jik, i.e. the S c are symmetric in the sense that ijk ∈ S c implies jik ∈ S c . In a simulation study a value of b equal to 50% of the number of lags L worked well. This choice of b is also used in the applications in Section 6. Often K between 5 and 10 are used (Hastie et al., 2013). We chose K = 8 to use efficient parallel computing on a server with eight central processor units. Following the discussion in Section 3.3 we consider in practice q in the range 0, : : : , p.

Inferences regarding multivariate dependence structure
The first pertinent question is whether species are at all correlated. To assess this we use the least squares criterion Q with q = 0 as a test statistic and compare the observed Q with its distribution obtained by using a parametric bootstrap (Davison and Hinkley, 1997) under the model fitted with q = 0.
The cross-pair-correlation functions or equivalently Σ q l=1 α il α jl r.t; φ l / = cov{Y i .u/, Y j .u + h/}, h = t, determine the sizes of cross-covariances of count variables associated with the point processes X i and X j , i = j; see equation (1). Our approach provides parametric estimates of the cross-pair-correlation functions but this is not a key contribution since essentially the same information is obtained from the non-parametric estimatesĝ ij . Our parametric model, however, allows us to decompose the covariances of the latent Gaussian fields into contributions from the common fields and the species-specific fields. For a given spatial lag t and species i we can consider the proportion of covariance due to the common fields of the log-random-intensity function Z i , The proportions of variances PV i .t/, i = 1, : : : , p, can thus be used to group species according to how much of the variation in the log-random-intensity function is due to common factors as opposed to species-specific factors. By analogy with Jalilian et al. (2013) the proportions of variances are further related to a certain R 2 -type statistic measuring how big a proportion of the variance in Λ i is due to the common latent fields.
Considering the between-species correlation structure for a given spatial lag we have, for i = j, Thus the correlation between two different log-random-intensity functions is factored into the correlation due to the common factors and the square roots of the proportions of variances. A small PV i .0/ thus immediately implies that the latent field Z i has a small correlation with any other species. To study the implications at the scale of counts of X i and X j (Section 2.2) note thatN i .A/ = E[N i .A/|Λ i ] = A Λ i .u/ du can be viewed as the spatially structured part of the count N i .A/ (Jalilian et al., 2013). For small A and B containing locations u and v,N i .A/ ≈ |A| Λ i .u/ andN j .B/ ≈ |B| Λ j .v/. Thus the correlation corr{N i .A/,N j .B/} can be approximated by corr{Λ i .u/, Λ j .v/}. Employing further exp.x/ ≈ 1 + x, we obtain corr{N i .A/,N j .B/} ≈ corr{Z i .u/, Z j .v/}. Suppose that we want to group species according to their pattern of dependence on the latent factors E l . A simple distance measure between species i and j would be α i· − α j· which is invariant to the kind of transformations of α that were mentioned in Section 3.3. The covariance . α i· 2 α j· 2 / seem less useful in this context. For two species which are similar in both having small values of α i· 2 and α j· 2 for example, the covariance will nevertheless be small. In contrast, if two species have the same relative patterns of dependence on the E l in the sense that α i· = kα j· , k = 0, then |corr{Y i .u/, Y j .u/}| = 1 regardless of k.
In the following applications we shall focus on estimation of PV i .0/ and the zero lag cross correlations corr{Y i .u/, Y j .u/} among the common fields. We also look at the mean crosscorrelation over a range [0, T ] of lags, i.e. T 0 corr[Y i .u/, Y j {u + .0, t/ T }] dt=T . We further perform clustering of species by using the distances α i· − α j· . To obtain confidence intervals for correlations and proportions of variances we use a parametric bootstrap based on simulations from the fitted model. In the bootstrap we consider q as known and given by the q selected by cross-validation. This will lead to some underestimation of variances of parameter estimates but doing a full bootstrap including selection of q can be very time consuming when the number of species is large.

Multispecies dependence structures in temperate and tropical forests
In the following sections we return to the applications that were presented in Sections 1.1 and 1.2.

Joint analysis of the Lansing Woods data
Covariates are not available for the Lansing Woods data so for the intensity functions we just fit an intercept for each species. For the correlation of the latent fields we use the exponential correlation model r.·; ψ/ = exp.− · =ψ/ where ψ > 0 is the correlation scale parameter. We fit seven stationary multivariate LGCPs with numbers of latent processes q ranging from 0 to 6. Initially we test the hypothesis of independent species (q = 0) by using a parametric bootstrap, i.e. we simulate 400 data sets under the model fitted with q = 0 and fit the model with q = 0 to all the simulated data sets. Only 0:25% of the simulated Q lie above the observed value 65.3 of Q. Hence we reject the hypothesis of independent species. For each q Fig. 3(a) shows the minimized object function (3) whereas Fig. 3(b) shows the cross-validation score (4). The smallest crossvalidation score is obtained with q = 4. The object function drops markedly from q = 0 to q = 3 and then starts to level off. Hence q = 4 seems to be a good choice for the number of latent processes.
Continuing with the model fitted with q = 4, Fig. 4(a) shows the estimated cross-correlations at lag 0 for pairs Y i , Y j and Z i , Z j as well as 95% parametric bootstrap confidence intervals obtained from 400 simulations of the model fitted. The indices i, j = 1, : : : , 6 correspond to black oak, hickory, maple, miscellaneous, red oak and white oak. Because of the species-specific random fields U i , the cross-correlations are smaller for the Z i than for the Y i . The estimated crosscorrelations for species pairs (black oak, maple), (black oak, miscellaneous) and (hickory, maple) are quite small. However, because of large sample variation all bootstrap confidence intervals contain zero. Fig. 4(b) shows estimated mean cross-correlations over the range [0, 0:25]. Overall the patterns of cross-correlations are similar in the two plots in Fig. 4 but some of the confidence intervals in Fig. 4(b) are narrower than in Fig. 4(a). In particular, zero is not contained in the bootstrap intervals for the mean cross-correlations for the species pairs (black oak, maple), (black oak, miscellaneous) and (hickory, maple). Fig. 5(a) shows estimated proportions of variances PV i .0/ at lag 0 with 95% parametric bootstrap intervals. According to the estimates, the main proportion of the variance of Z i is due to the common latent factors for black oak, hickory, maple and miscellaneous whereas the common latent factors and the species-specific factors have roughly equal contributions for red and white oak. Thus black oak seems to be distinct from red and white oak regarding the relative influence of common factors on the spatial pattern. However, as for the cross-correlations, the rather wide confidence intervals show that the estimates are quite uncertain. Fig. 5(b) shows a hierarchical clustering of the species based on the fitted coefficient rows α i· . In agreement with the fitted correlations, black oak and maple belong to separate clusters. The same holds for the species pairs (black oak, miscellaneous) and (hickory, maple). The clustering does not support a grouping into the coarser categories oak, hickory, maple.
The model with q = 4 has 28 parameters (in α and φ) used to fit 15 unique cross-paircorrelation functions g ij , i < j. Thus on average 1.9 parameters are used for each g ij , i < j. As an assessment of model fit, Figs 2 and 3 in the on-line supplementary material show non-parametric estimates of the L-and cross-L-functions (e.g. chapter 4 in Møller and Waagepetersen (2003)) together with 95% pointwise envelopes obtained from simulations of the model fitted. None of these plots disclose any severe deficiencies of the fitted model.

Analysis without miscellaneous category
The miscellaneous category corresponds, not to a single species, but to a residual group of  D 1,..., 6 corresponds to black oak, hickory, maple, miscellaneous, red oak and white oak); (b) hierarchical clustering based on fitted α i s trees belonging to a mixture of less abundant species. For this reason, as pointed out by a referee, omitting this group in the analysis could potentially lead to a simpler model and hence smaller uncertainty in the inference. We therefore repeated the analysis without the miscellaneous category. In this case the cross-validation identified q = 1 as a suitable number of latent processes ( Fig. 6(a)). This yields a much simpler model than with the previously selected q = 4. In particular on average only 0.6 parameters are used for each of the 10 unique cross-pair-correlation functions g ij , i < j. The fitted parameters are .α 11 , α 21 , α 31 , α 51 , α 61 / = .−0:68, −0:49, 0:87, 0:16, 0:08/. Thus both black oak and hickory have a negative dependence on the latent field, whereas maple has a positive dependence on the latent field. Red and white oak both have a relatively weak dependence on the latent field. In the case q = 1, the correlations corr{Y i .u/, Y j .u/} are either precisely 1 or −1 depending on whether the corresponding parameters α i1 and α j1 are of the same or different sign. The fitted correlations corr{Z i .u/, Z j .u/} are quite similar to the fitted correlations for the same pairs of species obtained in the previous analysis. However, the bootstrap confidence intervals are much narrower (the plots have been omitted). Regarding proportions of variance ( Fig. 6(b)) all the fitted proportions of variance are smaller than those obtained with q = 4. This is consistent with the much more sparse representation of the correlated latent fields Y i , which implies that more variation is explained by the species-specific fields. In particular the proportions of variances for red oak and white oak are close to 0 with much narrower confidence intervals than in the previous analysis. The grouping from the hierarchical clustering is a little more consistent with the coarser groups oaks, hickory and maple than before, placing red and white oak in one cluster and with maple forming a single-species cluster. However, there is still a heterogeneous cluster consisting of black oak and hickory.

Multivariate dependence and functional traits for species in the Barro Colorado Island plot
For the BCI data, following Section 3.1, we fit regression models for the μ i -terms by using composite likelihood for each species separately. In the subsequent non-parametric estimation of the cross-pair-correlation functions using equation (2) the variations due to the observed covariates are filtered out. The non-parametric estimates thus capture residual correlation due for example to unobserved covariates, seed dispersal and other sources of correlation. As for the Lansing data we use the exponential correlation model for the latent random fields. In what follows we consider similar analyses to those for the Lansing data. For ease of presentation we refer to the species by the first part of their genus (see Table 1), adding a 't.' for Protium tenuifolium and a 'p.' for Protium panamense.

Statistical analyses
The hypothesis of independent species (q = 0) is rejected with a parametric bootstrap p-value of 0:5%. For each q = 0, : : : , 9 Fig. 7(a) shows the minimized object function Q whereas Fig. 7(b) shows the cross-validation scores. The smallest cross-validation score is obtained with q = 4. This choice of q is also supported by Fig. 7(a) where the decrease in the object function is relatively modest for q > 4. Proceeding with the model fitted with q = 4, Fig. 8 shows estimated cross-correlations as well as 95% parametric bootstrap confidence intervals obtained from 400 simulations of the model fitted. Most of the cross-correlations (whether for Y or Z) appear to be significantly larger than 0. There is some evidence of negative correlation between Psychotria and Protium p. whereas there is no evidence of positive or negative correlation for the pairs (Psychotria, Protium t.), (Psychotria, Swartzia), (Psychotria, Hirtella), (Psychotria, Tetragastris) and (Psychotria, Garcinia). Integrated cross-correlations show a similar pattern (the plot has been omitted) but with wider confidence intervals for some species. However, the proportion of variance due to the common factors for Garcinia is significantly smaller than the benchmark value of 0.5. Fig. 9(b) shows a hierarchical clustering of the species on the basis of the fitted coefficient rows α i· . In agreement with the fitted correlations, Psychotria forms its own cluster.
For the model with q = 4 on average 1.1 parameters are used for each unique g ij , i < j. Figs 4-7 in the on-line supplementary material provide model assessment using L-functions as for the Lansing data. Out of 45 unique L-or cross-L-functions there appears to be issues with only the two intraspecies L-functions for Swartzia and Garcinia.

Relationship to species families, life form and mode of seed dispersal
Protium t., Protium p. and Tetragastris all belong to the family Burseraceae whereas the other species belong to distinct families. It is interesting to see that the family-related species Protium t., Protium p. and Tetragastris have fairly similar fitted proportions of variance and that the hierarchical clustering creates a cluster consisting of precisely these three species.
Regarding life form there is not a clear pattern in relation to the previous results as each of the categories trees (Protium t., Protium p., Tetragastris, Garcinia) and shrubs (Psychotria, Capparis, Mouriri) both display high and low proportions of variances and do not correspond to groups in the hierarchical clustering.
The spatial pattern of a species is influenced by the mode of seed dispersal (e.g. Muller-Landau and Hardesty (2005) and Seidler and Plotkin (2006)). The mode of seed dispersal for Psychotria is by birds whereas it is bird-mammal or big bird-mammal for the remaining species. This could explain why Psychotria seems to be distinct from the other species in terms of both

Discussion
A basic problem with multivariate LGCPs is to model the cross-covariance structure of the latent multivariate Gaussian field. Genton and Kleiber (2014) is a nice review of approaches to cross-covariance modelling. In practice we need a flexible, interpretable and parsimonious model. The linear model of coregionalization has some deficiencies in terms of flexibility. It for example enforces symmetric cross-covariances cov{Y i .u/, Y j .v/} = cov{Y i .v/, Y j .u/} but this seems a minor problem in the considered practical context of modelling point patterns of tree species. The model components have a reasonable interpretation as explained at the beginning of Section 2. Parsimony is sought by selection of a hopefully small q by cross-validation. In both practical examples this results in fairly parsimonious models as measured by the number of parameters per unique cross-pair-correlation function. Another way to obtain parsimony would be to consider a lasso approach (Tibshirani, 1996). In this case one would fix q a priori and then use cross-validation to select a suitable L 1 -regularization which would typically result in some entries α il being set to 0. The problem of identifying a suitable q, however, remains.
Regarding the biological implications of our work, the number of latent processes q that are selected by cross-validation gives an index of the complexity of the multivariate dependence structure. The plots of cross-correlations in Section 6 provide compact presentations of the correlation structure of the species whereas fitted proportions of variances quantify to what extent the spatially structured random variation of the species is governed by common or speciesspecific factors. As shown for the BCI data, there is further scope for linking proportions of variances and results of hierarchical clustering to families of species and functional traits such as life forms, reproductive strategies and growth or mortality patterns.
The hierarchical clustering results did not come with a measure of uncertainty. Inspired by Kerr and Churchill (2001) one may study the stability of the clustering by applying the hierar-chical clustering to parametric bootstrap simulations from the fitted models. This is considered in section 5 of the on-line supplementary material. It appears that the clustering results are very stable for the abundant BCI species but less so for the Lansing data where the species are less abundant. Bioimaging, funded by a grant from the Villum Foundation. Yongtao Guan's research was supported by National Science Foundation grant DMS-0845368, by National Institutes of Health grant 1R01CA169043 and by the VELUX Visiting Professor programme. Jorge Mateu's research was supported by grants P1-1B2012-52 and MTM2013-43917-P.
The BCI forest dynamics research project was made possible by National Science Foundation grants to Stephen P. Hubbell: DEB-0640386, DEB-0425651, DEB-0346488, DEB-0129874, DEB-00753102, DEB-9909347, DEB-9615226, DEB-9405933, DEB-9221033, DEB-9100058, DEB-8906869, DEB-8605042, DEB-8206992 and DEB-7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Center for Tropical Forest Science, a global network of large-scale demographic tree plots.
The BCI soils data set was collected and analysed by J. Dalling, R. John, K. Harms, R. Stallard and J. Yavitt with support from National Science Foudation grants DEB021104, DEB021115, DEB0212284, DEB0212818 and Office of International Science and Engineering grant 0314581, Smithsonian Tropical Research Institute and Center for Tropical Forest Science. Paolo Segre and Juan Di Trani provided assistance in the field. The covariates dem, grad, mrvbf, solar and twi were computed in SAGA GIS by Tomislav Hengl (http://spatial-analyst. net/). We thank Dr Joseph Wright for sharing data on dispersal modes and life forms for the BCI tree species.

Appendix A: Gradient and Hessian matrix for least squares object function
Letỹ denote the p 2 L-vector consisting of theỹ ijk = y ijk √ w ijk concatenated in lexicographic order (ỹ 111 , y 112 , : : :) and let R be the p 2 L × p 2 q block matrix with diagonal L × q blocks diag.
By the multivariate chain rule, the derivative of Q with respect to α is and the Hessian matrix is @β T @α k 1 l 1 @α k 2 l 2 dQ dβ k 1 l 1 , k 2 l 2 : . 6/ The pq × p 2 q matrix dβ T =dα has entries otherwise and the vector @β T =.@α i 1 k 1 @α i 2 k 2 / has entries @β jlk @α i 1 k 1 @α i 2 k 2 = The remaining derivatives of Q are given in section 2 of the on-line supplementary material.

Appendix B: Simulation study
To asses the least squares method for parameter estimation and the cross-validation method for choosing q we conducted a simulation study on the unit square with p = 5 and q either 0 or 2. Regarding parameter estimation, we focused on the estimation of the proportions of variance PV i .0/ and the off-diagonal correlations corr{Y i .u/, Y j .u/} at lag 0. For both q = 0 and q = 2 we considered σ 2 = .1, 1, 1, 1, 1/ and ψ = .0:01, 0:02, 0:02, 0:03, 0:04/. In the case q = 2, α T = √ 1 2 1 −1 0 0 0 0 1 −1 − 1 2 and φ = .0:02, 0:1/. This produced off-diagonal correlations corr{Y i .u/, Y j .u/} ranging between −0:41 and 0:41 and proportions of variances between 1 5 and 2 3 . For the trend models we used μ i .u/ = m i where m i was adjusted for each i = 1, : : : , 5 to produce an expected number of 1000 points. For the least squares estimation a uniform kernel with bandwidth 0.005 was used for the non-parametric estimation of the cross-pair correlation functions at 100 equispaced lags between 0.025 and 0.25.
We first considered parameter estimation in the case q = 2 by using the least squares method assuming q known and equal to the true value. Tables 2 and 3 show quantiles of estimates of the off-diagonal correlations corr{Y i .u/, Y j .u/} and the proportions of variances PV i .0/ obtained from 1000 simulations of the multivariate model. In general there is good agreement between the true values and the medians of the estimates.
We next applied the cross-validation method to 200 simulations of the multivariate model with both q = 2 and q = 0 and using block lengths 10, 20 or 50; see Section 4. For each simulation and block size we identified the value q s of q with the smallest cross-validation score. Table 4 shows for q = 2 and q = 0 the empirical distributions of the differences between q s and the true q. For both q = 2 and q = 0, the cross-validation method works best with b = 50. In the case q = 2 and b = 50, q s coincides with the true q for 41% of the simulations and differs at most by 1 from the true q in 67% of the cases. For q = 0 and b = 50 the corresponding percentages are 0.76 and 0.90. We also considered the distribution of the estimates of the off-diagonal corr{Y i .u/, Y j .u/} and the PV i .0/ in the case of unknown q = 2. For each simulation the least squares method was applied with the selected   Table 4 sometimes differs markedly from the true q. Quantiles of the simulated parameter estimates are shown in Tables 5 and 6. For most parameters there is reasonable agreement between the medians of estimates and true values but the estimates are more variable than for the case of known q.