A two-methodology comparison study of a spatial gravity model in the context of interregional trade flows

This article argues that the introduction of spatial interactions to model the determinants of origin-destination (OD) flows can potentially result in excessive contiguity. To explain flows between OD regions, it is not only what happens in the origin and destination that is relevant, but also what happens in their neighbouring regions. However, what happens if there is a high degree of overlap between origin neighbouring areas and destination neighbouring areas? The article presents an empirical illustration to re-examine the evidence presented in previous research (Alamá-Sabater et al., 2013) and more closely analyses the territorial level, focusing on the case of interregional trade of goods at the NUTS3 level (Spanish provinces). We then use two different methodologies within the framework of a spatial gravity equation for interregional trade modelling. The findings confirm the importance of spatial dependence on trade flows and in particular that logistics decisions within a province affect shipments from contiguous provinces.


Introduction
The link between spatial dependence and trade flows stems from contributions made by LeSage and Pace (2004;. LeSage and Polasek (2008) and LeSage and Thomas-Agnan (2014) introduce spatial effects into the econometric flow model, which can be interpreted as an extension of OD models used in the international trade literature: the gravity equation. In particular, these authors redefine the concept of spatial effects by considering the idea that the relation between OD flows depends not only on the features of origin and destination, as in a traditional gravity equation, but also on the characteristics of neighbouring regions. These characteristics can be measured by the flows between the neighbouring regions and the origin or destination regions.
This paper contributes to the existing literature in two ways. First, it explores the empirical performance of the gravity model to explain trade flows between regions using a spatial approach. To do so, it employs two different methodologies. The first methodology extends the gravity model controlling for the so-called multilateral resistance (MR) and introducing spatial lags, while the second methodology is based on the spatial econometric flow model introduced by LeSage and Pace (2008). Second, following the latest research in spatial econometrics, we control for the role of connectivity at a highly disaggregated territorial level. Specifically, we focus on level NUTS3 1 in Spain (i.e. provinces).
Recent research has shown that spatial correlation exists in heavily broken down geographical data in Spain (LeSage and Llano, 2013) and it has already analysed the role of transport connectivity across regions on interregional trade in goods using a spatial econometric model approach (Alamá-Sabater et al., 2013). Although LeSage and Llano (2013) and Alamá-Sabater et al. (2013) focused on Spanish regions at the NUTS2 level and their results revealed a spatial pattern, Alamá-Sabater et al. (2013) show the limitations of the level of territorial breakdown chosen. In particular, they show evidence of problems associated with the excessive size of some of the regions, which leads to distortions, for example, in terms of shared neighbours across origin and destination regions. Certainly, consideration of smaller spatial units, such as provinces should substantially improve the results in terms of positive externalities from transport connectivity.
We acknowledge that a problem of excessive contiguity might arise when analysing the determinants of OD flows by taking into account spatial interactions, which cover a wide variety of movements such as journeys to work, migrations, tourism, usage of public facilities, the transmission of information or capital, the market areas of retailing activities, international trade and freight distribution (Rodrigue et al., 2013). In fact, economic activities both generate and attract flows. Nonetheless, if regions are too large, depending on their location and the structure of the territory, there might be an excessive overlapping between neighbouring regions. A smaller basic unit area should therefore be considered.
The rest of the paper is organized as follows. Section two describes the two methodological approaches used. Section three outlines data and variables used for the present study. The regression analysis is performed in section four. Section five contains the conclusions.

The two methodologies
Weighting matrices measure the degree of potential interaction between neighbouring locations. Spatial interactions have been included in gravity equations to model OD flows in a number of empirical applications such as tourism (De la Mata and Llano-Verduras, 2012), migration (LeSage and Pace, 2008) and commodity flows (LeSage and Polasek, 2008). LeSage and Pace (2008) question traditional gravity models to explain the flow of goods between origin and destination due to the potential failure of a spatial component that can lead to model parameters being biased and consequently distort statistical inference. Similarly, Corrado and Fingleton (2012) argue that failing to acknowledge network dependence and spatial externalities leads to biased inference and to an incorrect understanding of true causal processes. However, the economic foundation of many spatial econometric models is weak. In order to overcome this shortcoming, we extend the theoretically justified gravity approach using spatial lags.

Spatial lags
The spatial dependence of the model is captured by the parameters i . The spatial econometrics literature (Anselin, 1988) measured relations with neighbouring regions by using weighting matrices. The structure of spatial dependence incorporated in weighting matrices preconditions any estimate obtained. With regards to how a neighbouring relation is defined in gravity-type models, the most common definition describes regions with the same border (Porojan, 2001), but there are studies that consider additional criteria, such as Behrens et al. (2012), 2 LeSage and Polasek (2008) or Alamá-Sabater et al. (2011 and2013). 3 In a gravity framework that uses symmetrical spatial interaction data, 4 we have to amplify the weighting matrix and build an n 2 *n 2 matrix to take into account all trade flows between all regions. For example, the model matrices might be defined as , if W is n*n, then is n 2 * n 2 . Note that in this type of model the spatial effect is amplified because of the dimension of the flow model, as each region has a relationship with the other regions.
2 This application does not contain any form of geographic connectivity as they use a similarity measure based on the relative size of regions. 3 They construct a measure of transport connectivity to include in the weighting matrices. 4 Recently, Márquez-Ramos (2014) uses a spatial approach with asymmetrical spatial interaction data (i.e. the number of origins is different from the number of destinations. Moreover, origins cannot also be destinations).
this paper: 1) , 2) and 3) . Where W matrix represents an n by n spatial weight matrix based on a neighbour's criteria of geographical first-order contiguity. Non-zero values for elements i and j denote that zone i neighbours zone j, whereas zero values denote that zones i and j are not neighbours. The elements on the diagonal are zero to prevent an observation from being defined as being its own neighbour.
It is important to highlight that when working with autoregressive specifications, as is the case with this paper, the structure of the model implies that the influence of the "neighbours of neighbours" is taken into account. Consequently, with an autoregressive type model in a territorially highly-disaggregated trade dataset, we are taking into account "second order" neighbour relations that generate the abovementioned problem of excessive contiguity.

A theoretically justified gravity approach with spatial lags
According to the traditional gravity model of trade (Anderson, 1979), the volume of aggregate exports between pairs of regions and/or countries, depends on their income, geographical distance and a series of dichotomous variables. Trade is expected to be positively related to income and negatively related to distance. Gravity models applied to the study of trade flows among countries normally include dichotomous variables such as whether or not the trading partners share the same language or have a common border, as well as variables for free trade agreements in order to assess the effects of regional integration. Distance is also included in most empirical studies that employ gravity equations as a proxy for transport costs.
The concept of transport connectivity has already been considered in gravity studies of trade by means of analysing transport-cost reducing measures (Limao and Venables, 2001;Sanchez et al., 2003;Clark et al., 2004;Micco and Serebrisky, 2004;. However, this branch of the literature only considers the spatial effects of the neighbouring regions as additional traditional regressors to be included in gravity equations, not defining weighting matrices. In this sense, ignoring a spatially lagged dependent variable can lead to biased parameter estimates, implying inaccurate estimates of infrastructure impacts (Cohen, 2010).
It is important to highlight that when inherent spatial effects are explicitly taken into account in gravity models, the magnitude of the estimated parameter changes (Porojan, 2001). Porojan (2001) also stressed that with the presence of spatial autocorrelation, the estimated parameter on the distance variable might capture a spatial pattern that reflects the structure of territory. In a more recent study, Behrens et al. (2012)  (1) where lnX ij denotes the logarithm of exports from a Spanish region i to an importing Spanish region j; lnY i (lnY j ) is the logarithm of the GDP for exporter i (importer j); Yh i (Yh j ) is GDP per capita in the exporting region (importing region); Dist ij measures the distance between capital cities or the economic centres of the two regions; CI i (CI j ) is the connectivity index that measures transportation networks in each exporter (importer), which has been calculated using information on the number of logistics facilities and the number of square kilometres of logistics zones at the NUTS3 level (Suárez-Burguet, 2012); following LeSage and Polasek (2008), we also include area i (area j ) to control for the area of the origin and destination regions. The spatial lag vector would be constructed by averaging flows from neighbours to the origin region and parameter 1 would capture the magnitude of the impact of this type of neighbouring observation on the dependent variable. The spatial lag vector would be constructed by averaging flows from neighbours to the destination region and parameter 2 would measure the impact and significance of flows from origin to all neighbours of the destination region. The third spatial lag in the model is constructed using an average of all neighbours to both the origin and destination regions. Estimating parameters 1 , 2 and 3 provides an inference of the relative importance of the three types of spatial dependence between the origin and destination regions. Then, 1 , 2 and 3 are the spatial autocorrelation coefficients and the null hypotheses test that 1 =0; 2 =0 and 3 =0. Rejecting these null hypotheses implies that trade flows from/in one region are directly affected by the importance of trade flows from /in neighbouring regions. Finally, is a random disturbance.
We estimate two additional specifications derived from equation (1). First, we take into account a remoteness factor in our gravity analysis by incorporating proxy variables and also using spatial lags, in line with Márquez-Ramos (2014). Then, we estimate equation (2): (2) Where rem i (rem j ) is the variable exporter (importer) remoteness based on equation 2a (2b). That is, for a given origin-destination pair i and j, the degree of remoteness of region i is defined as: where w Y is the sum of the income of the importing regions of region i considered in this study (total GDP in the Spanish peninsula). Similarly, the variable remoteness is also calculated for the importer: In order to control MR factors, dummies for exporters and importers can be added to the empirical model instead of remoteness variables. However, since income, surface and transport connectivity variables are region specific, we estimate an additional version of equation (1) that includes country dummies for importers ( j ) and for exporters ( i ), and assumes that the effect of the transport connectivity variables is of equal magnitude for both exporters and importers (i.e. CI ij =CI i *CI j ): We estimate equations 1, 2 and 3 using instrumental variables (IV) (Gibbons and Overman, 2012)  . One way of validating the results is to observe whether they are robust for the different specifications (i.e. equations 1, 2 and 3).

The spatial econometric modelling of OD flows
LeSage and Pace (2009) and Behrens et al. (2012) suggest using spatial econometrics to control for MR. Therefore, our second methodology is the spatial econometric flow model (LeSage and Pace, 2008). Its purpose is to explain variation in the magnitude of flows between each OD pair and it is based on the type of spatial autoregressive models appearing in equation (4): As in gravity models, X's matrix captures the characteristics of origin and destination regions that could influence bilateral trade. Each variable produces an n 2 by 1 vector with the associated parameters in origin i, o , and destination j, d . The dependent variable represents an n by n square matrix of interregional flows from each of the n origin regions to each of the n destination regions, where each of the n columns of the flow matrix represents a different destination and the n rows reflect origins.
Our focus is on extending gravity equations and then, we consider a number of characteristics of the origin and destination regions. As per LeSage and Polasek (2008), the explanatory variables used to construct the matrices X o (origin) and X d (destination) are the (logged) area, the (logged) population, the (logged) GDP per capita and the In a preliminary analysis and in order to compare results obtained when the model is estimated with and without spatial lags, we rely on the maximum likelihood method to test for spatial dependence. As the null hypothesis of absence of spatial dependence is rejected, the gravity model that includes the spatial lags appears to be a better alternative. In addition, lower values are produced for the Akaike Information Criterion (AIC) and Root Mean Squared Error (RMSE), indicating that the spatial model is preferred. Therefore, the empirical illustration relies on regressions that include the spatial lags.

Data and variables
We generate a dataset containing total commodity flows transported between 47 of the   In order to introduce logistics characteristics, the (transport) connectivity index (CI) is calculated as a simple average of the number and size of logistics platforms. Scores of every dimension are derived as an index relative to the maximum and minimum achieved by both origin and destination regions, based on the assumption that logistics play a comparable role in OD. The performance of the CI takes a value between 0 and 1 calculated according to equation (5) According to this index, if regions i and j have a good logistics performance and share a border, the matrix element is close to 1; if however they border one another but the logistics infrastructure is poor, the matrix element is close to zero, and if they do not border one another the matrix element is zero. In order to explain the model and the dependent variable, we generate an n 2 by 1 vector by stacking the columns of the matrix. If we consider a model with four regions, the flow matrix would appear as in  In the empirical analysis we use the two methodologies described above. A first variant of our models includes only first-order contiguity (contiguity-based model, with a modified matrix W m_contiguity ), whereas our second variant of the model (transport connectivity model, with a modified matrix W m_connectivity ) reflects transportation networks in Spanish provinces by using the surface area and size of logistics platforms.

SPANISH PROVINCES (NUTS 3) BY IMPORT INTENSITY (INFLOWS)
Source: Own elaboration. Figure 5 shows the map of the connectivity index derived from equation (5). 11 Examining the maps in Figures 3 and 4 in conjunction with that of the logistics network in Figure 5, there appears to be more flows in origin and destination regions in the provinces where logistics networks are more extensive than in provinces with less developed logistics networks. Therefore, in the case of Spain, a clear differentiation can be made between provinces in terms of logistics performance. This descriptive test emphasizes the need to explicitly incorporate such information into the spatial and network dependence structure when analysing trade flows, as it might result in substantial differences in the estimates and inferences. It can also be compared to previous research that uses the spatial econometric flow model at the NUTS2 level in Spain (Alamá-Sabater et al., 2011 and. Although their results support the use of an empirical framework where the spatial dependence of interregional trade flows is introduced, they also provide evidence of the limited significance of spatial lags. 12 It is therefore expected that the higher the level of disaggregation of geographical data, the greater the positive effect of the spatial lags. There are three main reasons for this: First, it seems unlikely that a small spatial economic unit could produce many goods without 11 The regions containing the highest values are in dark red. 12 The Moran's I statistic is used to analyse the existence of spatial autocorrelation.
the help of the surrounding areas, or that a small economic unit would not benefit from the transport networks of the surrounding areas to reach markets that would otherwise be unreachable without crossing them. Second, the variability of data means that the larger the geographical units, the more heterogeneous they are when treated as a whole.
Finally, a problem of excessive contiguity might potentially arise due to the structure of the territory.

The spatial econometric flow model. Main results.
In order to analyse the spatial dependence of interregional Spanish trade flows, we estimate equation (4) by maximizing the log-likelihood function with respect to parameters 1 , 2 and 3 . As before, for simplicity, only the results of the spatial lags are presented as they are the main focus of this paper. First, Table 3 shows the results of the spatial lags for total trade in the contiguity and the connectivity models, and shows that the spatial lags are positive and significant.
If we compare results in Table 3 with those obtained at NUTS2 in previous literature (see Table 1 in Alamá-Sabater et al., 2013) Turning our attention to specific sectors, 14 Tables 4 and 5 show the results by sector for the connectivity and the contiguity model, respectively. These results can be compared to those obtained at a NUTS2 level in previous literature (see Table A.1 in the Appendix). 15 Table 4 shows that three different patterns emerge. First, those sectors for which origin-based dependence is the most important (R3: Food Industry and R7: Paper, printing and Graphic Arts), where an origin region with a good transportation connection network to surrounding regions benefits the most in terms of interregional 14 To do so, we follow previous research that introduces spatial lags in the spatial econometric flow model and estimates different regressions by activity branch (Alamá-Sabater et al., 2011 and. 15 Although the level of significance and the elasticity of X variables changes for regressions in different sectors, overall, our results show that area, population and income per capita are both positive and significant. The larger the area, population and income per capita of a region, the greater the interregional trade flows. Unemployment is found not to be significant in most of the regressions, whereas distance results are ambiguous. Full results are available from the authors upon request. exports. Second, we determine sectors where destination-based dependence is the most important (R1, R4, R8, R9, R10, R11, R14 and R15), where a destination region with a good transportation connection network to surrounding regions benefits the most in terms of interregional trade. Finally, we also find those sectors where OD dependence is the most important (R2, R5, R6, R12 and R13). Then, destination-based dependence, i.e. that dependence considering trade between an origin region and regions neighbouring the destination, is found to be of greater importance than origin-based dependence in a number of sectors. Notes: ***, **, * indicate significance at 1%, 5% and 10%, respectively. Z-statistics are given in brackets. The (logged) dependent variable is measured in Tonnes. Finally, Table 5 shows the results by sector for the contiguity model. The sign and significance of 1 , 2 and 3 are similar in both the contiguity and the connectivity models, excluding the case of R12 (Manufacture of machinery and mechanical equipment), for which origin dependence seems to be more important than destination dependence in the transport connectivity model, whereas the opposite is true for the first-order contiguity model. It is important to note that there is a consistent pattern of parameter 2 being positive and significant more times than 1 in a number of sectors, suggesting that neighbours of the destination region in both the contiguity and the connectivity model represent a more important determinant of higher levels of industrial commodity flows between OD pairs.

Conclusions
This paper analyses the role of transport connectivity in interregional trade flows using a spatial approach by using highly-disaggregated regional trade data at a provincial level in Spain. In order to do so, we use a gravity framework and take into account multilateral resistance in a two-methodology comparison. In order to test whether incorporating transport connectivity information into the spatial structure of the model results in substantial differences in the estimates, we have defined different types of neighbour relations. In particular, two different variants of the model were estimated, based on first-order contiguity and transport connectivity criteria in order to construct the weighting matrices.
We find evidence that transport connectivity has a bearing on interregional trade.
Moreover, we show that forces leading to flows from an origin province to a destination province would create similar flows to neighbouring destinations. Regions therefore benefit from their neighbours' transport connectivity. These results not only provide evidence about the role of the location of logistics platforms in satisfying the existing demand for transport structures, but also as to the benefit of introducing spatial dependence in gravity models of trade when analysing interregional trade flows, as ignoring spatial lags might lead to biased estimation of the parameters.   (3.36) (4.54) Notes: ***, **, * indicate significance at 1%, 5% and 10%, respectively. Z-statistics are given in brackets. The (logged) dependent variable is measured in Tonnes. Source: Alamá-Sabater et al. (2013)