Directional analysis for point patterns on linear networks

Statistical analysis of point processes often assumes that the underlying process is isotropic in the sense that its distribution is invariant under rotation. For point processes on ℝ2 , some tests based on the K‐function and nearest neighbour orientation function have been proposed to check such an assumption. However, anisotropy and directional analysis need proper caution when dealing with point processes on linear networks, as the implicit geometry of the network forces particular directions that the points of the pattern have to necessarily meet. In this paper, we adapt such tests to the case of linear networks and discuss how to use them to detect particular directional preferences, even at some angles that are different from the main angles imposed by the network. Through a simulation study, we check the performance of our proposals under different settings, over a linear network and a dendrite tree, showing that they are able to precisely detect the directional preferences of the points in the pattern, regardless the type of spatial interaction and the geometry of the network. We use our tests to highlight the directional preferences in the spatial distribution of traffic accidents in Barcelona (Spain), during 2019, and in Medellin (Colombia), during 2016.

anisotropy in such a situation. In addition, the existence of more segments in particular angles does not necessarily lead to more points in these directions due to, for instance, the effect of possible latent attributes. Further, it is not clear if the tendency of the underlying distribution of points towards particular angles is only caused by the geometry of the network.
In this line of arguments, the literature currently misses appropriate statistical tools to study the existence of tendencies towards particular directions in the distribution of points when taking the underlying network into account. Here, it is crucial to be able to distinguish a directional tendency caused by the geometry of a network from a tendency coming from the underlying distribution of events. In this paper, having the work of Ohser and Stoyan (1981) and Comas et al. (2015) in mind, and upon the geometrically corrected K-function (Ang et al., 2012), we propose some tools to discover directional tendencies in the distribution of point patterns over a network apart from what is caused by the geometry of the underlying network. In addition, we propose to use the counterpart version of the nearest neighbour orientation function for linear networks and its circular density estimation (Baddeley, Rubak, & Turner, 2015 chapter 8). However, these functions will likely point to the main angles of the underlying network. In order to exclude what the underlying network inherently brings into these functions, we suggest to compare the estimated functions coming from the data with those obtained from point patterns which are angle-wise uniformly distributed over the network.
Thereby, we are able to see any discrepancy between such estimators, and, furthermore, we can estimate the relative risk of the circular density of the nearest neighbour orientation function of our data relative to that of an angle-wise uniformly distributed pattern. Note that such relative risk uncovers the angles for which the pattern in question shows higher/lower densities than the case where points are angle-wise uniformly distributed.
The paper is structured as follows. In Section 2, we provide the required background about point patterns on linear networks. Section 3 is devoted to introducing the K-function-based orientation function and the nearest neighbour orientation function. In Section 4, we study the performance of our methodology under different settings. Section 5 provides our directional analysis for traffic accidents in Barcelona (Spain) during 2019 and in Medellin (Colombia) during 2016. The paper ends with a discussion in Section 6.

Linear networks
A linear network L ⊂ R 2 is defined as a union of finite number of line segments l i = {tu i + (1 − t)v i ∶ 0 ≤ t ≤ 1} such that u i , v i ∈ R 2 are the endpoints of l i . The length of each line segment l i is denoted by |l i |, and it gives rise to |L| standing for the total length of the linear network L, being the sum of the length of all individual segments. The endpoints are usually called vertices, and the number of segments which exit from each vertex is referred to as the degree of that vertex. The distance between any two arbitrary points u, v ∈ L is denoted by d L (u, v), which is commonly calculated by a regular metric distance . Examples of regular metrics include Euclidean and shortest-path (travel) distance. We add that the intersection between any two arbitrary segments l i and l j is either an empty set or a mutual vertex of both. In other words, we assume that there is no underpass/overpass in L. A linear network which has no closed loop is often called a 'tree' that might be often seen in applications related to rivers or dendrite trees of neurons. See Baddeley et al. ((2015), chapter 17) for more details.

Point processes on linear networks
Throughout the paper, a simple point process on the linear network L is denoted by X, being a random countable subset of R 2 , and x = {x 1 , x 2 , … , x n } is used to show a realisation of such process with 0 < n < ∞ points. For any measurable function f ≥ 0, the Campbell formula (Daley & Vere-Jones, 2008) states that where (·) is called the intensity function of X and d 1 stands for integration with respect to arc length. The intensity function (u) gives rise to the expected number of points per unit length of L in the vicinity of u ∈ L. If (·) ≡ , the process is called homogeneous, otherwise it is referred as inhomogeneous. Common statistical methodologies for point processes often assume that X is stationary, that is, the distribution of X is invariant with respect to a family of transformations/shifts, and isotropic, that is, the distribution of X is invariant with respect to a family of rotations. These assumptions, however, are quite challenging when taking a linear network as the state space of an underlying point process X since there is no guarantee that points will still live on the underlying network after applying a transformation/rotation Cronie et al., 2020).
Moreover, we denote the number of points falling in any arbitrary measurable subset A ⊆ L by N(X ∩ A), with the expectation and, further where 2 (· , ·) is called the second-order product intensity function of X and gives the expected number of points per unit length of L jointly happening in the vicinities of u, v ∈ L.

Second-order summary statistics
An important while basic step when dealing with point pattern analysis is checking for the existence of any type of interaction between points, which is essential prior to model fitting. Common types of interaction include clustering or repulsion as opposite to randomness. Popular tools to detect interaction between points have been the K-function and pair correlation function. The pair correlation function is defined as If 2 (u, v) > (u) (v), then points tend to happen jointly, whereas 2 (u, v) < (u) (v) means that points rather tend to maintain an interdistance.
Further, g(v, u) = 1 claims the nonexistence of clustering or repulsion between points which is a well-known property of Poisson processes.
Point processes on L for which g(u, v) = g(d L (u, v)) are called second-order reweighted pseudostationary (SORS) by Ang et al. (2012), and for such processes, the geometrically corrected K-function is defined as and The above functions are called geometrically corrected since they account for the geometry of the network through the correction factor m(u, r), which gives the number of points on L with exact distance r units from u. For Poisson processes K(r) = r, whereas K(r) > r (K(r) < r) indicates clustering (repulsion) between points. See Ang et al. (2012) for more details.

METHODOLOGY
We now present the K-function-based orientation function and the counterpart of the nearest neighbour orientation function for linear networks.

Second-order orientation analysis
The use of K-function and pair correlation function, in Section 2, is usually carried out to study the existence of any pairwise interaction among points without taking possible directional effects into account. We now propose a modified version of (5) to further account for the pairwise directional interaction between points. The new K-function is of the form where Φ(u, x) stands for the angle between u ∈ L and x ∈ X and m(u, r| ) is the number of points laying exactly r units away from u within an angle . Note that Φ(u, x) = Φ(x, u)± and K anis (u, r, ) = K anis (r, ). We highlight that K anis (r, 2 ) = K(r), since 1{Φ(u, x) ≤ 2 } = 1 and Ang et al. (2012), a nonparametric estimator of K anis is given bŷ which reduces to the K-function estimator proposed by Ang et al. (2012), when = 2 . The advantage of K anis over K is that former further analyses the pairwise interaction within angles. However, rather than focusing on this aspect, we are here more interested in making use of K anis for orientation analysis. Therefore, we define the following cumulative distribution function in the line of the original contribution of Ohser and Stoyan (1981), which studies the orientations of the point process X for interdistances between 0 ≤ r 1 and r 1 < r 2 . Note that different combinations of r 1 , r 2 may describe short-range, middle-range, or long-range orientation of X (Ohser & Stoyan, 1981). Since O (r 1 ,r 2 ) (·) is a cumulative function, any tendency of the distribution of points towards a particular angle appears as a jump in O (r 1 ,r 2 ) (·). However, it might be of more interest to actually look into the vicinity of angles rather than the cumulative (scaled) count of points up to a fixed angle. Therewith, the use of the density function is recommended (Comas et al., 2015). In this case, a clear tendency towards particular angles appears as sharp peaks in O * (r 1 ,r 2 ) (·). In practice, calculating the above functions can be quite costly, especially when dealing with big data/network .

Nearest neighbour orientation function
Although both O (r 1 ,r 2 ) (·) and O * (r 1 ,r 2 ) (·) are useful tools to check the assumption of anisotropy, their practicality, however, might be affected by the choices of r 1 and r 2 . Hence, we next present an alternative approach using the nearest neighbour orientation distribution. To do so, let = { 1 , … , n } be the set of angles of the vectors joining each data point in x = {x 1 , … , x n } to its nearest neighbour. A circular probability density function, say (·), of such angles can then highlight particular tendencies towards some specific angles. More explicitly, Note that in the presence of isotropy, (u) ≡ 1/2 for 0 ≤ u ≤ 2 . We note that, in practice, such circular probability density function needs to be estimated (Illian et al. (2008), chapter 4 & Baddeley et al. (2015), chapter 8). In our simulation study and data analysis, we make use of a kernel-smoothed estimate (Baddeley et al., 2015 chapter 8, p. 278). Computationally, the nearest neighbour orientation function is more efficient than the second-order tools in Section 3.1. We add that this technique can be extended to the circular probability density function of kth (k > 1) nearest neighbour orientation function.

Relative influence of the underlying network
Generally, one possible source to make a point process in R d anisotropic might be a latent geometry which forces events to occur in some specific directions (Rajala et al., 2018). Under these circumstances, although such geometry may not be visible, however, the main detected directions by either second-order orientation analysis or the nearest neighbour orientation function might be (mis)understood as evidence of anisotropy.
Clearly, when dealing with point processes on linear networks or dendrite trees, there already exists a geometry which forces events to occur in some fixed directions. We then need to explore the differences between the distribution of the angles of the observed point pattern and that of the network itself. It is to be emphasised that even if the points are randomly and angle-wise uniformly distributed over a linear network, the tools for the orientation analysis will still show the main angles of the underlying network; this, however, may not be taken into account as an evidence for anisotropy since points cannot happen on other angles. We next propose some solutions for such a concern in order to exclude the directional effects of the underlying network.
With the aim of excluding the effect of the network, one first needs to get an insight in what the geometry of the network brings into such orientation analysis. Hereafter, we call the inherent orientations of the network by the fingerprint. Such fingerprint can be estimated through both the orientation function O * (·,·) (·) and the circular probability density function (·). We denote the corresponding functions for the network by O * ,L (·,·) (·) and L (·), so that they can be approximately estimated from point patterns over the network with no preferential direction. However, and to avoid possible random effects from simulating point patterns, we suggest to simulate some m > 1 times from the model in consideration (with no preferential direction), estimate the O * ,L (·,·) (·) and L (·) for each, and then use such estimates to get an insight into the average behaviour ofÔ * ,L (·,·) (·) and̂L(·) for angle-wise uniformly distributed patterns. We here note that the type of interaction between points might not be of great importance. Therefore, we estimate the fingerprint of the network based on different models with no directional preference including Poisson, Matérn cluster, and simple sequential inhibition models.
Any deviation from the orientations of the network can be detected graphically. One such solution is to compare the distributions ofÔ * (r 1 ,r 2 ) (·) withÔ * ,L (r 1 ,r 2 ) (·) for different ranges of r 1 , r 2 (Comas et al., 2018). Another solution would be to investigate the relative risk of the circular probability density functions (·) relative to L (·), that is, (·)/ L (·). Note the importance of considering a common bandwidth when estimating such relative risk (Hazelton, 2008;. We highlight that an estimation of this relative risk underlines the angles for which the point pattern in consideration has more/less points than what is expected when points are distributed over the network with no preference towards any fixed angle.

SIMULATION STUDY
This section is devoted to evaluate the performance of the methods presented in Section 3. We consider different point process models with and without directional effects, including inhomogeneous Poisson, Matérn cluster, and an immigration-death model. The simulation study is carried FIGURE 1 Point pattern realisations under different models. Left column: inhomogeneous Poisson process with intensity function (u) = (u 1 , u 2 ) = 12 + sin(12(u 1 + u 2 )); Middle column: Matérn cluster process with radius 2 units for the clusters, with a tendency on 90 • for the linear network and a tendency on 60 • for the dendrite; Right column: Immigration-death with a tendency on 60 • for the linear network and a tendency on 90 • for the dendrite out over a linear network as well as a dendrite tree (see Figure 1). The linear network has 64 vertices and 112 segments, maximum vertex degree 4, and total length of 16 units. In addition, this network inherently has its segments on angles 0 • , 90 • , 180 • and 270 • . The dendrite has 1,000 vertices and 999 segments, maximum vertex degree 4, and an approximated total length of 21 units. The dendrite does not show any preference towards particular angles. For creation of such artificial networks, we have used the R packages spatstat (Baddeley et al., 2015) and emstreeR (Quadros, 2019). The models we consider are as follows: 1. Inhomogeneous Poisson process with intensity function (u) = (u 1 , u 2 ) = 12 + sin(12(u 1 + u 2 )), u = (u 1 , u 2 ) ∈ L. This model does not have any preference towards fixed angles. The simulated pattern over the linear network has 172 points and the one on the dendrite 228 points.
See the left column of Figure 1. The simulation is done using the R package spatstat (Baddeley et al., 2015).
2. Matérn cluster process with radius 2 units for the clusters. We consider uniformly distributed 40 parents, so that each is then replaced by 3. Immigration-death. Assume that immigrants arrive randomly according to a Poisson process with constant intensity . Moreover, a simple death process with intensity is incorporated to ensure that the total number of points n remains bounded. Now, new arrived points are accepted or rejected in terms of the following spatial interaction mechanism. Let d i, new , 1 ≤ i ≤ n s , where n s < n is the number of previously established points, be the shortest-path distance between a newly arrived point and all previously established points and i, new be the angle between the newly arrived point and all other points. Now immigrants are accepted with probability where where r 1 < r 2 is the spatial scale of anisotropy, is the prescribed anisotropic directional effect (0 ≤ ≤ ), c is a threshold, and 0 ≤ a ≤ 1 is a constant that defines the strength of the anisotropic effects. The resulting point pattern, defined by (12), after a long enough 'burn-in' period is an anisotropic point configuration on a linear network, with angular direction . In the particular case, where a = 1, the resulting point pattern is a realisation of a Poisson process with intensity / . In our simulation study, for both networks, we consider a = 0, = 1, c = 18 • that somehow relaxes the condition of angles, r 1 = 0, and r 2 = 0.15 units. Moreover, we consider the death rates = 10 −5 and = 10 −3 for the realisations over the linear network and dendrite, respectively. The generated pattern over the linear network has 77 points, whereas the pattern simulated over the dendrite consists of 145 points. See the right column of Figure 1. Figure 1 shows some point pattern realisations from the three mentioned models over both a linear network and a dendrite tree. It can be seen that the inhomogeneous Poisson patterns apparently are not in favour of any particular angle nor the main angles of the network neither any other angle. However, for the simulated patterns of the Matérn cluster model, we can see that for the linear network, most points live on the vertical segments, and for the dendrite, the points indicate a diagonal orientation. In addition, the realisation of the immigration-death model over the linear network shows a slight diagonal preference, while its realisation over the dendrite maintains a vertical directional effect.
These indications of anisotropy, however, need to be checked more carefully. Thus, we next make use of the nearest neighbour orientation function to check any angle preference in the simulated patterns. In particular, we are interested in graphically testing the null hypothesis of there is no angle effect in the distribution of points, versus the alternative hypothesis which claims the existence of directional effects. Note that having no directional effect here means that points are angle-wise uniformly distributed. In order to check directional effects, we suggest the following: (1) estimate the circular probability density function (11) for the point pattern realisation; (2) generate some m > 1 number of realisations from the null hypothesis and estimate their circular probability density functions and (3) estimate the relative risk of the circular probability density function of the empirical point pattern relative to the average of m estimated probability functions in step 2. We emphasise the importance of using a common bandwidth in the estimation of relative risk . We make use of the R package spatstat for the kernel estimation of the density of the nearest neighbour orientation function.
In order to simulate from the null hypothesis, we consider homogeneous angle-wise uniformly distributed patterns with three different types of interaction: Poisson, Matérn cluster, and simple sequential inhibition. In each of these models, we simulate the same number of points as the empirical pattern. When estimating the relative risk, bandwidth is selected using the Silverman's rule-of-thumb (Silverman, 1986) for each individual pattern, and then a common bandwidth is estimated as the geometrical mean of all individuals. Figure 2 represents the estimated relative risks for the patterns in Figure 1 relative to their corresponding null hypothesis. These relative risks highlight the differences in the behaviour of the corresponding point pattern with respect to different angles, underlining angle preferences in the spatial distribution of the point pattern. The very first thing we can note from Figure 2 is that the relative risks obtained for the different null hypothesis almost fully coincide. This means that the type of spatial interaction for the patterns defining the null hypothesis has no or little effect, since interest here focuses on angles. In addition, looking closer to the relative risks in Figure 2, we can see that for the inhomogeneous Poisson patterns, the estimated relative risks are basically the same with respect to different angles. In other words, the estimated circular FIGURE 2 Relative risks of the estimated circular density function of the patterns in Figure 1, with the same order, relative to the estimated averaged circular density function under the null hypothesis, which is angle-wise uniformly distributed Poisson (solid), Matérn cluster (dotted), and simple sequential inhibition (dashed) in Figure 2, we have generally seen that such suggested relative risk can detect any angle preference in the spatial distribution of points over the network, even if it is not the main angle of the network.
We next turn to comment on the practical use of the K-function-based orientation function. Figure 3 shows the estimated O * for different ranges of r 1 and r 2 and for the simulated patterns of the Matérn model over the dendrite tree (middle plot in the bottom row of Figure 1) and the immigration-death model over the squared linear network (right plot in top row of Figure 1). In each case, for a fixed r 1 > 0, we estimate O * for several choices of r 2 . Looking closer at, for example, the two left-side plots in Figure 3, we can see that although the two peaks ofÔ * , in the very left panel, point to the angle preference of the simulated pattern, the behaviour of O * strongly depends on the choices of r 1 and r 2 , which makes it difficult to come to a concrete conclusion. For the two right-side plots in Figure 3, we note that for the selected choices of r 1 and r 2 , we cannot see the angle preferences of the simulated pattern, that is, 60 • (240 • ). Note that there might be other choices of r 1 and r 2 which can highlight the angle preferences. We add that it even becomes more complicated when thinking to build, for example, an envelope based on the simulated patterns from the null hypothesis for more solid conclusions. We further note that, in this case, the type of interaction among points may make a difference.
Although both presented tools, the nearest neighbour orientation function and the K-function-based orientation function, are prepared to respond to detection of particular main directions making the point pattern anisotropic, we have shown that the former tool is more reliable, computationally faster, and much easier to interpret.

REAL DATA ANALYSIS
This section is devoted to detecting main directions and angle preferences in two real cases, traffic accidents in Barcelona (Spain)  previously studied for the purpose of intensity estimation and spatially varying relative risk by Moradi (2018) and . however, some were duplicated due to having more than a single collision type. Moreover, there have been several traffic accidents with the exactly same location but, for example, different time occurrences. Since we are interested in detecting angle preferences in the spatial distribution of points, we do not consider these records, and it leads to a final pattern with 5,700 points. The corresponding linear network is constructed by 25,888 vertices and 31,235 line segments. The total length of the network is 1,377.444 km, and the maximum degree of the vertices is 7. The right panel in Figure 4 shows its estimated intensity, based on the quick kernel-based intensity estimator using 2D convolution , after considering the Scott's rule-of-thumb to select the bandwidth which yielded 0.54 km (Scott, 2015). It can be seen that the highest intensity is estimated at the 'Eixample' district which is characterised by long straight streets, a strict pattern of square blocks crossed by wide avenues. Moreover, several touristic places are located in this district. In addition, apart from some areas in the north-west,  (Scott, 2015). Intensity values are accidents per kilometre north-east, and south-west which show proportionally lower intensities than other areas, the rest of the network apparently shows a similar intensity. Figure 5 shows the traffic accidents in Medellin during 2016. In this case, traffic accidents are further labelled based on accidents' severity by 'fatal' (133 points), 'personal injury' (6,004 points), and 'property damage' (4,627 points). The road network has 54,164 segments, 48,194 vertices, maximum vertex degree 6, towards particular angles in the two datasets. We and total length 1,244.202 km. The corresponding estimated intensities and relative risks can be found in Moradi (2018) and .
We now analyse if there is any tendency towards particular angles in the two datasets. We use the kernel estimate of the circular density function of the nearest neighbour orientation function. For each dataset, we generate 99 angle-wise and spatially uniformly distributed patterns with the same number of points as the traffic data. Then, we estimate the corresponding circular density function with a common bandwidth and obtain the average of these 99 estimations. Note that in Section 4, we showed that the type of spatial interaction has a neglected effect on the null hypothesis. Left panel of Figure 6 shows the relative risk for traffic accidents in Barcelona using the common bandwidth 16. Since traffic accidents in Medellin are labelled by their type of severity, we further show the relative risk of those causing fatality and personal injury relative to those causing property damage. From Figure 6, we can see that the relative risk of accidents involving fatality relative to those involving property damage (using a common bandwidth 24.13 • ) is higher for angles between 90 • and 180 • than other angles, with some other peaks towards 20 • , 220 • and 280 • . On the contrary, the relative risk of traffic accidents causing personal injuries relative to those involving property damage (using a common bandwidth 16.80 • ) does not show any tendency towards any fixed angle, being uniformly distributed with respect to angles from 0 • to 360 • . In other words, angle-wise speaking, they show a similar distribution. From Left to Right: estimated relative risk of traffic accidents in Barcelona relative to the case if they were angle-wise uniformly distributed; Estimated relative risk of traffic accidents in Medellin relative to the case if they were angle-wise uniformly distributed; Estimated relative risk of traffic accidents in Medellin causing fatality relative to those causing property damage; Estimated relative risk of traffic accidents in Medellin causing personal injury relative to those causing property damage

DISCUSSION
We have presented different tools to check directional preferences in the spatial distribution of points over linear networks. The main challenge is that the structure of the underlying network drives events to occur at particular fixed angles, that is, the angles of the network. This, however, cannot be taken into account as directional preference since events are forced to maintain such directions. Nevertheless, it is of great interest to check if there is any preference among the possible directions caused by the network, or any preference towards angles different than those derived by the geometry of the network.
We have presented the counterparts of the K-function-based orientation function and the nearest neighbour orientation function for linear networks. Since these functions are unavoidably affected by the geometry of the network, we have suggested to compare their estimators with those obtained based on angle-wise uniformly distributed patterns on the same network and with the same number of points. Thereby, we can have an insight into the angle preferences caused by the inherent geometry of the network. We have then recommended to estimate the relative risk of the circular density function of the nearest neighbour orientation function of the data in question relative to that, on average, of the angle-wise uniformly distributed patterns. Therefore, we obtain a measure according to which we can uncover angle preferences among the main angles of the network or any other angle. We have seen that such relative risk is somehow not affected by the type of spatial interaction between points of the angle-wise uniformly distributed patterns. In contrast, the practicality of the K-function-based orientation function is, however, quite limited due to being influenced by the distance ranges (Baddeley et al., 2015 chapter 7, pp. 239) and computationally expensive . Moreover, the type of interaction among points may also have secondary effect.
The problem of anisotropy on networks is still in its infancy. There are many points that still need to be solved. One is formally testing using circular statistics over the estimated circular densities. We have approached the problem from a more graphical point of view. Clearly, statistical formality should reinforce our arguments. Another aspect is considering the evaluation of the strength of anisotropy. This could be a subsequent stage in the analysis. This comes together with considering anisotropic point process models for networks, an area completely open for research.