Ensembling Multiple Radio Maps with Dynamic Noise in Fingerprint-based Indoor Positioning

Fingerprint-based indoor positioning is widely used in many contexts, including pedestrian and autonomous vehicles navigation. Many approaches have used traditional Machine Learning models to deal with fingerprinting, being k-NN the most common used one. However, the reference data (or radio map) is generally limited, as data collection is a very demanding task, which degrades overall accuracy. In this work, we propose a novel approach to add random noise to the radio map which will be used in combination with an ensemble model. Instead of augmenting the radio map, we create n noisy versions of the same size, i.e. our proposed Indoor Positioning model will combine n estimations obtained by independent estimators built with the n noisy radio maps. The empirical results have shown that our proposed approach improves the baseline method results in around 10% on average.


I. INTRODUCTION
Despite the large variety of technologies for Indoor Positioning -based on visible light communications (VLC), ultrasounds, ultra-wide band (UWB), among others-, Wi-Fi and Bluetooth Low Energy (BLE) fingerprinting are widely spread.
Bahl and Padmanabhan [1] proposed RADAR, the first RFbased indoor positioning system for locating and tracking users inside buildings 20 years ago. The idea was simple, the digital signature of the Radio-Frequency (RF) signals could be used to estimate the position of the users. In particular, they exploited the signal strength information provided by the discover protocol of the nearby available Wi-Fi routers and Access Point (APs). They combined that information with previous empirical samples and signal propagation modelling to propose the final indoor positioning system. Their proposed method, based on the k-NN algorithm, is still widely used for smartphone-based applications and vehicles tracking [2], [3].
In machine learning, the ensemble model is widely used to improve the accuracy of estimators. When heterogeneous estimators are combined, the overall error decreases as demonstrated by Tumer and Ghosh [4] and Dietterich [5]. The key of ensembles is the diversity of the different base estimators, which mainly comes from the reference or training data used to build up the model. Ideally, if the number of base classifiers is large enough and they are built up over fully independent data, the errors should be minimised. Ensembles have already been used in Indoor Positioning [6], [7], [8], [9].
However, one of the main limitations of the fingerprintbased methods is the limited amount of information included in the radio map. Data collection is time-consuming and expensive, so developers try to find the optimal trade-off between collection costs and accuracy. Despite multiple fingerprints being collected at every reference position in some cases, this might not be enough to build enough different estimators.
In this paper we will exploit the generation of noisy radio maps. A noisy radio map is a radio map where random noise is added to every reference fingerprint. Our hypothesis is that if we can generate a large enough number of different artificial radio maps, we might reduce the positioning error without the need of additional site-surveys. We propose that the module of the added random vector be dependent on the distance of the reference fingerprint to the closest match in the radio map. As some datasets include multiple independent fingerprints collected in the same location, we restrict the closest match search to only those reference fingerprints which were collected in a different location. We contribute with: • A new method to generate a noisy radio map considering the features of fingerprinting problems • An ensemble model combining multiple noisy radio maps • An empirical evaluation in three different real use-cases The remainder of this paper is organised as follows. Section II describes the method to generate the noisy radio maps. Section III describes the ensemble model applied to indoor positioning. Section IV describes three use cases and shows the results. Section V introduces general discussion. Section VI concludes the paper.
II. GENERATING A NOISY RADIO MAP Different strategies can be implemented when collecting the reference fingerprints, being the professional and crowdsourced strategies commonly used. While the professional strategy defines a regular systematic data collection procedure in locations generally distributed over a regular grid, the crowdsourced relies on the system users, who voluntary collect fingerprints in arbitrary locations which were not predefined.
Even in a professionally generated radio map, the resulting distribution of samples in the feature -or Received Signal Strength (RSS)-space might not follow any regular patterns. Fig.1 shows the location of the reference points in a real professional deployment and the related fingerprints vectors (restricted to one sample per point and two APs for visualization purposes). Despite the regular distribution of the reference positions in the radio map ( Fig.1(a)), the density of the fingerprints is not regular in the feature space ( Fig.1(b)). The distance of a reference fingerprint to its closest match is not constant, being large in some cases. e.g. the reddish point in the center or the two blue points on the top side of Fig.1 (b). The proposed method injects uniform random noise to each reference fingerprint to fill the free space surrounding it. One way to generate a new noisy radio map consists of adding random noise to all the reference fingerprint vectors. However, adding noise to samples might increase the positioning error if the noise does not consider the nature of fingerprinting. Thus, the proposed method considers the Euclidean distance, in the RSS space, of each reference fingerprint to its closest match in the radio map (see Algorithm 1). As some datasets and collection strategies collect multiple consecutive fingerprints per reference point, the search to the closest match is restricted to those fingerprints which were collected in a different reference position.

Algorithm 1 Noisy Radio Map Generation
input: T = {s 1 , . . . , s N ; p 1 , . . . , p N } (radio map) output: Randomly generate a vector with noise: Let T andT be the original and new noisy radio maps, respectively. The radio maps have N samples, each of them represented by a fingerprint vector, s i , and its location in the operational area, p i . The functions distance f eat and distance geom stand for the distance metric in the feature (RSS) space and geometric space, respectively. In both cases, we have used the Euclidean distance. The random values for the noise vector and its length follow a uniform distribution.

III. ENSEMBLING IPS WITH NOISY RADIO MAPS
The ensemble model has been successfully used in machine learning, e.g. neural networks. According to different theoretical frameworks [4], [5], the ensemble estimator provides better accuracy than any of the individual base estimators if they are diverse enough. Although Bagging and Boosting are traditional ways to successfully generate diverse base estimators when training data (i.e. the radio map in fingerprinting) is limited, having multiple fully independent versions of the radio map is preferred. However, the latter alternative is not feasible as it involves repeating the data collection multiple times, whose cost might be prohibitive.
Due to the noise injected in the radio map, a simple estimator built on top of a single noisy radio map is not expected to improve the accuracy provided by a traditional system based on the original radio map. Nevertheless, Algorithm 1 introduces a method able to create multiple noisy radio maps with a certain degree of independence. Despite two different radio maps being different, those areas in the feature space with highest density of fingerprints will remain very similar. Therefore, the degree of diversity will not reach the levels of independent radio maps empirically collected on-site.
The proposed solution integrates noisy radio maps and the ensemble model. In particular, we propose an ensemble of 100 base indoor positioning systems, each of them based on the k-NN model and built on top of a different noisy radio map. The position estimate for any operational fingerprint -in terms of x, y and zcorresponds to the centroid of the 100 estimations (see Fig. 2).

IV. EXPERIMENTS
In order to evaluate the feasibility of the proposed noisy ensembles, we performed its evaluation over three different usecases given by public available datasets: 1) an extended version of UJIIndoorLoc, a guided crowdsourced dataset whose private test was used in the 2015 EvAAL-ETRI Competition to evaluate the participants; 2) a Wi-Fi Fingerprinting dataset with multiple simultaneous interfaces; and 3) a multi-slot BLE raw database collected in three indoor/outdoor environments.
We implemented two positioning algorithms based on k-NN, a plain and a fingerprint-optimized version. For the plain k-NN, the search for the closest neighbors is computed over the whole radio map. For the optimized implementation, the search is restricted only to those reference fingerprints which share with the operational fingerprint the same dominating APs, as done in [10]. We used the plain k-NN in the BLE datasets and the optimized version in the Wi-Fi datasets. The k-NN hyper-parameters have been set according to the original references. Full parameters are reported in Table I. The positioning error is computed as the 3D Euclidean distance between the true and estimated positions. As suggested in the ISO18305 standard, we provide the mean, median and three quartile values for numerical comparison. Moreover, we provide the CDF plots for visual comparison of the baseline and the proposed ensemble model.

A. Crowdsourced Wi-Fi fingerprinting
The extended UJIIndoorLoc database was used in the 2015 EvAAL-ETRI competition [11]. The radio map dataset is composed of the training and evaluation samples collected in the original UJIIndoorLoc, thus consisting of 20.972 reference samples. The evaluation set has 5.179 fingerprints whose location has not been published. A total of 520 APs were detected in this dataset. We have selected this dataset as it combines a guided and arbitrary crowdsourced data collection performed by more than 20 users and device models. Some reference points contain multiple fingerprints whereas others have just one fingerprint. Table II shows the numerical results, whereas the CDFs are reported in Fig.3. In both cases, we report the results of the baseline method and the proposed ensemble, as well as of the 100 individual IPS based on noisy radio maps.  According to the table and figure, the baseline provides significantly worse results than the ensemble. The individual noisy estimators (Noisy Single) tend to perform worse than the baseline method. The averaged mean positioning error of the noisy individual estimators is 8.21 m with a standard deviation of 0.25 over the 100 noisy versions. However for the large percentile values, the individual noisy estimators are slightly better than the baseline. Nevertheless, the ensemble model provides the best results in the five reported metrics. The ensemble successfully takes benefit of the diversity generated from the 100 noisy radio maps, especially reducing the presence of large errors.

B. Multiple Interfaces for Wi-Fi fingerprinting
This dataset was generated by means of a Raspberry Pi and multiple simultaneous Wi-Fi interfaces [12]. In contrast to traditional Wi-Fi fingerprinting datasets, the device was able to collect synchronized fingerprints from 5 interfaces. Averaging the five fingerprints makes the radio map more robust. We have selected it to evaluate our proposed model on a dataset with high accuracy and low presence of large positioning errors.
The radio map consists of 4973 fingerprints per interface collected in the reference locations shown in Fig.1, whereas evaluation has 810 fingerprints per interface. The number of detected APs is 11. Results are in Table III and Fig.4.  As in the crowdsourced dataset, the proposed model performs better than the baseline for this dataset. Despite the differences being lower, the 95 th percentile is reduced to 4.55 m. The results are especially promising in this dataset, as the use of multiple interfaces significantly improved the results provided by single interface systems. Finally, some individual noisy radio maps have performed better than the baseline.

C. BLE fingerprinting with multiple slots
This database uses BLE as main positioning technology instead of Wi-Fi [13]. The BLE beacons were configured to broadcast messages on 6 slots with different transmission power, which were fused (averaged) in the fingerprints to have a more robust RSS measurement. A total of three independent environments were surveyed with 2 or 3 smartphones each. We have selected this dataset as BLE presents relevant differences with respect to Wi-Fi and it is becoming popular.
In each of the three datasets, the radio map consists of one sample per reference point and device. The number of reference samples is 417, 552 and 250; whereas the number of evaluation samples is 102, 138 and 60. The number of BLE beacons is 30 in the three datasets. The results are reported in Table IV   As in the two previous datasets, the proposed model performs better than the baseline for each one of the three BLE environments. However, the differences are lower than for Wi-Fi datasets, especially in the second environment. The second environment is special as the fingerprints apply a large window to average RSS values over time and the multiple slots. The RSS readings are less affected by multi-path and, thus, the accuracy is higher. Therefore, this decreases the probability of significant improvement. In the most challenging case, the dataset collected on the third environment, our model clearly improved the baseline.

V. DISCUSSION
After analysing the three use-cases, we have realized that a single classifier based on a noisy radio map might make no sense as its accuracy is usually lower than the baselines. Despite the individual position estimations provided by noisy estimators being worse than the baseline estimation, their combination significantly improves the baseline. This demonstrates that the proposed noise generator successfully creates different versions of the radio map with a significant degree of diversity and, above all, without additional site survey.
As a general trend, the worse the baseline performs, the more significant the improvement is. Due to the nature of the crowdsourced dataset -with irregular distribution of reference positions, data collection involving multiple users & devices and presence of outliers-, the mean error is 8.02 m, which is significantly higher than in other Wi-Fi fingerprint-based systems. With the proposed model, we reduced the mean error to 6.5 m (almost 20%), also the large errors were reduced in more than 3 m. Similar results are observed in the third BLE scenario, where the data were collected in two buildings and outdoor parts, having large distances between the reference positions which degraded the positioning accuracy.
Despite some approaches fuse signal strength measurements at the receiver (multiple interfaces) or emitter (multiple slots) to have a robuster Indoor Positioning System, there is still room for improvement. For the Wi-Fi datasets with multiple interfaces, the average positioning error is slightly above 2 m and the 95 th percentile is 4.55 m with our proposed approach, both are much lower than in the other datasets. Similar low errors also apply to the first BLE environment. VI. CONCLUSIONS This paper proposes a method to add noise to a radio map, which is being exploited in combination with the ensemble model. Despite using an individual noisy radio map is not suggested for positioning purposes, integrating multiple noisy radio maps in an ensemble model significantly increases the diversity within the ensemble estimators. As consequence, the noisy ensemble works better than any of the noisy individuals and, even, better than the baseline method. This demonstrates that the proposed ensemble model is an alternative to consider when designing an IPS, as it is able to successfully augment the radio map without the need of additional site survey.
To validate our model, we considered three different use cases involving a total of 5 datasets. In all the cases, our model provides better results than the baseline. The improvement is especially significant on challenging datasets where the baseline positioning error is relatively high. In those cases where the positioning error is already low, our ensemble model also improves the positioning accuracy, especially reducing the presence of large errors in the 90 th and 95 th percentiles.
This work opens a promising research line based on noisy samples. Further work will be devoted to the definition of a random noise generator in-line with signal propagation, as well as a full validation over an extensive set of datasets.