A radiosity-based method to avoid calibration for Indoor Positioning Systems

Due to the widespread use of mobile devices, services based on the users current indoor location are growing in signiﬁcance. Such services are developed in the Machine Learning and Experst Systems realm, and ranges from guidance for blind people to mobile tourism and indoor shopping. One of the most used techniques for indoor positioning is WiFi ﬁngerprinting, being its use of widespread WiFi signals one of the main reasons for its popularity, mostly on high populated urban areas. Most issues of this approach rely on the data acquisition phase; to manually sample WiFi RSSI signals in order to create a WiFi radio map is a high time consuming task, also subject to re-calibrations, because any change in the environment might aﬀect the signal propagation, and therefore degrade the performance of the positioning system. The work presented in this paper aims at substituting the manual data acquisition phase by directly calculating the WiFi radio map by means of a radiosity signal propagation model. The time needed to acquire the WiFi radio map by means of the radiosity model dramatically reduces from hours to minutes when compared with manual acquisition. The proposed method is able to produce competitive results, in terms of accuracy, when compared with manual sampling, which can


Introduction
Indoor positioning is a core technique for ubiquitous and pervasive computing applications.Such applications can exploit user's position information in the services they provide to the user (Schilit et al., 1994;Abowd & Mynatt, 2000;Hightower & Borriello, 2001;Kwon et al., 2005): a remote health-care monitoring system could use positioning information to recognise person's activities and make decisions about health state (Yan et al., 2010); a position aware meeting service can provide mobile laptops with information regarding the meeting-room they are located at Castro et al. (2001); evacuation systems could provide the path to the nearest exit in an emergency case (Ingram et al., 2004).
The position of the user can be estimated, for instance, in terms of latitude and longitude or at room level.The former is commonly used by guiding services where the position of the users in movement is used to guide them to their destination; the latter is used in applications where knowing the position of the user at room level is accurate enough to provide services, for example: healthcare applications, monitoring and security applications, and so on (Gu et al., 2009).

Opportunistic WiFi signals have been extensively used as base technology
for indoor positioning systems due to: a) its ubiquitous presence in urban populations; b) its relatively low cost when compared with other technologies; c) its presence in most consumer mobile devices such as smart-phones, smart-watches or laptops.Different techniques have been used to exploit WiFi signals: Angle of Arrival (AoA) (Sen et al., 2013), triangulation (Lim et al., 2007) and trilateration (Mok & Retscher, 2007), being the most extended technique WiFi fingerprinting (He & Chan, 2016).The popularity of fingerprinting method is due to: a) its simplicity, b) it does not need any special hardware and, c) it is ubiquitously used.WiFi fingerprinting methods are based on the signal strength generated by a set of surrounding Wireless Access Points (WAPs) measured at different positions.This set of measures forms a WiFi map, also called radio map.Two stages are commonly used to create a WiFi map positioning system: calibration and operational stages.In the calibration stage, the set of WiFi intensity measures at different positions is taken to latter create the WiFi map.
In the operational stage, a user's device measures the WiFi signal strength of all surrounding WAPs, and this information is then used, by the positioning system, to provide an estimation of user's position.WiFi intensity measuring is a time consuming and expensive task where different issues can happen (Casas et al., 2007;Deasy & Scanlon, 2007;Han et al., 2014), for example, any change in the environment, such as changing the location of some furniture elements, changing partition walls, changing the position of existing WAPs, deploying new WAPs, or removing existing WAPs, may degrade the positioning service, which implies the recalibration of the positioning system.Some works have appeared trying to reduce the effort needed in the calibration stage.In Gu et al. (2016) the authors reduce in a half the number of samples needed to create the WiFi map taking advantage of the hidden structure and redundancy characteristics of WiFi samples.Other works try to cope with re-calibration when changes in the environment are detected (Fet et al. (2016)), although an accuracy degradation is always present when moving furniture, new walls are lifted or removed, and so on.
The problem of signal propagation has been successfully solved in some other realms of science.In Physics, heat transfer between a bodies at different temperature has been described using the radiosity model (Howell et al., 2010).Later, the same radiosity model was adapted and applied in Computer Graphics to model light propagation indoors for global illumination (Cohen et al., 1986;Cohen & Wallace, 2012).The definition of radiosity is: "the radiant flux leaving a surface by unit area".In the case of a WiFi signal, the radiant flux is the intensity of the WiFi signal.This way, existing techniques to solve the radiosity equation can be used to model the WiFi signal propagation in the presence of obstacles like walls and doors.

Motivations and Hypothesis
The main motivation of this work is to use the radiosity model to describe the WiFi signal propagation indoors.This way the WiFi map used for WiFi fingerprinting indoor location systems, can be generated analytically, reducing acquisition costs in terms of time devoted to obtain the WiFi radio map, and people involved in that task.This motivation is base on the following hypothesis: 1. Given that WiFi radio waves are an electromagnetic signal, its propagation model can be simulated using the radiosity model (Cohen et al., 1986;Heckhert, 1992).
2. A WiFi radio map can be analytically obtained from the radiosity model.
3. Walls are the most important structural elements to have into account when calculating the radiosity map of the WiFi signal.
Given that the WiFi radio map can be calculated analytically, the main benefits of the proposed solution are: 1.No domain expert intervention is needed in the acquisition phase, thus reducing costs.
2. The time to create the radiosity map using current CPUs is two orders of magnitude faster than manual acquisition.
3. WiFi maps for new deployed WAPs, or for WAPs which changes its position, can be easily included at any time.
4. Any structural changes in the scenario can be easily taken into account to re-calculate the WiFi maps.

Contributions
The main goal of this work is to replace the manual acquisition of a Wifi radio map, which is a cost in term of time and people carrying out the task, by analytically calculating the WiFi radio map by means of the radiosity model.
Our results show that the time consumend to analytically generate the WiFi radio map is one hundred times faster than manual acquisition.In addition, removing manual acquisition reduces the costs of creating a WiFi radio map.
This main goal can be subdivided in the following contributions: 1.To model WiFi signal propagation using the radiosity technique.This way, the RSSI value is directly evaluated from the radiosity model.2. To modify the Gaussian distribution for RSSI values to mimic its real temporal variation.
3. To compare the performance between classifiers built using real measured RSSI data, and the RSSI data calculated by the proposed radiosity model using well known Machine Learning algorithms.4. To check if any improvement arises by mixing real measured RSSI and simulated RSSI when building a classifier. 5. To compare the impact on performance with regards to the size of the data sets used during the training phase between classifiers built using real and simulated data.
To the best of our knowledge, this is the first work where the radiosity model to generate a WiFi map is applied for indoor positioning purposes.This technique would facilitate the development of new Expert Systems applications based on the users position information for ubiquitous and pervasive computing.This could relieve domain experts from the sampling task, substituting it by an assistive procedure based on a radiosity WiFi map.This will allow them to focus on adding value to their applications by including the spatial context to provide better services to final users.
The rest of paper is organized as follows: Section 2 presents the different propagation models appeared in the literature, how WiFi maps can be used for indoor positioning, and the basics of the radiosity model.Section 3 presents the scenario used to perform the experiments, how the radiosity model has been applied to obtain the WiFi signals, the Machine Learning algorithms used, and how analytical data is perturbed to mimic the time series behaviour of real data.
Section 4 presents the experimental results and discussion.Comparison with previous work, and the strengths and weaknesses of our work are presented in Section 5. Finally, the conclusions are presented in Section 6.

Background and Related work
First, this section presents WiFi signal propagation modelling presented in previous works.Then, WiFi fingerprinting location technique is presented.Finally, a detailed description of the radiosity propagation model is presented.

WiFi signal propagation and modelling
WiFi is an electromagnetic signal, which may be reflected, transmitted, absorbed and diffracted by physical objects in the scene.The intensity of an electromagnetic signal decreases with the inverse of the squared distance to the source I 3D ∝ 1 r 2 in a 3D open space.If the dimensions of the space reduces to two, the signal decreases following an inverse of the distance rule I 2D ∝ 1 r .Decibels (dBm) are the common unit used for WiFi signal intensity, this unit takes the logarithm of the signal power, so the intensity of a WiFi signal decreases with the logarithm of the distance when measured in dBm I(dBm) ∝ log r.
When there are objects in the scene that interact with the electromagnetic signal, the previous rules do not work well because they do not have into account the multipath effect due to reflections on object surfaces present in the scene, absorption by objects, and diffraction on object with a size similar to the wave length of the electromagnetic signal.The most common frequency used by WiFi access points (WAPs) is 2.4GHz.which corresponds to a wavelength of 12.5cm.
Different models have been presented to describe the electromagnetic propagation in complex scenes.One of the most simple models is the log normal shadowing model also know as path loss model (Seybold, 2005;Ficco et al., 2014).In this model, the different interactions with the objects in the scene are modelled as an exponent in the intensity formula, in such a way that the intensity of the electromagnetic signal decreases faster than in open space.The final result is that the intensity still remains linearly dependent with the logarithm of the distance.The main advantage of the path loss method is its simplicity.
Its main drawback is the lack of accuracy.Path loss has been used in Deasy & Scanlon (2007)  Ray tracing is a technique used in computer graphics to generate an image by tracing back the light rays arriving to a camera from all objects in a scene (Glassner, 1989).This model is more accurate than the path loss model because it has into account reflections, transmissions and refractions of the electromagnetic signal on objects in the scene (Kimpe et al., 1993;Yang et al., 1998).The main advantage of the ray tracing model is its accuracy when calculating the intensity of the electromagnetic signal for each point in the scene.Its main drawback is the computational time used to obtain the result.Ray tracing has been used to model WiFi signals for indoor locations in El-Kafrawy et al. (2010), Raspopoulos et al. (2012) and Ayadi et al. (2015).
The radiosity method tries to directly solve the Rendering Equation (see Equation 1), which describes the interaction between an electromagnetic signal and all objects in the scene, and solves it by means of the finite elements technique (Zienkiewicz & Taylor, 2005).A detailed description of the radiosity method is given in Section 2.3.
The ray tracing and radiosity methods are global methods because both take into account inter reflections of the electromagnetic signal between the objects present in the scene.
A performance comparison of the three previously presented methods regarding its accuracy to calculate the WiFi intensity for an indoor 2D scenario is presented in Ayadi et al. (2015).The authors show that the path loss method provides the worst estimations in the scenario analysed by the authors.The most accurate method is radiosity, followed by the ray tracing method.Given the accuracy provided by the radiosity method to model WiFi signal propagation, it was chosen for testing its feasibility simulating data for developing indoor positioning services.

Indoor positioning using WiFi fingerprinting
As previously mentioned, two stages are commonly used to create a WiFi fingerprinting positioning system: calibration and operational stages.In the calibration stage, the set of measures at different positions is taken to latter create the WiFi map.Lets denote this set as W = { w i ( x j )} where each vector w i = {s 1 , s 2 , ..., s k } i denotes the WiFi signal strengths for the k visible WAPs for the i-measure at position x j .Note that more than one measures can be taken at same position at different times.
In the operational stage, a user measures the WiFi signal strength of all surrounding WAPs, and this is compared with all measures in the WiFi fingerprinting database.The estimated position for the user x u is such that its vector of WiFi signal intensities w u minimizes some distance metrics x u = x j |min(d( w i ( x j ), w u )) (Torres-Sospedra et al., 2015).
Machine Learning (ML) techniques have been commonly applied to estimate the position of a user based on the WiFi map information.WiFi fingerprinting positioning is a candidate problem to be solved by means of ML techniques, due to the particular characteristics of the problem: a) it is difficult to obtain an analytical result due to the complexity of modelling WiFi signal propagation, b) to build a computational model based on WiFi RSSI measures is a challenging problem due to its high variability over time, c) response time provided by ML algorithms is fast enough to be used in real-time applications.
Extensive reviews about indoor positioning using WiFi fingerprinting can be found in Liu et al. (2007), Song et al. (2011) and He & Chan (2016).

The radiosity method
The radiosity method was first developed to solve heat transfer between systems at different temperatures.Later, the radiosity method was successfully applied to solve the rendering equation, which describes the illumination for each element in a 3D scene (Cohen et al., 1986).A simplified version of such a method can be used for 2D scenes (Heckhert, 1992).
For each point in a scene s, the radiosity b(s), is defined as the sum of the emitted radiation e(s), plus the reflected and transmitted radiation at such point coming from any other points in the scene.When ideal diffuse reflection and transmission is assumed, the radiosity equation is given by: where ρ(s) is the semicircular reflection coefficient of the diffuse material, τ (s) is the semicircular transmission coefficient of the diffuse material at point s, v r is the visibility term for reflection, v t the visibility term for transmission, and the geometric values are those shown in Figure 1.Note that for convenience, the integral extends over all segments in the scene L = i L i The integral Equation 1 can be solved using the finite elements method, in which case, for any element in the scene i, the radiosity at this element b i can be expressed as: where F ij is the forward diffuse form factor given by: and T ij is the backward diffuse form factor given by: Equation 2 can be written in matrix form as: Once Equation 5is solved, the RSSI signal for any point in the scene r (RSSI(r)) can be calculated as the contribution of all radiating elements directly visible from r as:

Methodology
This section describes the methods and materials used for data acquisition, the creation of the simulated WiFi fingerprinting data using radiosity, and the machine learning algorithms used to estimate the location of a user at room level.

Scenario description
The floor plan of the scenario used to test the validity of the radiosity method applied to WiFi signals.This plan shows a corridor at the second floor of the Languages and Systems Department at Jaume I University, which dimensions are 33.0x30.5 meters.Rooms with labels TI1202 to TI1212 are teacher's offices, room TI1213 is a seminar, and rooms TI1214 and TI1215 are research laboratories.
In Ayadi et al. (2015) the value for the reflection coefficient ρ = 0.1 is reported.Through experimentation, the former reported value for ρ provides the most similar results when compared with real data, so this is the value used in all performed experiments.Assuming that absorption coefficient is negligible in comparison with the transmission coefficient, the transmission coefficient was The number of elements in the scene is a parameter fixed by the size of one of them.The lower the size of one element the more elements in the scene, and the better accuracy in the calculated radiosity map.On the contrary, the more elements in the scene, the more time spent by the algorithm to calculate the radiosity map.A size of 25cm.has been used for each element in the experiments.
Four wireless access points (WAPs) were ad-hoc deployed for experimentation.They are represented as black circles in Figure 2. The nominal transmission intensity was set to -20dB for all WAPs.
The following assumptions were made when modelling the scenario: 1.All walls are made of the same material.
2. Doors are made of the same material as walls.
3. There is neither specular transmission nor reflection.

The absorption coefficient is negligible.
With the former assumptions, the radiosity equation in Equation 1was applied to simulate WiFi signal propagation.

The radiosity method
The rendering Equation 1 was solved using the finite elements technique and matrix Equation 5, with the assumptions presented in Section 3.1.Four WiFi maps were calculated, one for each one of the WAPs showed in Figure 2.
Once the radiosity was calculated for each structural element, the RSSI signal for each point in the space was calculated using Equation 6. Figure 2 shows the calculated WiFi map for the WAP located at office TI1202.Similar results were obtained for the other three WAPs, which are not shown for brevity.

Machine Learning Techniques
The problem of estimating, at room level {c 1 = T I1202, ...c n = T I1215}, a user's position p ∈ C ≡ {c 1 , ..., c n } given the vector w = {s 1 , s 2 , ..., s k } of measured RSSI of the k surroundings WAPs, can be seen as a supervised clas- chine Learning techniques dealing with classification problems can be found in (Alpaydin, 2004;Marsland, 2015).
A data set is needed to train a classifier.For indoor positioning purposes, each element in the data set is made of the vector w j of RSSI measures and the room c j where those were measured { w j , c j }, also called the radio map.In this 4. Random Forest (RF): ensemble of decision trees classifier.One hundred trees were used (Breiman, 2001).
5. Support Vector Machine (SVM): Separates two classes with an hyperplane with maximal margins.Radial Basis Function were used as kernel (Cortes & Vapnik, 1995).
6. Sequential Minimal Optimization (SMO): Originally developed for training SVM, it can be also used as a classifier (Platt, 1998).
An ensemble classifier was also used for performance comparison (Alpaydin, 2004).This ensemble is made up of the former six classifiers, and estimates user's position based on the sum of probably estimates of the results of all six classifiers.
Experimental results using the seven classification algorithms are presented in Section 4.3.

Time series of simulated data
Gaussian distribution is commonly used to characterize the WiFi signal intensity measured in a single position (Kaemarungsi & Krishnamurthy, 2004).
Although a more correct characterization implies a mixture of Gaussian distributions (Kaemarungsi & Krishnamurthy, 2012), a single Gaussian distribution can be used as a first approximation.and then the inertial factor was applied.The p-value for a Chi-Squared test was greater than 0.99 in all ten experiments.So, with a high confidence level, it can be concluded that the resulting data series followed a Gaussian distribution.

Experimental Results
This sections firstly presents how data sets were acquired, and their main statistics.Then, simulated data using radiosity is presented, and compared with the real data.Also, the classification performance when using real versus simulated data is compared.Following, the same classifiers were used to study the performance when mixing real and simulated data.Finally, the performance dependency with regards to the number of samples used in the training stage is compared for the cases of real and simulated classifiers.

Real data
Six data sets were acquired in six different days.The week of the day, and the time of the day were different for all six data sets.One hundred measures were acquired at each office, while standing up at the centre of the room.All experiments were carried out by the same person, and with the same smart-phone: Aquaris BQ M5, Android version: 6.0.1.Table 1 summarizes the statistics for the RSSI of the WAP with MAC d0:ae:ec:dd:ec:30.Tables for other MACs are omitted for brevity.The average time to acquire one data set was 120 minutes.
The mean value of standard deviation for all data acquired including all WAPs was σ = 1.64, its maximum value was σ max = 3.14 and its minimum value was σ min = 0.90.

Simulated data
Following the assumptions made in Section 3.1, and having ρ = 0.1 (reflection coefficient) and τ = 0.9 (transmission coefficient), four radiosity maps were generated, one of them for each WAP deployed.Figure 2 shows the radiosity map for WAP located at office TI1202.The location of all WAPs is showed in Figure 2.
An Intel i7-4790 CPU at 3.60 GHz with 16 GB RAM and Linux Mint 18.2 was used to generate radiosity maps.The average time consumed to generate a single radiosity map was 70 seconds.To generate a radiosity map is more than 100 times faster that to acquire real data.Moreover, the radiosity map provides data for each point in the floor plan, while manual acquisition provides data only for a set of selected points in the floor plan.
The RSSI value read from the radiosity map for each position at the centre of the room office was altered following the scheme presented in Section 3.4.
The value used to alter the simulated data was σ = 2.02.
Figure 5 shows the RSSI for the simulated values compared with the RSSI for real data.Each of the four sub-figures shows the data for a particular WAP, identified by its MAC address.For each sub-figure, each point in the dashed upper strip-line is the maximum of RSSI plus the standard deviation for the corresponding office.For example, point for office TI1202 in Figure 5-a is the max s∈S {RSSI + σ s } for elements in first row of Table 1, where S ≡ {1, 2, 3, 4, 5, 6}.In the same way, each point in the dashed lower strip-line is the minimum of RSSI minus the standard deviation for the corresponding office min s∈S {RSSI − σ s }.Remarkably, more than 80% of simulated RSSI values, for the four studied WAPs, are between these two limits.

Performace comparison
The particular classification task used to compare the performance was a challenging problem, only four characteristics were used to estimate the label (office ID) in a 13 classes classification problem.It was expected that, even using real data to build the classifier, the performance will be moderate.But the objective of the comparison it was to asses how good are the results provided by the simulated classifiers compared with the real ones, not to asses the performance of the simulated classifiers themself.
The following procedure was used to compare the classification performance using real classifiers and simulated classifiers: first, seven classifiers were built using one data set as training data (data sets: 1 to 6 for real data, and Sim. for simulated data); second, data sets 1 to 6 were used for testing.Tables 2-7  Table 2 shows the results when using a Bayes Network classifier, in all cases, the classifier built using simulated data never gave the worst result.The percentage of correctly classified samples for the simulated classifier is always above 50% of correctly classified rooms, far away from a random guess.The performance differences, for the same test set, between classifiers using real and simulated data ranges between 0.25 and 4.26.The mean performance for all tested data set in the case of real classifier was 55.18 (see Table 9), and the mean performance for the simulated classifier was 55.47, which is remarkably close to the real performance.
Table 3 shows the results when using a KNN classifier.The simulated classifier gave the worst results for 4 of 6 tests sets, but in one case only, the performance was below 50% of correctly classified rooms.The performance differences between classifiers, for the same test set, using real and simulated data ranges between 1.17 and 11.92.The mean performance for the real classifier was 60.35, and the mean performance for the simulated classifier was 52.82.
Table 4 shows the results when using a Multi Layer Perceptron.Again, in this case the classifier built using simulated data gave the worst results in 3 of 6 tests sets, but in one case only the performance was below 50% of correctly classfied rooms.The performance differences between classifiers, for the same test set, using real and simulated data ranges between 0.20 and 23.25.The mean performance for the real classifier was 62.40, and the mean performance for the simulated classifier was 53.00.
Table 5 shows the results when using a Random Forest classifier.Remarkably, the classifier built when using simulated data gave the best results in 2 of 6 test sets, and never gave the worst result.The performance differences between classifiers, for the same test set, using real and simulated data ranges between 0.72 and 13.31.The mean performance for the real classifier was 53.84, and the mean performance for the simulated classifier was 57.86, which is remarkably close to the real performance.In this case, the mean performance of the simulated classifier was higher than the real classifier.
Table 6 shows the results when using a Sequential Minimal Optimization classifier.The simulated classifier gave the worst results in 2 of 6 test sets, but in these two case the percentage of correctly classified rooms were above 60%.
The performance differences between classifiers, for the same test set, using real and simulated data ranges between 6.57 and 11.91.The mean performance for the real classifier was 73.22, and the mean performance for the simulated classifier was 66.31.
Table 7 shows the results when using a Support Vector Machine classifier.
The simulated classifier never gave the worst result using this classifier.The performance differences between classifiers using real and simulated data ranges between 0.53 and 12.37 for the same test set.The mean performance for the real classifier was 68.98, and the mean performance for the simulated classifier was 64.95.
Table 8 shows the results when using an ensemble classifier using the results of all six previous classifiers.The ensemble built using simulated data gave the worst result in one case only.The percentage of correctly classified data was 68.92% which means 2 of 3 correctly classified rooms on average.The performance differences between classifiers using real and simulated data ranges between 2.55 and 11.13 for the same test set.The mean performance for the real classifier was 70.36, and the mean performance for the simulated classifier was 65.96.
Table 9 shows a summary comparing the average performance between real and simulated classifiers, and its differences.The last row in the Table 9 shows the difference between the averaged values.The biggest difference was 7.35 for KNN classifier and the smallest was -4.02 for the RF classifier, which remarkably performs better, averaging all results, for the simulated classifier.In the case of the Ensemble classifier, the mean performance value is 65.96 ± 0.08, namely, the Ensemble correctly classifies 2 of 3 test samples.The difference in the mean, between real and simulated classifiers, for the Ensemble classifier is 4.40, this shows that simulated data generated using the radiosity algorithm provides accurate results when used to build indoor positioning classifiers.

Combining real and simulated data
In this section, the classifier performance when real measures are combined with simulated data to create the classifiers is analysed.The objective of these tests were to assess if it is possible to improve the performance of already built classifiers by adding new simulated data to the original training data set, without taking new real samples.
For each experiment, one hundred simulated samples were added to one hundred real measures, so the total number of elements in the training set was two hundred.After that, the performance was measured following the same procedure than in Section 4.3.For the sake of brevity, average results are presented only.
Table 10 shows a summary comparing the performance between real and simulated classifiers, and their differences.All results improved with regards those presented in Table 9, even the difference becomes narrower in all cases but for the Bayes Network classifier.In the case of the Ensemble classifier, the mean performance value was 78.18 ± 0.07, so more than 3 of 4 test samples were correctly classified on average.The difference regarding the real Ensemble classifier was 1.89 ± 0.08, that is less than two percentage points.

Leave-one-out performance comparison
This set of experiments compares the results when building each classifier leaving one of the six data sets out for training, and using the left data set for testing.Five hundred measures were used to build each classifier.In the case of the simulated classifier, five hundred simulated measures were generated following the scheme presented in Section 3.4.Table 11 shows the leave-one-out experimental results.The percentage of correctly classified rooms for the simulated classifier is always below the corresponding real classifier.The smallest difference between real (78.08%)and simulated (76.77%) classifiers was for the Random Forest classifier when testing set was number 6, which is less than two percentage points.
Although there is an improvement when more data is used to build real classifiers, there is not a clear improvement in the case of the simulated classifiers.
In the next section it is studied how generalization improves with regards the size of the training data.

Generalization with regards the number of samples in the training set
The performance of a classifier depends on how well it is classifying new data, in other words, how well it generalises when classifying new data.It is desirable that a learning algorithm will improve its performance with experience, namely, when the number of samples in the training set increases (Flach, 2012).
The results presented in this section study how performance behaves when increasing the size of the training data set for real and simulated classifiers.
Performance was compared when the number of training data was increased in one hundred new samples at each steps.In the case of real measurements, this was done just summing up a new real dataset to the previous training data.In the case of the simulated data, this were done generating a new simulated data set of one hundred samples, and adding it to the previous simulated data set.
Results are shown in Table 12.The first column in this table refers to the size of the training data set, 1 means that only the samples in data set 1 were used to train the classifier, 12 means that samples in data sets 1 and 2 were used to train the classifier, and so on.Sim100 refers to a training data set composed on 100 simulated samples, Sim200 refers to a training data set composed on 200 simulated samples, and so on.
Taken the results in Table 12 there is no clear increase in performance when increasing the number of samples used in the training phase.However, if results for 100 samples and 500 samples are compared, only, there is a clear improvement for all classifiers.
Figure 6 shows the particular cases for the Bayes Network and MLP classifiers.Bayes Network real classifier clearly follows a linear trend with positive slope regarding the size of the training data set.On the contrary, when using simulated samples, the trend exhibits a negative slope.In the case of the Random Forest classifier, real and simulated training data sets show a linear trend with positive slope when increasing the size of the data sets used in training stage, even the performance in the simulated case exhibits a bigger value for the slope.Although for some classifiers the performance improves when more samples are used in the training data set, this is not true in general.

Discussion
The radio propagation model for indoor positioning presented in Deasy & Scanlon (2007) uses an empirical model based on the radio signal absorption by walls, the final absorption is obtained by counting the number of walls between the radio signal emitter and the observer, no inter-reflexion between walls are taken into account as the radiosity model presented in this paper do.The work in Han et al. (2014) presents an ubiquitous application for indoor navigation based on the interpolation of the WiFi RSSI signal between sample points; although this approach reduces the number of samples, and so the acquisition time, they still need some manual data acquisition.The same interpolation approach, but using a different technique, is presented in Gu et al. (2016) where they use a sparsity rank singular value decomposition to interpolate the WiFi RSSI signal at sampled points; again, although the number of sampled points is reduced their solution still needs some manual data acquisition.In Ayadi et al. (2015) the authors compare three different empirical models to study its appropriateness to ITU accuracy statistics recommendation for 2.4 GHz indoor test environment, but they do not apply their conclusions to develop any application in the expert systems realm.In this paper, the data provided by the radio map obtained with the radiosity model is used to develop a positioning system based on machine learning algorithms commonly used in the expert system realm.
Compared with previous works, the main strength of this work is to completely remove the manual acquisition step in the offside development of a positioning system, which in turn dramatically reduces the time needed to develop them.Also, when the radio WiFi map is generated any number of sample points can be taken to build machine learning algorithms.Any change in the environment, as an addition or removal of a WiFi access point, or a relocation of an access point, can be easily taken into account by building a new WiFi radio map for the access point involved.To have the complete WiFi radio map might be a valuable tool for domain experts developing expert systems applications based on indoor location information.
The weaknesses of the presented work rely on the information needed to create the radiosity map.It is required to have a precise floor map of the area, on the contrary, the radiosity generated map could be degraded.To have an accurate measure of the material absorption for the walls is also important to obtain an accurate radiosity map.

Conclusions
In this work, how to reduce or even completely remove the calibration stage when building radio maps for indoor positioning has been explored.As short-term future work, we plan to adapt our radiosity implementation code to take profit of the power of modern GPUs, which could reduce the time consumed to generate the radiosity map two orders of magnitude.This improvement would provide near real-time tools for Expert Systems applications based on positioning information; as an example, this could allow domain experts to accurately fix the position of WiFi access points to maximize the accuracy of the positioning algorithms.In the medium term, we plan to extend the radiosity algorithm to three dimensions, this could provide better radiosity maps at the expenses of more calculus; again, we could use a GPU implementation of the 3D radiosity to reduce processing time.As stated in one result of this work, to mix real and radiosity information improves the accuracy of positioning algorithms, so in the medium term we plan to use a robot to take real samples without any manual intervention.Finally, in the long term, and for those cases where the floor map is not available, we plan to use artificial intelligence algorithms, based on the information provided by radiosity map, to estimate the position of the walls in the area of interest.
to calculate WiFi maps for localisation purposes, and the results presented underestimate the signal strength by up to 15 dBm.A combination of path loss WiFi modelling, Kalman filtering and RFID beacons is used in Chiou et al. (2010) to estimate the position of a user.Authors in Ali et al. (2017) use the floor plan/wall map and a path-loss model for WiFi signal propagation to estimate the WiFi signal intensity at any point on the floor plan.
work, two different training data sets were used to build a classifier: a) the data set with measured RSSI, hereinafter named the real classifier; and b) simulated data provided by the generated radiosity maps, hereinafter named the simulated classifier.This way, the performance of the two classifiers can be compared to test the validity of our initial hypothesis.Six well known and widely used classifiers (Kotsiantis (2007); Wu et al. (2008)) were used to compare the performance of real and simulated classifiers: 1. Bayes Network (BN): probabilistic graphical classification algorithm based on the Bayes' rule (Pearl (2014)).2. K Nearest Neighbours (KNN): finds the k elements in the training set nearest to test sample, and estimates the class of the test sample based on the minimum distance.Euclidean distance was used, and a value for k = 1 (Sillverman et al. (1951)).3. Multi Layer Perceptron (MLP): a Neural Network with one or more hidden layers.Eight neurons were used in the only hidden layer, which corresponds to the experssion #attributes+#classes Figure 3 (left)  shows the distribution of a WiFi signal for 100 samples.Moreover, WiFi signal varies along the time for any fixed point in space.This is mainly due to interferences with other electromagnetic signals, fluctuations of the emitting WAP and in the receiver's antenna, just to cite a few.Figure4(a) shows a time series of a WiFi signal for a single position.When simulating data it is important to mimic both behaviours: simulated data must follow a Gaussian distribution, and its time series must mimic real data.To mimic the first behaviour, the mean and standard deviation were estimated from real data.Maximum Likelihood was used to fit real measured RSSI to a Gaussian distribution.For the particular set of real data showed in Figure3(left) the fit provides µ = −68.54± 0.09 and σ = 0.94 ± 0.07.Figure3(right) shows the histogram obtained for a set of 100 samples randomly generated using a Gaussian distribution with the former values for the parameters.Although both histograms may seem similar, they do not describe the RSSI variation along the time, Figures 4 (a) and (b), show the time series of real data and simulated data following a Gaussian distribution with the mean and standard deviation values provided when adjusting the real data.It can be noted that although both data sets have the same Gaussian distribution, the time series are quite different.For real data the same WiFi signal intensity might remain unchanged for some consecutive measures.On the contrary, the simulated data changes almost with any new simulated measure.To mimic the second behaviour, an inertial factor was introduced.This factor keeps the last simulated intensity for a random number of following measures.From real data, it was estimated that each measure keeps unchanged 80% of samples, on average.Figures4 (b) and (c) show a time series of simulated data without inertia (b) and with inertia (c).Through experimentation, it has been check that the resulting distribution followed a Gaussian distribution.Ten different experiments were performed.For each experiment, 1000 samples following a Gaussian distribution with µ = −68.54and σ = 0.94 were generated, show the percentage of correct estimates for each classification algorithm used.For each row, the data set on the left most column was used as training data when building the classifier, and all data sets in other columns were used to test the classifiers.The elements in the diagonal, which corresponds to the result when the training and test data sets are the same, are omitted.The row with label Avg.corresponds to the averages of all six elements in the same column (same testing data set).The row with label Sim.corresponds to the classifier built using simulated data.The last column shows the performance average for all five data sets used for testing with the same training set.
The proposed alternative to sampling RSSI WiFi signal at different positions to create the radio map, is to calculate this radio map based on the radiosity model, which describes radio signal propagation for indoor scenarios in the presence of obstacles like walls and doors.Regarding the time consumed to generate simulated data sets, it is one hundred times faster to generate data using the radiosity model than with manual acquisition.Moreover, the radiosity map provides the RSSI level for each point in the floor plan, while manual acquisition provides data for the sampled point only.Additionally, removing manual data acquisition reduces the cost for creating WiFi maps.This might easy the use of positioningbased Expert Systems development in big scenarios where WiFi sampling is a high time consuming task(Casas et al. (2007);Han et al. (2014)).Experimental results, based on well known machine learning algorithms commonly used in expert systems development, showed that the accuracy of the presented method is close to manual acquisition of data.Even in those cases where positioning systems are already working, the results presented in this paper show that to add new samples from the radiosity map to real samples improves the final accuracy in almost 10% for the case of an ensemble of classifiers.The implication for already developed ubiquitous and pervasive applications based on positioning information is that it might be possible to improve their performances by adding new radiosity simulated data.

Figure 1 :
Figure 1: Geometry for the surface elements S and S'.The length of element S' is L i .

Figure 2 :
Figure 2: RSSI map generated using the radiosity method.WAP was located at office TI1202.Colours represents intensities in dBm.Black points show position of the four WAPs used in the experiments.

Figure 3 :
Figure 3: Histograms of the WiFi signal intensity for real data (left), and simulated data (right), for 100 samples.The values for simulated data were µ = −68.54,σ = 0.94.

Figure 4 :Figure 5 :
Figure 4: Temporal series of real data (a), simulated data (b), and simulated data with hysteresis.For simulated data the normal distribution was generated taken µ = −68.54and σ = 0.94.The inertia factor keeps the current data 80% of times on average.

Figure 6 :
Figure 6: Overfitting comparison between classifiers built with real and simulated data.

Table 2 :
Performance comparison results for the real and simulated Bayes Network.