FUZZY APPROACH TO DEVICE LOCALIZATION BASED ON WIRELESS NETWORK SIGNAL STRENGTH

. The paper presents an original approach to device location detection in a building. The new method is based on a map of individual interiors, drawn up based on the measurements of the strength of wireless network signals for each building venue. The device is initially assigned to all venues whose descriptions sufficiently correspond with the current measurements taken by the device. A fuzzy assignment level for each of the potentially considered venues depends on the difference between the averaged network strengths for the venue and the signal strengths currently measured with the device for localization purposes. Ultimately, the device is assigned to the venue with the highest level of assignment.


Introduction
The use of location tracking is becoming increasingly popular in many services used on a daily basis.Traffic navigation or location context services that can be used outdoors are mainly based on GPS [5].There are many possible uses of indoor location tracking.A good example can be guidance apps at museums [8,9].A museum app can contextually present descriptions of nearby exhibits.It can also be an indoor map supported with information about the user's current position.Location tracking can be also useful for disabled people and health care buildings [2,14]: monitoring people entering or leaving restricted areas and venues, equipment location, searching for people, guiding services.Unfortunately, GPS localization does not work properly indoors.There is a strong need to provide indoor location tracking in a different manner.There have been many solutions for indoor localization developed so far.Just to mention systems based on infrared sensors [15,18], ultrasound [16], or magnetic fields [4,6].In recent years we have seen the growing popularity of Bluetooth-based location systems named iBeacons [17].However, these systems require the installation of additional devices.
On the other hand, WiFi-based location tracking is also becoming popular: as there are many access points in offices using WiFi, such location tracking should be costless.Since RADAR [1], many WiFi-based indoor localization solutions were proposed and developed [2,7,9,[11][12][13][14].In general, WiFi location tracking uses a map as a reference.There are many types of maps, which can be divided into discrete and continuous maps [10].Continuous maps allow determining the position as a point on the map.Such precision in WiFi location tracking is possible to obtain under certain conditions: homogeneous devices, defined obstacles, no interference from people.It is useful when we have to know the precise position, e.g. to control a robot [11].A discrete map usually defines certain zones which are treated as positions on the map.Such location tracking is less precise (because we cannot determine the position, only the zone) but higher location precision cannot be reliably achieved using the most popular WiFi devices (different radio parameters, different measurement techniques).
The continuous approach usually does not require the venue map preparation stage.The system should have information about the location of access points, the structure of obstacles, the strength of the signal in the current position, and based on this information it determines the location in the venue [11].However, the calculation is vulnerable to errors due to many factors, such as weather conditions, interference from human bodies, different equipment [7].
Discrete positioning usually requires building a pre-learned set of fingerprints to infer the position of a device.There are many solutions to this issue: hidden Markov models [12], Bayesian filtering [21], clustering techniques [19], and genetic algorithms [3].
WiFi-based localization is currently easy to implement in many buildings using existing infrastructure.Discrete positioning seems to be a good choice considering the variety of WiFi devices, the precision of signal strength measurement, and the most common needs.The main contribution of this paper is to present a new discrete, WiFi signal-based positioning algorithm with the support of an SQL database and specified queries.The new method is based on a pre-learned set of fingerprints of defined locations.It uses fuzzy logic to determine the location and the SQL database and queries directly on the collected data.Additionally, it is possible to regularly update the datalocation fingerprintsand the operation does not require restarting the system.In opposition to the mentioned solutions, the location tracking is based on the rooms specified by the system operator, not on the zones calculated by the algorithm.As a result, it is possible to correct a location tracking procedure or a venue map definition manually.From the functional side, we receive specific information about whether the user is in a given room, if the user is missing, or their location cannot be determined.
The paper presents only the logical aspects of the SQL-based fuzzy location procedure: the process of building the map and the algorithm of fuzzy location.The other aspects of the device location such as software and hardware requirements, database structure, and SQL queries are not provided.
The paper is organized as follows: it starts with a short description of existing approaches to device location tracking issues with the discussion of the necessity of a new solution; afterward, the definition of the building map is presented, which is based on the wireless network's signal strength measured in the venues in question; the next part presents the fuzzy location tracking procedure in details and with a simple example incorporating artificial data; finally, the description of experiments and their results are presented.The paper ends with conclusions and possibilities of future works.

Related works
As was already mentioned in the previous section, there are many known implementations of WiFi-based indoor location tracking procedures.One of the first and most frequently mentioned is a system called RADAR [1].In the context of WiFi location tracking, a k-NN classifier was used to decide if a person is in a discrete location (close to the previously measured point).The authors are also considering the direction of the localized person in relation to the transmitter.Eventually, the direction is considered unnecessary.
Another work on WiFi localization is related to SLAM (Simultaneous Localization and Mapping) systems [11].The authors aimed to build a solution that does not need to use fingerprint maps (building a fingerprint map may be an expensive task).Their solution has to precisely obtain the position which is used to navigate the robot.The position is evaluated based on the measured signal strength of several WiFi sensors.Such algorithms also need to evaluate the signal loss due to propagation and obstacles.
The fingerprint solution is a hierarchical topology based on WiFi Indoor Localization [10].The test environment is built and then split into smaller sub-zones with a reduced number of WiFi Access Points (AP) and reference positions to be identified.The hierarchical partition of the map is created using a KMeans clustering algorithm and the Calinski-Harabasz Index.The authors try to use different classifiers: K-NN, SVM, FURIA.According to the authors, the results are superior to those of the RADAR system.The hierarchical approach makes the solution scalable from small to rather huge environments (several floors, many venues).
The authors of the Fuzzy Logic Based System [7] focus on measurement issues.The signal level can be influenced by weather conditions, nearby devices, obstacles, people in rooms, different radio signal frequencies.These all lead to drops and peeks in the signal strength of access points.The authors propose several pre-processing and post-processing techniques to improve the quality of measurements.They also propose the fuzzy calculations approach to position detection.

Building map
The natural solution to the issue of determining the current position is to measure distances from the reference points of known coordinates.Based on these data and after some necessary calculations the location can be specified.Generalizing the problem, it can be stated that the reference points impose (constitute) a local coordinate system in relation to which furthe analyses will be carried out.In the defined area, it is possible to measure the distance to reference points and in this way determine the coordinates of a point in the local coordinate system.
A generalized localization case consists of two stages.The first stage is to define the local coordinate system and determine the reference points appropriate to the local system.The second stage is to compute the relation between the measuring point and the reference points.Finally, after making the calculations, the measuring point receives the coordinates in the local system coordination system.
The first stage involves installing the infrastructure, e.g.setting reference points.This was called defining a map of a building covered by the location system.
The process of making the map involves selecting rooms in the building.Not all rooms in the building were considered to be attached to the building map.The selection of rooms was conducted according to the following criteria:  proximity of the rooms on the same floor, horizontal distance;  proximity to the rooms on different floors, vertical distance;  different types of walls separating the rooms;  distant rooms with multiple separating walls;  rooms between which there is visibility through the windows;  large rooms.
A large room was divided into 4 subzones.This selection of rooms aimed at using the results collected during the experiment for later analysis.
The data collection procedure was as follows: 1) A person enters the room and stops near its centre, e.g. at the reference point.(The room divided into 4 subzones is an exception: the procedure was performed separately for each subzone).2) All WiFi signals detected by the phone are measured (this step required a dedicated application for Android OS). 3) Measurements (WIFIs, strengths, room, device id, etc.) are stored in the database.
Additionally, the order of measurements in the location and the order of locations are saved.
The sample data are presented in Table 1, where:  iddatabase technical row identifier,  device_ididentifier of the device used to provide measurements,  BSSIDnetwork identifier,  point_idreference point identifier which represents a room,  strength -signal strength in dB measured at the reference point,  m_in_pointorder number, grouping rows in one measure probe and numbering according to the reference point,  m_in_totalorder number, the same as m_in_point but numbering according to all gathered samples.
For better generalization of measurements, five different devices operating under Android OS were used.Data acquisition was performed at different times and at various intervals to avoid measurement result contamination with particular environmental features in which the measurements were made.Additionally, the measurements were taken manually so the position of the measuring device relative to the designated reference point in the room changed slightly with each measurement.Over 30 measurements were made at each point.It was the consequence of the strength variability of the measured WiFi signal resulting from many factors independent of the measurement process and of the WiFi technology itself.
After 30 individual measurements in all reference points have been taken, each of them was described by the following dataset:  BSSID,  average signal strength,  standard signal strength deviation.This dataset defines the coordinates of the reference point in the map space.It should be emphasized that due to the attenuation of the WiFi signal, the sets describing reference points contain a different number of BSSIDs.Of course, the average signal strength and standard deviation are also different.Table 2 presents sample descriptions of two reference points.
Finally, after collecting all the measurements for each reference point, the coordinates identifying the point in the building space were determined using the WiFi signal strength.For each reference point, the average and standard deviation of the WiFi signal strength of each WiFi network available at that point were determined.The networks were identified by BSSID.As a result, each reference point is identified by the coordinates resulting from all the WiFi networks available at the point.Additionally, for each average signal strength, the standard deviation is calculated that will be used in the localization procedure.

Fuzzy location procedure with an example
In the research, the fuzzy representation of venue assignment was assumed.The notion of fuzzinessproposed over 50 years ago in [20] is one of the most common approaches of uncertainty representation.Such ability of uncertainty processing is a significant advantage when variable signalslike the WiFi network strengthare being analyzed.
The positioning of the device is similar to the map development procedure described earlier.Unlike map creation, positioning is an unattended process and is performed once for each location determination attempt.At the time of the measurement, the position of the device (the person holding the telephone) is unknown and the measurement is performed once or several times at short intervals.Repeating the signal strength measurement in several second intervals is intended to reduce the impact of WiFi signal variability.As a result, the networks whose signals are very week (or cannot be detected by the devices) are eliminated.Data concerning networks and signal strength collected by the device that wants to know its location are compared with data describing reference points.The similarity of an unknown-location point to the reference points (for which the location is known) is determined.As a result of the algorithm, we get a list of similarities.After sorting, at the top of the list is the reference point nearest to which the measuring device (smartphone) is located.This reference point is considered the device position.The presented fuzzy location procedure will now be described in more detail.
All visible WiFi networks are read at the unknown location point.The networks are identified thanks to BSSID.For each network, the signal power that reaches the measuring device from the access point is read.An example dataset is provided below in Table 3.The process of measuring and determining the position of the device can be presented in several steps described below.Prior development of the map is necessary for determining the position, as described above.In order to illustrate the position determination process, we assume that there is a map prepared for the building fragment as shown in Fig. 1.
Points R1 to R6 are the map reference points, example descriptions of these points are presented in Table 4. Point X is the measuring point for which the location will be determined, an example measurement made here is shown in Table 5. Points AP1 to AP7 represent the location of the wireless network access points.AP5 to AP7 are located outside the analyzed area, but the signal of these access points is visible in the area covered by the map.In the example tables, instead of the full identifier of the BSSID network, short identifiers were included to increase the transparency of the example.Similarly, most numbers are represented as integers.
The fuzzy location procedure consists of the following steps: Step 1: A person who wants to determine their position in the building, goes to any place covered by the map.Using the mobile device, the person measures WiFi networks visible there.Together with the BSSID network, the strengths of individual signals are read.An example measurement taken in room 4 at point X can provide the following results as presented in Table 5.
Step 2: The collected results are compared with the building map.The quantity condition of the network is checked first.For each reference point, we compare a set of network identifiers assigned to that point with a set of network identifiers visible at point X.Next we determine a common part of these two sets.Using the sample data, it can be seen that the measurement point X and the reference point R3 have three common network identifiers: AP1, AP4.In the case of X and R4, common identifiers are: AP1, AP3, AP4, AP6.
Step 3: For each reference point, a network factor is determined that reflects the ratio of found networks (in reference to the number of networks assigned to the reference point).If the factor takes on a value of more than 0.5, the reference point is taken into account in the next steps.Otherwise, the reference point is rejected and is not included in further calculations.This approach reduces the computational complexity of position determination but with the possible cost of decreased location accuracy (or the time of location increase).
The networks factor for the pair measurement point and the reference point is determined by the formula: ,  = For sample data, the factor summary is presented in Table 6.Based on network factors, reference points R1, R2 and R3 will be rejected from further processing.
Then, for the remaining reference points, the strengths of the network signals from the measurement point set are analyzed, along with the mean signal strengths and the standard deviation of the network in the set for each reference point.For each pair of sets the distance factor is determined according to the following formula: ,  = ∑   (,   ) ⋅   (,   ) ,  =1 (2) For each considered networkithe multiplication of two functions is performed.The first can be called the reference point assignment and is defined as the fuzzy assignment of the measurement point X to the reference point   : (,   ) = where (  , ) is the average signal strength for the network from the description of the reference point,   is the measured signal strength for the network at the measuring point   , and (  , ) is the standard deviation of the signal strength for the network i from the description of the reference point   .
In the early stages of location procedure development, many different fuzzy assignment functions were taken into consideration: triangular, piece-wise linear, trapezoidal, trigonometrical (cosine and arcus tangent-based), and many more.However, from the point of view of location accuracy and implementation complexity (SQL queries) the presented solution gave the best results.
The second function c represents the signal strength correction factor and is defined as follows: The results of using distance only based on reference point assignment functions were unsatisfactory.This was mostly due to the fact that the radio signal strength results from many other factors such as transmitter power, the output of the transmission and reception antennas, attenuation on the receiver and transmitter, and signal reflections (amplifying or weakening the signal in the receiver).This implied introducing an additional correction factor whose final form, presented above, was obtained through trial and error.
Step 4: After performing the above calculations, the determined distance factor is corrected with the previously calculated network factor  ,  .The correction takes place for each reference point.This is expressed by the formula: As a result, after performing the above calculations, we obtain a measure matching measurement point X to individual reference points.As the obtained all room assignment levels have positive, non-limited values, they must be normalized.The range [0; max   ,  ] is scaled linearly to [0;1].
The results showing the measure of matching the X measuring point to the map reference points are presented in Table 7.After sorting the table according to the value of the measure of matching from the highest to the lowest value, in the first place we obtain the location of the measurement point.In the example, this is the reference point, i.e. based on the performed calculations it can be concluded that the measurement was conducted in room 4. The obtained results can be filtered by rejecting the values of the matching measure which did not exceed the limit value.Such filtering, however, requires additional research that will allow to deliberately set a results rejection threshold.Without the use of such filtering, the order in the list shows the distance of the measuring point from the reference point.The measuring point is located closest to the reference point in the first position and farthest from the reference point in the last position.

Data acquisition
The building, in which the procedure was developed and applied, is a three-storey one with several dozen rooms on each floor.In the experiments, 20 of them were taken into consideration.One of them was the conference hall which was divided into four separate areas, so the final number of locations was 23.The criterion for selecting these rooms was the frequency of changes of the monitored people's locations.
As it occurred, inside these rooms over 100 different wireless network IDs were detected.In each room, 30 measurements of wireless network signal levels were conducted.These measurements were taken using several different types of mobile phones.The highest number of detected signals in a room was almost 35.This implies that the raw data were rather sparse.It also confirms that in the presented problem a non-resistant method could not be applied for missing values.
Finally, the collected dataset consisted of 690 objects and over 100 features.The measured signal strength was expressed in dB and, as it is assumed in the wireless network area, the signal strength is a negative number, and as the level of the signal decreases, the value of the strength decreases too.
A sample part of the collected data is presented in Table 8.The mes_id column represents the specific measurement identifier, the dev_id column is connected with the specific measurement device, columns from n001 up to n100 deal with wireless networks, and finally -ven_idis the venue identifier in the building.The collected data, due to the nature of the environment (dozens of networks and hundreds of measurements detecting only several networks), are characterized by a very high number of missing values.The total amount of non-missing ones was approximately 6%.For that reason, the application of the most popular multiclass classifiers was impossible.

Simple experiments
All experiments were carried out in the leave-oneout/stratified cross-fold validation method.In each of the 30 iterations, exactly one measurement from each venue was moved to the test set.This ensured that the fuzzy maps of venues were built on 29 measurements (each of the maps).On the other hand, the proportion of venue representation in the training and test set was equal.The proposed scheme also ensured that the prediction accuracy, measured as the fraction of correctly and incorrectly classified positions, is equal to the class-weighted accuracy.Finally, it is possible to present the prediction results in one confusion matrix as each measurement is only taken once as the testing object.
The results of the localization procedure are presented in Table 9.The total accuracy reaches a level of almost 75% (74.64%).The least class prediction accuracy was at the level of 23.33% but it should rather be considered the outlier because the median is 83.33%, which is quite more than the average.It is also worth noticing that none of the objects was unclassified.In other words: each wireless network signal level measurement was classified to one of the possible venues.
Let us remember that the first four places can be interpreted as one location (the conference hall divided into four regions).That simplification increases the prediction accuracy up to 78.63% and the median up to 86.66%.However, from the perspective of utility, we would not consider the class aggregation in further experiments.

Haste makes waste
Typically in the case of classification or regression tasks, a situation where an object is not classified to any known class (it is not possible to predict the real value of the dependent variable) is usually considered the misclassification (the error of predicting a known value).This results from the fact that the environment of independent variables remains unchanged.This is very easy to explain: in the case of character recognition, the image of acquired pixels does not change in time; while in the case of real values prediction, the constant current value of the predicted variable should be constant due to constant independent variables values.However, in the case in question, when a person enters the room, we can agree that as long as that person stays in the same room the prediction may be done at a certain cost in terms of prediction duration time.In other words: in such a casehaste makes waste.We agree for the longer time of prediction provided that the prediction accuracy also increases.This issue will be explained later in this section.
The maximum number of networks in the venue maps is 33.This leads to the remark that we may define the condition of the minimal percentage of venue map networks to be recognized by the measuring device (network factor) as the necessary condition for taking the map into consideration.For example: if the map is built on the strength of 10 network signals and the assumed minimal percentage of the measured signal is 25%, at least three signal levels must fulfill the map criteria to begin to consider the location pointed by that map.
Let us consider all possible signal threshold levels, varying from 0 to 1.The smallest sensible value of the threshold is 0.03 as there is no map built on more than 30 network signals.Then, similar experiments were carried out for each considered threshold.The results are presented in Figure 2.

Fig. 2. Data classification accuracy
The solid line represents how the classification accuracy of the map covered the sample changes due to the threshold increase.Starting from the value of 0.7464 (which is the same as in the previous experiments) it increases to 1.0 (for the threshold of 0.8700).The dotted line also represents the classification accuracy but in this case, unclassified objects are considered wrongly classified.We can observe the intuitive effect: as the threshold increases, the accuracy decreases.The decrease in the accuracy is caused by the decrease in the data coverage.In the next figure (Figure 3) we can compare both tendencies.The intuitive reflection about the increasing range of unclassified objects says that the model degrades with the threshold level.
It is a well-known procedure of parametrized classifier quality description to present the chart called ROC -Receiver Operator Curve.In terms of the classification issue, it shows how the true positive and false positive rates change due to the classifier parameters modification.In our case, it is worth taking into consideration two other criteria.We are interested in high accuracy and low non-coverage (difference between a set of objects and a set of objects classified by the model/map -the measure complement to the coverage).Let us check how this dependence looks like in the case of our predictor.ROC is presented in Figure 4.

Fig. 4. Comparison of accuracy and coverage decrease
The presented figures confirm that in this specific case, with changing environmental conditions, the ''lazy'' decision is preferred due to its correctness and it is worth considering the threshold increase in the real application.The following section will be devoted to proper threshold level selection.

Network factor level selection
As it was mentioned in the previous section, the main goal of the issue is to predict the proper location even at the cost of time spent to obtain this prediction.So it is important to define the appropriate compromise between the coverage of the model and the accuracy of the model predictions (limited only to the final statementsunclassified objects do not degrade the prediction accuracy).
One of the most intuitive approaches is to compare the coverage decrease and the accuracy increase due to the threshold increase.It is easy to find the threshold level in which the mentioned two measurements intersect.The comparison of data coverage and covered data prediction accuracy is presented in Fig. 5.

Fig. 5. Coverage and covered data accuracy
The analysis of that figure should imply the proper threshold level on the value equal to 0.23.A higher level will cause the coverage to be lower than the accuracy.With the assumed threshold, the accuracy would be equal to 0.807.
Another approach is to build the average of mentioned measures.The average result is presented in Fig. 6.

Fig. 6. Coverage and covered data accuracy together with their average
It occurs that such averaged measure decreases at the threshold level equal to just 0.1.That means that the optimal threshold value is a step-before one (0.066) and this situation corresponds with the initial prediction accuracy: 0.7518.
The mentioned two approaches are based on the raw quality measurement values.However, the most important quality criterion should take into consideration the real application results.The increase of the threshold value simultaneously increases the accuracy and decreases the data coverage, which was presented in the previous section.The crucial question is "How long can the threshold be increased to assure all venues be covered by the prediction results?''.In other words, what is the maximum threshold level that assures at least one correct classification into each of the possible venues?
This leads us to the results presented in Fig. 7.It occurs that the maximum threshold level that assures at least one correct object classification (for each possible object class) is 0.50.Up to this level, it is only a matter of time to obtain the correct location for the considered device.Over this level, there is a possibility that we are in a venue that we will never be classified into.The classification accuracy at this threshold level is 0.8953.

Conclusions and further works
The paper presents a novel approach to the location issue based on the wireless network signal measurement.A single measurement observed with the mobile phone is compared with the previously built description of all venues in the building.The final venue assignment is made based on the fuzzy assignment to each of the considered rooms.
The developed strategy of building the description (map) and the fuzzy location procedure is now applied in the people location tracking system developed at the Institute of Innovative Technologies EMAG.
The presented solution is adaptable but it requires an advanced implementation phase: the process of creating the map of the building based on averaged measurements.Additionally, the analysis of the proper network factor should be performed for each building separately.

Fig. 3 .
Fig. 3. Comparison of accuracy and coverage decrease

Fig. 7 .
Fig. 7.The minimum number of objects classified correctly to the class

Table 1 .
Sample data describing WiFi signal strength in different locations measured with different devices

Table 2 .
Sample descriptions of two reference points

Table 4 .
Detected WiFi networks and their signal strength

Table 5 .
Results of measurement made at location X

Table 7 .
Raw levels of assignment and normalized levels of assignment

Table 8 .
Sample data acquisition results

Table 9 .
Confusion matrix for the classified objects