6

Detecting Quarantine Violation from Mobile Phone Location Anomaly on Spark

 3 years ago
source link: https://pkghosh.wordpress.com/2020/04/20/detecting-quarantine-violation-from-mobile-phone-location-anomaly-on-spark/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Detecting Quarantine Violation from Mobile Phone Location Anomaly on Spark

With the world under siege with Corona virus, you might find this topic timely. There are two main aspects of any epidemic breakout, epidemic spread and containment. There are various strategies for containing epidemic spread. One of them is to put people tested positive under quarantine. People quarantined are not allowed to have any contact with any body.

How do you know if quarantine is not being violated. In this post, we will go through techniques for detecting quarantine violation based on anomaly in mobile phone location data. The Spark implementation is available in my open source project beymani in gitHub.

The implantation is generic and applicable for many other problems. It detects outliers depending on whether data is outside a defined range. It can also detect outliers based whether data falls into a range. Some other possible applications are IoT sensor data and geo fencing.

Quarantine Violation

Quarantine could be violated in 2 ways. Our solution will cover both cases. A 1)quarantined person could go out of a quarantined location or a 2)non quarantined person could come into a quarantined location.

In most anomaly detection algorithms, normal patterns are learnt using machine learning algorithms. Anomaly is defined in terms of a data point lying outside a range or a data point lying inside a range, corresponding to the 2 scenarios alluded to earlier. We have assumed that ranges of locations i.e perimeter of locations are pre defined and do not require Machine Learning solution.

A quarantined location is approximated as a rectangular grid and defined using a pair of GPS locations. The range of latitude and longitudes of the location is used for anomaly detection.

Anomaly score could be defined as step function with centered at location boundary. It will have a value of 1 for any mobile location outside the quarantined location and 0 inside for the first violation scenario The step function will be reversed for the second violation scenario.

To account for the approximation of locations as a rectangular grid and inaccuracies in GPS location data, we are using a logistic function for anomaly score, which can be thought of as a smoothed version of a step function.

For the first violation scenario, the value of the logistic function will increase to 0.5 as one moves from the center of the location to the boundary. As one moves away from the location the value increases from 0.5 and asymptotically approaches 1.0. The growth of the logistic is controlled by the growth factor, which is a configuration.

For the second scenario, the logistic function is flipped. As one approaches the location, it’s value increases to 0.5 at the location boundary. Inside the location, the value keeps increasing from 0.5, as one approaches the center of the location.

Discovering Quarantined Location with Clustering

As mentioned earlier, we assumed that quarantined locations are pre defined. This approach will work when dealing with small manageable number of quarantine locations e.g officially designated locations. It may not be practical when when there are large number of quarantined locations e.g in home quarantine

For large number of quarantine locations, Machine Learning can help. After someone is quarantined, mobile location data could be used to discover location cluster. For any quarantined person, there will be one cluster, which is the quarantined location. The clustering approach is more flexible. For example, if a person is moved from one quarantine location to another, with clustering, we can quickly discover the new location.

Moving Out of Location Range Outlier

As mentioned earlier, this kind of outliers correspond to the first violation scenarios i.e a quarantined person goes outside the quarantined location. Quarantined location data and movement location data is generated artificially with a python script. The movement data consists of phone number, time stamp, latitude and longitude. The spark job is implemented with the scala object OutRangeBasedPredictor.

Here are some records from the output showing outliers. These outliers will corresponding to first violation scenario i.e a quarantined person leaving the quarantined location.

4088290394,1585744200,37.117102,-122.316821,0.483,N
4088290394,1585743900,37.117020,-122.316748,0.462,N
4088290394,1585744200,37.117102,-122.316821,0.492,N
4088290394,1585744500,37.117053,-122.316859,0.494,N
4088290394,1585744800,37.118096,-122.317769,0.876,O
4088290394,1585745100,37.118469,-122.318528,0.965,O
4088290394,1585745400,37.117018,-122.316795,0.462,N
4088290394,1585745700,37.117034,-122.316843,0.486,N

Records with having O in the last field are outlier. As we can seethe are 2 outlying records in this snippet. Since the location data is sampled every 5 minutes, the person had ventured out of the quarantined location for at least 5 minutes and uptown 15 minutes.

Moving into Location Range Outlier

This kind of outliers correspond to the second scenario of quarantine violation, i.e some one from general population visits a quarantined location. The data is again synthetically generated with a python script.. It’s challenging to generates mobility data for general public as they go about their daily business. Certain distributions are assumed for various mobility patterns. Data is generated by sampling from those distributions. The spark job is with the scala object InRangeBasedPredictor.

Here are some outlier records. There are 2 consecutive outlier records. Since the location sampling interval is 5 min, the person was inside a quarantined location for anywhere between 5 and 15 minutes.

4080287456,1586697000,37.416589,-122.132281,0.172,N
4080287456,1586697300,37.416568,-122.132150,0.188,N
4080287456,1586697600,37.416225,-122.132341,0.562,O
4080287456,1586697900,37.416170,-122.132029,0.617,O
4080287456,1586698200,37.416409,-122.131984,0.338,N
4080287456,1586698500,37.416592,-122.132323,0.170,N
4080287456,1586698800,37.416496,-122.132003,0.249,N

There are two cases for this scenario of violation. In the first case, there is a list of quarantined location. No one from general public is supposed to visit any of these locations. The relationship between public and quarantined locations is many to many. Our use case correspond to this.

In the second case, an individual is barred from visiting a specific location as in court ordered restraining order. Here the relationship is one to one. The appropriate case can be chosen with a configuration parameter.

Multi Variate Anomaly Detection

Our data consisting of latitude and longitude is multi variate. Instead of using truly multivariate anomaly detection, I have taken a simpler approach. Anomaly score is found for each dimension. They are aggregated to find the net anomaly score. The scores can be aggregated in the following ways

  1. average
  2. weighted average
  3. median
  4. max
  5. min

Depending on the problem at hand, appropriate aggregate strategy should be used. For our problem, we have used max aggregation.

Wrapping Up

We have gone through techniques for detecting quarantine violation based on mobile phone location anomaly. The solution presented here, could be part of a testing and quarantine management and tracking ecosystem. To excuse the use case please refer to the step by step instructions in the tutorial.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK