26

Where should I walk?

 4 years ago
source link: https://www.tuicool.com/articles/7Vjyiyi
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Using new tools from NVIDIA RAPIDS to determine the shortest walking distance to a parking spot

Sep 25 ·6min read

Z3qIfiI.png!web

Introduction

In theprevious story, we’ve explored the Paid Parking Occupancy dataset provided by the City of Seattle Department of Transportation. You could see (and, hopefully, test) how quickly all the calculations on this data can be done using NVIDIA RAPIDS .

Just to recap: the data we used can be downloaded here . It is a subset of the full dataset that has been published since the beginning of 2019 and includes all the transactions for May and June. The dataset is around 7GB in size and fits nicely on the NVIDIA Titan RTX with 24GB of VRAM; to use this dataset on a GPU with less RAM (like the NVIDIA RTX 2080 Ti with 11GB) you might need to extract only one month — I have already prepared a smaller dataset that includes only the transactions for May 2019 and you can download it here .

Doing geospatial calculations fast

Before we go on with this story let’s go on a small tangent. Literally one day after I published my story NVIDIA’s RAPIDS team announced another great piece in the data science puzzle: cuSpatial !

3EJBran.png!web

Figure 1. The cuSpatial stack, used with permission from NVIDIA

cuSpatial is a library that, as the name suggests, focuses on the geospatial calculations. It is a bunch of C++/CUDA kernels exposed via Python interface just as every other tool in the RAPIDS toolkit. And it is fast, just like the rest of them!

Just so we get a feeling for how fast it does the calculations we will use the full dataset from the previous story (as a reminder, the data consists of 48M geocoded transactions) and test how much quicker we could have done the calculations if we had used cuSpatial.

rUNji2J.png!web

In this test, we can see a 4.8x speedup over the vanilla Python without breaking a sweat (and much less code!!). To compare the performance, we reused (somewhat syntactically compressed) version of the calculateDistance(…) method from the previous story.

Note, that if you run the .apply_rows(…) method for the first time you will incur the compilation penalty as RAPIDS needs to JIT compile the calculateDistance(…) method first: the first run of the cell normally reports ~650ms on my machine but if you rerun the cell again the execution time drops to ~220ms thereafter; assuming you didn’t make any changes, the calculateDistance(…) method would have been cached at that point. I caught myself on this when I first reported the impressive speedups I saw . However, the discussion that ensued actually led to a discovery of an additional performance squeeze that will come to RAPIDS soon!

Do we fly or do we walk?!

So, now back to the continuation of our story. Last time we calculated the distance as crow flies not as we walk . This led to the inclusion of some parking spots that would actually be farther than 1000 ft if reached on foot.

Thankfully, the one and only John Murray of Fusion Data Science took the shapefiles of the King County Tiger/Line road network and kindly created and donated a graph of King County roads in a form of a list of intersections (with geo coordinates) and list of edges linking the intersections with calculated length (in yards). The data can be downloaded here but if you use the code we post on Github, the notebook will do this for you anyway.

Connecting the parking spots to the graph

The parking locations, obviously, are not present in the graph so we need to add them. In the first attempt to achieve this we will loop through all the 1,500 parking locations, calculate the distance to each and every road intersection, and select 3 nearest intersections. This is a bit of a hack and leads to some artifacts (showing later on the map) but will do for now. In the next story, we will discuss how to add new nodes that place the parking spots perpendicular to the road/edge (thanks again to John for suggesting this and helping with the code!)

qyAFZb2.png!web

The .nsmallest(…) method (there is also nlargest(…) available) returns the top 3 closest intersections; we append these edges to the road_graph_data DataFrame using the .concat(…) method. Finally, we also add a new node to the parking_locations_nodes DataFrame so we can then later append them to the full list of the graph nodes.

Let’s pause here for a second: we have just calculated the haversine distance using cuSpatial for all 1,500 parking locations to each of the 127k road intersections (so, that is around 200M calculations), selected the 3 closest intersections, and updated the datasets, all in a matter of 1 minute. If this is not fast I don’t know what is…

Moving on, we also need to do add a node for the Space Needle. However, instead of linking to the nearest 3 intersections, a glimpse on the map suggests we should link it to 5.

MnyQbeN.png!web

Just like before, we use the Nomatim geo encoder from the geopy package to get the coordinates of Space Needle.

Let’s build a graph!

Now that we have the full list of nodes and edges we can build the graph. This is super easy in RAPIDS.

mIrMNfN.png!web

Check the documentation for the full list of available methods and algorithms!

With the full graph constructed we can now use the Single-Source Shortest Path (SSSP) to calculate distances from the Space Needle to each parking location! The algorithm traverses the graph and finds the shortest paths to all the 128k nodes; for an overview of SSSP algorithms I found these notes to be useful. In return, we get a DataFrame with a list of vertices and the corresponding shortest distance to such vertex. The DataFrame also shows the predecessor , a node (or vertex, if you will) that one needs to visit right before a particular node in question that forms the shortest path. All of that in 174 ms.

zeMZvuQ.png!web

We can now use this information to create the full path from the Space Needle to each of the parking spots that are within 1,000ft walking distance.

NFzAFnm.png!web

In the snippet above I simply hop from node to node and add the edges to the paths DataFrame. After about 1 second we can start charting. From the road_nodes DataFrame we extract the coordinates of each node so we can later use them to plot the points on the map.

f6ZNbam.png!web

And here’s the final result!

yIjUVfn.png!web

So, given the required distance, the closest parking spots from Space Needle are to the South and the East. Now, if you look close, you can see the artifacts I mentioned earlier and the inadequacy of the method of assigning parking spots to the nearest 3 intersections: to get to some of the parking spots the algorithm would require you to walk past the parking spot and then walk back to it. In the next story, we’ll tackle this problem.

Summary

Compared to ourprevious workflow (that run in about 20 seconds on the Titan RTX), this end-to-end process takes about 2 minutes to finish. Still, most of this work can be saved and then reused later, reducing the inference time to a mere second or two.

Stay tuned for more examples of the power, speed and agility of NVIDIA RAPIDS!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK