In this article, we will highlight the main steps involved to predict a location for a coffee shop in Vancouver. We also want to make sure that the coffee shop is near a transit station, and has no Starbucks near it. As an added feature, we will make sure that the crime concentration in the area is low, and the entire program should be implemented in Python. So let’s walk through the steps.
Steps Required
- Get crime history for the last two years
- Get locations of all transit stations and Starbucks in Vancouver
- Check all the transit stations that do not have any Starbucks near them
- Get all the data regarding crimes near the filtered transit stations
- Create a grid of all possible coordinates around the transit station
- Check crime around each created coordinate and display the top 5 locations.
Gathering Data
This covers the first two steps required to get data from the internet, both manually and automatically.
Getting all Crime History
We can get crime history for the past 14 years in Vancouver from here. This data is in raw crime.csv format, so we have to process it and filter out useless data. We then write this processed information on the crime_processed.csv file.
Note: There are 530,653 records of crime in this file
In this program, we will just use the type and coordinate of the crime. There are many crime types, but we have classified them into three major categories namely;
Theft (red), Break and Enter (orange) and Mischief (green)
These all crimes can be plotted on Graph as displayed below.
This may seem very congested and full, so let’s see a closeup image for future references.
Getting Locations of all Rapid Transit Stations
We can get the coordinates of all Transit Stations in Vancouver from here. This dataset has all coordinates of rapid transit stations in three transit lines in Vancouver. There are a total of 23 of them in Vancouver, we can then use it for further processing.
Getting Locations of all Starbucks
The Starbucks data is present here, we can scrape it easily and get the locations of all the Starbucks in Vancouver. We just need the Starbucks that is near transit stations, so we’ll filter out the rest. There are a total 24 Starbucks in Vancouver, and 10 of them are near Transit Stations.
Note: Other than the coordinates of Transit Stations and Starbucks, we also need coordinates and type of the crime.
Transit Stations with no Starbucks
As we have all the data required, now moving to the next step. We need to get to the transit Station locations that have no Starbucks near them. For that we can create an area of particular radius around each Transit Station. Then check all Starbucks locations with respect to them, whether they are within that area or not.
If none of the Starbucks are within that particular Transit Station’s area, we can append it to a list. At the end, we have a list of all Transit locations with no Starbucks near them. There are a total of 6 Transit Stations with no Starbucks near them.
Crime near Transit Stations
Now let’s filter out all crime records and get just what we are interested in, which means the crime near Transit stations. For that we will plot an area of specific radius around each of them to see the crimes. These are more than 110,000 crime records.
Crime near located Transit Stations
Now that we have all the Transit Stations that don’t have any Starbucks near them and also the crime near all Transit Stations. So, let’s use this information and get crime near the located Transit Stations. These are about 44,000 crime records.
This may seem correct at first glance, but the points are overlapping due to abundance, so we can create different lists of crimes based on their types.
Theft
Break and Enter
Mischief
Generating all possible coordinates
Now finally, we have all the prerequisites and let’s get to the main task at hand, predicting the best coordinate for the coffee shop.
There may be many approaches to solve this problem, but the one I used in this program is that I will create a grid of all possible locations (coordinates) in the area of 1 km radius around each located transit station.
Initially I generated 1 coordinate for every m, this resulted in 1000,000 coordinates in every km. This is a huge number, and for the 6 located Transit stations, it becomes 6 Million. It may not seem much at first glance because computers can handle such data in a few seconds.
But for location prediction we need to compare each coordinate with crime coordinates. As the algorithm has to check for ~7,000 Thefts, ~19,000 Break ins, and ~17,000 Mischiefs around each generated coordinate. Computing this would want the program to process an estimate of 432.4 Billion times. This sort of execution takes many hours on normal computers (sometimes days).
The solution to this is to create a coordinate for each 10 m area, this results about 10,000 coordinate per km. For the above mentioned number of crimes, the estimated processes will be several Billions. That would significantly reduce the time, but is still not less.
To control this, we can remove the duplicate values in crime coordinates and those which are too close to each other ~1m. Doing so, we are left with just 816 Thefts, 2,654 Break ins, and 8,234 Mischiefs around each generated coordinate.
The precision will not be affected much but the time and computational resources required will be reduced a lot.
Checking Crime near Generated coordinates
Now that we have all the locations, we will start some processing on it and check each coordinate against some constraints. That are respectively;
- Filter out Coordinates having Theft near 1 km
We get 122,000 coordinates with no Thefts (Below merged 1000 to 1)
- Filter out Coordinates having Break Ins near 200m
We get 8000 coordinates with no Thefts (Below merged 1000 to 1)
- Filter out Coordinates having Mischief near 200m
We get 6000 coordinates with no Thefts (Below merged 1000 to 1)
To check for Thefts, control has to process 48.9 Million times, It is 159.2 Million for Break Ins, and 494 Million for Mischiefs. At the end All nearly lying coordinates are merged automatically.
Predicting Top 5
Now that we have 6 Coordinates of best locations that have passed through all the constraints, we will order them.
To order them, we will check their distance from the nearest transit location. The nearest will be on top of the list as the best possible location, then the second and so on. The generated List is;
- -123.0419406741792, 49.24824259252004
- -123.05887151659479, 49.24327221040713
- -123.05287151659476, 49.24327221040713
- -123.04994067417924, 49.239242592520064
- -123.0419406741792, 49.239242592520064
- -123.0409406741792, 49.239242592520064
Code
https://github.com/Mindtrades-Consulting/Coffee-Shop-Location-Predictor