1. Introduction

1.1 Background

Recently, Oslo won the title of the European Green Capital: https://www.visitoslo.com/en/articles/oslo-european-green-capital-2019/

There is an increased emphasis from the Norwegian government on a greener, more sustainable future.

One of the ways in which this is being planned is by curbing the use of fossil fuels. To this end, the government has introduced a slew of measures that make it costlier for an individual to own or drive a car, whilst trying to incentivise the increased use of public transport.

However, not all areas of the country have an equitable distribution of public transport options.

To explore this, I decided to investigate the status in Oslo.

After all, people are more likely to use public transport if it is readily available.

1.2 Business Problem

As a hypothetical case study, the transport department approaches me to help them decide where to build more transport options in Oslo.

My job as a data science consultant is therefore to help them decide which areas of Oslo are lacking adequate transport infrastructure and where they should invest. 

One way to solve this problem is to cluster neighbourhoods/areas based on the number of transport options available and visualise them on a map, that can then help pinpoint areas needing improved connectivity.

The idea here would be- find neighbourhoods that have transport options within 400m, which is a reasonable walking distance that does not take too long to cover. Henceforth, we will search for transport options within a 400m radius of a given geo-location

To accomplish this task, one would need data on a) the different types of transport options available b) where they are available and c) how many are available.  

Such information may be obtained through the Foursquare API. Additionally, one would need geographic location data of postcode/streets to pin-point the areas that need improvement.

2. Data

2.1 Points to consider

Before we can commence our project, it is important to consider the following points-

  • In Oslo kommune/municipality, a given street can belong to different postcodes
  •  Furthermore, a postcode is not very intuitive to understand/place in one’s mind. If someone says «Oh I live at 0125 postcode«, it is not very easy to mentally think where that is. Instead, if the person says, «Oh I just live by the Blåsteingata», then one can instantly place where they live (assuming of course, that one is familiar with the street names)
  • So, in this project, we need to make a decision how deep a level to drill down, w.r.t defining an area around which to find transport options.
  • Additionally, having postcode info may lead to redundant street level data: i.e. 1 street = several postcodes. Conversely, 1 postcode = several streets, with sometimes the same street occurring in different postcodes

Based on the above points, since it is not very intuitive to understand the location of postcodes, and nobody can place them in their head, I will try to find transport options down to the street-level and not postcode level.

But then the question comes, how to define coordinates of street? One possible way may be to use mid points of street start and street end. For this to work, we would need to make an assumption that for a small section of the Earth such as a street, the distance between two points is basically a straight line (unlike a curved line for large distances)

2.2 Data availablity

For our project concerning transport options at a street-level, unfortunately Foursquare API has no direct way to find all transport options. Instead one would need to manually extract information about:

  1. Trikk/Tram
  2. Bus Stop/Bus Station
  3. T-bane/Metro
  4. Train Station

Furthermore, based on my overall experience, Foursquare data is not very good for Oslo, compared to USA/Canada, possibly due to insufficient awareness/use. This made it a considerable challenge to extract relatively clean information regarding various transport modes.

Additionally, to obtain the geo-coordinates of different streets in Oslo will not be an easy task. This information is not readily available. One possible way is with the help of the website created by Erik Bolstad.

  • It provides geo-coordinates of all the postcodes and the respective bydel/districts of Oslo
  •   Additionally, within each postcode, it maps out the street addresses which we will need to obtain.

3. Methods

The methods employed primarily fall under the following categories:

  • Web-scraping – HTML, JSON
  • Data wrangling in Python with Pandas, NumPy
  • Machine Learning, using K-Means Clustering to cluster the streets

The full details of all the steps are readily available on GitHub under Part 1 (Web Scraping), Part 2 (Foursquare API), Part 3 (EDA), Part 4 (Clustering).

4. Results

4.1 Obtaining street coordinates

After parsing through 442 postcodes in Oslo, I obtained street coordinates for 2460 streets. In the process, I omitted extracting information from postcodes for postboxes, service postcodes and other postcodes not in use. Here is a snapshot of the first 5 streets from the table:

Looks sweet….but sweet takes effort!

4.2 Basic Visualisation of streets

Using the Folium package, I subsequently generated a basic map of Oslo, plotting all the street names on it. One of the great things about Folium is the ability to generate point-and-click markers, as shown below:

The satisfaction of a wrestling contest that finally turns in your favour….and the vision starts bearing fruit

4.3 Obtain transport options for each street

After much data wrangling and munging with the extremely messy Foursquare data, including missing labels and categories (and Foursquare completely missing the Nationaltheatret train station!), the following table showing the various transport options (Trikk, Buses, T-bane and Train)for each street in Oslo was generated.

A simple table….that requires the skill of a carpenter to carve out from the wood of messy data

4.4 Streets with highest and lowest transport options

Following the generation of the above table, I computed the total transport options available for each street and then plotted the 10 streets with the most and least number of transport options available.

Now the fun really starts… some streets have more options than your hands and feet can grab!

As we can see, the streets that have the most transport options have as many 22 options available. On the other hand, there are also streets where there is No transport available within a radius of 400m.

4.5 Distribution of transport options

Let us find out how many streets actually have so many transport options or conversely streets with limited number of transport options.

We come to know that an astonishingly high number of streets, in fact, more than half of all streets in Oslo, have only 1 or no transport available within 400m.

This is confirmed through the violin plot that shows a very high skewness, with the majority of streets having less than 5 transport options

4.6 Clustering and Visualising

Using K-Means clustering and generating 7 clusters, we create the following map that helps group all the streets in Oslo based on the number of transport options available:

At last…the real ‘Machine Learning’ part of the project!

From the map, all the purple spots are streets where there are 2-3 transport options available while the blue spots represent areas with the least amount of transport options.

The various clusters, denoting the number of travel options and the total streets belonging to each cluster is depicted below:

5. Discussion

Vast parts of Oslo, measured down to the street level have very few transport options within 400m. 

With the help of the map and the clusters, the Transport Department can now easily plan which areas of Oslo to emphasise for building more infrastructure. For example, if they wanted to prioritise areas that have 0 transport options within 400m, then the following map would be very useful:

>>This is where ML finally translates to the business case at hand and brings value!<<

A major caveat in this work is that mid points of streets were taken. Since we define nearest transport option as those within 400m radius, if the street itself is >800m in length, then the midpoint may not necessarily be a good location measurement. Furthermore, one would obtain different results if one’s definition of short walking distance is more than 400m.

Additionally, one can further expand and elaborate this project by:-

  • Obtaining population info (folketal) for each bydel (city district) / streets and making choropleth maps showing population density. However, choropleth maps need GeoJSON file containing boundary info for each bydel/city district, so such a source must be found/generated.
  • Mapping foot traffic info onto heatmaps, that can help truly emphasize areas with maximum passenger traffic
  • Map the frequency of transport options in addition to number of transport options
  • Work on ferry data as well
  • Use Google Maps API instead of Foursquare API

6. Conclusion

To conclude, we set out to map out areas of Oslo, down to the street level, that can be said to have transport options available. We defined transport options as having any public transport facility within 400m of the mid-point of the streets in Oslo.

We find that there are several streets that are in need of infrastructure upgrade and if the transport department chooses to focus on the streets with less than 1 transport option within 400m, it will have its hands full and will hopefully be a satisfied client 🙂

7. References

  1. Erik Bolstad
  2. BeautifulSoup4
  3. Foursquare API
  4. Folium

About the author

Simple at heart and driven just through curiosity, Niladri has travelled miles, both figuratively and literally!

Born in Delhi, India to Bengali parents, he found himself strangely at odds with the rest of the North Indian culture. Diverse in his interests- arts, music, athletics and academics, the hardest thing to decide was- what to focus and what to drop, a struggle that continues to this day.

For pursuing his PhD, he moved to Bergen, Norway, mid 2014, to a culture and place that could not be farther away from the hustle and bustle of Delhi. Through the 4+ odd years there, he discovered evidence supporting the evolutionary hypothesis of schizophrenia- a serious mental illness affecting 1 in 100 individuals (for dramatised depictions of the illness, think ‘A Beautiful Mind’ starring Russel Crowe or ‘Fight Club’ starring Brad Pitt)

While we are still some way off before conclusively stating being human makes us schizophrenic, human evolution does have a role.

Presently he is ‘travelling’ through the various ‘realms’ of data, exploring new territories and landscapes, working as a data science consultant in the AI team @Capgemini Norway. To get in touch with him feel free to head to https://www.linkedin.com/in/niladri-banerjee/

Legg igjen en kommentar

Fyll inn i feltene under, eller klikk på et ikon for å logge inn:

WordPress.com-logo

Du kommenterer med bruk av din WordPress.com konto. Logg ut /  Endre )

Google-bilde

Du kommenterer med bruk av din Google konto. Logg ut /  Endre )

Twitter-bilde

Du kommenterer med bruk av din Twitter konto. Logg ut /  Endre )

Facebookbilde

Du kommenterer med bruk av din Facebook konto. Logg ut /  Endre )

Kobler til %s