How close is public transport in Vienna?

Evaluating and visualizing accessibility of Vienna’s public transport in various districts.
Data science
Vienna

Introduction

Vienna is considered to have an excellent public transport system, and one of the talking points is the accessibility of some form of public transport or the other in most areas of the city. In this project, I probe this aspect by creating a exhaustive dataset of (all!) addresses in Vienna along with the closest public transport by type (U-bahn, tram, bus, night bus) and the walking distance to this closest stop.

To implement this, I used the following datasets and tools:

  1. GTFS feeds of Wiener Linien (and Wiener Lokalbahnen) from transit.land, which contain details about the various public transport stops, coordiantes, routes and schedules.
  2. Adressen Standorte Wien from Cooperation OGD Österreich, which gives details of addresses in Vienna and their coordinates.
  3. A local Docker-installation of openrouteservice to query walking distances between addresses and public transport stops.
  4. R Shiny to create an interactive app allowing users to change district, public transport type, and distance threshold to see which areas of the district have public transport access inside the selected distance threshold.
  5. An AWS (Amazon Web Services) EC2 instance to deploy the app and embed here.

Interactive with R Shiny

Important

If you don’t see anything below even after waiting 15 seconds, force your browser to use HTTP instead of HTTPS. Switch back to HTTPS after you are done interacting with the Shiny app.

Speeding up computations

There are a lot of coordinates for each district, making the task of finding the areas computationally heavy. So currently, I’m sampling addresses from the district such that there aren’t too many addresses too close to each other (they have a high probability of having the same closest public transport). This is one of the reasons why you might notice that some areas that ought to be covered sometimes aren’t. I also use st_simplify with a large dTolerance and st_buffer with a large distance value. This speeds up the process of finding the clouds of areas from a large set of coordinates. This, however, is still slow for large districts like 1020, 1200, 1230, and also for large distance thresholds. Next, I’d like to try other methods of speeding up this computation. Some options are to pre-compute the areas for different thresholds using DBSCAN, a density-based clustering algorithm, and Shapely’s unary_union.

Code

The code is available here on GitHub.