Scatterplots in pydeck: A case study using Beijing subway stops

Below we'll plot the location of Beijing subway stops over time, half of which have been built since 2015. Locations for subway stops come from Wikipedia and OpenStreetMap. This is not a rigorous study, so some subway stops may be missing.

Contents

Getting the data

First, we can use the Pandas library to download our data. You're likely already familiar with it–Pandas is a very popular library in Python for filtering, aggregating, and joining data.

In [1]:
import pandas as pd
import pydeck as pdk

# First, let's use Pandas to download our data
URL = 'https://raw.githubusercontent.com/ajduberstein/data_sets/master/beijing_subway_station.csv'
df = pd.read_csv(URL)
df.head()
Out[1]:
lat lng osm_id station_name chinese_name opening_date color line_name
0 39.940249 116.456359 1351272524 Agricultural Exhibition Center 农业展览馆 2008-07-19 [0, 146, 188, 255] Line 10
1 39.955570 116.388507 5057476994 Andelibeijie 安德里北街 2015-12-26 [0, 155, 119, 255] Line 8 (North section)
2 39.947729 116.402067 339088654 Andingmen 安定门 1984-09-20 [0, 75, 135, 255] Line 2
3 40.011026 116.263981 1362259113 Anheqiao North 安河桥北 2009-09-28 [0, 140, 149, 255] Line 4
4 39.967112 116.388398 5305505996 Anhuaqiao 安华桥 2012-12-30 [0, 155, 119, 255] Line 8 (North section)

Data cleaning

Next, we'll have to engage in some necessary data housekeeping. The CSV encodes the [R, G, B, A] color values a str, and literal_eval lets us convert that string a list.

In [2]:
from ast import literal_eval
# We have to re-code position to be one field in a list, so we'll do that here:
# The CSV encodes the [R, G, B, A] color values listed in it as a string
df['color'] = df.apply(lambda x: literal_eval(x['color']), axis=1)

Automatically generate a viewport

pydeck features some utilities for visualizing data, like an automatic zoom using data_utils.compute_view for 2D data sets.

We'll render the viewport, as well, just to verify that the visualization looks sensible.

In [3]:
# Use pydeck's data_utils module to fit a viewport to the central 90% of the data
viewport = pdk.data_utils.compute_view(points=df[['lng', 'lat']], view_proportion=0.9)
auto_zoom_map = pdk.Deck(layers=[], initial_view_state=viewport)
auto_zoom_map.show()
/Users/andrewduberstein/Desktop/deck.gl/bindings/pydeck/pydeck/bindings/warnings.py:5: UserWarning: A Mapbox API key is not set, which will lead to a blank base map. If this is intentional, set map_provider=None. Otherwise, pass a Mapbox API key and consider setting this API key as an environment variable. See https://pydeck.gl/installation.html#getting-a-mapbox-api-key
  warnings.warn(
/Users/andrewduberstein/Desktop/deck.gl/bindings/pydeck/pydeck/bindings/warnings.py:14: UserWarning: The frontend widget did not render. To enable the widget, see https://pydeck.gl/installation.html#enabling-pydeck-for-jupyter. Alternatively, use .to_html(), https://pydeck.gl/deck.html?highlight=to_html#pydeck.bindings.deck.Deck.to_html
  warnings.warn(

Sure enough, we're centered to Beijing.

Plotting the data

We'll render the data and use some Jupyter notebook functionality to provide a header with a year.

It's worth spending some time on each line, if you haven't seen the Layer object yet:

scatterplot = Layer(
    'ScatterplotLayer',
    df,
    get_radius=500,
    get_fill_color='color',
    get_position='position')

We can specify the layer type as the first argument, the data as the second, and the layer arguments as keywords. ScatterplotLayer is one of a list of layers available in the deck.gl core library. We'll also provide a header to list the year using some built-in Jupyter notebook tools.

For a list of other layers, see the deck.gl documentation. Remember that deck.gl is a JavaScript library and not a Python one, so the documentation may differ for some kinds of terminology and functionality (e.g., pydeck doesn't support passing functions as arguments but this is a common occurrence within deck.gl).

In [4]:
from IPython.core.display import display
import ipywidgets

year = 2019

scatterplot = pdk.Layer(
    'ScatterplotLayer',
    df,
    get_position=['lng', 'lat'],
    get_radius=500,
    get_fill_color='color')
r = pdk.Deck(scatterplot, initial_view_state=viewport)

# Create an HTML header to display the year
display_el = ipywidgets.HTML('<h1>{}</h1>'.format(year))
display(display_el)
# Show the current visualization
r.show()

Playing the data forward in time

Finally, we can loop through the data and see the dramatic development in Beijing since 1971, as demonstrated by subway stop opening dates.

In [5]:
import time
for y in range(1971, 2020):
    scatterplot.data = df[df['opening_date'] <= str(y)]
    year = y
    # Reset the header to display the year
    display_el.value = '<h1>{}</h1>'.format(year)
    r.update()
    time.sleep(0.2)