If you’re trying to plot geographical data on a map then you’ll need to select a plotting library that provides the features you want in your map. And if you haven’t plotted geo data before then you’ll probably find it helpful to see examples that show different ways to do it. So, in this post I’m going to show some examples using three different python mapping libraries. Specifically, I will show how to generate a scatter plot on a map for the same geographical dataset using Matplotlib, Plotly, and Bokeh in Jupyter notebooks.
What is Jupyter?
Jupyter is a web application that allows you to create notebooks that contain live code, visualizations, and explanatory text. It’s often used by data scientists for statistical modeling and data visualization. I frequently use Jupyter as a development environment to explore data sets and develop visualizations prior to implementing them in a standalone web application.
Matplotlib vs Plotly vs Bokeh
The three plotting libraries I’m going to cover are Matplotlib, Plotly, and Bokeh. Bokeh is a great library for creating reactive data visualizations, like d3 but much easier to learn (in my opinion). Any plotting library can be used in Bokeh (including plotly and matplotlib) but Bokeh also provides a module for Google Maps which will feel very familiar to most people. Google Maps does one thing and it does it well. On the other hand, Matplotlib and Plotly can do much more than just plot data on maps. As far as geo mapping goes Matplotlib and Plotly look different (sometimes better) from the canonical Google Maps visual. I’ve given all three of these libraries a pretty fair shake, and of the three I prefer using Bokeh with Google Maps because it’s so familiar and so simple to plot anything with latitude and longitude data.
In these examples, I'm plotting data from the California Housing Prices dataset, which I discovered while reading Hands-On Machine Learning with Scikit-Learn & TensorFlow, by Aurélien Géron. If you're interested in learning about how real world machine learning applications get developed and operationalized, I highly recommend Aurélien's book! For the matplotlib example below, I borrowed heavily from the code Aurélien posted here.
Please provide your feedback to this article by adding a comment to https://github.com/iandow/iandow.github.io/issues/3.
Geo Mapping with Bokeh and Google Maps¶
To learn more about working with scatter plots on maps with Google Maps, check out http://bokeh.pydata.org/en/latest/docs/user_guide/geo.html.
DATASETS_URL = "https://github.com/ageron/handson-ml/raw/master/datasets"
import os
import tarfile
from six.moves import urllib
HOUSING_PATH = "datasets/housing"
HOUSING_URL = DATASETS_URL + "/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.exists(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
fetch_housing_data()
import pandas as pd
def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)
housing = load_housing_data()
housing.head()
housingv2 = housing[(housing['median_income'] <= 10)]
#housing['median_income'] = housing['median_income'].apply(lambda x: x * x)
print(housing.size)
print(housingv2.size)
type(housing.latitude.tolist())
from bokeh.io import output_file, output_notebook, show
from bokeh.models import (
GMapPlot, GMapOptions, ColumnDataSource, Circle, LogColorMapper, BasicTicker, ColorBar,
DataRange1d, PanTool, WheelZoomTool, BoxSelectTool
)
from bokeh.models.mappers import ColorMapper, LinearColorMapper
from bokeh.palettes import Viridis5
map_options = GMapOptions(lat=37.88, lng=-122.23, map_type="roadmap", zoom=6)
plot = GMapPlot(
x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options
)
plot.title.text = "Hey look! It's a scatter plot on a map!"
# For GMaps to function, Google requires you obtain and enable an API key:
#
# https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
plot.api_key = "AIzaSyBYrbp34OohAHsX1cub8ZeHlMEFajv15fY"
source = ColumnDataSource(
data=dict(
lat=housing.latitude.tolist(),
lon=housing.longitude.tolist(),
size=housing.median_income.tolist(),
color=housing.median_house_value.tolist()
)
)
max_median_house_value = housing.loc[housing['median_house_value'].idxmax()]['median_house_value']
min_median_house_value = housing.loc[housing['median_house_value'].idxmin()]['median_house_value']
#color_mapper = CategoricalColorMapper(factors=['hi', 'lo'], palette=[RdBu3[2], RdBu3[0]])
#color_mapper = LogColorMapper(palette="Viridis5", low=min_median_house_value, high=max_median_house_value)
color_mapper = LinearColorMapper(palette=Viridis5)
circle = Circle(x="lon", y="lat", size="size", fill_color={'field': 'color', 'transform': color_mapper}, fill_alpha=0.5, line_color=None)
plot.add_glyph(source, circle)
color_bar = ColorBar(color_mapper=color_mapper, ticker=BasicTicker(),
label_standoff=12, border_line_color=None, location=(0,0))
plot.add_layout(color_bar, 'right')
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
#output_file("gmap_plot.html")
output_notebook()
show(plot)
Geo Mapping with Matplotlib¶
To learn more about working with scatter plots on maps with Matplotlib, read Chapter 2 of Hands-On Machine Learning with Scikit-Learn & TensorFlow, by Aurélien Géron.
DATASETS_URL = "https://github.com/ageron/handson-ml/raw/master/datasets"
import os
import tarfile
from six.moves import urllib
HOUSING_PATH = "datasets/housing"
HOUSING_URL = DATASETS_URL + "/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.exists(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
fetch_housing_data()
import pandas as pd
def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)
housing = load_housing_data()
housing.head()
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4)
plt.show()
housing.plot(kind="scatter", x="longitude", y="latitude",
s=housing['population']/100, label="population",
c="median_house_value", cmap=plt.get_cmap("jet"),
colorbar=True, alpha=0.4, figsize=(10,7),
)
plt.legend()
plt.show()
import numpy as np
import matplotlib.image as mpimg
california_img=mpimg.imread('california.png')
ax = housing.plot(kind="scatter", x="longitude", y="latitude", figsize=(10,7),
s=housing['population']/100, label="Population",
c="median_house_value", cmap=plt.get_cmap("jet"),
colorbar=False, alpha=0.4,
)
plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5)
plt.ylabel("Latitude", fontsize=14)
plt.xlabel("Longitude", fontsize=14)
prices = housing["median_house_value"]
tick_values = np.linspace(prices.min(), prices.max(), 11)
cbar = plt.colorbar()
cbar.ax.set_yticklabels(["$%dk"%(round(v/1000)) for v in tick_values], fontsize=14)
cbar.set_label('Median House Value', fontsize=16)
plt.legend(fontsize=16)
plt.show()
import matplotlib.image as mpimg
california_img=mpimg.imread('california.png')
ax = housing.plot(kind="scatter", x="longitude", y="latitude", figsize=(10,7),
s=housing['population']/100, label="Branch Customers",
c="total_bedrooms", cmap=plt.get_cmap("jet"),
colorbar=False, alpha=0.4,
)
plt.imshow(california_img, extent=[-124.55, -113.80, 32.45, 42.05], alpha=0.5)
plt.ylabel("", fontsize=14)
plt.xlabel("", fontsize=14)
plt.tick_params(colors='w')
prices = housing["median_house_value"]
cbar = plt.colorbar()
cbar.set_cmap("jet")
cbar.solids.set_edgecolor("face")
cbar.solids.set_cmap("jet")
cbar.set_label('Churn Probability', fontsize=16, alpha=1,
rotation=270, labelpad=20)
plt.legend(fontsize=16)
plt.show()
Geo Mapping with Plotly¶
To learn more about working with scatter plots on maps with Plotly, check out https://plot.ly/python/scatter-plots-on-maps/.
DATASETS_URL = "https://github.com/ageron/handson-ml/raw/master/datasets"
import os
import tarfile
from six.moves import urllib
HOUSING_PATH = "datasets/housing"
HOUSING_URL = DATASETS_URL + "/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.exists(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
fetch_housing_data()
import pandas as pd
def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)
housing = load_housing_data()
housing.head()
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
plotly.offline.init_notebook_mode()
import pandas as pd
coords = pd.concat([housing['latitude'], housing['longitude'], housing['population']], axis=1)
coords = coords.sample(frac=0.1, replace=True)
cases = []
colors = ['rgb(239,243,255)','rgb(189,215,231)','rgb(107,174,214)','rgb(33,113,181)']
months = {6:'June',7:'July',8:'Aug',9:'Sept'}
for i in range(6,10)[::-1]:
cases.append(go.Scattergeo(
lon = coords['longitude'],
lat = coords['latitude'],
marker = dict(
size = coords['population']/1000,
color = 'colors[i-6]',
opacity = .4,
line = dict(width = 0)
),
) )
cases[0]['mode'] = 'markers'
layout = go.Layout(
title = 'Hey look! It\'s a scatter plot on a map!',
geo = dict(
resolution = 100,
scope = 'usa',
showframe = False,
showcoastlines = True,
showland = True,
landcolor = "rgb(229, 229, 229)",
countrycolor = "rgb(255, 255, 255)" ,
coastlinecolor = "rgb(255, 255, 255)",
projection = dict(
type = 'Mercator'
),
lonaxis = dict( range= [ -124.0, -113.0 ] ),
lataxis = dict( range= [ 32.0, 43.0 ] ),
),
legend = dict(
traceorder = 'reversed'
)
)
fig = go.Figure(layout=layout, data=cases)
plotly.offline.iplot(fig, validate=False, filename='iantest')