Skip to content

Commit 09b3b86

Browse files
Datashader tutorial (#2154)
* datashader tutorial * added requirements * CI fixup * icon + links Co-authored-by: Nicolas Kruchten <[email protected]>
1 parent 1a9d77d commit 09b3b86

File tree

7 files changed

+153
-0
lines changed

7 files changed

+153
-0
lines changed

Diff for: binder/requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ psutil
1212
requests
1313
networkx
1414
scikit-image
15+
datashader
16+
pyarrow

Diff for: doc/python/datashader.md

+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
jupyter:
3+
jupytext:
4+
notebook_metadata_filter: all
5+
text_representation:
6+
extension: .md
7+
format_name: markdown
8+
format_version: "1.2"
9+
jupytext_version: 1.3.1
10+
kernelspec:
11+
display_name: Python 3
12+
language: python
13+
name: python3
14+
language_info:
15+
codemirror_mode:
16+
name: ipython
17+
version: 3
18+
file_extension: .py
19+
mimetype: text/x-python
20+
name: python
21+
nbconvert_exporter: python
22+
pygments_lexer: ipython3
23+
version: 3.6.8
24+
plotly:
25+
description:
26+
How to use datashader to rasterize large datasets, and visualize
27+
the generated raster data with plotly.
28+
display_as: scientific
29+
language: python
30+
layout: base
31+
name: Plotly and Datashader
32+
order: 21
33+
page_type: u-guide
34+
permalink: python/datashader/
35+
thumbnail: thumbnail/datashader.jpg
36+
---
37+
38+
[datashader](https://datashader.org/) creates rasterized representations of large datasets for easier visualization, with a pipeline approach consisting of several steps: projecting the data on a regular grid, creating a color representation of the grid, etc.
39+
40+
### Passing datashader rasters as a mabox image layer
41+
42+
We visualize here the spatial distribution of taxi rides in New York City. A higher density
43+
is observed on major avenues. For more details about mapbox charts, see [the mapbox layers tutorial](/python/mapbox-layers). No mapbox token is needed here.
44+
45+
```python
46+
import pandas as pd
47+
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/uber-rides-data1.csv')
48+
dff = df.query('Lat < 40.82').query('Lat > 40.70').query('Lon > -74.02').query('Lon < -73.91')
49+
50+
import datashader as ds
51+
cvs = ds.Canvas(plot_width=1000, plot_height=1000)
52+
agg = cvs.points(dff, x='Lon', y='Lat')
53+
# agg is an xarray object, see http://xarray.pydata.org/en/stable/ for more details
54+
coords_lat, coords_lon = agg.coords['Lat'].values, agg.coords['Lon'].values
55+
# Corners of the image, which need to be passed to mapbox
56+
coordinates = [[coords_lon[0], coords_lat[0]],
57+
[coords_lon[-1], coords_lat[0]],
58+
[coords_lon[-1], coords_lat[-1]],
59+
[coords_lon[0], coords_lat[-1]]]
60+
61+
from colorcet import fire
62+
import datashader.transfer_functions as tf
63+
img = tf.shade(agg, cmap=fire)[::-1].to_pil()
64+
65+
import plotly.express as px
66+
# Trick to create rapidly a figure with mapbox axes
67+
fig = px.scatter_mapbox(dff[:1], lat='Lat', lon='Lon', zoom=12)
68+
# Add the datashader image as a mapbox layer image
69+
fig.update_layout(mapbox_style="carto-darkmatter",
70+
mapbox_layers = [
71+
{
72+
"sourcetype": "image",
73+
"source": img,
74+
"coordinates": coordinates
75+
}]
76+
)
77+
fig.show()
78+
```
79+
80+
### Exploring correlations of a large dataset
81+
82+
Here we explore the flight delay dataset from https://www.kaggle.com/usdot/flight-delays. In order to get a visual impression of the correlation between features, we generate a datashader rasterized array which we plot using a `Heatmap` trace. It creates a much clearer visualization than a scatter plot of (even a fraction of) the data points, as shown below.
83+
84+
Note that instead of datashader it would theoretically be possible to create a [2d histogram](/python/2d-histogram-contour/) with plotly but this is not recommended here because you would need to load the whole dataset (5M rows !) in the browser for plotly.js to compute the heatmap, which is practically not tractable. Datashader offers the possibility to reduce the size of the dataset before passing it to the browser.
85+
86+
```python
87+
import plotly.graph_objects as go
88+
import pandas as pd
89+
import numpy as np
90+
import datashader as ds
91+
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')
92+
fig = go.Figure(go.Scattergl(x=df['SCHEDULED_DEPARTURE'][::200],
93+
y=df['DEPARTURE_DELAY'][::200],
94+
mode='markers')
95+
)
96+
fig.update_layout(title_text='A busy plot')
97+
fig.show()
98+
```
99+
100+
```python
101+
import plotly.graph_objects as go
102+
import pandas as pd
103+
import numpy as np
104+
import datashader as ds
105+
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')
106+
107+
cvs = ds.Canvas(plot_width=100, plot_height=100)
108+
agg = cvs.points(df, 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY')
109+
x = np.array(agg.coords['SCHEDULED_DEPARTURE'])
110+
y = np.array(agg.coords['DEPARTURE_DELAY'])
111+
112+
# Assign nan to zero values so that the corresponding pixels are transparent
113+
agg = np.array(agg.values, dtype=np.float)
114+
agg[agg<1] = np.nan
115+
116+
fig = go.Figure(go.Heatmap(
117+
z=np.log10(agg), x=x, y=y,
118+
hoverongaps=False,
119+
hovertemplate='Scheduled departure: %{x:.1f}h <br>Depature delay: %{y} <br>Log10(Count): %{z}',
120+
colorbar=dict(title='Count (Log)', tickprefix='1.e')))
121+
fig.update_xaxes(title_text='Scheduled departure')
122+
fig.update_yaxes(title_text='Departure delay')
123+
fig.show()
124+
125+
```
126+
127+
```python
128+
129+
```

Diff for: doc/python/heatmaps.md

+5
Original file line numberDiff line numberDiff line change
@@ -162,5 +162,10 @@ fig.update_layout(
162162
fig.show()
163163
```
164164

165+
### Heatmap and datashader
166+
167+
Arrays of rasterized values build by datashader can be visualized using
168+
plotly's heatmaps, as shown in the [plotly and datashader tutorial](/python/datashader/).
169+
165170
#### Reference
166171
See https://plot.ly/python/reference/#heatmap for more information and chart attribute options!

Diff for: doc/python/imshow.md

+7
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,13 @@ fig.update_layout(height=400)
198198
fig.show()
199199
```
200200

201+
### imshow and datashader
202+
203+
Arrays of rasterized values build by datashader can be visualized using
204+
imshow. See the [plotly and datashader tutorial](/python/datashader/) for
205+
examples on how to use plotly and datashader.
206+
207+
201208
#### Reference
202209
See https://plot.ly/python/reference/#image for more information and chart attribute options!
203210

Diff for: doc/python/mapbox-layers.md

+4
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,10 @@ fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
186186
fig.show()
187187
```
188188

189+
#### Using a mapbox image layer to display a datashader raster image
190+
191+
See the example in the [plotly and datashader tutorial](/python/datashader).
192+
189193
#### Reference
190194

191195
See https://plot.ly/python/reference/#layout-mapbox for more information and options!

Diff for: doc/python/webgl-vs-svg.md

+4
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ jupyter:
3333
thumbnail: thumbnail/webgl.jpg
3434
---
3535

36+
Here we show that it is possible to represent millions of points with WebGL.
37+
For larger datasets, or for a clearer visualization of the density of points,
38+
it is also possible to use [datashader](/python/datashader/).
39+
3640
#### Compare WebGL and SVG
3741
Checkout [this notebook](https://plot.ly/python/compare-webgl-svg) to compare WebGL and SVG scatter plots with 75,000 random data points
3842

Diff for: doc/requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,5 @@ sphinx_bootstrap_theme
2222
recommonmark
2323
pathlib
2424
python-frontmatter
25+
datashader
26+
pyarrow

0 commit comments

Comments
 (0)