Skip to content

Commit 8aba24c

Browse files
committed
add datashader
1 parent 61c380a commit 8aba24c

File tree

1 file changed

+83
-1
lines changed

1 file changed

+83
-1
lines changed

doc/python/performance.md

+83-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@ jupyter:
3030
name: High Performance Visualization
3131
order: 14
3232
permalink: python/performance/
33-
redirect_from: python/webgl-vs-svg/
33+
redirect_from:
34+
- python/webgl-vs-svg/
35+
- python/datashader/
3436
thumbnail: thumbnail/webgl.jpg
3537
---
3638

@@ -152,6 +154,8 @@ See https://plotly.com/python/reference/scattergl/ for more information and char
152154

153155
## NumPy and NumPy Convertible Arrays for Improved Performance
154156

157+
*New in Plotly.py version 6*
158+
155159
Improve the performance of generating Plotly figures that use a large number of data points by using NumPy arrays and other objects that Plotly can convert to NumPy arrays, such as Pandas and Polars Series.
156160

157161
Plotly.py uses Plotly.js for rendering, which supports typed arrays. In Plotly.py, NumPy array and NumPy-convertible arrays are base64 encoded before being passed to Plotly.js for rendering.
@@ -214,3 +218,81 @@ fig = go.Figure(data=[go.Scatter3d(
214218

215219
fig.show()
216220
```
221+
222+
## Datashader
223+
224+
Use [Datashader](https://datashader.org/) to reduce the size of a dataset passed to the browser for rendering by creating a rasterized representation of the dataset. This makes it ideal for working with datasets of tens to hundreds of millions of points.
225+
226+
### Passing Datashader Rasters as a Tile Map Image Layer
227+
228+
We visualize here the spatial distribution of taxi rides in New York City. A higher density
229+
is observed on major avenues. For more details about tile-based maps, see [the tile map layers tutorial](/python/tile-map-layers).
230+
231+
```python
232+
import pandas as pd
233+
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/uber-rides-data1.csv')
234+
dff = df.query('Lat < 40.82').query('Lat > 40.70').query('Lon > -74.02').query('Lon < -73.91')
235+
236+
import datashader as ds
237+
cvs = ds.Canvas(plot_width=1000, plot_height=1000)
238+
agg = cvs.points(dff, x='Lon', y='Lat')
239+
# agg is an xarray object, see http://xarray.pydata.org/en/stable/ for more details
240+
coords_lat, coords_lon = agg.coords['Lat'].values, agg.coords['Lon'].values
241+
# Corners of the image
242+
coordinates = [[coords_lon[0], coords_lat[0]],
243+
[coords_lon[-1], coords_lat[0]],
244+
[coords_lon[-1], coords_lat[-1]],
245+
[coords_lon[0], coords_lat[-1]]]
246+
247+
from colorcet import fire
248+
import datashader.transfer_functions as tf
249+
img = tf.shade(agg, cmap=fire)[::-1].to_pil()
250+
251+
import plotly.express as px
252+
# Trick to create rapidly a figure with map axes
253+
fig = px.scatter_map(dff[:1], lat='Lat', lon='Lon', zoom=12)
254+
# Add the datashader image as a tile map layer image
255+
fig.update_layout(
256+
map_style="carto-darkmatter",
257+
map_layers=[{"sourcetype": "image", "source": img, "coordinates": coordinates}],
258+
)
259+
fig.show()
260+
```
261+
262+
### Exploring Correlations of a Large Dataset
263+
264+
Here we explore the flight delay dataset from https://www.kaggle.com/usdot/flight-delays. In order to get a visual impression of the correlation between features, we generate a datashader rasterized array which we plot using a `Heatmap` trace. It creates a much clearer visualization than a scatter plot of (even a fraction of) the data points, as shown below.
265+
266+
```python
267+
import plotly.graph_objects as go
268+
import pandas as pd
269+
import numpy as np
270+
import datashader as ds
271+
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')
272+
fig = go.Figure(go.Scattergl(x=df['SCHEDULED_DEPARTURE'][::200],
273+
y=df['DEPARTURE_DELAY'][::200],
274+
mode='markers')
275+
)
276+
fig.update_layout(title_text='A busy plot')
277+
fig.show()
278+
```
279+
280+
```python
281+
import plotly.express as px
282+
import pandas as pd
283+
import numpy as np
284+
import datashader as ds
285+
df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet')
286+
287+
cvs = ds.Canvas(plot_width=100, plot_height=100)
288+
agg = cvs.points(df, 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY')
289+
zero_mask = agg.values == 0
290+
agg.values = np.log10(agg.values, where=np.logical_not(zero_mask))
291+
agg.values[zero_mask] = np.nan
292+
fig = px.imshow(agg, origin='lower', labels={'color':'Log10(count)'})
293+
fig.update_traces(hoverongaps=False)
294+
fig.update_layout(coloraxis_colorbar=dict(title='Count', tickprefix='1.e'))
295+
fig.show()
296+
```
297+
298+
Instead of using Datashader, it would theoretically be possible to create a [2d histogram](/python/2d-histogram-contour/) with Plotly, but this is not recommended because you would need to load the whole dataset of around 5M rows in the browser for plotly.js to compute the heatmap.

0 commit comments

Comments
 (0)