|
| 1 | +--- |
| 2 | +jupyter: |
| 3 | + jupytext: |
| 4 | + notebook_metadata_filter: all |
| 5 | + text_representation: |
| 6 | + extension: .md |
| 7 | + format_name: markdown |
| 8 | + format_version: "1.2" |
| 9 | + jupytext_version: 1.3.1 |
| 10 | + kernelspec: |
| 11 | + display_name: Python 3 |
| 12 | + language: python |
| 13 | + name: python3 |
| 14 | + language_info: |
| 15 | + codemirror_mode: |
| 16 | + name: ipython |
| 17 | + version: 3 |
| 18 | + file_extension: .py |
| 19 | + mimetype: text/x-python |
| 20 | + name: python |
| 21 | + nbconvert_exporter: python |
| 22 | + pygments_lexer: ipython3 |
| 23 | + version: 3.6.8 |
| 24 | + plotly: |
| 25 | + description: |
| 26 | + How to use datashader to rasterize large datasets, and visualize |
| 27 | + the generated raster data with plotly. |
| 28 | + display_as: scientific |
| 29 | + language: python |
| 30 | + layout: base |
| 31 | + name: Plotly and Datashader |
| 32 | + order: 21 |
| 33 | + page_type: u-guide |
| 34 | + permalink: python/datashader/ |
| 35 | + thumbnail: thumbnail/datashader.jpg |
| 36 | +--- |
| 37 | + |
| 38 | +[datashader](https://datashader.org/) creates rasterized representations of large datasets for easier visualization, with a pipeline approach consisting of several steps: projecting the data on a regular grid, creating a color representation of the grid, etc. |
| 39 | + |
| 40 | +### Passing datashader rasters as a mabox image layer |
| 41 | + |
| 42 | +We visualize here the spatial distribution of taxi rides in New York City. A higher density |
| 43 | +is observed on major avenues. For more details about mapbox charts, see [the mapbox layers tutorial](/python/mapbox-layers). No mapbox token is needed here. |
| 44 | + |
| 45 | +```python |
| 46 | +import pandas as pd |
| 47 | +df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/uber-rides-data1.csv') |
| 48 | +dff = df.query('Lat < 40.82').query('Lat > 40.70').query('Lon > -74.02').query('Lon < -73.91') |
| 49 | + |
| 50 | +import datashader as ds |
| 51 | +cvs = ds.Canvas(plot_width=1000, plot_height=1000) |
| 52 | +agg = cvs.points(dff, x='Lon', y='Lat') |
| 53 | +# agg is an xarray object, see http://xarray.pydata.org/en/stable/ for more details |
| 54 | +coords_lat, coords_lon = agg.coords['Lat'].values, agg.coords['Lon'].values |
| 55 | +# Corners of the image, which need to be passed to mapbox |
| 56 | +coordinates = [[coords_lon[0], coords_lat[0]], |
| 57 | + [coords_lon[-1], coords_lat[0]], |
| 58 | + [coords_lon[-1], coords_lat[-1]], |
| 59 | + [coords_lon[0], coords_lat[-1]]] |
| 60 | + |
| 61 | +from colorcet import fire |
| 62 | +import datashader.transfer_functions as tf |
| 63 | +img = tf.shade(agg, cmap=fire)[::-1].to_pil() |
| 64 | + |
| 65 | +import plotly.express as px |
| 66 | +# Trick to create rapidly a figure with mapbox axes |
| 67 | +fig = px.scatter_mapbox(dff[:1], lat='Lat', lon='Lon', zoom=12) |
| 68 | +# Add the datashader image as a mapbox layer image |
| 69 | +fig.update_layout(mapbox_style="carto-darkmatter", |
| 70 | + mapbox_layers = [ |
| 71 | + { |
| 72 | + "sourcetype": "image", |
| 73 | + "source": img, |
| 74 | + "coordinates": coordinates |
| 75 | + }] |
| 76 | +) |
| 77 | +fig.show() |
| 78 | +``` |
| 79 | + |
| 80 | +### Exploring correlations of a large dataset |
| 81 | + |
| 82 | +Here we explore the flight delay dataset from https://www.kaggle.com/usdot/flight-delays. In order to get a visual impression of the correlation between features, we generate a datashader rasterized array which we plot using a `Heatmap` trace. It creates a much clearer visualization than a scatter plot of (even a fraction of) the data points, as shown below. |
| 83 | + |
| 84 | +Note that instead of datashader it would theoretically be possible to create a [2d histogram](/python/2d-histogram-contour/) with plotly but this is not recommended here because you would need to load the whole dataset (5M rows !) in the browser for plotly.js to compute the heatmap, which is practically not tractable. Datashader offers the possibility to reduce the size of the dataset before passing it to the browser. |
| 85 | + |
| 86 | +```python |
| 87 | +import plotly.graph_objects as go |
| 88 | +import pandas as pd |
| 89 | +import numpy as np |
| 90 | +import datashader as ds |
| 91 | +df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet') |
| 92 | +fig = go.Figure(go.Scattergl(x=df['SCHEDULED_DEPARTURE'][::200], |
| 93 | + y=df['DEPARTURE_DELAY'][::200], |
| 94 | + mode='markers') |
| 95 | +) |
| 96 | +fig.update_layout(title_text='A busy plot') |
| 97 | +fig.show() |
| 98 | +``` |
| 99 | + |
| 100 | +```python |
| 101 | +import plotly.graph_objects as go |
| 102 | +import pandas as pd |
| 103 | +import numpy as np |
| 104 | +import datashader as ds |
| 105 | +df = pd.read_parquet('https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet') |
| 106 | + |
| 107 | +cvs = ds.Canvas(plot_width=100, plot_height=100) |
| 108 | +agg = cvs.points(df, 'SCHEDULED_DEPARTURE', 'DEPARTURE_DELAY') |
| 109 | +x = np.array(agg.coords['SCHEDULED_DEPARTURE']) |
| 110 | +y = np.array(agg.coords['DEPARTURE_DELAY']) |
| 111 | + |
| 112 | +# Assign nan to zero values so that the corresponding pixels are transparent |
| 113 | +agg = np.array(agg.values, dtype=np.float) |
| 114 | +agg[agg<1] = np.nan |
| 115 | + |
| 116 | +fig = go.Figure(go.Heatmap( |
| 117 | + z=np.log10(agg), x=x, y=y, |
| 118 | + hoverongaps=False, |
| 119 | + hovertemplate='Scheduled departure: %{x:.1f}h <br>Depature delay: %{y} <br>Log10(Count): %{z}', |
| 120 | + colorbar=dict(title='Count (Log)', tickprefix='1.e'))) |
| 121 | +fig.update_xaxes(title_text='Scheduled departure') |
| 122 | +fig.update_yaxes(title_text='Departure delay') |
| 123 | +fig.show() |
| 124 | + |
| 125 | +``` |
| 126 | + |
| 127 | +```python |
| 128 | + |
| 129 | +``` |
0 commit comments