Skip to content

Plotly for pandas #1735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Sutyke opened this issue Aug 23, 2019 · 17 comments · Fixed by #2336
Closed

Plotly for pandas #1735

Sutyke opened this issue Aug 23, 2019 · 17 comments · Fixed by #2336

Comments

@Sutyke
Copy link

Sutyke commented Aug 23, 2019

Hi Team,

Big thanks to bring such a good tool as plotly!

I have tried new plotly 4.1. It is excellent that you have split plotly and chartstudio.

Unfortunately I had to go back to version 3 as I could not run offline pandas.Dataframe.iplot() this was one of the best features plotly offered as you can with one line of code plot full dataframe additionally allowing to switch on/off each column in jupyter notebook.

I had to downgrade back to version 3 with conda command below:
conda install -c conda-forge python-cufflinks

Could you please advice if there is any way I can run iplot as in this notebook:
https://nbviewer.jupyter.org/gist/santosjorge/b278ce0ae2448f47c31d

@HudsonMC16
Copy link

HudsonMC16 commented Aug 23, 2019

I transistioned from iplot to plotly.express after 4.0 came out: https://plot.ly/python/plotly-express/

@Sutyke
Copy link
Author

Sutyke commented Aug 25, 2019

I used plotly express also, however I downgraded to plotly v.3 as it was much more easier and faster to use iplot command where I'm able to plot multiple variables directly from pandas with one line of code, please see example below. I can not find out how to do it without too much code in plotly express or plotly 4.0. What is the simplest way to plot df below in plotly express? As you can see in plotly v.3 it can be done in one line.

df= pd.DataFrame({'a':[1,2,3],'b':[3,4,5], 'c':[3,1,2], 'd':[1,2,3],'e':[5,6,7], 'f':[7,8,9]})
df.iplot()

@HudsonMC16
Copy link

That bothered me as well, but I wanted to continue to use the plotly express syntax for single plots, so I wrote a wrapper which will accept lists or nested lists of df columns and plot them appropriately. A list will produce multiple traces on single axis, and a nested list (of length 2) will plot the first list on the primary y-axis and the second list on the secondary y-axis.

It's not perfect, as I probably need to add some error handling, but I'm the only one using it at the moment.

import plotly.express as px
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots


def plot(plot_type, df, x, y):
    if isinstance(y, str):
        fig = go.Figure()
        fig.add_trace(go.Scatter(x=df[x], y=df[y], name=y, mode=plot_type))
        return fig
    elif isinstance(y, list):
        if all([isinstance(axis, list) for axis in y]) and len(y) == 2:
            fig = make_subplots(specs=[[{"secondary_y": True}]])
            for trace in y[0]:
                fig.add_trace(
                    go.Scatter(x=df[x], y=df[trace], name=trace, mode=plot_type),
                    secondary_y=False,
                )
            for trace in y[1]:
                fig.add_trace(
                    go.Scatter(x=df[x], y=df[trace], name=trace, mode=plot_type),
                    secondary_y=True,
                )
            return fig
        elif all([isinstance(trace, str) for trace in y]):
            fig = go.Figure()
            for trace in y:
                fig.add_trace(
                    go.Scatter(x=df[x], y=df[trace], name=trace, mode=plot_type)
                )
            return fig


def line(df, x, y):
    fig = plot('lines', df, x, y)
    return fig


def scatter(df, x, y):
    fig = plot('markers', df, x, y)
    return fig


def scatter_line(df, x, y):
    fig = plot('lines+markers', df, x, y)
    return fig

@Sutyke
Copy link
Author

Sutyke commented Aug 25, 2019

That's very good approach, I'm wondering why plotly team didn't implement it.

what parameters you will input to your function to plot dataframe below?

df= pd.DataFrame({'a':[1,2,3],'b':[3,4,5], 'c':[3,1,2], 'd':[1,2,3],'e':[5,6,7], 'f':[7,8,9]})

image

@Sutyke
Copy link
Author

Sutyke commented Aug 25, 2019

@HudsonMC16 thanks for your help!

Dear plotly team,

In order to plot pandas dataframe with "express" speed, would you consider to add to plotly.express a method to allow plotting pandas dataframe with one line of code? . This will save users a lot of time.

def plot(plot_type, df):
    for trace in list(df.columns):
            fig = go.Figure()
            for trace in list(df.columns):
                fig.add_trace(
                    go.Scatter(x=df.index, y=df[trace], name=trace, mode=plot_type)
                )
            return fig

plot('lines', df)

@HudsonMC16
Copy link

I work mainly with time series data, and generally have a column which will serve as my x-axis, so I didn't consider that use case in my little wrapper. If you have a dataframe with a usable x-axis, then you could do this:

import PXWrapper as pxw

df= pd.DataFrame({'x':[0, 1, 2], 'a':[1,2,3],'b':[3,4,5], 'c':[3,1,2], 'd':[1,2,3],'e':[5,6,7], 'f':[7,8,9]})
fig = pxw.line(df, 'x', ['a', 'b', 'c', 'd'])
fig.show()

if you just want to plot them against the index value of the dataframe, then the wrapper would need to be modified to something similar to this:

import plotly.express as px
import plotly.graph_objs as go
import pandas as pd
from plotly.subplots import make_subplots

def plot(plot_type, df, x, y):
    if x is None:
        x = df.index
    else:
        x = df[x]
    if isinstance(y, str):
        fig = go.Figure()
        fig.add_trace(go.Scatter(x=x, y=df[y], name=y, mode=plot_type))
        return fig
    elif isinstance(y, list):
        if all([isinstance(axis, list) for axis in y]) and len(y) == 2:
            fig = make_subplots(specs=[[{"secondary_y": True}]])
            for trace in y[0]:
                fig.add_trace(
                    go.Scatter(x=x, y=df[trace], name=trace, mode=plot_type),
                    secondary_y=False,
                )
            for trace in y[1]:
                fig.add_trace(
                    go.Scatter(x=x, y=df[trace], name=trace, mode=plot_type),
                    secondary_y=True,
                )
            return fig
        elif all([isinstance(trace, str) for trace in y]):
            fig = go.Figure()
            for trace in y:
                fig.add_trace(
                    go.Scatter(x=x, y=df[trace], name=trace, mode=plot_type)
                )
            return fig


def line(df, x, y):
    fig = plot('lines', df, x, y)
    return fig


def scatter(df, x, y):
    fig = plot('markers', df, x, y)
    return fig


def scatter_line(df, x, y):
    fig = plot('lines+markers', df, x, y)
    return fig

@nicolaskruchten
Copy link
Contributor

The above is basically what cufflinks does, and we will upgrade it to work with Plotly 4.x shortly.

note that you can just do the following in PX:

import plotly.express as px
import pandas as pd

df = pd.DataFrame({'a':[1,2,3],'b':[3,4,5], 'c':[3,1,2], 'd':[1,2,3],'e':[5,6,7], 'f':[7,8,9]})
tidy_df = df.reset_index().melt(id_vars=["index"])
px.line(tidy_df, x="index", y="value", color="variable")

which gives

image

@HudsonMC16
Copy link

but, ultimately, yes, I agree. Plotting multiple traces using plotly.express is cumbersome right now. I'm hoping a feature is on its way.

@nicolaskruchten
Copy link
Contributor

Plotly Express does by design expect "tidy" input (long rather than wide) so if you want to use it, you'll have to do a bit of data transformation as I did above with .melt()

Cufflinks can operate on wide-rather-than-long data, and we don't today have plans to move that functionality into Plotly.py, rather we will resolve santosjorge/cufflinks#196 to enable Cufflinks and Plotly v4 to co-exist :)

@HudsonMC16
Copy link

@nicolaskruchten that's great to hear. Cufflinks is a great library. I suppose I misunderstood and thought plotly.express was meant to bring similar functionality into plotly. Glad cufflinks is sticking around. Thanks for clarifying!

@Sutyke
Copy link
Author

Sutyke commented Aug 25, 2019

@nicolaskruchten Thanks a lot for your quick reply and showing melt example.

Just to clarify, it will be great if Cufflinks will be working again with plotly v 4. This will solve the problem for small to large data.

For big data, I think with tidy_df data points almost triple. Is there any way to include @HudsonMC16 way of solving the problem to plotly.express for pandas? Cufflinks don't currently support scattergl comparing to plotly.express. So plotting big data with Cufflinks is slow as gpu is not utilised.

Tidy_df vs df example
image

@nicolaskruchten
Copy link
Contributor

Sure, there are more rows, but does this really cause much of a problem? Under the hood, Plotly Express splits them up and creates trace objects exactly like the code that @HudsonMC16 provided above.

@nicolaskruchten
Copy link
Contributor

I misunderstood and thought plotly.express was meant to bring similar functionality into plotly

One way to think about it is:

  • Cufflinks' iplot() is to plotly what Pandas' .plot() is to matplotlib
  • Plotly Express is to plotly what Seaborn is to matplotlib

They both enable you to visualize data frames but have quite different APIs.

@Sutyke
Copy link
Author

Sutyke commented Aug 26, 2019

@nicolaskruchten is it possible to get working webgl with Cufflinks in Plotly V.4 ? I found open issue, however it seems nobody replied from April 2018 santosjorge/cufflinks#101

Below is summarised lesson learned for others, please add other solutions if there is better way:

Required Outcome: Find most efficient way(speed, length of code) how to plot large data points with multiple variables in pandas dataframe.

Why? : Currently solution is plotly v3 with Cufflinks. with one line of code df.iplot() it is possible to visualise full dataframe. When this solution is used with large dataframe, interactive mode is freezing as it doesn't use webgl.

Evaluation Summary

1) Create own wrapper and use plotly v4 with go.Scattergl: (Currently best solution)

def plot(plot_type, df):
        fig = go.Figure()
        for trace in list(df.columns):
                fig.add_trace(
                    go.Scattergl(x=df.index, y=df[trace], name=trace, mode=plot_type,)
                )                
        fig.show()
plot('markers', df )

2) Plotly v.3 with Cufflinks. (this is second best solution, With Cufflink it is possible to plot all pandas dataframe with one line, however when dataframe is too big whole page freeze.
df.iplot(kind='scatter', mode='markers')

3) Plotly.express last place because it accepts only tidy data which takes 3 times more data points. Sometimes transferring data to tidy format is not always easy. Problem when working with large dataset. Once tidy data created it works fast.

tidy_df = df.reset_index().melt(id_vars=["index"])
px.scatter(tidy_df, x="index", y="value", color="variable", render_mode='webgl')

If Plotly v.4 will work with cufflinks and webgl this will be most efficient solution considering speed and length of code which needs to be written.

@mazzma12
Copy link

mazzma12 commented Dec 2, 2019

but, ultimately, yes, I agree. Plotting multiple traces using plotly.express is cumbersome right now. I'm hoping a feature is on its way.

IMO it's more the multi-axes plot that is cumbersome (or even impossible ?) atm in plotly express.
Something inspired from seaborn / matplotlib, passing and already existing axis / layout to px.line()`
could help ? https://stackoverflow.com/a/47593751/7657658

@nicolaskruchten
Copy link
Contributor

This issue will be resolved with #2336 which provides a cufflinks-like Pandas backend and accepts multiple columns for x or y (not both!) and does the melt()ing internally.

@nicolaskruchten
Copy link
Contributor

The above-mentioned features are now available! https://medium.com/plotly/beyond-tidy-plotly-express-now-accepts-wide-form-and-mixed-form-data-bdc3e054f891

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants