Skip to content

VIS: DataFrame.plot drops datetime data when kind is scatter #8113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Aug 25, 2014 · 27 comments · Fixed by #30602
Closed

VIS: DataFrame.plot drops datetime data when kind is scatter #8113

TomAugspurger opened this issue Aug 25, 2014 · 27 comments · Fixed by #30602
Labels
Dtype Conversions Unexpected or buggy dtype conversions Visualization plotting
Milestone

Comments

@TomAugspurger
Copy link
Contributor

This works fine:

In [13]: df = pd.DataFrame(np.random.randn(300), columns=['a'])
In [14]: df['dtime'] = pd.DatetimeIndex(start='2014-01-01', freq='h', periods=300).time
In [15]: df.plot(x='dtime', y='a')
Out[15]: <matplotlib.axes._subplots.AxesSubplot at 0x118fd17f0>

This raises a KeyError

In [17]: df.plot(x='dtime', y='a', kind='scatter')

We call df._get_numeric_data() which excludes datetimes.
May happen for other kinds too.

I'll fix this; just have to decide how much refactoring.

Matplotlib is ok with datetime.time values, it chokes on datetime values.

@TomAugspurger
Copy link
Contributor Author

Quick update: this will take a bit of work. @jreback what's the current thinking on adding new dtypes for date and time columns (separate from datetime). I'd be OK with adding it if you think it would be useful. I think it might be more useful now that we have the df.dt. accessors.

Once I have that I can use the df.select_dtypes(include=['number', 'bool', 'datetime', 'date', 'time']) to get them. (`select_dtypes is awesome btw. Great work @cpcloud)

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

can u give an example of what that would return?

u can certainly interpret date/time just not sure what that would mean here

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

adding a date and/or time dtype would be pretty tricky and not sure a lot of benefit for it

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

FYI you can infer an object dtype
with lib.infer_dtype (it has to scan the data so can be somewhat time consuming)

@TomAugspurger
Copy link
Contributor Author

For a line plot in won't make much sense, but I think a scatter plot would work. That way you can see the how y varies through the day, across different days. Something like this works fine

In [25]: df = tm.makeTimeDataFrame().reset_index().rename(columns={'index': 'datetime'})

In [26]: df['day'] = df.datetime.dt.day

In [27]: df.plot(x='day', y='A', kind='scatter')
Out[27]: <matplotlib.axes._subplots.AxesSubplot at 0x1181ca550>

Since .dt.day returns an int. I'd like to do the same thing, but for a date or time. I'll look into lib.infer_dtype

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

you could easily make a DatetimeIndex method to return what u need (like date/time)
and just a add to .dt

@TomAugspurger
Copy link
Contributor Author

The problem now is that the plotting code calls df._get_numeric_data() which drops all datetime/objects. I've already got the date/time data stored, I just need to be able to select it cleanly.

@jreback
Copy link
Contributor

jreback commented Aug 26, 2014

ahah maybe add a way to include it then? or some method of coercing ?

@TomAugspurger
Copy link
Contributor Author

I'm going to close this for now. matplotlib is able to handle it if everything on the axis is the same type (dates or times), but it gets confused when there's a mix.

@JeffAbrahamson
Copy link

It might be helpful documentation to add a comment to this (closed) issue indicating how to do this in matplotlib (plot_date ?).

There is still a bug in both pandas and matplotlib, I think, that a datetime.time against datetime.time scatter plot does not work. If that's a different issue, perhaps linking them would be good. (This issue is the closest I've found.)

@stared
Copy link

stared commented Mar 2, 2015

@JeffAbrahamson By any chance did you find a way to plot time vs time scatterplots or histograms? (I tried, but failed, and Google brought me here and to this SO question.) Of course, other than doing it manually or sticking to seconds since epoch.

@JeffAbrahamson
Copy link

I did not figure it out. My use case had smallish times (I was visualizing members of split times in a race) and so I eventually plotted integer seconds against integer seconds (numbers were all in the range from 450 to1350).

@ajschumacher
Copy link
Contributor

Hi @TomAugspurger and @jreback! I think it might be worth re-opening this issue; @jaclynweiser (with a few others) and I have been surprised recently by things like this:

from datetime import datetime
import pandas as pd
df = pd.DataFrame({'x': [datetime.now() for _ in range(10)], 'y': range(10)})
df.plot(x='x', y='y', kind='scatter')

This gives KeyError: 'x'.

Interestingly, you do get a plot with just df.plot(x='x', y='y'); it seems like if you can make a line graph, you should be able to make a scatterplot too.

What do you think? Is there some a good work-around for this? If so, what? It's surprising to me that a datetime scatterplot isn't possible with pandas.

@TomAugspurger
Copy link
Contributor Author

Agreed that it's surprising. Right now time series plots (datetime x axis) are completely separate from everything else. I've had refactoring all that to integrate with all our other plotting code on my todo list for a while.

Best workaround right now is probably df.plot(x=x, y=y, style=".")

@ajschumacher
Copy link
Contributor

Thanks @TomAugspurger!

@cboettig
Copy link

Yup, found this rather surprising as well, not to be able to scatterplot a datetime object against a numeric object. (If I've followed the thread correctly, I should be okay with a datetime.time object, but it seems pd.to_datetime naturally gives a datetime, and pandas doesn't give me an equivalent method to get a datetime.time?)

Isn't df.plot(x, y, style=".") very misleading? I mean, scatter plots are meant to show the underlying data, which, for instance, need not be sampled at a regular interval. style = "." is just doing a line plot with dots instead of dashes, right? This would give a misleading impression about both the regularity and frequency of sampling (which is usually my main motivation for plotting time series as a scatterplot instead of a line plot in the first place). Seems that explicitly converting the dates decimals would be a better work-around, though it results in much less pretty axis labels (at least without a bit more plot-magic than I can quickly drum up).

(Also, apologies if I'm off the mark here, am relatively new to pandas. Thanks for considering).

@colin-svds
Copy link

Any update on this? As of 0.18, pandas still gives a key error when plot is called with kind='scatter' and a datetime column.

@jreback
Copy link
Contributor

jreback commented Mar 29, 2016

@colin-svds this is a closed issue (from quite a while ago). you can open a new one if you would like. but pls read this one for work-arounds. I don't know if @TomAugspurger has anything more.

@TomAugspurger
Copy link
Contributor Author

Nope I haven't ever made progress on it.

I've reopened it for now if anyone wants to take a shot. It might be as simple as reworking the plotting methods to use something other than ._get_numeric_data, or it could be harder.

@jreback jreback removed this from the 0.18.1 milestone Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 13, 2016
@nehaljwani
Copy link

FWIW, till this issue is fixed people can still use this directly:

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
df = pd.DataFrame({'x': [datetime.now() for _ in range(10)], 'y': range(10)})
plt.scatter(df.x.dt.to_pydatetime(), df.y)
plt.show()

figure_1-2

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@RahulRAhhy123
Copy link

Any solution for this as if now @TomAugspurger @jreback ?

@TomAugspurger
Copy link
Contributor Author

It's still open. Let us know if you're interested in working on it.

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 1, 2020

I think this issue can be closed now. @TomAugspurger @jreback

Regarding:

Matplotlib is ok with datetime.time values, it chokes on datetime values.

Since Tom tested it long ago (issue was made 5 years ago), and matplotlib has changed the behaviour, now the situation is opposite: matplotlib is ok with datetime values but chokes on datetime.time values

And if datetime values, i tried on master, and seems look okay to me:
Screen Shot 2020-01-01 at 1 45 08 PM

Although x axis looks different, this is due to pandas has its own datetime formatter, so slightly different than matplotlib's formatter. but this is a different issue to me.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

@charlesdong1991 can u add a test for the above

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 1, 2020

@jreback I think the test I used in #30434 is almost identical to this one except for the freq, see below which is in test case of 30434:

dates = pd.date_range(start=date(2019, 1, 1), periods=12, freq="W")
vals = np.random.normal(0, 1, len(dates))
df = pd.DataFrame({"dates": dates, "vals": vals})

The reason I initially xref (other than closes) this issue was the values of example above was datetime.time, and I just tested and found out datetime.time was no longer supported by matplotlib, but datetime is supported. Therefore, I think this could be directly closed.

Screen Shot 2020-01-01 at 4 26 30 PM

Do you still want to have the same test for this? I could add one if you prefer this way, though a bit duplicated compared to the existing one.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

can u add the test and assert that error; that would be enough to close this i think

@charlesdong1991
Copy link
Member

ok @jreback see #30602

@jreback jreback modified the milestones: Someday, 1.0 Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Visualization plotting
Projects
None yet