Skip to content

Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15) #8614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Oct 23, 2014 · 20 comments · Fixed by #8693
Labels
Milestone

Comments

@jorisvandenbossche
Copy link
Member

From http://stackoverflow.com/questions/26526230/plotting-datetimeindex-on-x-axis-with-matplotlib-creates-wrong-ticks-in-pandas-0

df = pd.DataFrame({'RandomValues': np.random.randint(1, 50, 60)},
                  index=pd.date_range("2012-01-01", periods=60))

Plotting of course works:

In [79]: df.plot()
Out[79]: <matplotlib.axes._subplots.AxesSubplot at 0xf571550>

But plotting with matplotlib's plot functions gives no longer a datetime formatted x-axis, but just ints:

In [80]: plt.plot(df.index, df.RandomValues)
Out[80]: [<matplotlib.lines.Line2D at 0xfb5f748>]

Reason: matplotlibs plot calls np.atleast_1d on the input (https://github.com/matplotlib/matplotlib/blob/v1.4.0/lib/matplotlib/axes/_base.py#L220). Previously this returned an Index, now it returns a array of datetime64 values, which matplotlib can't handle.

Issues:

  • This is actually a problem of matplotlib (they should support datetime64), but of course: a lot of users rely now on the behaviour that it did work with a pandas Index also.
  • Is there a way to restore this behaviour? I don't see directly a solution?
  • We should update our docs, as this behaviour is also mentioned there briefly: http://pandas.pydata.org/pandas-docs/stable/visualization.html#plotting-directly-with-matplotlib + figure out why this didn't fail in the doc build (the figure seems not updated with the 0.15 doc build)

It seems there was some discussion about this in the Index-no-subclass PR: #7891 (comment) and #7891 (comment) (I commented there about exactly this issue, but I don't know why we didn't look further in it then)

@jorisvandenbossche jorisvandenbossche changed the title Plotting of DatetimeIndex directly with matplotlib no longer works (0.15) Plotting of DatetimeIndex directly with matplotlib no longer gives datetime formatted axis (0.15) Oct 23, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.15.1 milestone Oct 23, 2014
@jorisvandenbossche
Copy link
Member Author

@shoyer
Copy link
Member

shoyer commented Oct 23, 2014

Maybe someone should bite the bullet and try to get plots of np.datetime64 working upstream in matplotlib? Here's the relevant issue: matplotlib/matplotlib#1097

@jorisvandenbossche
Copy link
Member Author

yes, good point :-)
I have already been thinking a long time about that, but never found the time for it ..

@jreback
Copy link
Contributor

jreback commented Oct 24, 2014

I would just stick a note in the docs (we already have a small section about using matplotlib directly for certain higher perf plots). this is a matplotlib issue and the pandas behavior is correct (and was noted in the whatsnew for 0.15.0; PeriodIndex plotting already exhibited this) - I think you workaround is fine.

@jorisvandenbossche
Copy link
Member Author

@jreback It was not mentioned in the whatsnew docs AFAIK (or did I miss it somewhere? there is something about plotting a PeriodIndex, but this is about DatetimeIndex and about plotting directly with matplotlib), but I will add it when finishing #8586

I know it is matplotlib issue, but it is just that I think a lot of people relied on this behaviour (as I did myself in my code). But as I said, don't think there is much we can do about it

@jreback
Copy link
Contributor

jreback commented Oct 25, 2014

@jorisvandenbossche oh, I meant that comment :). Ok, not sure much to do then except doc it.

@jorisvandenbossche
Copy link
Member Author

It seems simply doing units.registry[np.datetime64] = DatetimeConverter() does already do the trick (see #8655), which means this can be easily solved from the pandas side (just add a units registry entry), and it should also not be that difficult to push this upstream to matplotlib for someone who has some time for this.

@jorisvandenbossche
Copy link
Member Author

So we can easily register a converter for datetime64 data, as we already do for datetime.date/datetime (see https://github.com/pydata/pandas/blob/v0.15.0/pandas/tseries/converter.py#L30).
The question is: would there be people who rely on the fact that datetime64 arrays are regarded as ints in matplotlib? As this will change such plots by just importing pandas ..

Other possibility is to add to the docs that users who want to do this direct plotting with matplotlib, can always add units.registry[np.datetime64] = DatetimeConverter() themselves to their script.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2014

+1 on this
we work with all kinds of other types and register converters. This is probably the most common datetime like type. so should just work. if it's this simple then let's just do it.

you can always disable by resetting the index and convert to ints.

@agijsberts
Copy link
Contributor

@jorisvandenbossche The chances of this being accepted upstream in matplotlib seem slim: matplotlib/matplotlib#1097

@shoyer
Copy link
Member

shoyer commented Oct 28, 2014

My guess is that matplotlib would like to have a fuller solution for automatic datetime64 conversion that handles other precisions than ns. The pandas solution relies on converting dates into a DatetimeIndex, so I don't see how that could get easily pulled separately into matplotlib.

@tacaswell
Copy link
Contributor

iirc there was also an issue with supporting older versions of numpy that don't have datetime64 (we go back to 1.6, but only recently dropped 1.5).

The numpy docs ( http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html) say the dtype is experimental, is that no longer true? I guess if both pandas and mpl start using it as-is we are a decent lobby to prevent breaking the API

This is now near the top of my mid-term to-do list....

@shoyer
Copy link
Member

shoyer commented Oct 28, 2014

@tacaswell Numpy's datetime64 remains broken in a number of ways (e.g., NaT, timezone support, conversions, etc.). But we don't have any better alternative, and pandas has patched over most of it.

From an design perspective, the custom precision available in datetime64 is regrettable because it adds a lot of complexity (pandas only uses ns precision). But, on the plus side, datetime64 added very little surface to the numpy API, so I don't see much in the way of negative consequences to supporting it in matplotlib. Even only supporting ns precision (as used in pandas) would be a big step up -- I haven't seen other precisions used much out in the wild.

@agijsberts
Copy link
Contributor

@shoyer Actually, pandas' solution relies on converting DatetimeIndex to datetime64, so it would be trivial to push a similar patch to matplotlib.

Although this quick fix would work fine for most users, a nasty problem is that the resolution of matplotlib's current datetime format (float days since 0001) is on the order of ~10 microseconds (iirc) for the typical range of years and thus not sufficient to represent the full resolution of datetime64[ns].

@shoyer
Copy link
Member

shoyer commented Oct 28, 2014

I could be wrong here, but I'm pretty sure I noticed a to_datetime hiding in DatetimeConverter. That gives you back a DatetimeIndex (with ns precision). The simple check would be to try registering this and see if something chokes on datetime64 that doesn't fit in ns precision.

On Tue, Oct 28, 2014 at 11:17 AM, agijsberts [email protected]
wrote:

@shoyer Actually, pandas' solution relies on converting DatetimeIndex to datetime64, so it would be trivial to push a similar patch to matplotlib.

Although this quick fix would work fine for most users, a nasty problem is that the resolution of matplotlib's current datetime format (float days since 0001) is on the order of ~10 microseconds (iirc) for the typical range of years and thus not sufficient to represent the full resolution of datetime64[ns].

Reply to this email directly or view it on GitHub:
#8614 (comment)

@agijsberts
Copy link
Contributor

@shoyer I'm not sure if I understood you correctly, but I meant that rudimentary support for datetime64 is straightforward. See for example the following proof of concept and example:

from matplotlib import units, dates
from matplotlib import pyplot as plt
from numpy import datetime64, timedelta64, arange, ndarray, dtype
from numpy.random import rand
import datetime

resolution_scale = {
    dtype('datetime64[ns]'): 1e-9,
    dtype('datetime64[us]'): 1e-6,
    dtype('datetime64[ms]'): 1e-3,
    dtype('datetime64[s]'): 1,
    dtype('datetime64[m]'): 60,
    dtype('datetime64[h]'): 60 * 60,
    dtype('datetime64[D]'): 24 * 60 * 60,
}

class Datetime64Converter(dates.DateConverter):
    @staticmethod
    def convert(values, unit, axis):
        if isinstance(values, ndarray) and issubclass(values.dtype.type, datetime64):
            return dates.epoch2num(values.view('i8') * resolution_scale[values.dtype])
        elif isinstance(values, datetime.date):
            return dates.date2num(values)
        else:
            return values

units.registry[datetime64] = Datetime64Converter

a = arange('2014-01-01', '2014-01-07', timedelta64(1, 'D'), dtype='datetime64[D]')
b = rand(len(a))
for i, r in enumerate(('ns', 'us', 'ms', 's', 'm', 'h', 'D')):
    plt.plot(a.astype('datetime64[{0}]'.format(r)), b + i, label=r)
plt.legend()
plt.show()

This approach does not rely in any way on pandas, though it's much like how it's done now in pandas (with added support for multiple resolutions). Of course the problem of matplotlib not being able to fully represent the possible resolutions remains a show-stopper.

@shoyer
Copy link
Member

shoyer commented Oct 28, 2014

@agijsberts Ah, OK. Yes, I like your solution. Much simpler, doesn't need pandas.

IMO this is still worth doing even if higher resolution dates with matplotlib are not possible. I would be happy just with a warning that triggers when the limits of datetime axis are too close together (e.g., less than 1 ms apart) to accurately plot the times. For most uses, us precision is not necessary.

@tacaswell
Copy link
Contributor

Things are slowly starting to make sense to me about this.

Is the best (long term) solution here to rip out the float representation used in the dates module and replace it with datetime64 objects?

cc @cimarronm

@jorisvandenbossche
Copy link
Member Author

@shoyer It indeed uses to_datetime, but just to convert the array to datetime64 values (not the DatetimeIndex itself is used, but its datetime64 values). But if you already have datetime64 values, it does not further rely on specific pandas features, as @agijsberts showed.

For matplotlib:

  • I think the easiest solution on short term, is to just add a converter for datetime64 -> float, as @agijsberts showed above and is really easy, and for the rest just rely on the existing infrastructure to determine the ticks, formatting etc based on the float date values
  • In the long term, I personally think matplotlib could rewrite this full date support to rely on datetime64 instead of on the float numbers, but that is obviously a lot more work.

@jreback
Copy link
Contributor

jreback commented Oct 28, 2014

only think a tiny bit non- trivial is that
if u have a tz attached to a DatetimeIndex then it is converted properly

so if a tz is attached you should use datetime.datetimes (but with a non-pandas array it's not possible anyhow so prob not a big deal)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants