Skip to content

ENH/BUG: support TimedeltaIndex plotting #8711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Nov 2, 2014 · 32 comments
Closed

ENH/BUG: support TimedeltaIndex plotting #8711

jreback opened this issue Nov 2, 2014 · 32 comments
Labels
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

This raises

s = Series(range(5),pd.timedelta_range('1day',periods=5))
s.plot()

This will show the timedeltas with a formatted (albeit string index)

s.index = s.index.format()
s.plot()

wonder if we can just register a converter somehow? like #8614

@jreback jreback added this to the 0.16.0 milestone Nov 2, 2014
@jorisvandenbossche
Copy link
Member

I don't think that matplotlib already has a converter for datetime.timedelta, so just registering our Timedelta type will not be enough. Eg plt.plot(s.index.to_pytimedelta(), s) also fails.

But writing a basic converter should not be that difficult I think (and if it also works for datetime.timedelta it could maybe also be pushed upstream to matplotlib)

@jreback
Copy link
Contributor Author

jreback commented Nov 3, 2014

Timedelta is s. subclass of datetime.timedelta

@shoyer
Copy link
Member

shoyer commented Feb 12, 2015

I just encountered a MemoryError when attempting to plot a TimedeltaIndex!

pd.Series(range(15), pd.timedelta_range(0, freq='D', periods=15)).plot()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-113-e9a2d53dcace> in <module>()
----> 1 pd.Series(range(15), pd.timedelta_range(0, freq='H', periods=15)).plot()

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
   2516                  yerr=yerr, xerr=xerr,
   2517                  label=label, secondary_y=secondary_y,
-> 2518                  **kwds)
   2519 
   2520 

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
   2322         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
   2323 
-> 2324     plot_obj.generate()
   2325     plot_obj.draw()
   2326     return plot_obj.result

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in generate(self)
    925         self._make_legend()
    926         self._post_plot_logic()
--> 927         self._adorn_subplots()
    928 
    929     def _args_adjust(self):

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _adorn_subplots(self)
   1058                     ax.set_xticklabels(xticklabels)
   1059                 self._apply_axis_properties(ax.xaxis, rot=self.rot,
-> 1060                                             fontsize=self.fontsize)
   1061                 self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize)
   1062             elif self.orientation == 'horizontal':

/Users/shoyer/dev/pandas/pandas/tools/plotting.pyc in _apply_axis_properties(self, axis, rot, fontsize)
   1069 
   1070     def _apply_axis_properties(self, axis, rot=None, fontsize=None):
-> 1071         labels = axis.get_majorticklabels() + axis.get_minorticklabels()
   1072         for label in labels:
   1073             if rot is not None:

/Users/shoyer/miniconda/envs/rapid/lib/python2.7/site-packages/matplotlib/axis.pyc in get_majorticklabels(self)
   1166     def get_majorticklabels(self):
   1167         'Return a list of Text instances for the major ticklabels'
-> 1168         ticks = self.get_major_ticks()
   1169         labels1 = [tick.label1 for tick in ticks if tick.label1On]
   1170         labels2 = [tick.label2 for tick in ticks if tick.label2On]

/Users/shoyer/miniconda/envs/rapid/lib/python2.7/site-packages/matplotlib/axis.pyc in get_major_ticks(self, numticks)
   1295         'get the tick instances; grow as necessary'
   1296         if numticks is None:
-> 1297             numticks = len(self.get_major_locator()())
   1298         if len(self.majorTicks) < numticks:
   1299             # update the new tick label properties from the old

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in __call__(self)
    901             vmin, vmax = vmax, vmin
    902         if self.isdynamic:
--> 903             locs = self._get_default_locs(vmin, vmax)
    904         else:  # pragma: no cover
    905             base = self.base

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in _get_default_locs(self, vmin, vmax)
    882 
    883         if self.plot_obj.date_axis_info is None:
--> 884             self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
    885 
    886         locator = self.plot_obj.date_axis_info

/Users/shoyer/dev/pandas/pandas/tseries/converter.pyc in _daily_finder(vmin, vmax, freq)
    505                     Period(ordinal=int(vmax), freq=freq))
    506     span = vmax.ordinal - vmin.ordinal + 1
--> 507     dates_ = PeriodIndex(start=vmin, end=vmax, freq=freq)
    508     # Initialize the output
    509     info = np.zeros(span,

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in __new__(cls, data, ordinal, freq, start, end, periods, copy, name, tz, **kwargs)
    637             else:
    638                 data, freq = cls._generate_range(start, end, periods,
--> 639                                                  freq, kwargs)
    640         else:
    641             ordinal, freq = cls._from_arraylike(data, freq, tz)

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in _generate_range(cls, start, end, periods, freq, fields)
    651                 raise ValueError('Can either instantiate from fields '
    652                                  'or endpoints, but not both')
--> 653             subarr, freq = _get_ordinal_range(start, end, periods, freq)
    654         elif field_count > 0:
    655             subarr, freq = _range_from_fields(freq=freq, **fields)

/Users/shoyer/dev/pandas/pandas/tseries/period.pyc in _get_ordinal_range(start, end, periods, freq)
   1317                              dtype=np.int64)
   1318     else:
-> 1319         data = np.arange(start.ordinal, end.ordinal + 1, dtype=np.int64)
   1320 
   1321     return data, freq

MemoryError: 

> /Users/shoyer/dev/pandas/pandas/tseries/period.py(1319)_get_ordinal_range()
   1318     else:
-> 1319         data = np.arange(start.ordinal, end.ordinal + 1, dtype=np.int64)
   1320 

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@TomAugspurger
Copy link
Contributor

Working on this. Doesn't look too bad.

@TomAugspurger
Copy link
Contributor

As an update, it's a bit worse than I thought. I think it was @changhiskhan who put in a ton of heuristics for figuring out what to resolution to draw when plotting datetimes. I wasn't sure if we'd need that for timedeltas, and then I got busy with other thing. My branch is here

@JonasAbernot
Copy link
Contributor

As a workaround, the following works with master:

plt.plot(s.index,s.values)

@sinhrks
Copy link
Member

sinhrks commented Jul 22, 2015

I don't think freq adjustment of different timedeltas is mandatory at initial version. If ok, I'll try.

@lucasb-eyer
Copy link

Coming here from #10650, and adding a little more info just in case it can help. In my case, the bug manifests in _get_ordinal_range's end parameter having a huge ordinal. This means the following line:

data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)

allocates a gigantic array. To be specific, when doing:

pd.Series(np.random.randn(4), index=pd.timedelta_range('0:00:00', periods=4, freq='min')).plot()

the values of start.ordinal and end.ordinal are 0 and 180000000000, respectively.

@shoyer
Copy link
Member

shoyer commented Nov 12, 2015

@lucas-eyer is the mult parameter on that line appropriate, or is it some very small number? That might be the source of the issue...

@lucasb-eyer
Copy link

I don't know what appropriate would be, but it's 1 (one).

Edit: pip freeze | grep pandas gives pandas==0.17.0.

@Liam3851
Copy link
Contributor

I also just ran into this issue on 0.17.1. I'm not very familiar with the code, but it appears the issue is in pandas.tseries.converter.

The issue is that vmin and vmax as specified in the call to _get_default_locs in the get_major_locator function are in nanoseconds as returned from XAxis.get_view_interval:

def __call__(self):
    'Return the locations of the ticks.'
    # axis calls Locator.set_axis inside set_m<xxxx>_formatter
    vi = tuple(self.axis.get_view_interval())             # THIS IS IN NANOS
    if vi != self.plot_obj.view_interval:
        self.plot_obj.date_axis_info = None
    self.plot_obj.view_interval = vi
    vmin, vmax = vi
    if vmax < vmin:
        vmin, vmax = vmax, vmin
    if self.isdynamic:
        locs = self._get_default_locs(vmin, vmax)     # VMIN AND VMAX ARE IN NANOS
    else:  # pragma: no cover
        base = self.base
        (d, m) = divmod(vmin, base)
        vmin = (d + 1) * base
        locs = lrange(vmin, vmax + 1, base)
    return locs

But downstream in _daily_finder the freq parameter is used, which means that the system is interpreting the deltas in terms of minutes/hours/etc. rather than nanos:

def _daily_finder(vmin, vmax, freq):
    periodsperday = -1

    if freq >= FreqGroup.FR_HR:
        if freq == FreqGroup.FR_NS:
            periodsperday = 24 * 60 * 60 * 1000000000
       # ETC MAPPING periodsperday
       # .....
    # save this for later usage
    vmin_orig = vmin

    (vmin, vmax) = (Period(ordinal=int(vmin), freq=freq),    # NOW THESE ARE INTERPRETED AS MINUTES (or whatever freq)
                    Period(ordinal=int(vmax), freq=freq))

Replacing the final line above with

 (vmin, vmax) = (Period(ordinal=int(vmin), freq='N'), Period(ordinal=int(vmax), freq='N'))

appears to fix the issue.

@shoyer
Copy link
Member

shoyer commented Jan 13, 2016

@Liam3851 glad you have tracked this down! Any chance you're interested in making a pull request with the fix? :)

@Liam3851
Copy link
Contributor

Sure, I just have to figure out how to do it lol. Longtime pandas user but kinda new on this github thingy. I'll head over to the FAQ.

@shoyer
Copy link
Member

shoyer commented Jan 13, 2016

Great! Give it a try and let us know if you have any questions :).

On Wed, Jan 13, 2016 at 11:48 AM, Liam3851 [email protected] wrote:

Sure, I just have to figure out how to do it lol. Longtime pandas user but
kinda new on this github thingy. I'll head over to the FAQ.


Reply to this email directly or view it on GitHub
#8711 (comment).

@andreabedini
Copy link
Contributor

Lots of love from me too @Liam3851!

@Liam3851
Copy link
Contributor

Hmm, ok still slightly more complicated. Was testing the fix and the bounds are now right and the graphs themselves look correct but the axis labels don't always work properly (sometimes they disappear)-- probably something related to how the labels are interpreted. I'm busy these next few days but I'll try to get around to making the fix sound.

@TomAugspurger
Copy link
Contributor

Just guessing, but you could be hitting what I ran into. I can't remember how much progress if any I made on that.

@Liam3851
Copy link
Contributor

@TomAugspurger Hmm.. I'll try your version to see what it does. From the diff it looks like we're taking slightly different paths. It looks like you were building a TimedeltaConverter that worked parallel to DatetimeConverter and TimeConverter; I've been trying to fix the codepath the timedeltas are currently taking (through DatetimeConverter). But it's entirely possible that getting it to look just right will require going down your path.

@TomAugspurger
Copy link
Contributor

I’d say getting it somewhat functional is good enough for now. Hopefully you don’t have to go down that rabbit hole.

On Jan 14, 2016, at 10:29 AM, Liam3851 [email protected] wrote:

@TomAugspurger https://github.com/TomAugspurger Hmm.. I'll try your version to see what it does. From the diff it looks like we're taking slightly different paths. It looks like you were building a TimedeltaConverter that worked parallel to DatetimeConverter and TimeConverter; I've been trying to fix the codepath the timedeltas are currently taking (through DatetimeConverter). But it's entirely possible that getting it to look just right will require going down your path.


Reply to this email directly or view it on GitHub #8711 (comment).

@sam-cohan
Copy link
Contributor

Hello. I am using pandas version 0.19.0 and matplotlib version 1.5.3 with python 3 and this issue is still there: If I try to plot a Dataframe where the index is a timedelta I get Memory Error. I am working around this by calling plt.plot(df.index, df.values) but it would be nice if there was a proper fix for this...

@jorisvandenbossche
Copy link
Member

@sam-cohan As you can see, the issue is still open, so it's indeed not yet solved. But any help is certainly welcome!

@sam-cohan
Copy link
Contributor

Sorry I was looking at the wrong "Closed" :)

@jgoppert
Copy link
Contributor

jgoppert commented Jan 4, 2017

Really wish this was fixed. I'm using datetime as a work around but stringing along 1970-01-01 to do time deltas is not fun.

@jorisvandenbossche
Copy link
Member

@TomAugspurger does your branch with a first attempt still exist? (the link above is not working anymore)

@jgoppert
Copy link
Contributor

jgoppert commented Jan 5, 2017

So the issue here is that we are trying to use the Int64Index as a base class for TimedeltaIndex but we are trying to use the plotting routines for the PeriodIndex which relies on DatetimeIndex (matplotlib.date) underneath. Matplotlib.date scales the view interval to the selected frequency. Int64Index does not, so this explains the issues above.

Options:

  1. Rebase timedelta index on DatetimeIndex
  2. Write a another routine to plot time deltas like this: http://stackoverflow.com/questions/15240003/matplotlib-intelligent-axis-labels-for-timedelta. I think this is the easiest path forward, but I need help figuring out how to hook it in. With the time series mix-ins for plotting I'd have to override the plotting routines based on the type of index somewhere.

@jorisvandenbossche
Copy link
Member

@jgoppert you should take a look at pandas/tseries/converter.py and the TimeConverter and DatetimeConverter classes. A possible way forward is to make a new TimedeltaConverter similar to those.

@jgoppert
Copy link
Contributor

jgoppert commented Jan 5, 2017

@jorisvandenbossche I did consider that approach, but I think having a separate matplotlib plotting function is cleaner and will require less maintenance. We also won't have to worry about ever seeing jan 1970 on the time delta plot like we do on the period index based plots now. It seems pretty robust and I have added nano-second level precision labels.

@TomAugspurger
Copy link
Contributor

@TomAugspurger does your branch with a first attempt still exist? (the link above is not working anymore)

Seems like I deleted that branch when I was cleaning up my fork. I didn't get far beyond the TimedletaConverter, which is pretty straightforward. IIRC the difficult part was getting the dynamic relabeling to work like datetimes do (which can be a separate fix from fixing the memory error).

@jgoppert
Copy link
Contributor

jgoppert commented Jan 5, 2017

@TomAugspurger can you take a look at my PR. Totally different approach but seems to work for me.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Feb 22, 2017
@sam-cohan
Copy link
Contributor

Does this mean the fix for this will be in next release? If so, what is the timeline for that? Thanks in advance.

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2017

@sam-cohan yes it will be in 0.20.0

I think we are still about 1 month away from an rc.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
Author: James Goppert <[email protected]>
Author: James Goppert <[email protected]>

Closes pandas-dev#8711
Closes pandas-dev#15067 from jgoppert/tdi_plot_fix and squashes the following commits:

945ec14 [James Goppert] Merge branch 'master' into tdi_plot_fix
7db61ec [James Goppert] Create TimeSeries_TimedeltaFormatter.
232efe6 [James Goppert] Fix comment format and exception type for tdi plotting.
4eff697 [James Goppert] Add more time delta series plotting tests.
f5f32bc [James Goppert] Link time delta index docs to better matplotlib docs.
d588c2c [James Goppert] Fixes test for tdi w/o autofmt_xdate.
b6e6a81 [James Goppert] Disables autofmt_xdate testing.
c7851e3 [James Goppert] Adjusts tdi test draw calls to try to fix CI issue.
7d28842 [James Goppert] Switch to draw_idle to try to fix bug on xticks update.
3abc310 [James Goppert] Try plt.draw() instead of canvas.draw() to fix issue on osx 3.5.
91954bd [James Goppert] Finished unit test for timedelta plotting.
41ebc85 [James Goppert] Fixes for review comments from pandas-dev#15067.
f021cbd [James Goppert] Support nano-second level precision x-axis labels.
5ec65fa [James Goppert] Plot fix for tdi and added more comments.
b967d24 [James Goppert] flake8 fixes for tdi plotting.
efe5636 [James Goppert] Adds custom plot formatting for TimedeltaIndex.
@billtubbs
Copy link

I am having a similar issue today using version 2.1.2.

See this stackoverflow post where I include a reproducible example:

This bug is quite hard to track down and dangerous because it may or may not occur depending on particular values in the data to be plotted.

Is this the same issue or should I raise a new one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.