Skip to content

API: restore timedelta compability for Timedelta components? #9185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Jan 2, 2015 · 6 comments · Fixed by #9257
Closed

API: restore timedelta compability for Timedelta components? #9185

shoyer opened this issue Jan 2, 2015 · 6 comments · Fixed by #9257
Labels
API Design Timedelta Timedelta data type
Milestone

Comments

@shoyer
Copy link
Member

shoyer commented Jan 2, 2015

With the new Timedelta type (see #8184), we introduced behavior that breaks compatibility with the superclass timedelta: td.seconds refers to seconds since the last minute, rather than seconds since the last day (as in the superclass).

I think breaking compatibility with the superclass here was a mistake. User code clearly does rely on the old behavior, e.g., as manifested in bugs for exporting datetime columns to Excel: #9139. Moreover, it's likely to remain an issue causing silent bugs, because external libraries are going to continue to use isinstance checks to verify that something is a datetime/timedelta object. We chose to subclass timedelta here, so we really are stuck with full API compatibility.

So I think we should make this change in 0.16. It's a breaking change, but 0.15 was already a breaking change in this respect, and in any case I think we should give ourselves some allowance for API changes for new types.

CC @jorisvandenbossche @jreback

@shoyer shoyer added API Design Timedelta Timedelta data type labels Jan 2, 2015
@jreback
Copy link
Contributor

jreback commented Jan 2, 2015

I disagree entirely here. The actual impl of the datetime.timedelta is really odd and its interface is pretty useless.

The new Timedelta is more much useful. Practicality wins here.

The 'bug' you mentioned is really that the implementers of those libraries are using an implementation detail IMHO. And it wasn't tested.

-1 on this

@shoyer
Copy link
Member Author

shoyer commented Jan 2, 2015

I agree that the new Timedelta is more useful, but pandas exists in the broader Python ecosystem. For better or for worse, we are stuck with the standard library timedelta and its API. I don't think think fair to call this an implementation detail when it is [https://docs.python.org/2/library/datetime.html#timedelta-objects](called out very clearly) in the documentation.

The alternative approach is to insist that libraries be aware of our API breakage by not subclassing from timedelta.

@jreback
Copy link
Contributor

jreback commented Jan 2, 2015

@shoyer we discussed this at length. If you have a useful alternative then I am all ears.

We discussed making these attributes singular e.g. day, second and such. I would be ok with that route, thought IIRC that was MORE confusing.

I normally don't like breaking from standard implementations. But the timedelta in the standard library is bad and np.timedelta64 is almost worse. I think pd.Timedelta improves greatly on both. If it was possible to use a meta-class to NOT sublcass I think it would be done a long time ago. But alas its not.

@jmcnamara
Copy link
Contributor

The 'bug' you mentioned is really that the implementers of those libraries are using an implementation detail IMHO.

I don't think that is strictly true. XlsxWriter accepts python datetime objects and converts them to Excel dates. To do this it uses datetime.timedelta objects. It doesn't rely on an "implementation detail". It follows the datetime API and documentation. xlwt does something similar.

The issue in #9139, from my point of view, is that pandas.Timestamp identifies as a datetime object (via isinstance()) but has different behaviour for timedelta.

I'm not going to argue either way on that. I just wanted to clarify this in relation to XlsxWriter.

@jorisvandenbossche
Copy link
Member

I have to say that I rather agree with @jmcnamara and @shoyer.

By implementing pandas.Timedelta as a datetime.timedelta subclass (and so by causing eg isinstance checks for datetime.timedelta to work with pandas.Timedelta), we implicitly say that it is compatible with datetime.timedelta.
And datetime.timedelta has a well defined API, which we are breaking in the subclass.

In my opinion, we should:

  • or not claim it to be a subclass (at least ensure that isinstance checks don't work), only that it is another type but has a 'similar' interface (with some clearly noted differences, as this is now also noted in the docs)
  • or ensure it has full compatibility with its parent class

For that last one, @jreback it is true that is was discussed when this new feature was implemented. But it is always possible we later think that there was a possibly better option. And there were some options discussed as a way around this, and I think there is no blame in changing thought about that.
For example, there is the components attribute:

In [3]: td = pd.Timedelta('1 days 2 hours 4 seconds')

In [4]: td
Out[4]: Timedelta('1 days 02:00:04')

In [5]: td.seconds
Out[5]: 4L

In [6]: td.to_pytimedelta().seconds
Out[6]: 7204

In [10]: td.components.seconds 
Out[10]: 4L

So one option would be to only include the 'extra' attributes in the components attribute, and keep the only the days/seconds/microseconds on the Timedelta itself and have these compatible with datetime.timedelta.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2015

ok, seems everyone wants exact compatiblity, ok.fine :)

  • need to remove hours,minutes,nanoseconds from Timedelta,TimedeltaIndex and the .dt accessor (for TimedeltaIndex). to avoid confusion.
  • change meaning of seconds and microseconds to mean seconds_remainder and microseconds_remainder respectively.
  • .components accessor then becomes the idomatic way to retrieve these components (and is already existing), so no loss in info.
  • steal tests from BUG: Fix for Timestamp handling in xlwt and xlsxwriter engines. #9175 but should work w/o after this fix

Not a big deal to change, but would need a small doc section.

If anyone wants to give a whirl would be appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Timedelta Timedelta data type
Projects
None yet
4 participants