Skip to content

ENH: Timedelta isoformat #15136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

TomAugspurger
Copy link
Contributor

Split from #14904

A few questions before I finish all the docs / tests:

The spec seems very permissive about what's allowed, so we need to choose

  • Do we include components pandas can't represent like Years and Months with 0 (P0Y0M...) (no)
  • Do we include components whose value is 0, e.g. P0DT... vs. PT... (yes)
  • Do we 0 pad units, e.g. ...T5H vs T05H (no)
  • Do we strip trailing 0s in the seconds decimal: ...6.000000000S vs. 6S (yes)

We probably want a reader for this format as well, maybe in the Timedelta constructor. Not sure if I'll have time to get to that before 0.20 . I'd rather prioritize the JSON table schema PR.

@jreback
Copy link
Contributor

jreback commented Jan 15, 2017

fyi the parsing issue is here: #11375

"""
Format Timedelta as ISO 8601 Duration like
`P[n]Y[n]M[n]DT[n]H[n]M[n]S`, where the `[n]`s are replaced by the
values for that duration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the link to the 8601 format ?

`ISO8601 duration`_

.. _ISO601 duration: https://en.wikipedia.org/wiki/ISO_8601

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link in the doc string as well

@jreback jreback added Enhancement Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type labels Jan 15, 2017
@jreback jreback added this to the 0.20.0 milestone Jan 15, 2017
@jreback
Copy link
Contributor

jreback commented Jan 15, 2017

lgtm

minor did comment

result = td.isoformat()
expected = 'P0DT0H0M0.000000123S'
self.assertEqual(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for NaT (which i think should just be nan), similar to what we do for Timestamp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. How about NaT to match Timestamp?

In [24]: pd.Timestamp('NaT').isoformat()
Out[24]: 'NaT'

In [25]: pd.Timedelta('NaT').isoformat()
Out[25]: 'NaT'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's fine

@TomAugspurger
Copy link
Contributor Author

Thanks for the review. Added the NaT test and fixed up some doc issues. I also added a small example in timedeltas.rst, under the Attributes section. There wasn't really an appropriate place for it in api.rst that I saw.

@codecov-io
Copy link

codecov-io commented Jan 15, 2017

Current coverage is 86.23% (diff: 100%)

Merging #15136 into master will increase coverage by 0.68%

@@             master     #15136   diff @@
==========================================
  Files           145        145           
  Lines         51352      54034   +2682   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          43932      46598   +2666   
- Misses         7420       7436     +16   
  Partials          0          0           

Powered by Codecov. Last update 362e78d...9fda713

@jreback
Copy link
Contributor

jreback commented Jan 15, 2017

lgtm. merge on pass.

@jorisvandenbossche
Copy link
Member

Do we need to note that a Timedelta is not actually a 'duration' as defined by that standard? (as a timedelta specified absolute time difference and not a relative)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 15, 2017

@jorisvandenbossche does a Timedelta actual violate the definition? The way I'm reading it, a duration can mean either absolute or relative difference.
Something like P1M, for 1 month, is going to be relative and we can't represent that as a Timedelta. But the duration from '2016-01-01' to '2016-02-01' can be represented as a timedelta.

@pwalsh does that sound right to you?

@jorisvandenbossche
Copy link
Member

Yeah, it is not really clear from the wiki page. It depends on how the 'days' are interpreted, but from "But keep in mind that "PT36H" is not the same as "P1DT12H" when switching from or to Daylight saving time." I would conclude that the days are also relative and not absolute (not 1D == 24H), and I think for a timedelta this is absolute?

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche rereading the wiki page again today. I missed this section earlier

They should only be used as part of a time interval as prescribed by the standard. Time intervals are discussed in the next section
[ ...]
A time interval is the intervening time between two time points.

How do you read that? I'm going back and forth on it. I think that means we're OK, as python uses time deltas to represent the time between two points, and the spec allows

Duration only, such as "P1Y2M10DT2H30M", with additional context information

As one valid representation. I am concerned that other libraries don't implement an isoformat on time deltas.

@jreback
Copy link
Contributor

jreback commented Jan 17, 2017

Timedelta is de-facto a duration, so I think the above definition is fine.

@jorisvandenbossche
Copy link
Member

Timedelta is de-facto a duration,

Not in the sense that it is defined on the wiki (or at least the possible notations for a duration). For example, "1 month" is not representable by a Timedelta in general.
But as Tom points out, they "should only be used as part of a time interval" according to the wiki, so which means that the duration is always interpreted based on a certain context. So in such a case, then also "1 month" is exactly defined in absolute time. Although the json table schema does not seem to follow that restriction of only using it as part of a time interval.

But anyway, this is probably too detailed discussion for this use case as long as we only specify days and below.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2017

@TomAugspurger my only remaining question is this in the JSON standard anywhere / common practice? (or is there anything for describing how to store durations / intervals)?

For datetimes we have ISO and as epochs, is there such a thing for durations? (if so maybe we should have a kw arg timedelta_unit?, though could be added later).

@TomAugspurger
Copy link
Contributor Author

  • This JS library converts absolute durations, like timedeltas, to iso8601 duration format.

  • moment.js is quite clear that its durations are relative, not absolute

Durations do not have a defined beginning and end date. They are contextless.

A duration is conceptually more similar to '2 hours' than to 'between 2 and 4 pm today'. As such, they are not a good solution to converting between units that depend on context.

They do convert to ISO8601

  • this python package seems to read and write between iso8601 durations and time deltas.

@jreback
Copy link
Contributor

jreback commented Jan 19, 2017

ok @TomAugspurger then I think your .isoformat() is just fine, its a duration, and represents exactly what Timedelta is. (whether releative or absolute is context dependent, which is fine).

@jreback
Copy link
Contributor

jreback commented Jan 19, 2017

merge on pass.

@jreback jreback closed this in 6c11b91 Jan 20, 2017
@jreback
Copy link
Contributor

jreback commented Jan 20, 2017

thanks @TomAugspurger

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Author: Tom Augspurger <[email protected]>

Closes pandas-dev#15136 from TomAugspurger/timedelta-isoformat and squashes the following commits:

9fda713 [Tom Augspurger] ENH: Timedelta isoformat
@TomAugspurger TomAugspurger deleted the timedelta-isoformat branch April 5, 2017 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants