Skip to content

Datetime functionality #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue Sep 19, 2023 · 0 comments · Fixed by #275
Closed

Datetime functionality #260

MarcoGorelli opened this issue Sep 19, 2023 · 0 comments · Fixed by #275
Labels
API design timeseries related to dates / datetimes / times / durations

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Sep 19, 2023

I like how pandas/polars have a .dt namespace for datetime functionality, I'd suggest having that too. What do we put in it?

Looking at skrub, here's some datetime functionality we should add.

  • year
  • month
  • day
  • hour
  • minute
  • second
  • millisecond
  • microsecond
  • nanosecond
  • iso_weekday (monday = 1, sunday = 7)
  • timestamp (number of seconds since 1970-01-01 UTC)

These should all be fairly trivial. They also use floor, but I think we should keep that one out initially, as it's really non-trivial in the timezone-aware case when there's DST. Pretty sure there's a way around it for what they're doing anyway

There's some inconsistency in the definitions here:

  • pandas: microsecond returns the number of microseconds since the last second, but 'nanosecond' returns the number of nanoseconds since the last microsecond. there is no millisecond
  • polars (and chrono): nanosecond returns the number of nanoseconds since the last second. similarly for millisecond and microsecond

I'd suggest we only include microsecond as part of the Standard, which they all agree on

Also, I'd suggest making all these functions rather than properties. I think pandas is really misleading here:

In [71]: ts = pd.date_range('1900-01-01', '2100-01-01', freq='1min')

In [72]: %time ts.nanosecond  # definitely not "free"
CPU times: user 2.1 s, sys: 26 ms, total: 2.13 s
Wall time: 2.13 s
Out[72]:
Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       ...
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
      dtype='int32', length=105190561)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design timeseries related to dates / datetimes / times / durations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant