-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Speed up Period construction #50149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/tslibs/period.pyx
Outdated
if dt is NaT: | ||
ordinal = NPY_NAT | ||
# Doesn't matter what this is, we just need to have it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of assigning a random value here would it make more sense to just not have it fail later on?
pandas/_libs/tslibs/parsing.pyx
Outdated
@@ -421,6 +421,10 @@ cdef parse_datetime_string_with_reso( | |||
parsed = datetime( | |||
dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us | |||
) | |||
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this impact the period construction? Feels a bit strange to re-assign this variable
pandas/_libs/tslibs/parsing.pyx
Outdated
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns: | ||
# No picoseconds, so no nanoseconds. | ||
# Have to have seen microseconds, in order to have "seen" nanoseconds | ||
out_bestunit = NPY_DATETIMEUNIT.NPY_FR_us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think doing this will mess up Timestamp inference in a follow-up to #49737
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I only added this block because somehow CI was failing tests only on Linux(maybe a locale problem).
(Issue was that datetime object was returned, when reso was nanosecond, and datetimes don't have nanosecond support).
pandas/pandas/_libs/tslibs/parsing.pyx
Line 413 in 7ef6a71
if dts.ps != 0 or out_local: |
IMO, we should be checking out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns
. don't know if that will break anything, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've updated this block now to checking the bestunit.
@jbrockmendel @WillAyd Would this work fine?
pandas/_libs/tslibs/parsing.pyx
Outdated
@@ -386,7 +386,7 @@ cdef parse_datetime_string_with_reso( | |||
&out_tzoffset, False | |||
) | |||
if not string_to_dts_failed: | |||
if dts.ps != 0 or out_local: | |||
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns or out_local: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns is handled here, should it be removed from the dict on L399-L409?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, both blocks go through the dict to get the string representation of the reso.
This is just to force the returned datetime object to be Timestamp, which has a nanosecond attribute.
are the improvements in parsing.pyx independent of the improvements in period.pyx? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ex @jbrockmendel comments. Nice work!
needs rebase, otherwise LGTM |
Thanks for the reviews. This PR depends on #50417, though, so need someone to review/re-review there. |
ping on green |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.ASVs.
parse_time_string already calls string_to_dts(parse_iso_8601_datetime), which I think is the only function that can currently parse nanoseconds(dateutil can't do it IIRC).