-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Parsing week-freq Periods #50803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not too familiar with datetime stuff in general, but from reading the code it seems like week-freq Periods have a fixed length? Then maybe we could check if the middle is a "/", and then try to send it down string_to_dts twice(by inserting a null character to terminate the c-str where the slash is, and then calling again passing in the char pointer after)? |
I'm not too concerned about perf of this particular case (though it can certainly be optimized), more about the fact that pre-checking for it with re.search will mean a perf hit for all other cases. Current thought is to wait for #50790, then re-arrange this to do this check as a fallback rather than upfront. |
Does this conflict with #50149? If not, we might also want to get that one in first, since it touches the same lines of code. |
This can definitely wait until after #50149 and ill rebase then. |
f788b9e
to
8b6cb59
Compare
Updated to sit on top of #50790 and do the week-checking case as a fallback instead of up-front |
8b6cb59
to
a34c9bc
Compare
@MarcoGorelli gentle ping on this and #50914, not time-sensitive but blockery for further work in this area |
sure, taking a look today, thanks for the ping (in general, please do feel free to ping / request reviews from me when necessary, with all the activity in this repo it's easy to miss things otherwise) |
""" | ||
# GH#50803 | ||
start, end = value.split("/") | ||
start = Timestamp(start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use Timestamp here?
IIUC from the last time I was in Period code, Timestamp sends strings down parse_datetime_string_with_reso
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use datetime.strptime. either way is fine. Timestamp doesn't yet use parse_datetime_string_with_reso, but getting there is one of the goals this is aiming at. even when it does, that won't affect behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks good to me
Thanks @jbrockmendel |
Could this have caused regressions in read_json? |
It doesn't look like that benchmark does any period parsing, so it seems unlikely. |
@lithomas1 IIRC youve spent some time recently in optimizing this portion of the constructor. I'm open to ideas on optimize this check or do it as the fallback rather than up-front.
The reason it is currently up-front is bc ATM
parse_datetime_string_with_reso
gets this wrong, bc dateutil gets it wrong. Following #50790, we could move theif dt.tzinfo is not None:
check fromparse_datetime_string
intoparse_dateutil
, at which pointparse_datetime_string_with_reso
would raise instead of return something incorrect (at least for some cases, I think others might get through).