-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix to_datetime to properly deal with tz offsets #3944 #5958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm going to stop pretending I haven't seen this PR for just long enough to give you a glimpse Convince me that this can't possibly break anything for anyone and that the change is what Some of those variations seem like they stretch iso8601 to the breaking point, I may be wrong. |
Perhaps I should have written the test in a more readable way to highlight the bug. I was just writing it for the future to get all the test cases in a very concise manner and try to flex various ways the datetime strings are parsed. The current situation is just wrong:
These timestamp strings
This is certainly what the code should be doing. This could break somebodies code who depends on this bug, but on the plus side: a) All of the existing tests pass |
The timestring is not a valid iso8061 string, so the behavior might be said to be undefined. Can you show an example where another code path in pandas parses this differently from I'm half convinced this can go in to 0.14.0, @jreback? |
Can you do a test_perf.sh run to check for possible perf impact? |
I agree that this is a bug (the very fact that we have different results for this one case). For the most part this path is not hit very often AFAICT (e.g. you need to have no specified format and its not parsed by the ISO8601 c-parser), so other that our tests I wouldn't see how this is even hit. @danbirken can you you contrive a case where its actually hit either in (and not by directly calling |
I guess this proves the point (but this is way esoteric, who specifies dtypes when constructing Series!)
|
So the original reason I saw this bug was the code path in
Here is a case with
Old (timezone offset is ignored):
Fixed (timezone offset used):
I'd actually imagine this gets triggered quite a bit, since people use pandas to import log data a lot (which will have timestamps in weird formats, probably including a tz offset). Running the ./test_perf.sh stuff now... |
I'm sold on the first example. The rest are just cases where a strange timestamp doesn't start of 0.14, if no objections @jreback ? |
technically this is a bug fix but I agree that people may actually be relying in this |
@danbirken this does not close #3844 correct? |
Currently for certain formats of datetime strings, the tz offset will just be ignored.
Well I can't get ./test_perf.sh to cooperate, but I ran some ad-hoc tests and it appears to me to be within 1% of whatever the previous performance was. This changes from one fully cython-ed function to another, so it makes sense that it still would be quite fast. I admit I don't fully understand what is going on in that particular issue (#3844), so I really have no idea if this is related. FYI: I just made a small update because I just saw that |
Oh I see, you probably mean #3944. I think that issue should be closed as "will not fix", and this is really a different bug (though it is related). I can make another GH issue for this change if you would like for tracking purposes. |
no that's fine |
@danbirken, please open an issue on test_perf if there is one and cc me. |
banzai |
BUG: Fix to_datetime to properly deal with tz offsets #3944
Currently for certain formats of datetime strings, the tz offset will
just be ignored.
#3944