-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: to_datetime with box=False returning DatetimeIndex with mixed naive and aware datetime.datetimes #23722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm pretty sure this ends up going through the datetime_to_datetime64 branch of to_datetime. The fix is going to involve merging that into array_to_datetime and handling the tzinfo checks more carefully (like you already did for the string cases). Did the issue with hashability of dateutil timezones ever get resolved? I think we're going to need to use actual BTW note that passing the list in the reversed order raises correctly. |
It appears there's some caution making dateutil timezones hashable: dateutil/dateutil#792 (comment) Agreed this probably involves just merging datetime_to_datetime64 into array_to_datetime. |
If hashing is still a problem then sets are out; could we map tzinfo objects to unique strings and use a dict? Actually we might even use the same dict that tslibs.timezones uses |
That could work. We could have a function that does the reverse of |
Agreed that the boxing into an index is wrong, but the fact that it is actually parsed into datetime64, is that necessarily wrong? Or if we don't want to parse such a mixture of tz-aware/tz-naive, shouldn't the default be to raise an error? |
I guess in essence this appears as if My impression was to always try preserve mixed types of arguments into a "lowest common datetime format"; therefore allowing cases like above to come in as an object array of |
The raise/not behavior should be determined by the |
The expected behaviour is indeed not very well defined (at least not in the docstring). But there is the Eg with out of range timestamps, by default it raises, but with 'ignore' it gives you back datetime.datetime objects:
And as @jbrockmendel noted above, if you switch the order to put a tz-aware first, it also raises:
Although here the So there are clearly some inconsistencies that would be nice to solve. |
And some more inconsistencies: The equivalent with strings (one tz aware, one not), "works" as well (so this is similar as the case with datetime.datetime):
But now switching them in order returns datetime.datetime objects:
|
Well my impression is that
So I think in the case of the original example input But I agree your last 2 examples should be made consistent regardless of order of arguments. |
There's certainly some logic in there, but then from a user's standpoint: what's the point of returning those object dtype values (same as input)? I cannot do anything with it, I exactly passed them to Or differently put: if not with |
To answer myself: by passing |
So if this is the minimum assumption (which we should document) for I have a slight suspicion some internals may depend on retaining mixed timezone or mixed tz-aware/tz-naive values (e.g. I just changed this for parsing utc offsets: https://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#parsing-datetime-strings-with-timezone-offsets) |
There are a number of overlapping issues involved here. The OutOfBoundsDatetime behavior is a particularly difficult one. To the extent possible, I think we should consider that separately for the time being. boxing is currently done at several places inside |
+1 The boxing can indeed be discussed separately, but I think the original issue was not only about boxing, but also the question about should it return datetime64 data or object data (or raise). In any case, if you have a mixture of tz-aware and tz-naive values, the behaviour should not depend on the order and which of the two happened to be the first value .. |
I am not sure it will be best in all cases or what we would break when trying to do this, but I think it is at least a "clear" explanation: the function returns datetime64[ns] data (boxed or not), or raises that it cannot do that (by default, the keywords can then still influence it) |
I just made a GH Project for to_datetime to collect all of these issues/discussions |
Since |
xref #23702 (comment)
The current output also is incorrect if
box=True
I believe since the first argument is getting coerced to UTC.Expected Result
The text was updated successfully, but these errors were encountered: