-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Unexpected pd.to_datetime result #12583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also suggest to at least warn users of this behavior in the online document. |
Agree with the warning suggestion. A common sense assumption is: the date format in the same column should be consistent. But currently, pandas just infers each datetime independently, and SILENTLY. It does not care if the dates parsed use the same format. This could be a pitfall if the user is not careful in checking the parsing results. A check on consistency and warning would be very helpful. |
This is a known issue with See also #12501, #7348 and #3341 And we are indeed considering changing the default of Currently raising a warning is not possible as far as I know, as pandas just passes the strings to |
Thanks for linking to the previous issues. I am thinking about some heriustics used by |
There could maybe be improvements to I made an overview issue for this: #12585 BTW, if you want to make some clarifications to the docs for now, PRs also very welcome! |
Proper documentation sounds like a good temp solution. Will try to help if possible. |
Closing in favor of the master issue #12585 |
Code Sample, a copy-pastable example if possible
Expected Output
_Reason: Expect a default behavior (without extra
format
ordayfirst
parameter) that implements a consistent datetime parsing within the same column._Comments
Within the internal helper function
pd.tseries.tools._to_datetime
, the datetime format will be first inferred based on the first non-nan element, followed by a parsing viatslib.array_strptime
(Line 357), which gives expected result.But the fallback function
tslib.array_to_datetime
doesn't use the inferred format(Line 373)_This is caused by the default value (
False
) of parameterinfer_datetime_format
to the_to_datetime
(Line 285)Suggestion: make parameter
infer_datetime_format
to the_to_datetime
default toTrue
, it won't solve all problems, but at least half of them, depending on the format of the first element._output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: