-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: UserWarning about lacking infer_datetime_format with pd.to_datetime #46210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you provide a small DataFrame that reproduces this warning? I attempted with
on main and did not get a warning. |
closing for now, will reopen if you provide a reproducible example @moojen |
@MarcoGorelli Here is a small example to reproduce this warning: pd.to_datetime(['01/01/2000','31/05/2000','31/05/2001', '01/02/2000'], infer_datetime_format=True) It leads to this output
The problem is the following: the Later in the conversion function, there is an internal error because '31/05/2000' can not be converted using the guessed format and the format is locally changed (per object, not for the entire array) into the good one As you can see in my example, the last date in the array The user warning is annoying because it is emitted per object in the array. With a large array, it leads to thousands of warning lines: one per value when there is no other option than to change the initial guess to parse the date. This issue seems to be related to #12585 |
Agreed that the warning should only be emitted once; and the message saying to pass infer_datetime_format=True is confusing. The trickier issues about how to infer better seems to be well captured by #12585, so I'd recommend scoping this issue on just the warning message and count. |
infer_datetime_format seems flawed. Is it assuming the silly American MM-DD-YYYY format? (why is it not taking my locality into account?). But I'm not sure what it's doing as when I tried with the above sample and just switched the order of the first two dates this happens:
but.....
What's going on? It's got the right date format for the first three but has oddly switched the 4th date around? |
I'm tempted to suggest changing the warning to
, as
By default, |
In your example the wording I’d also wager that Sorry for my hate of MM-DD. ISO 8601 is a standard for a reason and solves all ambiguity with dates! |
Reckon this would be clearer? >>> import pandas as pd
>>> pd.to_datetime(['01/01/2000','31/05/2000','31/05/2001', '01/02/2000'], infer_datetime_format=True)
<stdin>:1: UserWarning: Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently-parsed dates! Specify a format to ensure consistent parsing.
DatetimeIndex(['2000-01-01', '2000-05-31', '2001-05-31', '2000-01-02'], dtype='datetime64[ns]', freq=None) Like this, the warning would only be emitted once, and (I think) would be a bit clearer
This'd have to go through a deprecation cycle, not sure it'd be worth it |
|
@TheSwallowCoder can you check on upstream/main (or on the pandas 1.5.0 release candidate)? |
Can't reproduce, here's the output I get from 1.5.0rc: (.venv) marco@marco-Predator-PH315-52:~/tmp$ cat t.py
import pandas as pd
pd.to_datetime(['01/01/2000','31/05/2000','31/05/2001', '01/02/2000'], infer_datetime_format=True)
(.venv) marco@marco-Predator-PH315-52:~/tmp$ python -c 'import pandas; print(pandas.__version__)'
1.5.0rc0
(.venv) marco@marco-Predator-PH315-52:~/tmp$ python t.py
t.py:2: UserWarning: Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing.
pd.to_datetime(['01/01/2000','31/05/2000','31/05/2001', '01/02/2000'], infer_datetime_format=True) |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When I run this code, I get the following warning:
"UserWarning: Parsing '15/09/1979' in DD/MM/YYYY format. Provide format or specify infer_datetime_format=True for consistent parsing."
Which is strange, because I already have set infer_datetime_format=True.
Expected Behavior
You would expect no warning at all, since "infer_datetime_format=True" is already provided as an argument.
Installed Versions
The text was updated successfully, but these errors were encountered: