-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistant parsing between Timestamp and to_datetime #52167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
thanks for the report this is expected: having said that, the warning could probably be extended to say
|
I still don't get it. I understand that |
arguably we could only warn if there are at least two non-null elements maybe that's worth doing - interested in submitting a PR? |
I had a look at the code. It looks a bit harder than I thought. Suppressing this warning in this method when there is a single element seems a wrong idea. In my opinion, this warning should be raised later if this method returns None and the array is of length > 1. In
However, I am not completely sure of the possible side effect of this change. Is there anybody who can tell me whether what I propose makes sense, or whether I am missing something? |
By the way, is the message of the warning correct. I have no idea what "falling back to 'dateutil'" means and implies. I would personally just drop this sentence from the warning message as it seems uninformative for a user. |
I'd suggest making the change within - warnings.warn(
- "Could not infer format, so each element will be parsed "
- "individually, falling back to `dateutil`. To ensure parsing is "
- "consistent and as-expected, please specify a format.",
- UserWarning,
- stacklevel=find_stack_level(),
- )
+ if tslib.first_non_null(arr[first_non_null+1:]) != -1:
+ warnings.warn(
+ "Could not infer format, so each element will be parsed "
+ "individually, falling back to `dateutil`. To ensure parsing is "
+ "consistent and as-expected, please specify a format.",
+ UserWarning,
+ stacklevel=find_stack_level(),
+ ) |
Do you have any specific reasons for this suggestion?
According to me it is more relevant to raise the warning in |
because in there we're already doing the check for the first non-null item if you wanted to do it one level up, you'd need to do that check again, or return the index of the first non-null item |
I'll open a PR so we can get this in by the 2.0 release |
@MarcoGorelli you were faster than me! I got your point on checking for non-null item. Still have one comment on the warning message.
When I read this message, I understand that the date cannot be parsed. Indeed, if you cannot find the format of a date, how it can be parsed anyway later. This "Could not infer format, so each element will be parsed individually" seems contradictory. Then I believed this warning was raised when the date format is not uniform among dates in the array (which could explain the message), but it is not the case. My naive understanding is that to parse a date, you must infer its format. Passing a format simply speed up the parsing because the format does not need to be inferred. It seems the internal is more complex. Then I have no idea what " falling back to Currently, this warning is confusing and I believe it ought to be revised. I cannot really make a proposal as my understanding of the date parsing internal is not good enough, but I can give feedback on a new version if needed. NOTE: I am still not fully understanding how dates are parsed (seems most code is cython and I still not completely understand how to navigate this code in pycharm, by the way, is there any documentation on how to seamlessly navigate python and cython/C code in an IDE). |
Apologies - I'd normally prefer to mentor people through contributions, but with the final 2.0.0 release on the horizon (possibly next week!), I figured I'd just go ahead and try to get this in Regarding the warning - I'll try to explain what's going on:
So, this is what the warning is trying to tell users about. If you have suggestions for how to better phrase it, that would be extremely welcome
I wish 😄 But the legendary @WillAyd has some blog posts on debugging C extensions at https://willayd.com/ |
Thanks for the clarification, I would suggest :
great, thanks! |
Hey! I have the same problem, if I use The data in
The warning message what I get then:
But I get a correctly parsed datetime format
If I try to set a format, for example,
Pandas is of version of 2.0.3 |
your format's not correct, you're missing the meridiem directive |
You are right.
parses correctly |
hi everyone, im new using pandas, I am trying to analyze some data and I get the following UserWarning: UserWarning: Could not infer format, so each element will be parsed individually, falling back to dateutil. To ensure parsing is consistent and as-expected, please specify a format. df['Fecha'] = pd.to_datetime(df.Date) this is my dataframe code: df['Fecha'] = pd.to_datetime(df.Date) df.drop(columns='Date', inplace=True) Does anyone have a solution? |
You need to do this |
@MarcoGorelli I also have an issue with this date parsing message. I am trying to parse a date series which has 2 formats merged together: Someone on stack over flow suggested to read data in both formats and merge:
but I also don't understand this one and my only option remains as making a custom function to infer my date and apply it manually so that |
I have the same problem as @ReetiMauryaCrest. I don't have control over how the data is entered, so I get a mixture of formats for dates. I want to be sure I'm using best practices, what would y'all suggest? |
hey
if you pass but personally i'd suggest doing some validation before your data reaches pandas |
Thanks @MarcoGorelli I ended up making my custom parsing function based on 2 date formats I was receiving and applying it on the row. My main concern was that it would be too slow but I haven't found any large delays till now, so I think it would work for me for now. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
to_datetime
used to be able to parse date using month in English. There is no ambiguity in this case (no doubt what is the day position), but it raises now aUserWarning
.Surprisingly, a
Timestamp
can be constructed without any warning with the same string that raised a Warning withto_datetime
Expected Behavior
Ideally, I would like to have no user warning with a string such as '8 July 2010' or 'July 8 2012'.
At the minimum
to_datetime
andTimestamp
should raise on the same strings.Installed Versions
INSTALLED VERSIONS
commit : c2a7f1a
python : 3.11.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : fr_FR.cp1252
pandas : 2.0.0rc1
numpy : 1.23.5
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.10.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: