-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: to_datetime() returns object instead of datetime type or raising exception #42229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. We have been testing this behavior for a while so it's unlikely to change. Documentation was recently added for this behavior (which will be in the 1.3 docs) https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.to_datetime.html |
Would it be possible to move the notice about the return type sometimes being |
Good point. The return type section should specifically call out and |
If I may jump in, I do not think that the current behaviour is intuitive for "the masses". The fundamental problem is that most users do not understand the difference between the concepts of timezone and time offset. Therefore most APIs available (in many programming languages, not only python) are of poor quality, because they do not convey a clear explanation. In real-world data, it is extremely frequent to
Even if stdlib's To come back to the issue at hand, when I write this: pd.to_datetime(["2020-10-25T01:30:00+02:00", "2020-10-25T02:00:00+01:00"]) The datetimes are not of mixed timezone. They are both in If you agree, I would suggest at least to first modify #42244 so that "mixed time offset" appears instead of "mixed timezone" (I'll do the suggestion directly in the PR). For future versions, we could maybe discuss how to improve/replace this "utc=True" parameter to propose a more readable/intuitive alternative... What do you think? |
@smarie That's a really good point, I hadn't even noticed at first but I think the data that prompted my confusion was, just like you describe, not in two different time zones but with different offsets while the time zone was experiencing daylight savings. In fact, the entries at the start and end of my (long) timeseries had the same offset, so it took me forever to figure out what was happening. Perhaps the default behavior could be printing a warning or raising an exception when mixed offset data is sent to the function, with an option to turn that off if the |
Following pandas-dev#42244 , improved documentation about datetime parsing. See also pandas-dev#42229 (comment)
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
The output is
Problem description
The first two lines work as expected, but the following three lines do not return a
DatetimeIndex
but also don't raise any exceptions in the process. This behavior is undocumented, and quite hard to debug/track down, since the failure is completely silent. It seems that the reason is the input contains times from two different time zones.Expected Output
Either a DatetimeIndex object or an exception to be raised.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 7c48ff4
python : 3.9.1.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.5
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 21.1.2
setuptools : 49.2.1
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.22.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : 1.4.2
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: