-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Out of bound dates can be saved to feather but not loaded #47832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the issue is due to the |
This is a bit tricky. The column in your DataFrame is not a pandas datetime, but a Python object. Those are saved differently internally. If you try to convert the column to a pandas datetime it'll fail: >>> pd.to_datetime(df['date'])
Traceback (most recent call last):
File "/home/mgarcia/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2211, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
File "pandas/_libs/tslibs/conversion.pyx", line 358, in pandas._libs.tslibs.conversion.datetime_to_datetime64
File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1654-01-01 00:00:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mgarcia/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 1051, in to_datetime
values = convert_listlike(arg._values, format)
File "/home/mgarcia/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 402, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/home/mgarcia/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2217, in objects_to_datetime64ns
raise err
File "/home/mgarcia/mambaforge/envs/pandas-dev/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2199, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 381, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 608, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 604, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslib.pyx", line 476, in pandas._libs.tslib.array_to_datetime
File "pandas/_libs/tslibs/np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1654-01-01 00:00:00 This is because the 1654 date is out of the range supported by pandas dates. See https://pandas.pydata.org/docs/user_guide/timeseries.html#timestamp-limitations While your expected behavior makes total sense, I think each individual current behavior is reasonable:
We could consider raising an exception if a Python datetime column is being saved, and force the user to cast it to a pandas datetime. But not sure about the implications. In any case, in pandas 1.5 released soon, we should start having support for a much wider range of dates. So, your specific case will work with the new release. |
EDIT: I wrote wrong things in my first message.
How would one save the above dataframe then? Convert datetime Python object to string?
Yes, quite excited by non-ns datetime ! I understand that this is an issue that might be solved in the next pandas release. Do you want me to close the issue? If anyone face this issue, you can access your feather file through this little hack: import pyarrow.feather as feather
df = feather.read_table("to_trash.feather")
df = pd.DataFrame(df.to_pylist()) You can answer to this SO question if you have a better solution. |
I just checked with pandas 1.5.0, and the issue is still there. Will this be solved in pandas 2.0, or in a future 1.5.x ? |
Pandas 2.0 is out, and the problem still exists. However, if the feather file is opened with the import pandas as pd
from datetime import datetime
df = pd.DataFrame({"date": [
datetime.fromisoformat("1654-01-01"),
datetime.fromisoformat("1920-01-01"),
],})
df.to_feather("test.feather")
pd.read_feather("test.feather", dtype_backend="pyarrow") @datapythonista should we close the issue? |
@jbrockmendel do you want to have a look here? Looks like when a big date is loaded from feather, it'll use ns precision and will fail, unless the pyarrow backend is used. I guess this is a bug, but I may be missing something. |
Looks like the read_feather call is raising from within pyarrow. Not clear to me what we can do at that level. On our end we might make it easier to solve by having df['date'] have dtype |
This is fixed by #55901, but will need a test. |
take |
take |
Pandas version checks
Reproducible Example
Issue Description
Does not return the original dataframe but raise an issue instead.
ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -9971942400000000
Expected Behavior
return the original dataframe df.
Installed Versions
pandas : 1.4.3
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 58.1.0
pip : 22.0.4
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.4.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: