-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
dt-accessor fails to extract 'days' from timedelta-series #26285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This may have something to do with Timestamp limitations: http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations Though that may not be the sole driver. |
The issue is that you are using
I can see an argument where pandas coerce in this case but I believe we don't do this for other object data types (e.g. subtracting 2 ints with object dtype). We recommend sticking with |
Thank you @mroeschke I have reworked example from date -> datetime. import pandas as pd
import datetime as dt
# Create data
df = pd.DataFrame({'start_date': [dt.datetime(2019, 3, 31), dt.datetime(2019, 3, 31)],
'end_date': [dt.datetime(2019, 3, 31), dt.datetime(3014, 1, 1)] })
# Calculate duration (timedelta objects)
df['duration'] = df['end_date'] - df['start_date']
# EXAMPLE 1: PROBLEM HERE - THIS FAILS !!!
df['duration_days'] = df['duration'].dt.days The code still fails with error:
Full error in details below:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
1504 try:
-> 1505 result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
1506 except TypeError:
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs) ~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b, truediv, reversed, **eval_kwargs) ~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b, **eval_kwargs) TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<M8[ns]') During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y) ~\Anaconda3\lib\site-packages\pandas\core\ops.py in masked_arith_op(x, y, op) TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<M8[ns]') During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(left, right) ~\Anaconda3\lib\site-packages\pandas\core\ops.py in safe_na_op(lvalues, rvalues) pandas/_libs/algos.pyx in pandas._libs.algos.arrmap() ~\Anaconda3\lib\site-packages\pandas\core\ops.py in (x) TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<M8[ns]') If I replace the year 3014 for value 2014, everything works fine. Because of this, I am suspicious, that using
|
The object vs datetime64 difference you are seeing with the year has to do with the timestamp limitation @WillAyd mentioned.
|
Thank you @mroeschke Despite I can imagine, that there is some hidden impact of Timestamp limitation under the hood, Let's test the simplest code possible: import pandas as pd
import datetime as dt
series = pd.Series([dt.datetime(2019, 1, 1), dt.datetime(2262, 1, 1)])
series OK - This results in datetime-series, as expected. (we used max year 2262) import pandas as pd
import datetime as dt
series = pd.Series([dt.datetime(2019, 1, 1), dt.datetime(2263, 1, 1)])
series NOT OK - This results in object-series, despite all elements are valid datetimes. (we used max year 2263 - over limit of Timestamp) Result: Suggestion: This bug practically excludes pandas from working on real-world projects, where fully valid datetime objects (above year 2262) are used and forcing us to do hacky workarounds (like in initial Example 2). Pandas should work normally and predictably with ANY valid datetime. What do you think about it? |
Note that the dtype is not
Similar to @mroeschke 's comment about supporting If you don't need nanosecond precision, you can use |
@stefansimik you can also have a look here: #7307 closing for all the reasons @mroeschke and @jbrockmendel pointed. |
Thank you all guys (@jbrockmendel, @jreback, @mroeschke) Let's hope for better future in pandas v2.0 👍 |
Copy-pastable example
Problem description
dt-accessor cannot extract
days
value from timedelta-seriesand throws exception:
Expected Output
dt-accessor should extract
days
value from timedelta-serieswithout any exceptions.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: 19.0.1
setuptools: 40.8.0
Cython: None
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: