
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
I'm trying to replicate the behaviour of dateutil.relativedelta.relativedelta
using pandas DateOffset
.
With regular Python + dateutil, I would do this:
In [1]: from datetime import date
In [2]: from dateutil.relativedelta import relativedelta
In [3]: date_of_birth = date(1992, 1, 1)
In [4]: today = date(2021, 6, 7)
In [5]: relativedelta(today, date_of_birth)
Out[5]: relativedelta(years=+29, months=+5, days=+6)
However, when I try to do the same with DateOffset
, I get an error:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'date_of_birth': pd.to_datetime(['1992-01-01', '1994-03-18']), 'today': date.today()})
In [3]: df
Out[3]:
date_of_birth today
0 1992-01-01 2021-06-07
1 1994-03-18 2021-06-07
In [4]: pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()
c:\users\dwales\.local\pipx\venvs\jupyterlab\lib\site-packages\pandas\core\series.py in wrapper(self)
140 return converter(self.iloc[0])
--> 141 raise TypeError(f"cannot convert the series to {converter}")
142
TypeError: cannot convert the series to <class 'int'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-36-429217dd8eab> in <module>
----> 1 pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])
pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.RelativeDeltaOffset.__init__()
pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.__init__()
pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()
TypeError: `n` argument must be an integer, got <class 'pandas.core.series.Series'>
Problem description
This is confusing to me, because the docs for DateOffset
say:
Works exactly like relativedelta in terms of the keyword args you pass in
I'm guessing the issue is that relativedelta
has two distinct modes of operation.
The first mode expects two datetime or date instances as positional arguments, and returns a relativedelta
between them.
The second mode expects relative years, months, days, etc as keyword arguments and allows you to manually construct a required relativedelta
.
So far as I can tell, DateOffset
operates similarly to the second (keyword argument) mode of relativedelta
. However, I can't find anything in pandas which operates like the first (positional argument) mode.
Expected Output
In [3]: df
Out[3]:
date_of_birth today
0 1992-01-01 2021-06-07
1 1994-03-18 2021-06-07
In [4]: pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])
Out[4]:
0 <DateOffset: days=6, months=5, years=29>
1 <DateOffset: days=20, months=2, years=27>
dtype: object
Workaround
In [5]: df.apply(lambda df: relativedelta(df['today'], df['date_of_birth']), axis=1)
Out[5]:
0 relativedelta(years=+29, months=+5, days=+6)
1 relativedelta(years=+27, months=+2, days=+20)
dtype: object
If you have NaT
values in your data, you will need to handle them:
df.apply(lambda df: relativedelta(df['today'], df['date_of_birth']) if (pd.notna(df['today']) & pd.notna(df['date_of_birth'])) else pd.NaT, axis=1)
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.9.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Australia.1252
pandas : 1.2.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None