Skip to content

BUG: Unable to create DateOffset from two dates #41847

Closed
@ghost

Description

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

I'm trying to replicate the behaviour of dateutil.relativedelta.relativedelta using pandas DateOffset.
With regular Python + dateutil, I would do this:

In [1]: from datetime import date
In [2]: from dateutil.relativedelta import relativedelta

In [3]: date_of_birth = date(1992, 1, 1)
In [4]: today = date(2021, 6, 7)
In [5]: relativedelta(today, date_of_birth)
Out[5]: relativedelta(years=+29, months=+5, days=+6)

However, when I try to do the same with DateOffset, I get an error:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'date_of_birth': pd.to_datetime(['1992-01-01', '1994-03-18']), 'today': date.today()})
In [3]: df
Out[3]:
  date_of_birth       today
0    1992-01-01  2021-06-07
1    1994-03-18  2021-06-07

In [4]: pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()

c:\users\dwales\.local\pipx\venvs\jupyterlab\lib\site-packages\pandas\core\series.py in wrapper(self)
    140             return converter(self.iloc[0])
--> 141         raise TypeError(f"cannot convert the series to {converter}")
    142

TypeError: cannot convert the series to <class 'int'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-36-429217dd8eab> in <module>
----> 1 pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])

pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.RelativeDeltaOffset.__init__()

pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset.__init__()

pandas\_libs\tslibs\offsets.pyx in pandas._libs.tslibs.offsets.BaseOffset._validate_n()

TypeError: `n` argument must be an integer, got <class 'pandas.core.series.Series'>

Problem description

This is confusing to me, because the docs for DateOffset say:

Works exactly like relativedelta in terms of the keyword args you pass in

I'm guessing the issue is that relativedelta has two distinct modes of operation.
The first mode expects two datetime or date instances as positional arguments, and returns a relativedelta between them.
The second mode expects relative years, months, days, etc as keyword arguments and allows you to manually construct a required relativedelta.

So far as I can tell, DateOffset operates similarly to the second (keyword argument) mode of relativedelta. However, I can't find anything in pandas which operates like the first (positional argument) mode.

Expected Output

In [3]: df
Out[3]:
  date_of_birth       today
0    1992-01-01  2021-06-07
1    1994-03-18  2021-06-07

In [4]: pd.tseries.offsets.DateOffset(df['today'], df['date_of_birth'])
Out[4]:
0     <DateOffset: days=6, months=5, years=29>
1    <DateOffset: days=20, months=2, years=27>
dtype: object

Workaround

In [5]: df.apply(lambda df: relativedelta(df['today'], df['date_of_birth']), axis=1)
Out[5]:
0     relativedelta(years=+29, months=+5, days=+6)
1    relativedelta(years=+27, months=+2, days=+20)
dtype: object

If you have NaT values in your data, you will need to handle them:

df.apply(lambda df: relativedelta(df['today'], df['date_of_birth'])  if (pd.notna(df['today']) & pd.notna(df['date_of_birth'])) else pd.NaT, axis=1)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Australia.1252

pandas : 1.2.4
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.4.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions