-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistent behavior of DateOffset #47953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
While this is a bug, I would say that the second test actually gives the correct results (because you're adding one month at a time, when you add 1 month to 30-Jan-2020, you get 29-Feb-2020, and next year it'd be 28-Feb-2021). |
I had another look into this, looks like the problem is inside cython code. Applying offset to Summary: When applying multiple offsets to DatetimeArray, Cython tries to accumulate months/weeks/days first and then applies one single offset. But when offset is applied to a Timestamp object, the offsets are applied iteratively, which is more accurate when providing multiple offsets. Solution: changes to offsets.pyx along with test cases. Would like to know your views @jreback, @jbrockmendel |
hi, I would say that the correct result should be 2021-06-30, since it is coherent with datatime relativedelta. |
hi I would add that the dataframe gives 2021-06-30 if it has 1 row and 2021-06-28 in case of 2 identical rows... there are several bugs here, however imho the right result should be 2021-06-30 (coherent with relativedelta). "2 * 1 month" intuitively should be equal to "2 months", no? |
It all comes down to how one interprets multiplying a offset/relativedelta by a scalar. I'll try to show by examples: import pandas as pd
from dateutil.relativedelta import relativedelta
base_date = pd.Timestamp('2020-01-30')
offset_1month = pd.DateOffset(months=1)
offset_2month = pd.DateOffset(months=2)
delta_1month = relativedelta(months=1)
delta_2month = relativedelta(months=2)
base_date + 2 * offset_1month
>>> Timestamp('2020-03-29 00:00:00')
base_date + offset_1month + offset_1month
>>> Timestamp('2020-03-29 00:00:00')
base_date + offset_2month
>>> Timestamp('2020-03-30 00:00:00')
base_date + 2 * delta_1month
>>> Timestamp('2020-03-30 00:00:00')
base_date + delta_1month + delta_1month
>>> Timestamp('2020-03-29 00:00:00')
base_date + delta_2month
>>>Timestamp('2020-03-30 00:00:00') As you can see, multiplying I think we need to first decide what we want to do when a |
The difference arises due to the fact that in the |
take |
Other similar instances of this operation in other related classes to |
great. please consider also this strange behavior: 1 row df differs from 2 rows df, even if these rows are equal
|
I expect this is bc of a fastpath in _addsub_object_array. id be OK with removing that fastpath. |
for RelativeDeltaOffset._apply let's just multiply _offset by n instead of the loop. Will that be consistent with apply_array? If we can share anyhting with apply_array that'd be ideal |
this PR looked pretty close, if anyone wants to take it forward #50542 |
take |
@MarcoGorelli this is my first open source contribution. Is it okay to follow the same steps as in #50542 and resolve the comments in my branch? |
yeah probably |
…#53681) * BUG: Fixed inconsistent multiplication pandas-dev#47953 * Fixed release note * Fixed attribute name * Changed offset apply logic * Addressed pandas-dev#46877 re-occurence * Removed dulicate code * Addressed comments * Added unit test cases * Added mistakenly removed comment
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
It seems like multiplying a DateOffset by constant leads to an inconsistent behavior: the first test gives 30/6 while the second 28/6
Expected Behavior
In any case it should be 2021-06-30
Installed Versions
INSTALLED VERSIONS
commit : 66e3805
python : 3.10.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 126 Stepping 5, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Italian_Italy.1252
pandas : 1.3.5
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.1.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.5.1
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: