Skip to content

BUG: In master, math with periodindex + index of offset produces wrong result #50162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
Dr-Irv opened this issue Dec 9, 2022 · 4 comments
Open
3 tasks done
Labels
Bug Index Related to the Index class or subclasses Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 9, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
p = pd.Period("2012-1-1", freq="D")
as_period_index = pd.period_range("2012-1-1", periods=10, freq="D")
offset_index = p - as_period_index
print(offset_index)
result = p + offset_index
print(result)

Issue Description

In pandas 1.5.2, here is the result:

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
PeriodIndex(['2012-01-01', '2011-12-31', '2011-12-30', '2011-12-29',
             '2011-12-28', '2011-12-27', '2011-12-26', '2011-12-25',
             '2011-12-24', '2011-12-23'],
            dtype='period[D]')

In master version 2.0.0.dev0+881.gcab77e3d6c

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
Index([2012-01-01, 2011-12-31, 2011-12-30, 2011-12-29, 2011-12-28, 2011-12-27,
       2011-12-26, 2011-12-25, 2011-12-24, 2011-12-23],
      dtype='object')

So the result used to be PeriodIndex, now it is Index. This was picked up in our tests for pandas-stubs

Expected Behavior

Should return PeriodIndex when you add offsets

Installed Versions

INSTALLED VERSIONS

commit : cab77e3
python : 3.8.15.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.0.0.dev0+881.gcab77e3d6c
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.4.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.60.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.7.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.9
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.44
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None

@Dr-Irv Dr-Irv added Bug Period Period data type Index Related to the Index class or subclasses Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 9, 2022
@ramvikrams
Copy link
Contributor

If we change the pandas/_libs/_tslibs/period.pyi and add the def __add__(self, other: Index) -> PeriodIndex: ... the result does not changes, so if not here where should the change ne done

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 13, 2022

If we change the pandas/_libs/_tslibs/period.pyi and add the def __add__(self, other: Index) -> PeriodIndex: ... the result does not changes, so if not here where should the change ne done

The error is in the pyx or py files, not the pyi files.

@jbrockmendel
Copy link
Member

This is probably the result of #49999, specifically moving away from doing type inference on object-dtype results. Use result.infer_objects(copy=False) to get the old behavior

(the reversed op offset_index + p raises a TypeError that looks very wrong to me)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 14, 2022

I find this change to be inconsistent. Aside from the TypeError that should be investigated, this behavior seems inconsistent:

p = pd.Period("2012-1-1", freq="D")
as_period_index = pd.period_range("2012-1-1", periods=10, freq="D")
offset_index = p - as_period_index
print(offset_index)
result = p + offset_index
print(result)
iresult = pd.Index([p]*10) + offset_index
print(iresult)

This produces output:

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
Index([2012-01-01, 2011-12-31, 2011-12-30, 2011-12-29, 2011-12-28, 2011-12-27,
       2011-12-26, 2011-12-25, 2011-12-24, 2011-12-23],
      dtype='object')
PeriodIndex(['2012-01-01', '2011-12-31', '2011-12-30', '2011-12-29',
             '2011-12-28', '2011-12-27', '2011-12-26', '2011-12-25',
             '2011-12-24', '2011-12-23'],
            dtype='period[D]')

In the first addition, you are adding a Period plus an index of offsets, the latter has type object, and you get an Index.
In the second addition, you are adding a PeriodIndex plus an index of offsets, the latter has type object and you get a PeriodIndex.

So that PR documents the change as:

Changed behavior of :class:Index, :class:Series, and :class:DataFrame arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operation

That's not really correct, because in the second addition I did, you added one Index that had a dtype of Period and a second index that had a dtype of object, and it kept the dtype of Period (as I would expect). So it seems that if you do arithmetic on a single object that has a type, then it doesn't do the inference. This could get really confusing.

Why did you make that change?

@mroeschke mroeschke added Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type
Projects
None yet
Development

No branches or pull requests

4 participants