Skip to content

BUG: pandas.Index cannot be subclassed (despite saying it can be in the docs) #37882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
acturner opened this issue Nov 16, 2020 · 6 comments · Fixed by #52186
Closed
2 of 3 tasks

BUG: pandas.Index cannot be subclassed (despite saying it can be in the docs) #37882

acturner opened this issue Nov 16, 2020 · 6 comments · Fixed by #52186
Labels
Docs Index Related to the Index class or subclasses Subclassing Subclassing pandas objects

Comments

@acturner
Copy link

acturner commented Nov 16, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas


class CustomIndex(pandas.Index):

    def __new__(cls, data, **kwargs):
        return super().__new__(cls, data, **kwargs).view(cls)

    def __init__(self, data, **kwargs):
        super().__init__(self, data, **kwargs)


if __name__ == '__main__':
    index = CustomIndex([1,2,3])
    print(type(index), index)

The result is: <class 'pandas.core.indexes.numeric.Int64Index'> Int64Index([1, 2, 3], dtype='int64')

Problem description

In the pandas docs: "The motivation for having an Index class in the first place was to enable different implementations of indexing. This means that it’s possible for you, the user, to implement a custom Index subclass that may be better suited to a particular application than the ones provided in pandas."

Expected Output

<class 'CustomIndex'> CustomIndex([1, 2, 3], dtype='int64')

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.1.0
Version          : Darwin Kernel Version 20.1.0: Sat Oct 31 00:07:11 PDT 2020; root:xnu-7195.50.7~2/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.4
numpy            : 1.19.4
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 50.3.2
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 2.0.0
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None
@acturner acturner added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 16, 2020
@ivanovmg
Copy link
Member

I confirm the issue on master.

Ever overriding __repr__ and methods does not make a difference.

from pandas.core.indexes.base import Index


class CustomIndex(Index):
    def __repr__(self):
        return "CustomIndex"

    def take(self, indices):
        return 777


if __name__ == '__main__':
    index = CustomIndex([1,2,3])
    print(index)
    print(type(index))
    print(index.take([2]))

Output:

Int64Index([1, 2, 3], dtype='int64')
<class 'pandas.core.indexes.numeric.Int64Index'>
Int64Index([3], dtype='int64')

@jreback
Copy link
Contributor

jreback commented Nov 16, 2020

Index is not trivially subclasses and needs a fair amount of hooks

welcome to have a fully working example but you would have to dive into the current implementation

not sure subclassing makes a whole lot of sense generally

@acturner
Copy link
Author

@jreback Would love an example of how to do this if possible!

@jreback
Copy link
Contributor

jreback commented Nov 16, 2020

well someone would need to write the docs after understanding how it could work and of course out testing inplace for doing this arbitrarily

so welcome pull requests

@jorisvandenbossche
Copy link
Member

The xarray project actually has an Index subclass (see https://github.com/pydata/xarray/blob/23dc2fc9f2785c348ff821bf2da61dfa2206d283/xarray/coding/cftimeindex.py), which I think is actively being used

Now, the main issue that is shown above by the small examples is that you can't subclass Index without overriding the __new__ method. That's because the base class Index constructor will infer which Index subclass to use based on the passed data, so if you want to have your subclass type being returned, you need to implement __new__ yourself

@acturner
Copy link
Author

@jorisvandenbossche Thanks this makes sense!

@jorisvandenbossche jorisvandenbossche added the Subclassing Subclassing pandas objects label Dec 7, 2020
@mroeschke mroeschke added Docs Index Related to the Index class or subclasses and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Index Related to the Index class or subclasses Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants