Skip to content

Subclassing pandas.Index #15258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Jan 30, 2017 · 3 comments · Fixed by #52186
Closed

Subclassing pandas.Index #15258

shoyer opened this issue Jan 30, 2017 · 3 comments · Fixed by #52186
Labels
Enhancement Subclassing Subclassing pandas objects Testing pandas testing functions or related to the test suite

Comments

@shoyer
Copy link
Member

shoyer commented Jan 30, 2017

@spencerkclark is working on a custom pandas.Index subclass for xarray (see pydata/xarray#1084) like pandas.DatetimeIndex to handle arrays of netcdftime.datetime objects. This index is primarily intended for use with xarray, but ideally we'd like it to work in pandas Series and DataFrame objects, too.

The subclass will include implementations of at least get_loc, get_slice_bound and get_value (this one should probably be unnecessary, but it's needed for pandas.Series). To minimize fragility, it will not subclass DatetimeIndex but will instead copy some of the relevant code (thank you open source!).

Two questions for other pandas devs:

  • Is there any fundamental reason why a custom pandas.Index subclass won't work on a Series or DataFrame?
  • Does this seem like a reasonable thing to do, or we are setting ourselves up for suffering in the future? I'll update this issue when we have a concrete PR to look at.

At a bare minimum, we should probably add some tests to pandas to ensure that a basic subclass works.

@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

At a bare minimum, we should probably add some tests to pandas to ensure that a basic subclass works.

there are already quite a few subclasses of Indexes (internally). The API is not publicly exposed. I think it would take a bit of work to make it 'simpler'.

You have to define a fair bit of machinery (lots of methods) to make it work properly. Including construction, inference, equality, testing, and various indexing routines.

It is thus straightforward, but not trivial to sub-class. (remember IntervalIndex!)

Is there any fundamental reason why a custom pandas.Index subclass won't work on a Series or DataFrame?

It will work. though there may be some API leakage (IOW some methods are 'internal', others are 'public form the main pandas API).

Does this seem like a reasonable thing to do, or we are setting ourselves up for suffering in the future? I'll update this issue when we have a concrete PR to look at.

why do you think you need a custom Index?

@shoyer
Copy link
Member Author

shoyer commented Jan 30, 2017

It is thus straightforward, but not trivial to sub-class. (remember IntervalIndex!)

Indeed, I do :)

why do you think you need a custom Index?

The climate science community wants the convenient indexing of DatetimeIndex, but datetime64[ns] will suffice for them. Not only do they need to handle dates outside the range representable with ns resolution (prior to 1672), but they also use all sorts of funny calendar conventions, e.g., pretending leap years never exist, or that every month has exactly 30 days.

@jreback
Copy link
Contributor

jreback commented Jan 30, 2017

@shoyer ok in that case I would directly subclass DatetimeIndex or PeriodIndex. If you do this you get all kinds of things for free (e.g. resample, NaT, accessors, partial string indexing, etc). So would be much simpler.

As I said above, you might have some API leakage (IOW, we have a notion of pandas functions calling Index methods which are not 'public' per se). But nothing insurmountable.

So comes down do you need: points-in-time, or spans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Subclassing Subclassing pandas objects Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants