Skip to content

REF: move sharable methods to ExtensionIndex #30717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 9, 2020

Conversation

jbrockmendel
Copy link
Member

No description provided.

@WillAyd WillAyd added the Clean label Jan 6, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@TomAugspurger
Copy link
Contributor

Is the goal to provide an index class that can work with arbitrary EAs? If so, I think that ExtensionIndex should only be using methods and attributes defined on ExtensionArray. In particular, I don't think that _ndarray_values is part of the interface.

@jreback jreback added this to the 1.0 milestone Jan 6, 2020
@jbrockmendel
Copy link
Member Author

I don't think that _ndarray_values is part of the interface.

I guess this is ambiguous. the base EA class does have _ndarray_values, but the docstring says "This method is not part of the pandas interface"

@TomAugspurger
Copy link
Contributor

Mmm, that's a bit tricky then. Perhaps this will help us pin down some of the required semantics on _ndarray_values.

Longer-term, what are you plans for ExtensionIndex? Will it be a base class that 3rd parties can inherit from and customize?

@jbrockmendel
Copy link
Member Author

Longer-term, what are you plans for ExtensionIndex? Will it be a base class that 3rd parties can inherit from and customize?

That's an option. Ideally I'd like to keep the customization inside the EAs. ATM I'm working on smoothing out the small differences between e.g. DatetimeIndex.searchsorted vs DatetimeArray.searchsorted so we can both delegate more and improve internal consistency.

@TomAugspurger
Copy link
Contributor

Sounds good. Happy to continue finding common methods and develop on interface out of that.

Can you check the ASVs for our EA-backed indexes on this branch?

@jbrockmendel
Copy link
Member Author

It looks like IntervalIndexing.time_getitem_list and time_loc_list are significantly slower, will look into this

@jbrockmendel
Copy link
Member Author

Looks like the slowdown was caused by using IntervalIndex.__new__ instead of IntervalIndex._shallow_copy. ATM there is a problem with DTA/TDA _shallow_copy that prevent us from using that in base class, but ill be pushing a fix for that before long

@jreback
Copy link
Contributor

jreback commented Jan 9, 2020

can you rebase

@jreback jreback merged commit 09bd172 into pandas-dev:master Jan 9, 2020
@jreback
Copy link
Contributor

jreback commented Jan 9, 2020

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the ref-ei branch January 9, 2020 16:17

if self.hasnans:
return self._shallow_copy(self._data[~self._isnan])
return self._shallow_copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you overwriting the base Index one?

Also, this dropped the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the base class uses ._values, where we want ._data here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But _values and _data is the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess. Past-me must have thought it not-obvious that this would always hold. If it can be removed, go for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also see my docstring comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did. In this case I think removing the method makes sense. More generally I wonder if we can use a metaclass or something to automatically inherit docstrings and remove a lot of boilerplate (cc @bashtage IIRC you do something like this in arch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open a new issue to see if we can do this smarter?

But for 1.0.0, I would just add back the docstring

def __getitem__(self, key):
result = self._data[key]
if isinstance(result, type(self._data)):
return type(self)(result, name=self.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a faster constructor (simple_new ?) when we just want to wrap the correct type of ExtensionArray in the index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'd work. IIRC there were some corner cases involving CategoricalIndex.dtype, not sure if those are relevant here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants