Skip to content

REG: quantile with IntegerArray/FloatingArray #41428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 31, 2021

Conversation

jbrockmendel
Copy link
Member

Not present in any released version, so no release note needed as long as before 1.3

return _quantile_ea_compat(values, qs, interpolation)
try:
out = _quantile_ea_compat(values, qs, interpolation)
except AbstractMethodError:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche catching AbstractMethodError to determine if we need to use the fallback implementation isn't my favorite, but all the other options i can think of involve adding methods/attributes to EA. thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is throwing this? can you test something about the EA itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c, you need to test if _from_factorized is implemented. then can you do that explicty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c, you need to test if _from_factorized is implemented. then can you do that explicty?

i think to make that work we'd need to put quantile as an EA method (which id be OK with)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned this before, but I would personally simply special case our own internal EAs. And then we have two cases: the ones backed by an ndarray (NDArrayBackedExtensionArray where you can access _ndarray), and the masked array ones (BaseMaskedArray where you can access _data and _mask).

(and thus not use _from_factorize at all here, as it was not meant for this use case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we have two cases

what about Interval (currently xfailed) and 3rd party?

but I would personally simply special case our own internal EAs

any objection to making a EA.quantile method with a reasonable default implementation? That lets us implement optimized NDArrayBackedExtensionArray and BaseMaskedArray implementations without special-casing internal EAs, which we usually try to avoid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly discuss EA.quantile, but some of my concerns: it's not a general method (only relevant for some types, other reductions are also not as methods but through EA_reduce, and there is also no ndarray method equivalent as reason to add it.

(so therefore I think the easier short-term action is to special case it here. And it's not that this code will be lost, it's just a different location, because if we add the method, the code here can be split and moved to the different method overrides)

So personally I am not that fond on adding EA.quantile in itself, but it's certainly true that we still need to devise an interface that can be used for 3rd party EAs as well.

what about Interval (currently xfailed)

What would be the result for Interval? That's not generally a numeric dtype (we don't support other reductions like mean or median for interval)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but some of my concerns: it's not a general method (only relevant for some types, other reductions are also not as methods but through EA_reduce, and there is also no ndarray method equivalent as reason to add it.

Would you be happier with this if it were EA._quantile?

What would be the result for Interval? That's not generally a numeric dtype (we don't support other reductions like mean or median for interval)

quantile (and median) only depend on ordering, which Interval does implement. So in principle nothing stops us from supporting it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so i guess let's do what joris is suggesting and then create an issue about adding EA.quantile (which i am +1)

return _quantile_ea_compat(values, qs, interpolation)
try:
out = _quantile_ea_compat(values, qs, interpolation)
except AbstractMethodError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is throwing this? can you test something about the EA itself?

return _quantile_ea_compat(values, qs, interpolation)
try:
out = _quantile_ea_compat(values, qs, interpolation)
except AbstractMethodError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i c, you need to test if _from_factorized is implemented. then can you do that explicty?

@jbrockmendel jbrockmendel added quantile quantile method NA - MaskedArrays Related to pd.NA and nullable extension arrays labels May 14, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone May 25, 2021
@jreback jreback removed this from the 1.3 milestone May 31, 2021
@jreback
Copy link
Contributor

jreback commented May 31, 2021

let's move this off the milestone for now

@jbrockmendel
Copy link
Member Author

updated as suggested; since its a regression we should try to get this in for the RC

@jreback jreback added this to the 1.3 milestone May 31, 2021
@jreback
Copy link
Contributor

jreback commented May 31, 2021

lgtm. needs a whatsnew note?

@jbrockmendel
Copy link
Member Author

needs a whatsnew note?

no, the regression didnt make it into any released version

@jreback jreback merged commit 9004414 into pandas-dev:master May 31, 2021
@jreback
Copy link
Contributor

jreback commented May 31, 2021

cool, do we have an issue for discussion about this (eg. the api enhancement / consolidation of the ea_compat quantile methods) ? if not can you create one

@jbrockmendel jbrockmendel deleted the bug-masked-quantile branch May 31, 2021 18:57
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays quantile quantile method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: fix quantile for nullable integer/float
4 participants