-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: add NDArrayBackedExtensionArray to public API #56755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… into python-db-dtypes-pandas-issue28
… into python-db-dtypes-pandas-issue28
…andas into python-db-dtypes-pandas-issue28
Can we make this "public" with some kind of documentation that we may adjust it in non-user-facing ways more aggressively than we do with regular APIs? |
added a note to the class docstring |
I'm unsure what to do about this error as
Also not sure what to do about
as I haven't changed that file. |
Given that the Pandas team was finalizing the 2.2 release, you may have hit the docs system at a bad time. I've had that happen to me. Might make sense to try merging with latest and resubmitting. |
I tried jumping in to see if I could fix the missing toctree problem, but two things defeated me:
I'm trying to see if I can isolate a reproducer for (1) above. |
|
||
Examples | ||
-------- | ||
>>> arr = pd.array([4, 5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't give a NDArrayBackedEA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstrings for the other methods are the EA docstrings, so I've used an example similar to those here. Is there a better way of going about this? I can't see how to write an example using a NDArrayBackedEA without several lines initialising an ExtensionDtype and NDArrayBackedEA
def min(self, *, axis: Optional[int] = None, skipna: bool = True, **kwargs): | ||
pandas.compat.numpy.function.validate_minnumpy_validate_min((), kwargs) | ||
result = pandas.core.nanops.nanmin( | ||
values=self._ndarray, axis=axis, mask=self.isna(), skipna=skipna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is implicitly assuming that the ordering of self matches the ordering of self._ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that an issue? If someone wants control over that they could use ExtensionArray instead of NDBackedExtensionArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a conversation with @jorisvandenbossche two ideas were clarified for me:
- In the case of ordering, examples include IP addresses (https://cyberpandas.readthedocs.io/en/latest/usage.html#pandas-integration), which can be ordered in a number of ways.
- On the topic of composing multiple Extension Arrays (e.g. PintArray and UncertaintyArray), we might look at how the composition of PintArray and Arrow Arrays (https://arrow.apache.org/docs/python/pandas.html) might work, and then argue either how we can follow those patterns or why new patterns/assumptions are needed.
If I've misunderstood either the problem or proposed next steps, happy to edit the above to correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the ordering question, both NumPy and Pandas select min and max (and also sort) based only on the real component of complex numbers:
import pandas as pd
xx = pd.Series([4+3j, 3+10j, 2+4j])
xx.min()
# (2+4j)
xx.max()
# (4+3j)
xx.map(lambda x: abs(x))
# 0 5.000000
# 1 10.440307
# 2 4.472136
xx.sort_values()
# 2 2.0+ 4.0j
# 1 3.0+10.0j
# 0 4.0+ 3.0j
# dtype: complex128
Thus it is up to us to decide what sorting behavior applies to uncertainties (I propose ordering based on magnitude only, not error terms, except when error is NaN, in which case the value is treated as NaN).
When we look at the composition question (2, above), we will look at having the subclasses deal with this entirely (meaning we can implement an EA of units which might also have uncertain values). And of course update the documentation...
cc @jreback @jorisvandenbossche from original PR |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Reviving #45544 as I'd like to use it for an UncertaintyArray.