ENH: add NDArrayBackedExtensionArray to public API #56755

andrewgsavage · 2024-01-06T21:11:34Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Reviving #45544 as I'd like to use it for an UncertaintyArray.

…ndas-issue28

… into python-db-dtypes-pandas-issue28

…ndas-issue28

… into python-db-dtypes-pandas-issue28

…ndas-issue28

…andas into python-db-dtypes-pandas-issue28

…ndas-issue28

jbrockmendel · 2024-01-08T21:47:43Z

Can we make this "public" with some kind of documentation that we may adjust it in non-user-facing ways more aggressively than we do with regular APIs?

andrewgsavage · 2024-01-13T13:48:03Z

Can we make this "public" with some kind of documentation that we may adjust it in non-user-facing ways more aggressively than we do with regular APIs?

added a note to the class docstring

andrewgsavage · 2024-01-13T17:27:28Z

I'm unsure what to do about this error as to_numpy is an inherited method:

Error: /home/runner/work/pandas/pandas/pandas/core/arrays/base.py:542:EX01:pandas.api.extensions.NDArrayBackedExtensionArray.to_numpy:No examples section found

Also not sure what to do about

Error: /home/runner/work/pandas/pandas/pandas/core/arrays/datetimes.py:179:EX03:pandas.arrays.DatetimeArray:flake8 error: line 2, col 4: E121 continuation line under-indented for hanging indent

as I haven't changed that file.

MichaelTiemannOSC · 2024-01-22T02:54:33Z

Given that the Pandas team was finalizing the 2.2 release, you may have hit the docs system at a bad time. I've had that happen to me. Might make sense to try merging with latest and resubmitting.

doc/source/development/extending.rst

MichaelTiemannOSC · 2024-01-28T21:21:18Z

I tried jumping in to see if I could fix the missing toctree problem, but two things defeated me:

In the normal pandas build process (running doc/make.py --warnings-are-errors) a child process segfaulted on Mac OS 14.1.2 in a Python 3.11.4 enviroment
When I tried running doc/make.py --warnings-are-errors --single reference/api/pandas.api.extensions.NDArrayBackedExtensionArray.argmax.rst (a command that failed with the toctree warning), that single file built successfully.

I'm trying to see if I can isolate a reproducer for (1) above.

jbrockmendel · 2024-01-30T22:59:20Z

pandas/core/arrays/_mixins.py

+
+        Examples
+        --------
+        >>> arr = pd.array([4, 5])


this won't give a NDArrayBackedEA

The docstrings for the other methods are the EA docstrings, so I've used an example similar to those here. Is there a better way of going about this? I can't see how to write an example using a NDArrayBackedEA without several lines initialising an ExtensionDtype and NDArrayBackedEA

jbrockmendel · 2024-01-30T23:02:57Z

doc/source/development/extending.rst

+        def min(self, *, axis: Optional[int] = None, skipna: bool = True, **kwargs):
+            pandas.compat.numpy.function.validate_minnumpy_validate_min((), kwargs)
+            result = pandas.core.nanops.nanmin(
+                values=self._ndarray, axis=axis, mask=self.isna(), skipna=skipna


this is implicitly assuming that the ordering of self matches the ordering of self._ndarray

Is that an issue? If someone wants control over that they could use ExtensionArray instead of NDBackedExtensionArray.

In a conversation with @jorisvandenbossche two ideas were clarified for me:

In the case of ordering, examples include IP addresses (https://cyberpandas.readthedocs.io/en/latest/usage.html#pandas-integration), which can be ordered in a number of ways.

On the topic of composing multiple Extension Arrays (e.g. PintArray and UncertaintyArray), we might look at how the composition of PintArray and Arrow Arrays (https://arrow.apache.org/docs/python/pandas.html) might work, and then argue either how we can follow those patterns or why new patterns/assumptions are needed.

If I've misunderstood either the problem or proposed next steps, happy to edit the above to correct.

Regarding the ordering question, both NumPy and Pandas select min and max (and also sort) based only on the real component of complex numbers:

import pandas as pd xx = pd.Series([4+3j, 3+10j, 2+4j]) xx.min() # (2+4j) xx.max() # (4+3j) xx.map(lambda x: abs(x)) # 0 5.000000 # 1 10.440307 # 2 4.472136 xx.sort_values() # 2 2.0+ 4.0j # 1 3.0+10.0j # 0 4.0+ 3.0j # dtype: complex128

Thus it is up to us to decide what sorting behavior applies to uncertainties (I propose ordering based on magnitude only, not error terms, except when error is NaN, in which case the value is treated as NaN).

When we look at the composition question (2, above), we will look at having the subclasses deal with this entirely (meaning we can implement an EA of units which might also have uncertain values). And of course update the documentation...

simonjayhawkins · 2024-02-04T15:28:35Z

cc @jreback @jorisvandenbossche from original PR

github-actions · 2024-03-09T00:05:19Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2024-10-29T20:25:05Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

tswast and others added 30 commits January 21, 2022 17:16

ENH: add NDArrayBackedExtensionArray to public API

1f93779

add whatsnew

522b548

Merge branch 'main' into python-db-dtypes-pandas-issue28

ee4e23d

add NDArrayBackedExtensionArray to pandas.core.arrays.__init__

945f840

add tests for extensions api

721ae11

add docs

ae68f9d

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

05d0e08

…ndas-issue28

Merge remote-tracking branch 'origin/python-db-dtypes-pandas-issue28'…

1ad0338

… into python-db-dtypes-pandas-issue28

add autosummary for methods and attributes

38113c8

remove unreferenced methods from docs

18ec784

fix docstrings

2919f60

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

0c52366

…ndas-issue28

use doc decorator

319ac2b

add code samples and reference to test suite

8513863

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

5309895

…ndas-issue28

Merge branch 'main' into python-db-dtypes-pandas-issue28

827f483

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

2cd9b31

…ndas-issue28

add missing methods to extension docs

cc75eda

Merge remote-tracking branch 'origin/python-db-dtypes-pandas-issue28'…

ca323bb

… into python-db-dtypes-pandas-issue28

Merge branch 'main' into python-db-dtypes-pandas-issue28

bfd31f0

Merge branch 'main' into python-db-dtypes-pandas-issue28

396da54

Merge branch 'main' into python-db-dtypes-pandas-issue28

27cf80e

Merge branch 'main' into python-db-dtypes-pandas-issue28

c716826

Merge branch 'main' into python-db-dtypes-pandas-issue28

f4df0e9

clarify _validate_searchsorted_value and 2d backing array

8876b9a

Merge branch 'main' into python-db-dtypes-pandas-issue28

1bdd1cd

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

4b0a948

…ndas-issue28

Merge branch 'python-db-dtypes-pandas-issue28' of github.com:tswast/p…

5920778

…andas into python-db-dtypes-pandas-issue28

DOC: make insert docstring have single line summary

38018e6

Merge remote-tracking branch 'upstream/main' into python-db-dtypes-pa…

9277cf5

…ndas-issue28

andrewgsavage added 2 commits January 7, 2024 10:19

docstrings, extending docs

01191d1

whatsnew

ce4eeef

mroeschke requested a review from jbrockmendel January 8, 2024 21:18

andrewgsavage added 4 commits January 13, 2024 13:17

docstring

7019bc7

aggressive docstring

5f99f57

lint

47f8917

lint

4f8c055

andrewgsavage added 5 commits January 13, 2024 14:45

docstrings

1aaaa9a

to_numpy example

8a66d3a

to_numpy example

a8fe040

remove ndarray example

f2cbd4b

value_counts example

552f7a3

andrewgsavage mentioned this pull request Jan 21, 2024

ENH: add NDArrayBackedExtensionArray to public API #45544

Closed

4 tasks

jbrockmendel reviewed Jan 24, 2024

View reviewed changes

doc/source/development/extending.rst Outdated Show resolved Hide resolved

use base.extensiontests in exmaple

6ae423d

jbrockmendel reviewed Jan 30, 2024

View reviewed changes

simonjayhawkins added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 4, 2024

github-actions bot added the Stale label Mar 9, 2024

mroeschke closed this Oct 29, 2024

tswast mentioned this pull request Apr 18, 2025

chore: address issues identified in code review on pandas docs PR googleapis/python-db-dtypes-pandas#175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add NDArrayBackedExtensionArray to public API #56755

ENH: add NDArrayBackedExtensionArray to public API #56755

andrewgsavage commented Jan 6, 2024

jbrockmendel commented Jan 8, 2024

andrewgsavage commented Jan 13, 2024

andrewgsavage commented Jan 13, 2024

MichaelTiemannOSC commented Jan 22, 2024

MichaelTiemannOSC commented Jan 28, 2024

jbrockmendel Jan 30, 2024

andrewgsavage Feb 2, 2024

jbrockmendel Jan 30, 2024

andrewgsavage Feb 2, 2024

MichaelTiemannOSC Feb 7, 2024

MichaelTiemannOSC Mar 27, 2024 •

edited

Loading

simonjayhawkins commented Feb 4, 2024

github-actions bot commented Mar 9, 2024

mroeschke commented Oct 29, 2024

ENH: add NDArrayBackedExtensionArray to public API #56755

ENH: add NDArrayBackedExtensionArray to public API #56755

Conversation

andrewgsavage commented Jan 6, 2024

jbrockmendel commented Jan 8, 2024

andrewgsavage commented Jan 13, 2024

andrewgsavage commented Jan 13, 2024

MichaelTiemannOSC commented Jan 22, 2024

MichaelTiemannOSC commented Jan 28, 2024

jbrockmendel Jan 30, 2024

Choose a reason for hiding this comment

andrewgsavage Feb 2, 2024

Choose a reason for hiding this comment

jbrockmendel Jan 30, 2024

Choose a reason for hiding this comment

andrewgsavage Feb 2, 2024

Choose a reason for hiding this comment

MichaelTiemannOSC Feb 7, 2024

Choose a reason for hiding this comment

MichaelTiemannOSC Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

simonjayhawkins commented Feb 4, 2024

github-actions bot commented Mar 9, 2024

mroeschke commented Oct 29, 2024

MichaelTiemannOSC Mar 27, 2024 •

edited

Loading