API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

MarcoGorelli · 2022-09-30T16:37:30Z

Currently, to_numpy() from a nullable type always returns dtype object, regardless of whether the Series contains missing values.
E.g.

In [3]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[3]: array([1, 2, 3], dtype=object)

This isn't great for users/downstream libraries. E.g. matplotlib/matplotlib#23991 (comment)

Users can specify a dtype in to_numpy(), but if they want to mirror the current behaviour of pandas whereby one has:

In [4]: pd.Series([1,2,3], dtype='int64').to_numpy()
Out[4]: array([1, 2, 3])

In [5]: pd.Series([1,2,3], dtype='float64').to_numpy()
Out[5]: array([1., 2., 3.])

then they would need to add extra logic to their code.

I'd suggest that instead, to_numpy() just always convert to the corresponding numpy type. Then:

if a Series has no NA, then to_numpy() will just convert to the corresponding numpy type:

In [4]: pd.Series([1,2,3], dtype='Int64').to_numpy()
Out[4]: array([1, 2, 3])

In [5]: pd.Series([1,2,3], dtype='Float64').to_numpy()
Out[5]: array([1., 2., 3.])

if a Series does have NA, then unless it's being converted to float (which can handle nan), it'll raise:

In [8]: pd.Series([1,2,pd.NA], dtype='Int64').to_numpy(int)
---------------------------------------------------------------------------
ValueError: cannot convert to '<class 'int'>'-dtype NumPy array with missing values.
Please either:
- convert to 'float'
- convert to 'object'
- specify an appropriate 'na_value' for this dtype

This would achieve:

better experience for users
continue to avoid value-dependent behavior where the metadata (shape, dtype, etc.) depend on the values of the array

The text was updated successfully, but these errors were encountered:

phofl · 2022-09-30T18:33:13Z

Could you include how you would handle df.values/ser.values?

MarcoGorelli · 2022-09-30T18:34:49Z

I don't think they'd need to change

mroeschke · 2022-09-30T21:24:28Z

specify an appropriate 'na_value' for this dtype

Is an "appropriate" na_value include when the na_value could be cast to the dtype losslessly e.g. dtype="Int64" & na_value=1.0 -> np.int64?

jreback · 2022-10-01T00:26:20Z

this looks reasonable to do
it's basically best efforts conversion

tacaswell · 2022-10-02T22:42:12Z

What would

pd.Series([1,2,pd.NA], dtype='Int64').to_numpy()

do? I can see a case both for "promote to float" and "raise".

MarcoGorelli · 2022-10-03T07:08:29Z

Thanks for weighing in - I'm suggesting it'd raise. Because numpy doesn't have a missing value for int64, so it'd be up to the user to be explicit about what they want to happen

MarcoGorelli · 2022-12-09T19:14:45Z

Numpy is going to error on equality comparisons between object ndarrays which contain pd.NA, so this is kinda important to solve before too long

In [1]: np.array([1, 3]) == np.array([4, pd.NA])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 np.array([1, 3]) == np.array([4, pd.NA])

File ~/pandas-dev/pandas/_libs/missing.pyx:413, in pandas._libs.missing.NAType.__bool__()
    411 
    412     def __bool__(self):
--> 413         raise TypeError("boolean value of NA is ambiguous")
    414 
    415     def __hash__(self):

TypeError: boolean value of NA is ambiguous

lukemanley · 2023-09-02T02:18:42Z

+1 for avoiding object dtype where possible, e.g. at least in these cases:

In [2]: pd.array([1, 2], dtype="Int64").to_numpy().dtype
Out[2]: dtype('O')

In [3]: pd.array([1, pd.NA], dtype="Int64").to_numpy(na_value=2).dtype
Out[3]: dtype('O')

ArrowExtensionArray.to_numpy now has this behavior:

In [4]: pd.array([1, 2], dtype="int64[pyarrow]").to_numpy().dtype
Out[4]: dtype('int64')

In [5]: pd.array([1, pd.NA], dtype="int64[pyarrow]").to_numpy(na_value=2).dtype
Out[5]: dtype('int64')

MarcoGorelli added API Design NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 30, 2022

MarcoGorelli mentioned this issue Sep 30, 2022

[Bug]: plt.pcolormesh crashes when called with Int64 nullable dtype matplotlib/matplotlib#23991

Open

MarcoGorelli mentioned this issue Oct 4, 2022

API: to_numpy() defaults to corresponding type #48937

Closed

5 tasks

phofl mentioned this issue Oct 5, 2022

BUG: 'Series.to_numpy(dtype=, na_value=)' behaves differently with 'pd.NA' and 'np.nan' #48951

Closed

3 tasks

MarcoGorelli mentioned this issue Oct 5, 2022

ExtensionDtype API for preferred type when converting to NumPy array #22791

Open

MarcoGorelli mentioned this issue Dec 9, 2022

CI: PY311 failures #50124

Closed

dennisbrookner mentioned this issue Feb 17, 2023

seaborn 0.12.x breaks plotting functionality rs-station/rs-booster#36

Closed

This was referenced Feb 17, 2023

confusion_matrix: Remove nullable type handling when possible alteryx/evalml#4020

Closed

roc_curve: remove nullable type handling when possible alteryx/evalml#4021

Closed

MichaelTiemannOSC mentioned this issue Jun 30, 2023

PintArray.astype does not use the same conventions as pandas for float and int dtypes hgrecco/pint-pandas#183

Closed

lukemanley mentioned this issue Sep 30, 2023

Convert to compatible NumPy dtype for MaskedArray to_numpy #55058

Merged

7 tasks

mroeschke closed this as completed in #55058 Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

MarcoGorelli commented Sep 30, 2022

phofl commented Sep 30, 2022 •

edited

Loading

MarcoGorelli commented Sep 30, 2022

mroeschke commented Sep 30, 2022

jreback commented Oct 1, 2022

tacaswell commented Oct 2, 2022

MarcoGorelli commented Oct 3, 2022

MarcoGorelli commented Dec 9, 2022 •

edited

Loading

lukemanley commented Sep 2, 2023

API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

API: should to_numpy() by default return the corresponding type, and raise otherwise? #48891

Comments

MarcoGorelli commented Sep 30, 2022

phofl commented Sep 30, 2022 • edited Loading

MarcoGorelli commented Sep 30, 2022

mroeschke commented Sep 30, 2022

jreback commented Oct 1, 2022

tacaswell commented Oct 2, 2022

MarcoGorelli commented Oct 3, 2022

MarcoGorelli commented Dec 9, 2022 • edited Loading

lukemanley commented Sep 2, 2023

phofl commented Sep 30, 2022 •

edited

Loading

MarcoGorelli commented Dec 9, 2022 •

edited

Loading