Skip to content

BUG: avoid conversion to object dtype in to_numpy for Arrow dtype and with dtype-compatible fill value #54808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Aug 28, 2023 · 1 comment · Fixed by #54843
Labels
Arrow pyarrow functionality

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 28, 2023

Using the current main branch (2.1.0rc0+115.gca429943c7), if we convert an Arrow-backed boolean array to numpy, we get object dtype if there are missing values:

# no missing values -> bool dtype
>>> ser = pd.Series([True, False, True], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([ True, False,  True])

# missing values -> object dtype
>>> ser = pd.Series([True, False, None], dtype=pd.ArrowDtype(pa.bool_()))
>>> ser.to_numpy()
array([True, False, <NA>], dtype=object)

This follows pyarrow's behaviour of converting to a numpy array (and I think also makes sense, unless we might want to raise an error instead, but that's for another discussion).

But when a user specifies an na_value to avoid this issue of missing values with numpy, we still return an object dtype:

>>> ser.to_numpy(na_value=False)
array([True, False, False], dtype=object)

while if the specified na_value is valid for the default target bool dtype (as in the example above), we could perfectly well return a proper bool dtype array automatically (currently you would have to be explicit: to_numpy(np.bool_, na_value=False)).

@jorisvandenbossche jorisvandenbossche added the Arrow pyarrow functionality label Aug 28, 2023
@mroeschke
Copy link
Member

cc @lukemanley IIRC you spent some time optimizing ArrowExtensionArray.to_numpy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants