Skip to content

Convert ArrowExtensionArray to proper NumPy dtype #56290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 21, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Dec 1, 2023

As discussed on the other pr

@phofl phofl requested a review from mroeschke December 1, 2023 23:32
@phofl phofl added the Arrow pyarrow functionality label Dec 1, 2023
@phofl phofl added this to the 2.2 milestone Dec 1, 2023
from pandas.core.dtypes.common import is_numeric_dtype


def _to_numpy_dtype_inference(arr, dtype, na_value, hasna):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you type this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -58,6 +58,7 @@
"_iLocIndexer",
# TODO(3.0): GH#55043 - remove upon removal of ArrayManager
"_get_option",
"_to_numpy_dtype_inference",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the original method is in core, probably can remove the leading _

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1501,8 +1511,8 @@ def test_to_numpy_int_with_na():
data = [1, None]
arr = pd.array(data, dtype="int64[pyarrow]")
result = arr.to_numpy()
expected = np.array([1, pd.NA], dtype=object)
assert isinstance(result[0], int)
expected = np.array([1, np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that ArrowExentionArray.to_numpy is now lossy for integer types if there are NA's since it will now be converted to float?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised it doesn't break more tests as I think there are still a number of places that try to round-trip through numpy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah same

@phofl
Copy link
Member Author

phofl commented Dec 21, 2023

cc @mroeschke

this should be ready as well

# Conflicts:
#	pandas/core/arrays/arrow/array.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.log fails with missing pyarrow values
4 participants