Skip to content

ENH: Add IntegerArray.__arrow_array__ for custom conversion to Arrow #28368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,14 @@ def __array__(self, dtype=None):
"""
return self._coerce_to_ndarray()

def __arrow_array__(self, type=None):
"""
Convert myself into a pyarrow Array.
"""
import pyarrow as pa

return pa.array(self._data, mask=self._mask, type=type)

_HANDLED_TYPES = (np.ndarray, numbers.Number)

def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/arrays/test_integer.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from distutils.version import LooseVersion

import numpy as np
import pytest

Expand All @@ -19,6 +21,13 @@
from pandas.tests.extension.base import BaseOpsUtil
import pandas.util.testing as tm

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to replace this with compat._optional.import_optional_dependency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used compat._optional.import_optional_dependency before, so correct if me if I am wrong: it seems that method is typically used in the code (not tests) and is meant to raise an error or return the module (so eg in functions that use an optional dependency).
So I would still need to catch the error, I think? In which case I am not sure it is necessarily clearer, or deduplicating code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it has a raise_on_missing=False option. But in theory I then also need to specify on_version='ignore' to not have an error or warning on old pyarrow versions.

But, I can maybe actually replace it with pytest.importorskip inside the test, which seems better suited for test cases. That should also simplify it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we have our own wrapper around that as td.skip_if_no decorator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, simplified it with td.skip_if_no. I was actually already using it in the other test as well .. (so I was being a bit blind :-))

import pyarrow

_PYARROW_INSTALLED = True
except ImportError:
_PYARROW_INSTALLED = False


def make_data():
return list(range(8)) + [np.nan] + list(range(10, 98)) + [np.nan] + [99, 100]
Expand Down Expand Up @@ -817,6 +826,20 @@ def test_ufunc_reduce_raises(values):
np.add.reduce(a)


@pytest.mark.skipif(
not _PYARROW_INSTALLED
or _PYARROW_INSTALLED
and LooseVersion(pyarrow.__version__) < LooseVersion("0.14.1.dev"),
reason="pyarrow >= 0.15.0 required",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for this requiring 0.15.0 but the next test requiring 0.14.1.dev?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is to be able to test it with arrow master locally. We can also wait until final 0.15.0 is released, and then I can change this check. But in practice it gives the same.

)
def test_arrow_array(data):
import pyarrow as pa

arr = pa.array(data)
expected = pa.array(list(data), type=data.dtype.name.lower(), from_pandas=True)
assert arr.equals(expected)


# TODO(jreback) - these need testing / are broken

# shift
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/io/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,6 +478,18 @@ def test_empty_dataframe(self, pa):
df = pd.DataFrame()
check_round_trip(df, pa)

@td.skip_if_no("pyarrow", min_version="0.14.1.dev")
def test_nullable_integer(self, pa):
df = pd.DataFrame({"a": pd.Series([1, 2, 3], dtype="Int64")})
# currently de-serialized as plain int
expected = df.assign(a=df.a.astype("int64"))
check_round_trip(df, pa, expected=expected)

df = pd.DataFrame({"a": pd.Series([1, 2, 3, None], dtype="Int64")})
# if missing values currently de-serialized as float
expected = df.assign(a=df.a.astype("float64"))
check_round_trip(df, pa, expected=expected)


class TestParquetFastParquet(Base):
@td.skip_if_no("fastparquet", min_version="0.2.1")
Expand Down