Skip to content

TST: check compatibility with pyarrow types_mapper parameter #44369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 1, 2021
22 changes: 22 additions & 0 deletions pandas/tests/arrays/masked/test_arrow_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import pandas as pd
import pandas._testing as tm
from pandas.api.types import pandas_dtype

pa = pytest.importorskip("pyarrow", minversion="1.0.1")

Expand Down Expand Up @@ -36,6 +37,27 @@ def test_arrow_roundtrip(data):
tm.assert_frame_equal(result, df)


def test_dataframe_from_arrow_types_mapper(any_int_ea_dtype):
int_dtype = pandas_dtype(any_int_ea_dtype)

def types_mapper(arrow_type):
if pa.types.is_boolean(arrow_type):
return pd.BooleanDtype()
elif pa.types.is_integer(arrow_type):
return int_dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually needed to parametrize this over all int EA dtypes? Could also just use pd.Int64Dtype() like you use BooleanDtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to make sure types_mapper works for any integer dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could have a separate column in the arrow record batch for each integer data type, but that seemed too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to make sure types_mapper works for any integer dtype?

This is already implicitly tested in the test_arrow_roundtrip above (since that is parametrized for every dtype, and will under the hood make use of the same dtype.__from_arrow__)

But I am also fine with keeping the test as you have it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll simplify, but add one column just so we test a code path that requires a copy because of differing data sizes.


bools_array = pa.array([True, None, False], type=pa.bool_())
ints_array = pa.array([1, None, 2], type=pa.int64())
record_batch = pa.RecordBatch.from_arrays(
[bools_array, ints_array], ["bools", "ints"]
)
result = record_batch.to_pandas(types_mapper=types_mapper)
bools = pd.Series([True, None, False], dtype="boolean")
ints = pd.Series([1, None, 2], dtype=int_dtype.name)
expected = pd.DataFrame({"bools": bools, "ints": ints})
tm.assert_frame_equal(result, expected)


def test_arrow_load_from_zero_chunks(data):
# GH-41040

Expand Down