-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: check compatibility with pyarrow types_mapper parameter #44369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
017803f
325c0bf
88d6972
49ad892
09fe48e
ece974c
2c80f74
12a092e
da2873f
0762fd7
1ede256
31b0065
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,22 @@ def data(request): | |
return request.param | ||
|
||
|
||
@pytest.fixture( | ||
params=[ | ||
pd.Int8Dtype, | ||
pd.Int16Dtype, | ||
pd.Int32Dtype, | ||
pd.Int64Dtype, | ||
pd.UInt8Dtype, | ||
pd.UInt16Dtype, | ||
pd.UInt32Dtype, | ||
pd.UInt64Dtype, | ||
] | ||
) | ||
def int_dtype(request): | ||
return request.param() | ||
|
||
|
||
def test_arrow_array(data): | ||
arr = pa.array(data) | ||
expected = pa.array( | ||
|
@@ -39,6 +55,30 @@ def test_arrow_roundtrip(data): | |
tm.assert_frame_equal(result, df) | ||
|
||
|
||
@td.skip_if_no("pyarrow") | ||
def test_dataframe_from_arrow_type_mapper(int_dtype): | ||
pyarrow = pytest.importorskip("pyarrow", minversion="3.0.0") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put the version IN the skip_if_no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a reference to this issue in a comment (e.g. the gh number) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out the module-level limit of 1.0.1 seems to be OK. Just tested with that version. |
||
|
||
def types_mapper(arrow_type): | ||
if pyarrow.types.is_boolean(arrow_type): | ||
return pd.BooleanDtype() | ||
elif pyarrow.types.is_integer(arrow_type): | ||
return int_dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it actually needed to parametrize this over all int EA dtypes? Could also just use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't you want to make sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, I could have a separate column in the arrow record batch for each integer data type, but that seemed too much. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is already implicitly tested in the But I am also fine with keeping the test as you have it now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. I'll simplify, but add one column just so we test a code path that requires a copy because of differing data sizes. |
||
|
||
bools_array = pyarrow.array([True, None, False], type=pyarrow.bool_()) | ||
ints_array = pyarrow.array([1, None, 2], type=pyarrow.int64()) | ||
record_batch = pyarrow.RecordBatch.from_arrays( | ||
[bools_array, ints_array], ["bools", "ints"] | ||
) | ||
result = record_batch.to_pandas(date_as_object=False, types_mapper=types_mapper) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. Removed. That was leftover from a similar test I was writing for the db-dtypes package |
||
assert result["bools"].dtype == "boolean" | ||
assert result["ints"].dtype == int_dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can leave this out, the assert_frame_equal will also take care of it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call. Done. |
||
bools = pd.Series([True, None, False], dtype="boolean") | ||
ints = pd.Series([1, None, 2], dtype=int_dtype.name) | ||
expected = pd.DataFrame({"bools": bools, "ints": ints}) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@td.skip_if_no("pyarrow") | ||
def test_arrow_load_from_zero_chunks(data): | ||
# GH-41040 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use this fixture: any_int_ea_dtype and remove the int_dtype from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are strings, but
types_mapper
expects actual dtype objects. Maybe there's a way to convert, though?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.api.types.pandas_dtype
is the function I needed.