-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: check compatibility with pyarrow types_mapper parameter #44369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Possible this was actually just a misunderstanding on my part of the I still think it'd be worth having these tests, so I can update my PR to do that. |
@@ -397,3 +397,21 @@ class TestParsing(base.BaseParsingTests): | |||
|
|||
class Test2DCompat(base.Dim2CompatTests): | |||
pass | |||
|
|||
|
|||
def test_from_arrow(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to use a min version here in the skip and use the decorator version like
td.skip_if_no("pyarrow", min_version="1.0.0")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also move the test to /tests/arrays/boolean/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more specifically, we have tests for arrow compat in tests/arrays/masked/test_arrow_compat.py
. I would add it there (or first check if it's not actually already covered)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/arrays/masked/test_arrow_compat.py
looks more low-level than what I'm aiming to test here, but a combo test there could make sense.
…to issue44368-BooleanDtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the combined PR!
record_batch = pyarrow.RecordBatch.from_arrays( | ||
[bools_array, ints_array], ["bools", "ints"] | ||
) | ||
result = record_batch.to_pandas(date_as_object=False, types_mapper=types_mapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the date_as_object
relevant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Removed.
That was leftover from a similar test I was writing for the db-dtypes package
assert result["bools"].dtype == "boolean" | ||
assert result["ints"].dtype == int_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave this out, the assert_frame_equal will also take care of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done.
@@ -39,6 +55,28 @@ def test_arrow_roundtrip(data): | |||
tm.assert_frame_equal(result, df) | |||
|
|||
|
|||
@td.skip_if_no("pyarrow") | |||
def test_dataframe_from_arrow_type_mapper(int_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use this fixture: any_int_ea_dtype and remove the int_dtype from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are strings, but types_mapper
expects actual dtype objects. Maybe there's a way to convert, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.api.types.pandas_dtype
is the function I needed.
@@ -39,6 +55,28 @@ def test_arrow_roundtrip(data): | |||
tm.assert_frame_equal(result, df) | |||
|
|||
|
|||
@td.skip_if_no("pyarrow") | |||
def test_dataframe_from_arrow_type_mapper(int_dtype): | |||
pyarrow = pytest.importorskip("pyarrow", minversion="3.0.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the version IN the skip_if_no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a reference to this issue in a comment (e.g. the gh number)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the module-level limit of 1.0.1 seems to be OK. Just tested with that version.
if pa.types.is_boolean(arrow_type): | ||
return pd.BooleanDtype() | ||
elif pa.types.is_integer(arrow_type): | ||
return int_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually needed to parametrize this over all int EA dtypes? Could also just use pd.Int64Dtype()
like you use BooleanDtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to make sure types_mapper
works for any integer dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I could have a separate column in the arrow record batch for each integer data type, but that seemed too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to make sure types_mapper works for any integer dtype?
This is already implicitly tested in the test_arrow_roundtrip
above (since that is parametrized for every dtype, and will under the hood make use of the same dtype.__from_arrow__
)
But I am also fine with keeping the test as you have it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll simplify, but add one column just so we test a code path that requires a copy because of differing data sizes.
can you merge master & @jorisvandenbossche comment (agree) |
thanks @tswast |
Closes #44368