Skip to content

TST: check compatibility with pyarrow types_mapper parameter #44369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 1, 2021

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Nov 9, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Closes #44368

@tswast tswast marked this pull request as ready for review November 9, 2021 17:59
@tswast
Copy link
Contributor Author

tswast commented Nov 9, 2021

Possible this was actually just a misunderstanding on my part of the __from_arrow__ method. Creating an instance of the dtype resolves my issue.

https://pandas.pydata.org/pandas-docs/stable/development/extending.html#compatibility-with-apache-arrow

I still think it'd be worth having these tests, so I can update my PR to do that.

@tswast tswast changed the title BUG: BooleanDtype compatibility with pyarrow types_mapper parameter TST: check BooleanDtype compatibility with pyarrow types_mapper parameter Nov 9, 2021
@@ -397,3 +397,21 @@ class TestParsing(base.BaseParsingTests):

class Test2DCompat(base.Dim2CompatTests):
pass


def test_from_arrow(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might need to use a min version here in the skip and use the decorator version like
td.skip_if_no("pyarrow", min_version="1.0.0")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also move the test to /tests/arrays/boolean/ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more specifically, we have tests for arrow compat in tests/arrays/masked/test_arrow_compat.py. I would add it there (or first check if it's not actually already covered)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/arrays/masked/test_arrow_compat.py looks more low-level than what I'm aiming to test here, but a combo test there could make sense.

@tswast tswast changed the title TST: check BooleanDtype compatibility with pyarrow types_mapper parameter TST: check compatibility with pyarrow types_mapper parameter Nov 10, 2021
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the combined PR!

record_batch = pyarrow.RecordBatch.from_arrays(
[bools_array, ints_array], ["bools", "ints"]
)
result = record_batch.to_pandas(date_as_object=False, types_mapper=types_mapper)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the date_as_object relevant here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Removed.

That was leftover from a similar test I was writing for the db-dtypes package

Comment on lines 74 to 75
assert result["bools"].dtype == "boolean"
assert result["ints"].dtype == int_dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave this out, the assert_frame_equal will also take care of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Done.

@jreback jreback added the Arrow pyarrow functionality label Nov 14, 2021
@jreback jreback added this to the 1.4 milestone Nov 14, 2021
@jreback jreback added the Testing pandas testing functions or related to the test suite label Nov 14, 2021
@@ -39,6 +55,28 @@ def test_arrow_roundtrip(data):
tm.assert_frame_equal(result, df)


@td.skip_if_no("pyarrow")
def test_dataframe_from_arrow_type_mapper(int_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use this fixture: any_int_ea_dtype and remove the int_dtype from above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are strings, but types_mapper expects actual dtype objects. Maybe there's a way to convert, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.api.types.pandas_dtype is the function I needed.

@@ -39,6 +55,28 @@ def test_arrow_roundtrip(data):
tm.assert_frame_equal(result, df)


@td.skip_if_no("pyarrow")
def test_dataframe_from_arrow_type_mapper(int_dtype):
pyarrow = pytest.importorskip("pyarrow", minversion="3.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the version IN the skip_if_no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a reference to this issue in a comment (e.g. the gh number)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the module-level limit of 1.0.1 seems to be OK. Just tested with that version.

@tswast tswast requested a review from jreback November 22, 2021 18:04
if pa.types.is_boolean(arrow_type):
return pd.BooleanDtype()
elif pa.types.is_integer(arrow_type):
return int_dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually needed to parametrize this over all int EA dtypes? Could also just use pd.Int64Dtype() like you use BooleanDtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to make sure types_mapper works for any integer dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could have a separate column in the arrow record batch for each integer data type, but that seemed too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to make sure types_mapper works for any integer dtype?

This is already implicitly tested in the test_arrow_roundtrip above (since that is parametrized for every dtype, and will under the hood make use of the same dtype.__from_arrow__)

But I am also fine with keeping the test as you have it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll simplify, but add one column just so we test a code path that requires a copy because of differing data sizes.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2021

can you merge master & @jorisvandenbossche comment (agree)

@jreback jreback merged commit 7a26910 into pandas-dev:master Dec 1, 2021
@jreback
Copy link
Contributor

jreback commented Dec 1, 2021

thanks @tswast

@tswast tswast deleted the issue44368-BooleanDtype branch December 1, 2021 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: BooleanDtype needs an instance to use pyarrow types_mapper parameter
3 participants