Skip to content

ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 16, 2022

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jul 8, 2022

Generally the xfails correspond to:

  • pyarrow < 2 not having the *_checked ops
  • pyarrow < 8 not supporting arithmetic with some temporal types
  • pyarrow not having mod/rmod compute functions
  • 1**pandas.NA == 1 while 1**pyarrow.NA == NULL

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite Arrow pyarrow functionality labels Jul 8, 2022
@mroeschke mroeschke added this to the 1.5 milestone Jul 8, 2022
@@ -101,7 +101,7 @@ def test_add(dtype, request):
"unsupported operand type(s) for +: 'ArrowStringArray' and "
"'ArrowStringArray'"
)
mark = pytest.mark.xfail(raises=TypeError, reason=reason)
mark = pytest.mark.xfail(raises=NotImplementedError, reason=reason)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW (not necessarily for this PR), but + for strings is actually implemented in Arrow, but under the "join" name (like str.join), so this could be emulated using pc.binary_join_element_wise(left, right, "") (the last empty string is to not use a separator)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, but lgtm

def assert_series_equal(self, left, right, *args, **kwargs):
# Series.combine for "expected" retains bool[pyarrow] dtype
# While "result" return "boolean" dtype
right = pd.Series(right._values.to_numpy(), dtype="boolean")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah using Series.combine to generate expected for these ExtensionArray tests doesn't always compare nicely with pyarrow.

I wouldn't say there's a bug in Series.combine or this implementation, just a consequence of Series.combine using Python comparisons semantics and not necessarily Arrow comparison semantics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would maybe also not do too much effort in trying to fully implement all those tests (too late probably) following the base extension tests exactly. As those need to be written in a generic way, that also makes them not very robust (eg the combine() method) / often awkward to write.
It might make sense to write a set of custom, more targeted set of tests for a group of dtypes like the arrow dtypes (for certain operations)

@jreback jreback merged commit 0b8d8bb into pandas-dev:main Jul 16, 2022
"rmod": NotImplemented,
"divmod": NotImplemented,
"rdivmod": NotImplemented,
"pow": NotImplemented if pa_version_under2p0 else pc.power_checked,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failing with pyarrow 2.0.0

ImportError while loading conftest '/home/simon/pandas/pandas/conftest.py'.
pandas/__init__.py:48: in <module>
    from pandas.core.api import (
pandas/core/api.py:27: in <module>
    from pandas.core.arrays import Categorical
pandas/core/arrays/__init__.py:20: in <module>
    from pandas.core.arrays.string_arrow import ArrowStringArray
pandas/core/arrays/string_arrow.py:37: in <module>
    from pandas.core.arrays.arrow import ArrowExtensionArray
pandas/core/arrays/arrow/__init__.py:1: in <module>
    from pandas.core.arrays.arrow.array import ArrowExtensionArray
pandas/core/arrays/arrow/array.py:124: in <module>
    "pow": NotImplemented if pa_version_under2p0 else pc.power_checked,
E   AttributeError: module 'pyarrow.compute' has no attribute 'power_checked'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged @mroeschke's follow up PR (#47752), but I am now wondering: we don't have CI builds with pyarrow 2.0 to catch this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, only testing 1.01 (min), 5, 6, 7. IIRC pyarrow version 2 or 3 were causing the CI to time out around the read_csv tests so that's why 2 or 3 weren't included #43650

@mroeschke mroeschke deleted the arrow/comparisons branch July 16, 2022 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants