ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

mroeschke · 2022-07-08T19:59:06Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

Generally the xfails correspond to:

pyarrow < 2 not having the *_checked ops
pyarrow < 8 not supporting arithmetic with some temporal types
pyarrow not having mod/rmod compute functions
1**pandas.NA == 1 while 1**pyarrow.NA == NULL

pandas/core/arrays/arrow/array.py

jorisvandenbossche · 2022-07-09T09:31:17Z

pandas/tests/arrays/string_/test_string.py

@@ -101,7 +101,7 @@ def test_add(dtype, request):
            "unsupported operand type(s) for +: 'ArrowStringArray' and "
            "'ArrowStringArray'"
        )
-        mark = pytest.mark.xfail(raises=TypeError, reason=reason)
+        mark = pytest.mark.xfail(raises=NotImplementedError, reason=reason)


BTW (not necessarily for this PR), but + for strings is actually implemented in Arrow, but under the "join" name (like str.join), so this could be emulated using pc.binary_join_element_wise(left, right, "") (the last empty string is to not use a separator)

jreback

small comment, but lgtm

jreback · 2022-07-16T02:17:44Z

pandas/tests/extension/test_arrow.py

+    def assert_series_equal(self, left, right, *args, **kwargs):
+        # Series.combine for "expected" retains bool[pyarrow] dtype
+        # While "result" return "boolean" dtype
+        right = pd.Series(right._values.to_numpy(), dtype="boolean")


this is not great

Yeah using Series.combine to generate expected for these ExtensionArray tests doesn't always compare nicely with pyarrow.

I wouldn't say there's a bug in Series.combine or this implementation, just a consequence of Series.combine using Python comparisons semantics and not necessarily Arrow comparison semantics

In general I would maybe also not do too much effort in trying to fully implement all those tests (too late probably) following the base extension tests exactly. As those need to be written in a generic way, that also makes them not very robust (eg the combine() method) / often awkward to write.
It might make sense to write a set of custom, more targeted set of tests for a group of dtypes like the arrow dtypes (for certain operations)

simonjayhawkins · 2022-07-16T14:25:46Z

pandas/core/arrays/arrow/array.py

+        "rmod": NotImplemented,
+        "divmod": NotImplemented,
+        "rdivmod": NotImplemented,
+        "pow": NotImplemented if pa_version_under2p0 else pc.power_checked,


failing with pyarrow 2.0.0

ImportError while loading conftest '/home/simon/pandas/pandas/conftest.py'. pandas/__init__.py:48: in <module> from pandas.core.api import ( pandas/core/api.py:27: in <module> from pandas.core.arrays import Categorical pandas/core/arrays/__init__.py:20: in <module> from pandas.core.arrays.string_arrow import ArrowStringArray pandas/core/arrays/string_arrow.py:37: in <module> from pandas.core.arrays.arrow import ArrowExtensionArray pandas/core/arrays/arrow/__init__.py:1: in <module> from pandas.core.arrays.arrow.array import ArrowExtensionArray pandas/core/arrays/arrow/array.py:124: in <module> "pow": NotImplemented if pa_version_under2p0 else pc.power_checked, E AttributeError: module 'pyarrow.compute' has no attribute 'power_checked'

I merged @mroeschke's follow up PR (#47752), but I am now wondering: we don't have CI builds with pyarrow 2.0 to catch this?

We don't, only testing 1.01 (min), 5, 6, 7. IIRC pyarrow version 2 or 3 were causing the CI to time out around the read_csv tests so that's why 2 or 3 weren't included #43650

mroeschke added 5 commits July 7, 2022 16:18

start adding arith tests

3deef7c

Add more arithmetic tests

82d7734

Override _combine in the future

4957475

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

d976797

Finalize tests

5c7d4bf

mroeschke added Testing pandas testing functions or related to the test suite Arrow pyarrow functionality labels Jul 8, 2022

mroeschke added this to the 1.5 milestone Jul 8, 2022

Add checked

f38bf94

mroeschke commented Jul 8, 2022

View reviewed changes

pandas/core/arrays/arrow/array.py Outdated Show resolved Hide resolved

mroeschke added 3 commits July 8, 2022 16:55

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

5b97245

Can raise NotImplimented instead of TypeError now

6f5b57b

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

eb99dd5

jorisvandenbossche reviewed Jul 9, 2022

View reviewed changes

mroeschke added 16 commits July 9, 2022 17:57

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

d1cb7f3

Fix typing, compute kernel compat

3d5d96d

pyarrow 8 supports some duration ops

f03f774

Add to pandas compat

726c6b7

xor not implememnted in min pyarrow

2de348c

xor not implememnted in min pyarrow

1dd2f79

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

83011d5

Fix pyarrow=8 temporal condition

0aed029

min version compat

d30877f

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

afe9468

more compat

f374451

Add support for truediv

88449db

Add floordiv

4034a1c

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

9bfd503

min version compat

81c609f

Merge remote-tracking branch 'upstream/main' into arrow/comparisons

6bacdba

Add comparison tests

72e8923

jreback approved these changes Jul 16, 2022

View reviewed changes

jreback merged commit 0b8d8bb into pandas-dev:main Jul 16, 2022

simonjayhawkins reviewed Jul 16, 2022

View reviewed changes

mroeschke deleted the arrow/comparisons branch July 16, 2022 17:52

mroeschke mentioned this pull request Jul 16, 2022

BUG: Fix pc.power_checked min version #47752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

mroeschke commented Jul 8, 2022 •

edited

Loading

jorisvandenbossche Jul 9, 2022

jreback left a comment

jreback Jul 16, 2022

mroeschke Jul 16, 2022

jorisvandenbossche Jul 16, 2022

simonjayhawkins Jul 16, 2022

jorisvandenbossche Jul 17, 2022

mroeschke Jul 17, 2022

ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645

Conversation

mroeschke commented Jul 8, 2022 • edited Loading

jorisvandenbossche Jul 9, 2022

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Jul 16, 2022

Choose a reason for hiding this comment

mroeschke Jul 16, 2022

Choose a reason for hiding this comment

jorisvandenbossche Jul 16, 2022

Choose a reason for hiding this comment

simonjayhawkins Jul 16, 2022

Choose a reason for hiding this comment

jorisvandenbossche Jul 17, 2022

Choose a reason for hiding this comment

mroeschke Jul 17, 2022

Choose a reason for hiding this comment

mroeschke commented Jul 8, 2022 •

edited

Loading