-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH/TST: Add TestBaseArithmeticOps tests for ArrowExtensionArray #47601 #47645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -101,7 +101,7 @@ def test_add(dtype, request): | |||
"unsupported operand type(s) for +: 'ArrowStringArray' and " | |||
"'ArrowStringArray'" | |||
) | |||
mark = pytest.mark.xfail(raises=TypeError, reason=reason) | |||
mark = pytest.mark.xfail(raises=NotImplementedError, reason=reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW (not necessarily for this PR), but +
for strings is actually implemented in Arrow, but under the "join" name (like str.join), so this could be emulated using pc.binary_join_element_wise(left, right, "")
(the last empty string is to not use a separator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, but lgtm
def assert_series_equal(self, left, right, *args, **kwargs): | ||
# Series.combine for "expected" retains bool[pyarrow] dtype | ||
# While "result" return "boolean" dtype | ||
right = pd.Series(right._values.to_numpy(), dtype="boolean") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using Series.combine
to generate expected
for these ExtensionArray tests doesn't always compare nicely with pyarrow.
I wouldn't say there's a bug in Series.combine
or this implementation, just a consequence of Series.combine
using Python comparisons semantics and not necessarily Arrow comparison semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would maybe also not do too much effort in trying to fully implement all those tests (too late probably) following the base extension tests exactly. As those need to be written in a generic way, that also makes them not very robust (eg the combine()
method) / often awkward to write.
It might make sense to write a set of custom, more targeted set of tests for a group of dtypes like the arrow dtypes (for certain operations)
"rmod": NotImplemented, | ||
"divmod": NotImplemented, | ||
"rdivmod": NotImplemented, | ||
"pow": NotImplemented if pa_version_under2p0 else pc.power_checked, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failing with pyarrow 2.0.0
ImportError while loading conftest '/home/simon/pandas/pandas/conftest.py'.
pandas/__init__.py:48: in <module>
from pandas.core.api import (
pandas/core/api.py:27: in <module>
from pandas.core.arrays import Categorical
pandas/core/arrays/__init__.py:20: in <module>
from pandas.core.arrays.string_arrow import ArrowStringArray
pandas/core/arrays/string_arrow.py:37: in <module>
from pandas.core.arrays.arrow import ArrowExtensionArray
pandas/core/arrays/arrow/__init__.py:1: in <module>
from pandas.core.arrays.arrow.array import ArrowExtensionArray
pandas/core/arrays/arrow/array.py:124: in <module>
"pow": NotImplemented if pa_version_under2p0 else pc.power_checked,
E AttributeError: module 'pyarrow.compute' has no attribute 'power_checked'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged @mroeschke's follow up PR (#47752), but I am now wondering: we don't have CI builds with pyarrow 2.0 to catch this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, only testing 1.01 (min), 5, 6, 7. IIRC pyarrow version 2 or 3 were causing the CI to time out around the read_csv tests so that's why 2 or 3 weren't included #43650
Generally the xfails correspond to:
1**pandas.NA == 1
while1**pyarrow.NA == NULL