-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Use fixtures in pandas/tests/base #32046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fixtures in pandas/tests/base #32046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks generally good (I think the naming of the fixture is a little strange, though admittedly don't have anything better to offer atm)
A few questions
I think I addressed all your comments here @WillAyd |
first = indices[2:] | ||
second = indices[:4] | ||
answer = indices[4:] | ||
if isinstance(indices, CategoricalIndex) or indices.is_boolean(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. To make the indices fixture more useful I enhanced its length from 2 to 10 entries, which impacts the expected value here. In the same way, I could provide a proper "expected" for CategoricalIndex
instead of just silently skipping it.
pandas/tests/base/test_ops.py
Outdated
def test_none_comparison(self, series_with_differing_index): | ||
o = series_with_differing_index | ||
if isinstance(o.index, IntervalIndex): | ||
pytest.skip("IntervalIndex is immutable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does immutability have to do with this? all indexes should be immutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check. I think the IntervalIndex doesn't implement .loc
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a hunch, id imagine its like the datetime64 case where it raises for inequalities (in which case it can be added to the special case for datetime64 below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's failing on line 149 o[0] = np.nan
already. The assignment fails with
AttributeError: 'pandas._libs.interval.IntervalTree' object has no attribute 'get_loc'
I added a comment and adjusted the skip message. Please let me know if you have a better idea
@SaturnFromTitan if its easy for you, can you point out sections of the diff that are effectively just-de-indentation, should make it easier to focus review |
pandas/tests/base/test_ops.py
Outdated
@@ -137,227 +137,238 @@ def setup_method(self, method): | |||
self.is_valid_objs = self.objs | |||
self.not_valid_objs = [] | |||
|
|||
def test_none_comparison(self): | |||
def test_none_comparison(self, f8series_any_simple_index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed indentation and added skips
pandas/tests/base/test_ops.py
Outdated
|
||
assert o.nunique() == len(np.unique(o.values)) | ||
# dropna=True would break for MultiIndex | ||
assert o.nunique(dropna=False) == len(np.unique(o.values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the dropna=False and a comment explaining why
|
||
def test_ndarray_compat_properties(self): | ||
def test_ndarray_compat_properties(self, index_or_series_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fixed indentation
|
||
expected_s = Series( | ||
range(10, 0, -1), index=expected_index, dtype="int64", name="a" | ||
def test_value_counts_unique_nunique(self, index_or_series_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed indentation and added xfail. All other changes are marked with comments
assert o.dtype == orig.dtype | ||
|
||
expected_s = Series( | ||
range(len(orig), 0, -1), index=expected_index, dtype="int64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the length of range dynamic as not all values of index_or_series_obj
share the same length
elif needs_i8_conversion(o): | ||
values[0:2] = iNaT | ||
values = o._shallow_copy(values) | ||
def test_value_counts_unique_nunique_null(self, null_obj, index_or_series_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed indentation and added skips. All other changes are marked with comments
expected_data = list(range(num_values, 2, -1)) | ||
expected_data_na = expected_data.copy() | ||
if expected_data_na: | ||
expected_data_na.append(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- length of expected_data is now dynamic
- using append instead of inline if/else statement
pandas/tests/base/test_ops.py
Outdated
assert result.dtype == orig.dtype | ||
|
||
assert o.nunique() == max(0, num_values - 2) | ||
assert o.nunique(dropna=False) == max(0, num_values - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- comparing against dynamic value depending on the length of the fixture value
- using
max(0, ...)
so tests don't break with the empty index
I addressed all your comments @jbrockmendel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, some comments. ping on green.
pandas/tests/base/test_ops.py
Outdated
values = o._ndarray_values | ||
num_values = len(orig) | ||
|
||
if not allow_na_ops(o): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd don't we have a way to avoid using skips inside the test itself and rather do this externally? I don't recall exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can only do this via the pytest.mark.skip
decorator. But this approach doesn't work here, because we need a condition on the parametrized fixture value. A quick Google search also didn't bring up anything better
expected_index.name = None | ||
o = o.repeat(range(1, len(o) + 1)) | ||
o.name = "a" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future PR, let's break these huge tests up
@jreback CI is green and I addressed all your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to merge, but there are a number of weird constructs / duplications here, pls address these as a followup (or open an issue)
|
||
# special assign to the numpy array | ||
if is_datetime64tz_dtype(obj): | ||
if isinstance(obj, DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simplified; the else path should work for all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref: #32205
if isinstance(o, (DatetimeIndex, PeriodIndex)): | ||
expected_index = o.copy() | ||
expected_index.name = None | ||
elif needs_i8_conversion(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also duplicative here
obj = klass(values.repeat(range(1, len(obj) + 1))) | ||
obj.name = "a" | ||
else: | ||
if isinstance(obj, DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can't possibly be true (as the first if already catches this)
thanks. ideally very targeted PRs are best as we can merge quickly. |
That's the plan @jreback I wanted to focus on fixturizing the tests before refactoring. Seeing how this PR went, it apparently makes more sense to refactor right away and go test by test instead ✌️ |
part of #23877
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Note: The diff is a bit inflated. Most changes are just indentation because for loops are replaced by a parametrized fixture.