Skip to content

Use fixtures in pandas/tests/base #32046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 23, 2020

Conversation

SaturnFromTitan
Copy link
Contributor

part of #23877

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Note: The diff is a bit inflated. Most changes are just indentation because for loops are replaced by a parametrized fixture.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks generally good (I think the naming of the fixture is a little strange, though admittedly don't have anything better to offer atm)

A few questions

@WillAyd WillAyd added the Testing pandas testing functions or related to the test suite label Feb 19, 2020
@SaturnFromTitan
Copy link
Contributor Author

I think I addressed all your comments here @WillAyd

first = indices[2:]
second = indices[:4]
answer = indices[4:]
if isinstance(indices, CategoricalIndex) or indices.is_boolean():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related?

Copy link
Contributor Author

@SaturnFromTitan SaturnFromTitan Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. To make the indices fixture more useful I enhanced its length from 2 to 10 entries, which impacts the expected value here. In the same way, I could provide a proper "expected" for CategoricalIndex instead of just silently skipping it.

def test_none_comparison(self, series_with_differing_index):
o = series_with_differing_index
if isinstance(o.index, IntervalIndex):
pytest.skip("IntervalIndex is immutable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does immutability have to do with this? all indexes should be immutable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to check. I think the IntervalIndex doesn't implement .loc or something like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a hunch, id imagine its like the datetime64 case where it raises for inequalities (in which case it can be added to the special case for datetime64 below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's failing on line 149 o[0] = np.nan already. The assignment fails with

AttributeError: 'pandas._libs.interval.IntervalTree' object has no attribute 'get_loc'

I added a comment and adjusted the skip message. Please let me know if you have a better idea

@jbrockmendel
Copy link
Member

@SaturnFromTitan if its easy for you, can you point out sections of the diff that are effectively just-de-indentation, should make it easier to focus review

@@ -137,227 +137,238 @@ def setup_method(self, method):
self.is_valid_objs = self.objs
self.not_valid_objs = []

def test_none_comparison(self):
def test_none_comparison(self, f8series_any_simple_index):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentation and added skips


assert o.nunique() == len(np.unique(o.values))
# dropna=True would break for MultiIndex
assert o.nunique(dropna=False) == len(np.unique(o.values))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the dropna=False and a comment explaining why


def test_ndarray_compat_properties(self):
def test_ndarray_compat_properties(self, index_or_series_obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixed indentation


expected_s = Series(
range(10, 0, -1), index=expected_index, dtype="int64", name="a"
def test_value_counts_unique_nunique(self, index_or_series_obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentation and added xfail. All other changes are marked with comments

assert o.dtype == orig.dtype

expected_s = Series(
range(len(orig), 0, -1), index=expected_index, dtype="int64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the length of range dynamic as not all values of index_or_series_obj share the same length

elif needs_i8_conversion(o):
values[0:2] = iNaT
values = o._shallow_copy(values)
def test_value_counts_unique_nunique_null(self, null_obj, index_or_series_obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed indentation and added skips. All other changes are marked with comments

expected_data = list(range(num_values, 2, -1))
expected_data_na = expected_data.copy()
if expected_data_na:
expected_data_na.append(3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • length of expected_data is now dynamic
  • using append instead of inline if/else statement

assert result.dtype == orig.dtype

assert o.nunique() == max(0, num_values - 2)
assert o.nunique(dropna=False) == max(0, num_values - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • comparing against dynamic value depending on the length of the fixture value
  • using max(0, ...) so tests don't break with the empty index

@SaturnFromTitan
Copy link
Contributor Author

I addressed all your comments @jbrockmendel
Also, I added comments for all real changes in the tests to aid the review

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some comments. ping on green.

values = o._ndarray_values
num_values = len(orig)

if not allow_na_ops(o):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd don't we have a way to avoid using skips inside the test itself and rather do this externally? I don't recall exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can only do this via the pytest.mark.skip decorator. But this approach doesn't work here, because we need a condition on the parametrized fixture value. A quick Google search also didn't bring up anything better

expected_index.name = None
o = o.repeat(range(1, len(o) + 1))
o.name = "a"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future PR, let's break these huge tests up

@jreback jreback added this to the 1.1 milestone Feb 23, 2020
@SaturnFromTitan
Copy link
Contributor Author

@jreback CI is green and I addressed all your comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to merge, but there are a number of weird constructs / duplications here, pls address these as a followup (or open an issue)


# special assign to the numpy array
if is_datetime64tz_dtype(obj):
if isinstance(obj, DatetimeIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified; the else path should work for all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref: #32205

if isinstance(o, (DatetimeIndex, PeriodIndex)):
expected_index = o.copy()
expected_index.name = None
elif needs_i8_conversion(obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also duplicative here

obj = klass(values.repeat(range(1, len(obj) + 1)))
obj.name = "a"
else:
if isinstance(obj, DatetimeIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can't possibly be true (as the first if already catches this)

@jreback jreback merged commit aa6f241 into pandas-dev:master Feb 23, 2020
@jreback
Copy link
Contributor

jreback commented Feb 23, 2020

thanks. ideally very targeted PRs are best as we can merge quickly.

@SaturnFromTitan
Copy link
Contributor Author

That's the plan @jreback I wanted to focus on fixturizing the tests before refactoring.

Seeing how this PR went, it apparently makes more sense to refactor right away and go test by test instead ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants