Skip to content

CLN: Remove redundant tests for .duplicated and .drop_duplicates in tests/base #32487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

SaturnFromTitan
Copy link
Contributor

@SaturnFromTitan SaturnFromTitan commented Mar 6, 2020

part of #23877

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

duplicated and drop_duplicates are already thoroughly tested in tests/indexes and tests/series. I added comments to highlight the redundancy and extended the existing test cases where needed.

@SaturnFromTitan SaturnFromTitan changed the title remove redundant duplicated test from tests/base/test_ops.py CLN: Remove redundant test from tests/base/test_ops.py Mar 6, 2020
@SaturnFromTitan SaturnFromTitan changed the title CLN: Remove redundant test from tests/base/test_ops.py CLN: Remove another redundant test from tests/base/test_ops.py Mar 6, 2020
@WillAyd
Copy link
Member

WillAyd commented Mar 6, 2020

Same comment as #32484 are you sure this doesn't lose any coverage?

@WillAyd WillAyd added the Testing pandas testing functions or related to the test suite label Mar 6, 2020
@SaturnFromTitan SaturnFromTitan changed the title CLN: Remove another redundant test from tests/base/test_ops.py CLN: Remove redundant tests for .duplicated and .drop_duplicates Mar 12, 2020
@SaturnFromTitan SaturnFromTitan changed the title CLN: Remove redundant tests for .duplicated and .drop_duplicates CLN: Remove redundant tests for .duplicated and .drop_duplicates in tests/base Mar 12, 2020
Comment on lines -666 to -697
idx = original.index[list(range(len(original))) + [5, 3]]
values = original._values[list(range(len(original))) + [5, 3]]
s = Series(values, index=idx, name="a")

expected = Series(
[False] * len(original) + [True, True], index=idx, name="a"
)
tm.assert_series_equal(s.duplicated(), expected)
tm.assert_series_equal(s.drop_duplicates(), original)

base = [False] * len(idx)
base[3] = True
base[5] = True
expected = Series(base, index=idx, name="a")

tm.assert_series_equal(s.duplicated(keep="last"), expected)
tm.assert_series_equal(
s.drop_duplicates(keep="last"), s[~np.array(base)]
)

base = [False] * len(original) + [True, True]
base[3] = True
base[5] = True
expected = Series(base, index=idx, name="a")

tm.assert_series_equal(s.duplicated(keep=False), expected)
tm.assert_series_equal(
s.drop_duplicates(keep=False), s[~np.array(base)]
)

s.drop_duplicates(inplace=True)
tm.assert_series_equal(s, original)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -658 to -664
expected = Series(
[False] * len(original), index=original.index, name="a"
)
tm.assert_series_equal(original.duplicated(), expected)
result = original.drop_duplicates()
tm.assert_series_equal(result, original)
assert result is not original
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tested in test_drop_duplicates_no_duplicates in tests/series/methods/test_drop_duplicates.py, see below

Comment on lines -651 to -655
with pytest.raises(
TypeError,
match=r"drop_duplicates\(\) got an unexpected keyword argument",
):
idx.drop_duplicates(inplace=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tested in test_drop_duplicates_inplace in tests/indexes/test_common.py, see below

Comment on lines -618 to -649
# has_duplicates
assert not original.has_duplicates

# create repeated values, 3rd and 5th values are duplicated
idx = original[list(range(len(original))) + [5, 3]]
expected = np.array([False] * len(original) + [True, True], dtype=bool)
duplicated = idx.duplicated()
tm.assert_numpy_array_equal(duplicated, expected)
assert duplicated.dtype == bool
tm.assert_index_equal(idx.drop_duplicates(), original)

base = [False] * len(idx)
base[3] = True
base[5] = True
expected = np.array(base)

duplicated = idx.duplicated(keep="last")
tm.assert_numpy_array_equal(duplicated, expected)
assert duplicated.dtype == bool
result = idx.drop_duplicates(keep="last")
tm.assert_index_equal(result, idx[~expected])

base = [False] * len(original) + [True, True]
base[3] = True
base[5] = True
expected = np.array(base)

duplicated = idx.duplicated(keep=False)
tm.assert_numpy_array_equal(duplicated, expected)
assert duplicated.dtype == bool
result = idx.drop_duplicates(keep=False)
tm.assert_index_equal(result, idx[~expected])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in test_drop_duplicates in tests/indexes/test_common.py which was extended and refactored, see below

Comment on lines -609 to -616
# original doesn't have duplicates
expected = np.array([False] * len(original), dtype=bool)
duplicated = original.duplicated()
tm.assert_numpy_array_equal(duplicated, expected)
assert duplicated.dtype == bool
result = original.drop_duplicates()
tm.assert_index_equal(result, original)
assert result is not original
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tested in test_drop_duplicates_no_duplicates in tests/indexes/test_common.py, see below

@SaturnFromTitan
Copy link
Contributor Author

SaturnFromTitan commented Mar 12, 2020

@WillAyd I added comments highlighting and linking the redundant tests.

Where needed I extended the existing tests in tests/indexes/test_common.py or tests/series/methods/test_drop_duplicates.py

@jreback jreback added this to the 1.1 milestone Mar 14, 2020
@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

can you rebase, ping on green.

@jreback
Copy link
Contributor

jreback commented Mar 14, 2020

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@SaturnFromTitan
Copy link
Contributor Author

@jreback CI is green

@jreback jreback merged commit 2b34275 into pandas-dev:master Mar 15, 2020
@jreback
Copy link
Contributor

jreback commented Mar 15, 2020

thanks @SaturnFromTitan

there might be other opportunities to use the keep fixture in the codebase as well.

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants