-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.drop raising Error when Index has duplicates #38070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
if self.is_unique: | ||
indexer = self.get_indexer(labels) | ||
else: | ||
indexer, _ = self.get_indexer_non_unique(labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_indexer_for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not know this function, thanks very much.
pandas/tests/indexes/test_base.py
Outdated
@@ -1496,6 +1496,13 @@ def test_drop_tuple(self, values, to_drop): | |||
with pytest.raises(KeyError, match=msg): | |||
removed.drop(drop_me) | |||
|
|||
def test_drop_with_duplicates_in_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use the index
fixture and do something like
def test_drop_with_duplicates(self, index):
if len(index) == 0: return
index = index.repeat(2) # <-- ensure duplicates
res = index.drop(index[0])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better? Is there a way to construc expected without drop?
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
pandas/tests/indexes/test_base.py
Outdated
# GH38051 | ||
if len(index) == 0: | ||
return | ||
expected = index.drop(index[0]).repeat(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works, but ideally we'd form expected
without using drop
. could do
index = index.unique()
index = index.repeat(2)
expected = index[2:]
result = index.drop(index[0])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a similar idea, but the fixture contains indexes with duplicates. If drop is bad, we could use unique and index[1:] atfterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write your idea out explicitly? im not clear on how it is different from what i wrote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forget what I have said. I missed your first line
Exited for the MultiIndex cases, because would have to catch a PerformanceWarning there if it is not sorted and we are testing this at another place too. |
looks good, @phofl can you merge master and ping on green. |
def test_drop_with_non_monotonic_duplicates(): | ||
# GH#33494 | ||
mi = MultiIndex.from_tuples([(1, 2), (2, 3), (1, 2)]) | ||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could maybe use @pytest.mark.filterwarnings
as a follow-up
Thanks for merging. Found that warnings call in another Test. Will look through as a follow up and clean this up |
@phofl test_drop_with_duplicates_in_index failing on 32bit (not all tests completed on other envs yet) |
Dtype casting issue, will look into this later. |
I think this is a configuration which can not work on 32 bit for this one test, so I skipped it |
@jreback green |
thanks @phofl |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
@jbrockmendel This should deal with duplicates. For MultiIndex sometimes a slice with stepzize greater than zero was given, which dropped to many elements