Skip to content

BUG: Index.drop raising Error when Index has duplicates #38070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 2, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 25, 2020

@jbrockmendel This should deal with duplicates. For MultiIndex sometimes a slice with stepzize greater than zero was given, which dropped to many elements

if self.is_unique:
indexer = self.get_indexer(labels)
else:
indexer, _ = self.get_indexer_non_unique(labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_indexer_for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not know this function, thanks very much.

@phofl phofl added Index Related to the Index class or subclasses MultiIndex labels Nov 25, 2020
@@ -1496,6 +1496,13 @@ def test_drop_tuple(self, values, to_drop):
with pytest.raises(KeyError, match=msg):
removed.drop(drop_me)

def test_drop_with_duplicates_in_index(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use the index fixture and do something like

def test_drop_with_duplicates(self, index):
    if len(index) == 0: return
    index = index.repeat(2)  #  <-- ensure duplicates
    res = index.drop(index[0])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better? Is there a way to construc expected without drop?

# GH38051
if len(index) == 0:
return
expected = index.drop(index[0]).repeat(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works, but ideally we'd form expected without using drop. could do

index = index.unique()
index = index.repeat(2)
expected = index[2:]
result = index.drop(index[0])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a similar idea, but the fixture contains indexes with duplicates. If drop is bad, we could use unique and index[1:] atfterwards?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write your idea out explicitly? im not clear on how it is different from what i wrote

Copy link
Member Author

@phofl phofl Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forget what I have said. I missed your first line

@phofl
Copy link
Member Author

phofl commented Nov 26, 2020

Exited for the MultiIndex cases, because would have to catch a PerformanceWarning there if it is not sorted and we are testing this at another place too.

@jreback jreback added this to the 1.2 milestone Dec 2, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

looks good, @phofl can you merge master and ping on green.

def test_drop_with_non_monotonic_duplicates():
# GH#33494
mi = MultiIndex.from_tuples([(1, 2), (2, 3), (1, 2)])
with warnings.catch_warnings():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could maybe use @pytest.mark.filterwarnings as a follow-up

@phofl
Copy link
Member Author

phofl commented Dec 2, 2020

Thanks for merging. Found that warnings call in another Test. Will look through as a follow up and clean this up

@simonjayhawkins
Copy link
Member

@phofl test_drop_with_duplicates_in_index failing on 32bit (not all tests completed on other envs yet)

@phofl
Copy link
Member Author

phofl commented Dec 2, 2020

Dtype casting issue, will look into this later.

@phofl
Copy link
Member Author

phofl commented Dec 2, 2020

I think this is a configuration which can not work on 32 bit for this one test, so I skipped it

@phofl
Copy link
Member Author

phofl commented Dec 2, 2020

@jreback green

@jreback jreback merged commit 73d0d34 into pandas-dev:master Dec 2, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

thanks @phofl

@phofl phofl deleted the 38051 branch December 3, 2020 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses MultiIndex
Projects
None yet
4 participants