reindex from a duplicate axis: inconsistent behaviour #8849

urraca · 2014-11-18T14:14:21Z

The behaviour below occurs in version '0.15.1'.

When a series has a duplicate index, the method reindex will raise an exception, unless the index passed to reindex is identical to the series' index.

I propose that when a series has a duplicate index, the method reindex should always raise an exception, because when a series with a duplicate index is to be conformed to a new index, the intended behaviour is always ambiguous.

This issue applies to the methods reindex_like and reindex_axis too.

Examples of current behaviour:

(a)

>>> pd.Series([1, 2, 3], index=['a', 'b', 'b']).reindex(['a', 'b'])
ValueError: cannot reindex from a duplicate axis

(b)

>>> pd.Series([1, 2, 3], index=['a', 'b', 'b']).reindex(['a', 'b', 'b'])
a    1
b    2
b    3
dtype: int64

The exception message in (a) implies that (b) should raise; but it doesn't.

The text was updated successfully, but these errors were encountered:

jreback · 2014-11-18T21:43:20Z

can you try to change and see what breaks in the test suite?

The compariosons of index objects on a reindex is really for efficiency, since the are equal (or indentical), no reindexing is necessary.

Why are you suggesting that this should raise? (meaning what is the use case)

urraca · 2014-11-18T23:19:51Z

In (b), why is what is returned the right answer? Why should it not be:

a    1
b    2
b    2
dtype: int64

It seems to me that this sort of ambiguity is the justification for (a) raising.

I'll have a think about use cases and I'll look at the test suite.

jreback · 2014-11-18T23:35:52Z

@urraca ok, have a look, but I don't think what you just showed, e.g.

In [2]: Series([1,2,2],['a','b','b'])
Out[2]: 
a    1
b    2
b    2
dtype: int64

would be correct (e.g. why would it arbitrary take the 2 and not the 3?)

It returne it unchanged, and doesn't do anything. You are suggesting it should raise.
So give a look at the test suite and see if this would impact anything.

This is what it would 'reindex' to if it was not exactly the same

In [7]: s.take(s.index.get_indexer_non_unique(s.index)[0])
Out[7]: 
a    1
b    2
b    3
b    2
b    3
dtype: int64

mrocklin · 2015-05-22T19:25:57Z

I ran into something like this in 0.16.1

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=[0, 1, 0])

In [3]: df.groupby(level=0).apply(lambda x: x)
ValueError: cannot reindex from a duplicate axis

I expected something like the same dataframe out again

jreback added the API Design label Nov 18, 2014

ivirshup mentioned this issue Dec 30, 2020

API/ ENH: Unambiguous indexing should be allowed, even if duplicates are present #38797

Open

mroeschke added Bug and removed API Design labels Apr 11, 2021

simonjayhawkins mentioned this issue Sep 8, 2021

BUG: a duplicated index would cause groupby.fillna(method='ffill') a wrong result #43412

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reindex from a duplicate axis: inconsistent behaviour #8849

reindex from a duplicate axis: inconsistent behaviour #8849

urraca commented Nov 18, 2014

jreback commented Nov 18, 2014

urraca commented Nov 18, 2014

jreback commented Nov 18, 2014

mrocklin commented May 22, 2015

reindex from a duplicate axis: inconsistent behaviour #8849

reindex from a duplicate axis: inconsistent behaviour #8849

Comments

urraca commented Nov 18, 2014

jreback commented Nov 18, 2014

urraca commented Nov 18, 2014

jreback commented Nov 18, 2014

mrocklin commented May 22, 2015