API: selection from a duplicated index of a Series return a Series? #5678

floux · 2013-12-11T13:10:33Z

See below for discussion

In [19]: import pandas as pd

In [20]: import numpy as np

In [21]: import random

In [22]: df = pd.DataFrame(np.random.random_sample((20,5)), index=[random.choice('ABCDE') for x in range(20)])

In [23]: df.loc[:,0].ix['A'].median()
Out[23]: 0.57704085832236685

In [24]: pd.version.version
Out[24]: '0.12.0'


In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import random

In [4]: df = pd.DataFrame(np.random.random_sample((20,5)), index=[random.choice('ABCDE') for x in range(20)])

In [5]: df.loc[:,0].ix['A'].median()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-57f9fc9d1583> in <module>()
----> 1 df.loc[:,0].ix['A'].median()

AttributeError: 'numpy.ndarray' object has no attribute 'median'

In [6]: pd.version.version
Out[6]: '0.13.0rc1-43-g4f9fefc'

The text was updated successfully, but these errors were encountered:

jreback · 2013-12-11T14:14:53Z

median works correctly.

In [5]: df.apply(lambda x: x.median())
Out[5]: 
0    0.478124
1    0.540118
2    0.325508
3    0.531052
4    0.537722
dtype: float64

In [6]: df.median()
Out[6]: 
0    0.478124
1    0.540118
2    0.325508
3    0.531052
4    0.537722
dtype: float64

In [8]: df.loc[:,0]
Out[8]: 
B    0.910850
C    0.495885
C    0.617074
A    0.382164
D    0.335341
B    0.705503
C    0.543724
D    0.742013
C    0.824611
C    0.272631
E    0.773018
C    0.460364
C    0.511589
D    0.662656
E    0.291188
E    0.031795
A    0.025877
D    0.127890
B    0.457107
C    0.446388
Name: 0, dtype: float64

You are selecting out of a duplicated index Series. You could argue that you should bet back another Series, however, if its a unique indexed Series you would be getting back a scalar (a float). I will mark this as a bug, but for the selection of a duplicate from a Series.

In [9]: df.loc[:,0].ix['A']
Out[9]: array([ 0.3821636 ,  0.02587671])

Select like this.

In [10]: df.loc['A',0]
Out[10]: 
A    0.382164
A    0.025877
Name: 0, dtype: float64

jreback · 2013-12-11T15:27:27Z

all fixed up
that was a regression from 0.12

floux · 2013-12-11T16:52:29Z

Cool! Thanks a lot!

jreback · 2013-12-11T18:05:38Z

gr8...its merged into master so you should be good to go...

jreback mentioned this issue Dec 11, 2013

BUG: Bug in repeated indexing of object with resultant non-unique index (GH5678) #5680

Merged

jreback closed this as completed in #5680 Dec 11, 2013

toobaz mentioned this issue Feb 19, 2015

API: Please make ".loc" return type depend on index, not on specific labels #9519

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: selection from a duplicated index of a Series return a Series? #5678

API: selection from a duplicated index of a Series return a Series? #5678

floux commented Dec 11, 2013

jreback commented Dec 11, 2013

jreback commented Dec 11, 2013

floux commented Dec 11, 2013

jreback commented Dec 11, 2013

API: selection from a duplicated index of a Series return a Series? #5678

API: selection from a duplicated index of a Series return a Series? #5678

Comments

floux commented Dec 11, 2013

jreback commented Dec 11, 2013

jreback commented Dec 11, 2013

floux commented Dec 11, 2013

jreback commented Dec 11, 2013