Groupby filter doesn't work with repeated index #4620

hayd · 2013-08-21T15:33:38Z

Example:

In [1]: data = pandas.DataFrame(
    {'pid' : [1,1,1,2,2,3,3,3],
     'tag' : [23,45,62,24,45,34,25,62],
     }, index=[0] * 8)

In [2]: g = data.groupby('tag')

In [3]: g.filter(lambda x: len(x) > 1)
Exception: Reindexing only valid with uniquely valued Index objects

cc #3680

(I had thought there was also a similar behaviour with transform, but that seems ok.)

The text was updated successfully, but these errors were encountered:

hayd · 2013-10-14T22:15:07Z

cc @danielballan while you're looking at the related issue :)

danielballan · 2013-10-29T15:43:55Z

There is a problem with transform also. There are four cases to consider: filtering and transforming both DataFrame- and SeriesGroupBy. Transforming a DataFrameGroupBy works.

In [28]: df_groupby = data.groupby('tag') # a DataFrameGroupBy object

In [29]: df_groupby.transform(len)
Out[29]: 
   pid
0    1
0    1
0    1
0    1
0    2
0    2
0    2
0    2

But the other three cases do not.

In [30]: df_groupby.filter(lambda x: len(x) > 1)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

In [31]: ser_groupby = data['pid'].groupby(data['tag'])

In [32]: ser_groupby.transform(len)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

In [33]: ser_groupby.filter(lambda x: len(x) > 1)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects

danielballan · 2013-10-29T18:04:02Z

Can anyone explain why this is not a valid use of get_indexer?

DataFrame([1,2,3], [0, 0, 1]).index.get_indexer([0])
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-42-ad97732098af> in <module>()
----> 1 DataFrame([1,2,3], [0, 0, 1]).index.get_indexer([0])

/home/dallan/pandas-danielballan/pandas/core/index.py in get_indexer(self, target, method, limit)
   1112 
   1113         if not self.is_unique:
-> 1114             raise InvalidIndexError('Reindexing only valid with uniquely'
   1115                                     ' valued Index objects')
   1116 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

I would expect the output array([0, 1]).

jreback · 2013-10-29T18:08:57Z

You need to know if they are unique or not, index.is_unique, then you can use get_indexer, else you can use get_indexer_non_unique (they are separate because you don't always want/need to do this generally), and finding non-unique indexers is somewhat more expensive perf wise (though not much)

In [6]: df.index.get_indexer_non_unique([0])
Out[6]: (Int64Index([0, 1], dtype='int64'), array([], dtype=int64))

danielballan · 2013-10-29T18:10:25Z

Ah, thanks.

danielballan · 2013-10-29T19:10:43Z

For reference, get_indexer_non_unique was not the key here. See commit notes on #5375 .

…4620

BUG/TST: transform and filter on non-unique index, closes #4620

danielballan mentioned this issue Oct 29, 2013

BUG/TST: transform and filter on non-unique index, closes #4620 #5375

Merged

danielballan added a commit to danielballan/pandas that referenced this issue Oct 31, 2013

BUG/TST: transform and filter on non-unique index, closes pandas-dev#…

1a47ee4

…4620

jreback closed this as completed in #5375 Nov 1, 2013

jreback added a commit that referenced this issue Nov 1, 2013

Merge pull request #5375 from danielballan/filter-nonunique

7df68f6

BUG/TST: transform and filter on non-unique index, closes #4620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Groupby filter doesn't work with repeated index #4620

Groupby filter doesn't work with repeated index #4620

hayd commented Aug 21, 2013

hayd commented Oct 14, 2013

danielballan commented Oct 29, 2013

danielballan commented Oct 29, 2013

jreback commented Oct 29, 2013

danielballan commented Oct 29, 2013

danielballan commented Oct 29, 2013

Groupby filter doesn't work with repeated index #4620

Groupby filter doesn't work with repeated index #4620

Comments

hayd commented Aug 21, 2013

hayd commented Oct 14, 2013

danielballan commented Oct 29, 2013

danielballan commented Oct 29, 2013

jreback commented Oct 29, 2013

danielballan commented Oct 29, 2013

danielballan commented Oct 29, 2013