-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
get_group sometimes throws an exception when using an index of tuples with different lengths #8121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll take a look. While you're probably right that this shouldn't thrown an exception, storing containers in DataFrames is usually frowned upon. Something like In [14]: gr = df.groupby(pd.factorize(df.ids)[0])
In [15]: for i in gr.size().index:
....: print(i)
....: gr.get_group(i)
....:
0
1 is usually better (faster and I think clearer). pd.factorize also returns the labels if you need those. |
@dwiel This is what you're expecting, right? In [1]: good = pd.DataFrame([[1, 1, 1, 1], ['a', 'b', 'a', 'b']]).T
In [2]: bad = pd.DataFrame(pd.Series([(1,), (1,2), (1,), (1, 2)]), columns = ['
ids'])
In [3]: gg = good.groupby([0, 1])
In [4]: gb = bad.groupby('ids')
In [5]: good
Out[5]:
0 1
0 1 a
1 1 b
2 1 a
3 1 b
In [6]: bad
Out[6]:
ids
0 (1,)
1 (1, 2)
2 (1,)
3 (1, 2)
In [9]: def run(gr):
for i in gr.size().index:
print(i)
print(gr.get_group(i))
...:
In [10]: run(gg)
(1, 'a')
0 1
0 1 a
2 1 a
(1, 'b')
0 1
1 1 b
3 1 b
In [11]: run(gb)
(1,)
ids
0 (1,)
2 (1,)
(1, 2)
ids
1 (1, 2)
3 (1, 2) |
The factorize code does appear to do what I want. To your second comment that does look like how I would expect it to work. |
Should be fixed now. Like I said, you're probably better off with Thanks for the report! |
Thanks! On Wed, Aug 27, 2014 at 9:59 PM, Tom Augspurger [email protected]
|
Here is a simple test case that exposes the problem:
The issues is that in _get_index of GroupBy, these lines assume that if there is a tuple in the index, then the index is a multi-index, which in the above test case isn't true. Maybe there is some other way to detect that values are from a multi-index, or should pandas explicitly not support tuples in this situation (in an index of a groupby)
The text was updated successfully, but these errors were encountered: