-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: partial nan levels missing when grouping #10484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
Maybe this example makes my point from #10468 a little clearer: In [18]: df1 = pd.DataFrame({'A' : [1,2,3]},index=pd.MultiIndex.from_tuples([(1,np.nan),(1,1),(2,1)],names=['first','second']))
In [19]: df1
Out[19]:
A
first second
1 NaN 1
1 2
2 1 3
[3 rows x 1 columns]
In [20]: df1.groupby(level=['first','second']).groups
Out[20]: {(1, 1): [(1, nan), (1, 1.0)], (2, 1): [(2, 1.0)]}
In [21]: df2 = df1.reset_index()
In [22]: df2
Out[22]:
first second A
0 1 NaN 1
1 1 1 2
2 2 1 3
[3 rows x 3 columns]
In [23]: df2.groupby(['first','second']).groups
Out[23]: {(1, nan): [0], (1, 1.0): [1], (2, 1.0): [2]} Out[20] skips (1, nan), but Out[23] doesn't. As the documentation says NaN in the grouping key will be skipped here, I would rather argue that Out[23] should skip (1, nan). Out[43] from jreback above would actually be correct. |
4 tasks
The np.nan group is included in the result when you specify |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
from #10468
I suppose that [43] is missing the
(1,nan)
group here.The text was updated successfully, but these errors were encountered: