-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
NA is not included in MultiIndex.levels if we construct MI with nan #30750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This caused me a headache in the past - when merging together two dataframes by a multiindex, all the wrong rows seemed to match up, even when the multiindexes in each dataframe had the same values and the same inferred type. The root cause seemed to be because the |
cc @topper-123 if you have thoughts here. |
Related discussion: #29111 |
Seems like this is a duplicate of #29111? Or are there differences? |
@TomAugspurger slightly different i think, since in the example of #29111 if defining the MI through Feel free to close if you think it is duplicate. |
>>>: mi.to_frame()
a b
a b
A B A B
NaN A NaN
B A B A So while the Categorical probaly could have been implemented differently (e.g. the nan being the 0 index of the categories/level). It would also have the benefit that Categoricals could be based on Uint64Index insetad of Int64Index, which would have the additional benefit that we could double the unique values in a Categorical (256 for 8bit instead of 128 etc.). OTOH, changing that now might break backwards compatability in a big way, which won't fly. I'd welcome thought if this can be implemented, but no (or maybe minimal) breakage would be needed for it to be accepted. |
thanks for your reply, @topper-123 ! |
Not sure I understand; But I doubt it's possible; e.g. just going fom Int64Index to UInt64Index would likely be very large change in itself. and adding nan to categories is also a huge change. |
yeah, sorry for bad interpretation, i meant to provide option to include nan in categorical and mi levels. and indeed, we have @topper-123 but you are right, there seems a lot of changes to happen for such change. |
If we construct MI with
nan
, and check thelevels
, output does not containnan
,Tracking it down, this is due to
pd.Categorical
does not include NA incategories
:While
inferred_type
does indicate it is a mixed type, sonp.nan
should be accepted.However, if the
nan
is gotten by operations, thenan
is included in levels, e.g.This is quite inconsistent though, is it an intended behaviour?
The text was updated successfully, but these errors were encountered: