-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: should reindex on a level introduce NaNs for missing entries per label of other levels? #12319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If this is a leaf level then its easy to do this, but what if its not, then do you do a cartesian product of everything below that level? we could, but would want a |
xref #7895 |
Curious why the expected behavior should be different? |
I'm spent a few hours looking into the issue, but I am not familiar enough with index joins to know what the problem is. I think it might be happening in the function Within the function Finally near the end of the function Hope that helps someone willing to fix it. |
I ran into this yesterday and it caught me by surprise. If you do a So, yes, I would have very much expected it to be a cartesian product between the new index (passed into reindex) and all unique entries of the old index after the selected level got removed. For example, if I have a dataframe with a 3-level multi index: df = pd.DataFrame(
np.arange(10).reshape(5, 2),
index=pd.MultiIndex.from_tuples(
[
("A", "a", 2),
("A", "b", 0),
("B", "a", 1),
("B", "a", 3),
("B", "d", 1),
]
),
)
and I reindex the "middle" index using
Which is equivalent to this (naive, but hopefully easy to read) python implementation: level = 1
remaining_levels = df.reset_index(level=level, drop=True).index.unique()
new_index_tuples = []
for old_levels in remaining_levels:
for new_level in new_index:
new_index_tuples.append((*old_levels[:level], new_level, *old_levels[level:]))
target_index = pd.MultiIndex.from_tuples(new_index_tuples)
df.reindex(target_index) (see also: https://stackoverflow.com/questions/75106282/how-to-reindex-a-datetime-based-multiindex-in-pandas/75106323) |
Suppose the following dataframe and reindex operation:
Should this give the following?
I am not sure what the exact behaviour of the
level
keyword should be, but eg in the following example it does the selection of columns for each of label of the other level:The text was updated successfully, but these errors were encountered: