Skip to content

MultiIndex reindex should behave like Index. #7895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rockg opened this issue Aug 1, 2014 · 7 comments
Open

MultiIndex reindex should behave like Index. #7895

rockg opened this issue Aug 1, 2014 · 7 comments
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@rockg
Copy link
Contributor

rockg commented Aug 1, 2014

xref #10347

I'm having trouble understand how exactly reindex is supposed to work in the case of MultiIndex. What I would like is the exact behavior in the Index case where values can be filled or padded according to some level of the MultiIndex. I can't understand what reindex is doing in the MultiIndex case. The below is an example where I would like to interpolate using the prior value a month frame to an hourly frame.

import pandas as pd
d1 = pd.date_range('1/1/2012', '3/1/2012', freq='MS')
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=d1)
dfu = df.unstack()

dfu
Out[12]: 
A  2012-01-01    1
   2012-02-01    2
   2012-03-01    3
B  2012-01-01    4
   2012-02-01    5
   2012-03-01    6
dtype: int64

d2 = pd.date_range('1/1/2012', '3/31/2012', freq='D')

dfu.reindex(d2, level=1) # What is this supposed to do?
Out[17]: 
A  2012-01-01    1
   2012-02-01    2
   2012-03-01    3
B  2012-01-01    4
   2012-02-01    5
   2012-03-01    6
dtype: int64

df.reindex(d2, method='pad').unstack() #Ideally something like this, but without the trickery
Out[19]: 
A  2012-01-01    1
   2012-01-02    1
   2012-01-03    1
   2012-01-04    1
   2012-01-05    1
   2012-01-06    1
   2012-01-07    1
   2012-01-08    1
   2012-01-09    1
   2012-01-10    1
   2012-01-11    1
   2012-01-12    1
   2012-01-13    1
   2012-01-14    1
   2012-01-15    1
...
B  2012-03-17    6
   2012-03-18    6
   2012-03-19    6
   2012-03-20    6
   2012-03-21    6
   2012-03-22    6
   2012-03-23    6
   2012-03-24    6
   2012-03-25    6
   2012-03-26    6
   2012-03-27    6
   2012-03-28    6
   2012-03-29    6
   2012-03-30    6
   2012-03-31    6
Length: 182, dtype: int64
@jreback
Copy link
Contributor

jreback commented Aug 1, 2014

we were just discussing this today
see #7886

@rockg
Copy link
Contributor Author

rockg commented Aug 1, 2014

That seemed kind of different to me, but maybe I'm not looking closely enough.

@jreback
Copy link
Contributor

jreback commented Aug 1, 2014

see also comments in #7867

@jreback
Copy link
Contributor

jreback commented Aug 1, 2014

you can do this easily with a groupby

something like

df.groupby(level=0).apply(lambda x: x.reindex(d2,method='pad')

prob could directly make reindex a groupby method (so make this syntactically nicer)
maybe have reindex to under the hood

@jreback
Copy link
Contributor

jreback commented Aug 1, 2014

Ya, I guess this is a bit awkward

In [29]: dfu.reset_index().groupby('level_0').apply(lambda x: x.set_index('level_1').reindex(d2,method='pad'))
Out[29]: 
                   level_0  0
level_0                      
A       2012-01-01       A  1
        2012-01-02       A  1
        2012-01-03       A  1
        2012-01-04       A  1
        2012-01-05       A  1
        2012-01-06       A  1
        2012-01-07       A  1
        2012-01-08       A  1
        2012-01-09       A  1
        2012-01-10       A  1
        2012-01-11       A  1
        2012-01-12       A  1
        2012-01-13       A  1
        2012-01-14       A  1
        2012-01-15       A  1
        2012-01-16       A  1
        2012-01-17       A  1
        2012-01-18       A  1
        2012-01-19       A  1
        2012-01-20       A  1
        2012-01-21       A  1
        2012-01-22       A  1
        2012-01-23       A  1
        2012-01-24       A  1
        2012-01-25       A  1
        2012-01-26       A  1
        2012-01-27       A  1
        2012-01-28       A  1
        2012-01-29       A  1
        2012-01-30       A  1
...                    ... ..
B       2012-03-02       B  6
        2012-03-03       B  6
        2012-03-04       B  6
        2012-03-05       B  6
        2012-03-06       B  6
        2012-03-07       B  6
        2012-03-08       B  6
        2012-03-09       B  6
        2012-03-10       B  6
        2012-03-11       B  6
        2012-03-12       B  6
        2012-03-13       B  6
        2012-03-14       B  6
        2012-03-15       B  6
        2012-03-16       B  6
        2012-03-17       B  6
        2012-03-18       B  6
        2012-03-19       B  6
        2012-03-20       B  6
        2012-03-21       B  6
        2012-03-22       B  6
        2012-03-23       B  6
        2012-03-24       B  6
        2012-03-25       B  6
        2012-03-26       B  6
        2012-03-27       B  6
        2012-03-28       B  6
        2012-03-29       B  6
        2012-03-30       B  6
        2012-03-31       B  6

[182 rows x 2 columns]

@markb-trustifi
Copy link

markb-trustifi commented Jun 22, 2018

In my case the solution has been to create a full multiindex by:
midx = pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max()), range(24)])
and after that to apply it to the original data frame:
df = df.reindex(midx)

@ms7463
Copy link
Contributor

ms7463 commented Dec 15, 2021

Could create a utility like this.

def mi_reindex(df, idx, axis=0):
    if axis == 1:
        df = df.T
    if not isinstance(idx, pd.Index):
        raise ValueError('idx must be an index object')
    if (not all(idx.names)) or (not all(df.index.names)):
        raise ValueError('All indexes must have non-null names')
    if set(idx.names) - set(df.index.names):
        raise ValueError('idx names must be a subset of df index names')

    meta = df.index.to_frame(index=False)
    idx_df = idx.to_frame(index=False)
    new_meta = meta.drop(idx_df.columns, axis=1).drop_duplicates().merge(idx_df, how='cross')[meta.columns]
    df = df.reindex(pd.MultiIndex.from_frame(new_meta))
    if axis == 1:
        df = df.T
    return df
dfu = dfu.rename_axis(['Letters', 'Dates'])
d2 = d2.rename('Dates')

mi_reindex(dfu, d2).ffill()

@jreback - does there exist a "blessed" library for pandas utils that don't quite belong in core, but show up often enough to warrant being cataloged somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

6 participants