limiting reindex with MultiIndex ffill/bfill within levels. #10347

bwillers · 2015-06-13T13:35:44Z

When reindexing on a multiindex with method='ffill' or method='bfill', it would be very useful to be able to restrict the fill to certain groups/levels of the index.

For example, consider the following:

In [1]: dates = pd.date_range(start=pd.Timestamp('20080102'), 
                              periods=3, freq='7D')
In [2]: names = ['jane', 'john']
In [3]: index = pd.MultiIndex.from_product([names, dates], 
                                           names=['name', 'date'])
In [4]: df = pd.DataFrame(index=index, 
                          data={'best_score': [1, 2, 3, None, 5, 6]})
In [5]: df
Out[5]:
                 best_score
name date
jane 2008-01-02           1
     2008-01-09           2
     2008-01-16           3
john 2008-01-02         NaN
     2008-01-09           5
     2008-01-16           6

In [7]: new_dates = [pd.Timestamp('20080101'), pd.Timestamp('20080117')]
In [8]: new_index = pd.MultiIndex.from_tuples([('jane', new_dates[1]),
                                               ('john', new_dates[0]), 
                                               ('john', new_dates[1])], 
                                              names=['name', 'date'])

In [9]: df.reindex(new_index, method='ffill')
Out[9]:
                 best_score
name date
jane 2008-01-17           3
john 2008-01-01           3
     2008-01-17           6

Clearly john's score on 2008-01-01 is not 3, it's NaN. What would be great (ignoring the awful argument name) is something like:

df.reindex(new_index, method='ffill', fill_group_level=['name'])

                 best_score
name date
jane 2008-01-17           3
john 2008-01-01           NaN
     2008-01-17           6

This generalizes to indexes with more than two levels. In effect, it amounts to being able to specify a set of boundaries for the ffill/bfill based on changes in level values. I don't think this can be done in a straightforward way with a groupby(level='name') because the values of the index in the second level are not the same for every group.

The text was updated successfully, but these errors were encountered:

jreback · 2015-06-13T14:33:18Z

this is a dupe of #7895
You want this:

In [22]: df.groupby(level='name').apply(lambda x: x.reset_index(level=0,drop=True).reindex(new_dates,method='ffill'))
Out[22]: 
                 best_score
name date                  
jane 2008-01-01         NaN
     2008-01-17           3
john 2008-01-01         NaN
     2008-01-17           6

But you actually want it to work with this syntax

df.reindex(new_dates,method='ffill',level='date')
which is not supported ATM (but a good place for it TO work).

I don't think there is an easy way to support a multi-level reindex as you have indicated (and IMHO too complicated). Better to allow a single level fill, which is what you ultimately want).

bwillers · 2015-06-13T17:35:29Z

Thanks for the comments. The groupby approach you suggested doesn't do quite the same thing - in the example the index being used intentionally does not have the same date values for every name, whereas the groupby approach assumes the same dates for every group. I guess you could then subset the dates based on the exid, but it gets hairy pretty quickly.

I did have a look at the issue you referenced but I'm not sure these are the same.

#7895 involves taking a multiindex A, and reindexing it with a plain index
(or a multiindex with fewer levels) B (i.e. len(A.names) > len(B.names)), to broadcast across the levels that are absent in B. Coming up with sane and consistent broadcasting semantics for arbitrary multi indexes seems a very complex task.

In conrast, this issue is about taking a multiindex A and reindexing it with a multiindex C with the same number/name/type of levels (i.e. A.names == C.names), there's no broadcasting involved. The only thing thats different from a vanilla df.reindex(C, method='ffill') is changing how far back/forward the ffill and bfill methods look to find a value, based on the levels passed. So the end result ends up looking a lot like what you would get if you line up the frames with an ordered left merge by name (related: #1870).

jreback · 2015-06-13T17:39:43Z

well, the interface is simply as I have stated above. So this needs to be addressed in the Index.reindex method (for MultiIndex). Its not implemented ATM, so feel free to have a crack at it. Forcing the user to specify a multi-level reindex with a filler is pretty complicated. This should work with a single level specified. Its possible that the multi-reindex with fill should simply be not allowed (if it gives the 'wrong' answer).

bwillers · 2015-06-13T17:48:57Z

Will take a crack at it, seems like a good reason to figure out how all this stuff works under the covers.

jreback · 2015-06-13T17:50:53Z

awesome!

mroeschke · 2024-09-12T21:30:18Z

Thanks for the request, but it appears there hasn't been much interest or activity in this feature for years so closing

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex labels Jun 13, 2015

jreback added this to the Next Major Release milestone Jun 13, 2015

rockg mentioned this issue Jun 13, 2015

MultiIndex reindex should behave like Index. #7895

Open

jreback added Difficulty Advanced labels Jun 13, 2015

chris-b1 mentioned this issue Feb 13, 2017

reindex() doesn't work with MultiIndex #15384

Closed

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added Enhancement and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 18, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

mroeschke closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limiting reindex with MultiIndex ffill/bfill within levels. #10347

limiting reindex with MultiIndex ffill/bfill within levels. #10347

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

mroeschke commented Sep 12, 2024

limiting reindex with MultiIndex ffill/bfill within levels. #10347

limiting reindex with MultiIndex ffill/bfill within levels. #10347

Comments

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

bwillers commented Jun 13, 2015

jreback commented Jun 13, 2015

mroeschke commented Sep 12, 2024