BUG: reindex on index in a frame using a not-None method is buggy #5669

jreback · 2013-12-09T16:20:44Z

reported here:

http://stackoverflow.com/questions/20459782/what-is-the-functionality-of-the-filling-method-when-reindexing

does not appear to be any tests for it, nor is fix that trivial

import pandas as pd
import numpy as np

hf_index = pd.date_range(start='2013-05-09 9:00', end='2013-05-13 23:59', freq='1min')
hf_prices = np.random.rand(len(hf_index))
hf = pd.DataFrame(hf_prices, index=hf_index)
hf.ix['2013-05-10 18:00':'2013-05-13 18:00',:]=np.nan
ind_daily = pd.date_range(start='2013-05-09 16:00', end='2013-05-13 16:00', freq='B')
daily1 = hf.reindex(index=ind_daily, method='ffill')

The text was updated successfully, but these errors were encountered:

behzadnouri · 2013-12-09T20:33:58Z

I do not think this is a bug, and the current behavior in my opinion makes more sense. 'nan' values can be valid "actual" values in some scenarios. the concept of an actual 'nan' value should be different from 'nan' value because of changing index. If I have a dataframe like this:

       A      B      C
1  1.242    NaN  0.110
3    NaN -0.185 -0.209
5 -0.581  1.483    NaN

and i want to keep all nan as nan, it makes much more sense to have:

 df.reindex( [2, 4, 6], method='ffill' )
        A      B      C
2  1.242    NaN  0.110
4    NaN -0.185 -0.209
6 -0.581  1.483    NaN

just take whatever value there is ( nan or not nan ) and fill forward until the next available index. This is completely different from

df.reindex( [2, 4, 6], method=None )

which produces

    A   B   C
2 NaN NaN NaN
4 NaN NaN NaN
6 NaN NaN NaN

TLDR: in reindexing a dataframe, forward flll means just take whatever value there is ( nan or not nan ) and fill forward until the next available index. otherwise, you have no choice but to fill 'nan' values simply because you want to reindex. Reindexing should not enforce a mandatory fillna on the data.

jreback · 2013-12-09T20:38:14Z

df.reindex(new_index,method=''ffill') should be equivalent of df.reindex(new_index).ffill()

your middle example can be done by taking a union of the existing indices and the new, then forward-filling

you can also specify a fill_value if you want something other than nan

behzadnouri · 2013-12-09T20:43:21Z

"your middle example can be done by taking a union of the existing indices and the new, then forward-filling"

why?! why this way?

just keep reindexing as reindexing, and fillna as fillna. a "nan" value can be an actual valid value, when you reindex with ffill you want to take all the actual values ( nan or not nan) and fill forward until the next available index. Reindexing should not enforce a mandatory fillna on the data.

jreback · 2013-12-09T20:51:25Z

I am not sure what you mean by this. Reindexing should not enforce a mandatory fillna on the data.

by definition np.nan is the marker for missing data. you have the option to provide a fill-forward if you want (or not); you can also fill with a specific value (fill_value=). But reindexing will by definition possibly create missing values. Not sure that you can have both a missing value and a np.nan (unless you want to swap the nan to something else first).

FYI, this behavior has been their since as far as I can remember. It IS tested for in the context of a non-monotonic index (this is an error), but not in the general case.

Series works this way, the bug is on DataFrame which does not.

behzadnouri · 2013-12-09T21:04:09Z

If I want to reindex with ffill just to forward fill whatever value is in the original dataframe ( again nan or not nan ) until the next available index, but I do not want to fillna what should I do?

what you are saying is that, reindex will do the fillna for me and then I have to revert that.

Here is an example: np.nan can just mean not applicable; say i have hourly data, and on weekends some calculations are just not applicable. I will fill nan for those columns during the weekends. now if I reindex to finer index, say every minute, the reindex will pick the last value from Friday, and fill it out for the whole weekend. This is wrong.

If this is the behavior for the time series then maybe there shoudl be a bug report there.

If I want to fillna I can always call fillna directly.

jreback · 2013-12-09T21:08:24Z

see here for the current docs: http://pandas.pydata.org/pandas-docs/dev/basics.html#filling-while-reindexing
Series currently does this, DataFrame does not

can you make your words into an example and show me what you mean?

behzadnouri · 2013-12-09T21:25:26Z

say you have trade data across markets, and you are measuring the correlations across these markets: Tokyo, London, New York, Chicago.

These market open and close at different hours during the day, so for example for periods during the day you can measure corr( New York, London ) but you just have to fill nan for corr( New York, Tokyo ) at say 11am EST simply because it is not possible to measure the correlation when the market is closed.

now, if I reindex the time series into a different frequency( say every half hour), it should not fill out nan values in the dataframe at the time the market is closed. It should just forward fill whatever is in the original dataframe.

is the example clear?

dbew · 2014-01-30T15:31:29Z

We came across this as well. We expected the behaviour similar to that @behzadnouri described i.e. forward fill within buckets. Hopefully the examples below explain what I mean by this.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(10.), index=range(10), columns=['A'])
df.iloc[2] = np.nan
df.iloc[5:8] = np.nan
df
#     A
# 0   0
# 1   1
# 2 NaN
# 3   3
# 4   4
# 5 NaN
# 6 NaN
# 7 NaN
# 8 NaN
# 9   9


# Straight reindex, no fill. Value for 2, 6 and 8 should be np.nan.
df.reindex(range(0, 10, 2))
#     A
# 0   0
# 2 NaN
# 4   4
# 6 NaN
# 8 NaN


# reindex with ffill - current behaviour - gets the wrong value for 2.
df.reindex(range(0, 10, 2), method='ffill')
#     A
# 0   0
# 2 NaN   # should not be NaN
# 4   4
# 6 NaN
# 8 NaN


# reindex with ffill - expected behaviour - fill within "buckets" - so we expect
# value for 2 to be 1 (ffilled from 1) but values for 6, 8 to be NaN (no data to ffill).
df.reindex(range(0, 10, 2), method='ffill')
#     A
# 0   0
# 2   1
# 4   4
# 6 NaN
# 8 NaN


# behaviour when reindexing then ffilling - note that this is different to reindex with
# method='ffill' because we ffill *after* the reindex instead of during the reindex
# In particular the value for 2 is now 0 not 1 and the for 6 and 8 we have value 4.
df.reindex(range(0, 10, 2)).ffill()
#    A
# 0  0
# 2  0
# 4  4
# 6  4
# 8  4

jreback · 2014-01-30T15:44:18Z

Here's one way to do this; essentially your own groupby

In [36]: concat([df.loc[:4].ffill(),df.loc[5:].ffill()]).reindex(range(0,10,2))
Out[36]: 
    A
0   0
2   1
4   4
6 NaN
8 NaN

[5 rows x 1 columns]

Anotherway is to introduce a multindex where your data 'breaks' and treat the groups separately.
(this is not exactly your result, but close); MultiIndex.from_product is new in 0.13.1

In [55]: df.index = MultiIndex.from_product([list('ab'),list(range(5))])

In [56]: df
Out[56]: 
      A
a 0   0
  1   1
  2 NaN
  3   3
  4   4
b 0 NaN
  1 NaN
  2 NaN
  3 NaN
  4   9

[10 rows x 1 columns]

In [54]: df.groupby(level=0).apply(lambda x: x.ffill().reset_index(drop=True).reindex(range(0,5,2))).reset_index(drop=True)
Out[54]: 
    A
0   0
1   1
2   4
3 NaN
4 NaN
5   9

[6 rows x 1 columns]

HTH

leungwk mentioned this issue Mar 7, 2014

xs is filling nan in index with its last item, as if sorted ascending, in the resulting index #6574

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Apr 6, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jreback mentioned this issue Oct 12, 2015

DOC: Improve reindex examples and docstring #10996

Merged

skatenerd mentioned this issue Oct 18, 2018

Pandas 0.23.4 reindexing multiindexed frame with ffill confusing output #23225

Open

mroeschke removed the Indexing Related to indexing on series/frames, not to indexes themselves label Apr 11, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: reindex on index in a frame using a not-None method is buggy #5669

BUG: reindex on index in a frame using a not-None method is buggy #5669

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

dbew commented Jan 30, 2014 •

edited by mroeschke

Loading

jreback commented Jan 30, 2014

BUG: reindex on index in a frame using a not-None method is buggy #5669

BUG: reindex on index in a frame using a not-None method is buggy #5669

Comments

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

jreback commented Dec 9, 2013

behzadnouri commented Dec 9, 2013

dbew commented Jan 30, 2014 • edited by mroeschke Loading

jreback commented Jan 30, 2014

dbew commented Jan 30, 2014 •

edited by mroeschke

Loading