-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: reindex on index in a frame using a not-None method is buggy #5669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I do not think this is a bug, and the current behavior in my opinion makes more sense. 'nan' values can be valid "actual" values in some scenarios. the concept of an actual 'nan' value should be different from 'nan' value because of changing index. If I have a dataframe like this:
and i want to keep all nan as nan, it makes much more sense to have:
just take whatever value there is ( nan or not nan ) and fill forward until the next available index. This is completely different from
which produces
TLDR: in reindexing a dataframe, forward flll means just take whatever value there is ( nan or not nan ) and fill forward until the next available index. otherwise, you have no choice but to fill 'nan' values simply because you want to reindex. Reindexing should not enforce a mandatory fillna on the data. |
your middle example can be done by taking a union of the existing indices and the new, then forward-filling you can also specify a |
why?! why this way? just keep reindexing as reindexing, and fillna as fillna. a "nan" value can be an actual valid value, when you reindex with ffill you want to take all the actual values ( nan or not nan) and fill forward until the next available index. Reindexing should not enforce a mandatory fillna on the data. |
I am not sure what you mean by this. by definition FYI, this behavior has been their since as far as I can remember. It IS tested for in the context of a non-monotonic index (this is an error), but not in the general case. Series works this way, the bug is on DataFrame which does not. |
If I want to what you are saying is that, Here is an example: If this is the behavior for the time series then maybe there shoudl be a bug report there. If I want to |
see here for the current docs: http://pandas.pydata.org/pandas-docs/dev/basics.html#filling-while-reindexing can you make your words into an example and show me what you mean? |
say you have trade data across markets, and you are measuring the correlations across these markets: Tokyo, London, New York, Chicago. These market open and close at different hours during the day, so for example for periods during the day you can measure now, if I is the example clear? |
We came across this as well. We expected the behaviour similar to that @behzadnouri described i.e. forward fill within buckets. Hopefully the examples below explain what I mean by this. import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(10.), index=range(10), columns=['A'])
df.iloc[2] = np.nan
df.iloc[5:8] = np.nan
df
# A
# 0 0
# 1 1
# 2 NaN
# 3 3
# 4 4
# 5 NaN
# 6 NaN
# 7 NaN
# 8 NaN
# 9 9
# Straight reindex, no fill. Value for 2, 6 and 8 should be np.nan.
df.reindex(range(0, 10, 2))
# A
# 0 0
# 2 NaN
# 4 4
# 6 NaN
# 8 NaN
# reindex with ffill - current behaviour - gets the wrong value for 2.
df.reindex(range(0, 10, 2), method='ffill')
# A
# 0 0
# 2 NaN # should not be NaN
# 4 4
# 6 NaN
# 8 NaN
# reindex with ffill - expected behaviour - fill within "buckets" - so we expect
# value for 2 to be 1 (ffilled from 1) but values for 6, 8 to be NaN (no data to ffill).
df.reindex(range(0, 10, 2), method='ffill')
# A
# 0 0
# 2 1
# 4 4
# 6 NaN
# 8 NaN
# behaviour when reindexing then ffilling - note that this is different to reindex with
# method='ffill' because we ffill *after* the reindex instead of during the reindex
# In particular the value for 2 is now 0 not 1 and the for 6 and 8 we have value 4.
df.reindex(range(0, 10, 2)).ffill()
# A
# 0 0
# 2 0
# 4 4
# 6 4
# 8 4 |
Here's one way to do this; essentially your own groupby
Anotherway is to introduce a multindex where your data 'breaks' and treat the groups separately.
HTH |
reported here:
http://stackoverflow.com/questions/20459782/what-is-the-functionality-of-the-filling-method-when-reindexing
does not appear to be any tests for it, nor is fix that trivial
The text was updated successfully, but these errors were encountered: