Skip to content

Series pct_change fill_method behavior #25006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Loupiol opened this issue Jan 29, 2019 · 3 comments
Open

Series pct_change fill_method behavior #25006

Loupiol opened this issue Jan 29, 2019 · 3 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@Loupiol
Copy link

Loupiol commented Jan 29, 2019

In[2]: import pandas as pd
  ...: import numpy as np
  ...: pd.__version__
Out[2]: u'0.23.4'

In[3]: ts = pd.Series([np.nan, 1., 2., 3., np.nan, 4., np.nan])
In[4]: ts.pct_change(fill_method = None)
Out[4]: 
0    NaN
1    NaN
2    1.0
3    0.5
4    NaN
5    NaN
6    NaN
dtype: float64

In[5]: ts.pct_change(fill_method = 'pad')
Out[5]: 
0         NaN
1         NaN
2    1.000000
3    0.500000
4    0.000000
5    0.333333
6    0.000000
dtype: float64

In[6]: ts.pct_change(fill_method = 'pad').mask(ts.isnull())
Out[6]: 
0         NaN
1         NaN
2    1.000000
3    0.500000
4         NaN
5    0.333333
6         NaN
dtype: float64

Hello,

After recently updating my version, I noticed a change in behavior of pct_change with missing data. This is related to #19873 .

First example without fill_method is as expected. The second example is the result now and the third is what it used to be. I think the user should be able to choose if she prefers the second or third behavior. I agree that the second example is correct, as it forward fills as expected, but if the time series is a stock price for example, returns on missing days (holidays) were not 0, which can bias some statistics.

I would suggest adding a new parameter, like skipna. I could not find any solution with existing parameters, if I missed something please let me know.

Thanks

@WillAyd
Copy link
Member

WillAyd commented Jan 30, 2019

Thanks for the issue and clear example. I think skipna could make sense as an argument here. PRs are always welcome!

@WillAyd WillAyd added this to the Contributions Welcome milestone Jan 30, 2019
@albertvillanova
Copy link
Contributor

@WillAyd and what should be the expected result for a DataFrame with non-aligned NaNs?

df = pd.DataFrame({'a': [np.nan, 1., 2., 3., np.nan, 4., np.nan], 
                   'b': [np.nan, np.nan,  1.,  2.,  3., np.nan,  4.]})

@WillAyd
Copy link
Member

WillAyd commented Feb 11, 2019

@albertvillanova not sure I understand the distinction you are trying to make; this should work against each series in a DataFrame individually

albertvillanova pushed a commit to albertvillanova/pandas that referenced this issue Feb 12, 2019
@gfyoung gfyoung added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Feb 16, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 19, 2019
@jorisvandenbossche jorisvandenbossche removed this from the 0.25.0 milestone Jun 30, 2019
@mroeschke mroeschke added Enhancement Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed API Design labels Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants