Skip to content

BUG: asfreq / pct_change strange behavior #7292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shura-v opened this issue May 31, 2014 · 8 comments
Closed

BUG: asfreq / pct_change strange behavior #7292

shura-v opened this issue May 31, 2014 · 8 comments

Comments

@shura-v
Copy link

shura-v commented May 31, 2014

In the first case, is it a bug (those NaNs at the end) or a feature? I just don't get the reason behind this behavior:

In[5]: s = Series(range(10), date_range('2014', periods=10, freq='H'))
In[6]: s
Out[6]: 
2014-05-31 00:00:00    0
2014-05-31 01:00:00    1
2014-05-31 02:00:00    2
2014-05-31 03:00:00    3
2014-05-31 04:00:00    4
2014-05-31 05:00:00    5
2014-05-31 06:00:00    6
2014-05-31 07:00:00    7
2014-05-31 08:00:00    8
2014-05-31 09:00:00    9
Freq: H, dtype: int64
In[7]: s.pct_change(periods=1, freq='5H')
Out[7]: 
2014-05-31 00:00:00         NaN
2014-05-31 01:00:00         NaN
2014-05-31 02:00:00         NaN
2014-05-31 03:00:00         NaN
2014-05-31 04:00:00         NaN
2014-05-31 05:00:00         inf
2014-05-31 06:00:00    5.000000
2014-05-31 07:00:00    2.500000
2014-05-31 08:00:00    1.666667
2014-05-31 09:00:00    1.250000
2014-05-31 10:00:00         NaN
2014-05-31 11:00:00         NaN
2014-05-31 12:00:00         NaN
2014-05-31 13:00:00         NaN
2014-05-31 14:00:00         NaN
dtype: float64

but this seems ok:

In[8]: s.pct_change(periods=5)
Out[8]: 
2014-05-31 00:00:00         NaN
2014-05-31 01:00:00         NaN
2014-05-31 02:00:00         NaN
2014-05-31 03:00:00         NaN
2014-05-31 04:00:00         NaN
2014-05-31 05:00:00         inf
2014-05-31 06:00:00    5.000000
2014-05-31 07:00:00    2.500000
2014-05-31 08:00:00    1.666667
2014-05-31 09:00:00    1.250000
Freq: H, dtype: float64
@hayd hayd added the Frequency label May 31, 2014
@jreback
Copy link
Contributor

jreback commented Jun 1, 2014

So pct_change is just s divided by its 5H shift (slightly more complicated as it handles various fill methods). So this, while I agree looks a bit odd, seems correct. That said I could also see that it should reindex to the original series

In [6]: s
Out[6]: 
2014-06-01 00:00:00    0
2014-06-01 01:00:00    1
2014-06-01 02:00:00    2
2014-06-01 03:00:00    3
2014-06-01 04:00:00    4
2014-06-01 05:00:00    5
2014-06-01 06:00:00    6
2014-06-01 07:00:00    7
2014-06-01 08:00:00    8
2014-06-01 09:00:00    9
Freq: H, dtype: int64

In [7]: s.shift(freq='5H')
Out[7]: 
2014-06-01 05:00:00    0
2014-06-01 06:00:00    1
2014-06-01 07:00:00    2
2014-06-01 08:00:00    3
2014-06-01 09:00:00    4
2014-06-01 10:00:00    5
2014-06-01 11:00:00    6
2014-06-01 12:00:00    7
2014-06-01 13:00:00    8
2014-06-01 14:00:00    9
dtype: int64

Proposed

In [10]: s.div(s.shift(freq='5H')).reindex_like(s)
Out[10]: 
2014-06-01 00:00:00         NaN
2014-06-01 01:00:00         NaN
2014-06-01 02:00:00         NaN
2014-06-01 03:00:00         NaN
2014-06-01 04:00:00         NaN
2014-06-01 05:00:00         inf
2014-06-01 06:00:00    6.000000
2014-06-01 07:00:00    3.500000
2014-06-01 08:00:00    2.666667
2014-06-01 09:00:00    2.250000
Freq: H, dtype: float64

@jreback
Copy link
Contributor

jreback commented Jun 1, 2014

want to do a pr to fix this?

@jreback jreback added the Bug label Jun 1, 2014
@jreback jreback added this to the 0.14.1 milestone Jun 1, 2014
@jreback
Copy link
Contributor

jreback commented Jun 10, 2014

@shura-v how's this coming?

@jreback
Copy link
Contributor

jreback commented Jun 22, 2014

@shura-v ?

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 22, 2014
@jreback jreback changed the title pct_change strange behavior BUG: asfreq / pct_change strange behavior Jun 26, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014
@shura-v
Copy link
Author

shura-v commented Jul 15, 2014

I made pull request on June 2nd:
Here is my commit: https://github.com/shura-v/pandas/commit/eb250b6cead505bad3cf67e838f975b6148cdeae

@jreback
Copy link
Contributor

jreback commented Jul 15, 2014

needs some tests

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@minggli
Copy link
Contributor

minggli commented Jan 25, 2018

Hi, will attempt to work on this issue. reverting.

@minggli
Copy link
Contributor

minggli commented Jan 26, 2018

@jreback

since pandas.core.generic.NDFrame.pct_change essentially calls s.shift method to work out pct change, I've looked into how different outputs were produced with difference choices of params.

s.shift(periods=5, freq=None) uses the underlying _data block manager to shift values without touching the index.

s.shift(freq='5H') calls s.tshift and shift index, hence when calculating pct_change (unshifted divided by shifted frame), the resulting frame will have longer index when shifting with frequencies than when shifting with periods, because the latter scenario, index remains unchanged.

So that explains the difference of observations as per earlier discussion. raising PR. reverting.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants