Skip to content

Series pct_change fill_method behavior #25291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 31 commits into from

Conversation

albertvillanova
Copy link
Contributor

@albertvillanova albertvillanova commented Feb 12, 2019

@pep8speaks
Copy link

pep8speaks commented Feb 12, 2019

Hello @albertvillanova! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-08-30 16:38:20 UTC

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

Merging #25291 into master will decrease coverage by 50%.
The diff coverage is 16.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25291       +/-   ##
===========================================
- Coverage   91.72%   41.71%   -50.01%     
===========================================
  Files         173      173               
  Lines       52831    52841       +10     
===========================================
- Hits        48457    22045    -26412     
- Misses       4374    30796    +26422
Flag Coverage Δ
#multiple ?
#single 41.71% <16.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 38.24% <16.66%> (-55.93%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 130 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d44a2a...4418bf1. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@7b25463). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #25291   +/-   ##
=========================================
  Coverage          ?   91.68%           
=========================================
  Files             ?      174           
  Lines             ?    50751           
  Branches          ?        0           
=========================================
  Hits              ?    46531           
  Misses            ?     4220           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.19% <100%> (?)
#single 41.15% <7.14%> (?)
Impacted Files Coverage Δ
pandas/core/generic.py 93.38% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b25463...fd18d04. Read the comment docs.

@jreback jreback changed the title Fix #25006 Series pct_change fill_method behavior Feb 13, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't actually looked at the changes, rather some stylistic concerns

Albert Villanova del Moral added 2 commits February 13, 2019 23:38
@gfyoung gfyoung added API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Feb 16, 2019
@albertvillanova
Copy link
Contributor Author

@jreback all checks have passed.

@albertvillanova
Copy link
Contributor Author

@WillAyd @jschendel could you please have a look?

raise ValueError("cannot pass both skipna and limit")
if skipna is None and fill_method is None and limit is None:
skipna = True
if skipna and self._typ == 'dataframe':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isinstance here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe add this to the frame subclass instead? Somewhat confusing to introspect here in the shared one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, I have added isinstance as required by @jreback. Tell me if you both think I should do otherwise.

])
def test_pct_change_skipna_raises(self, fill_method, limit):
# GH25006
if self._typ is DataFrame or self._typ is Series:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be needed any longer, Panels are gone

raise ValueError("cannot pass both skipna and limit")
if skipna is None and fill_method is None and limit is None:
skipna = True
if skipna and self._typ == 'dataframe':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe add this to the frame subclass instead? Somewhat confusing to introspect here in the shared one.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a subsection in the whatsnew to show the previous and current behavior.

@albertvillanova
Copy link
Contributor Author

@TomAugspurger yes, the behavior you are interested in can be achieved by setting skipna=False:

In [11]: s = pd.Series([90, 91, None, 85, None, 95, 97], index=pd.date_range('2000', periods=7))                                               

In [12]: s                                                                                                                                     
Out[12]: 
2000-01-01    90.0
2000-01-02    91.0
2000-01-03     NaN
2000-01-04    85.0
2000-01-05     NaN
2000-01-06    95.0
2000-01-07    97.0
Freq: D, dtype: float64

In [13]: s.pct_change(skipna=False)                                                                                                            
Out[13]: 
2000-01-01         NaN
2000-01-02    0.011111
2000-01-03         NaN
2000-01-04         NaN
2000-01-05         NaN
2000-01-06         NaN
2000-01-07    0.021053
Freq: D, dtype: float64

I totally agree with you that the behavior with skipna=False is much more sensible and intuitive.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think settingt he default skipna=True is good. I believe this simplifies the checking a bit as well.

@@ -86,6 +86,7 @@ Other API Changes
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
- ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
- Default `skipna=True` for :meth:`Series.pct_change` and :meth:`DataFrame.pct_change` will drop NAs before calculation (:issue:`25006`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, maybe also say adding the skipna arg (as its not obvious that it was added in the note)

@WillAyd
Copy link
Member

WillAyd commented May 3, 2019

@albertvillanova can you merge master and address latest comments?

@jreback
Copy link
Contributor

jreback commented May 12, 2019

can you merge master

@jreback jreback added this to the 0.25.0 milestone May 19, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertvillanova small comment; pls merge master and ping on green.

@@ -86,6 +86,7 @@ Other API Changes
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
- ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
- Default `skipna=True` for :meth:`Series.pct_change` and :meth:`DataFrame.pct_change` will drop NAs before calculation (:issue:`25006`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you say that thiis is current behavior and the default is NO change.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

I think only a small comment left, can you merge master and respond.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

can you merge master here

@jorisvandenbossche jorisvandenbossche removed this from the 0.25.0 milestone Jun 30, 2019
@WillAyd
Copy link
Member

WillAyd commented Aug 26, 2019

Rebased to keep current. Let's see if we can get this one in

@TomAugspurger
Copy link
Contributor

Seems that CI is failing (haven't looked closely).

Is the release note accurate (just an enhancement, not an API change?)

Looks like we need a docstring example with skipna=False.

@WillAyd
Copy link
Member

WillAyd commented Aug 30, 2019

Updated docstring, though on closer look this does change the default behavior against master:

before change:

>>> s = pd.Series([90, 91, np.nan, 85, np.nan, 95])
>>> s.pct_change()
0         NaN
1    0.011111
2    0.000000
3   -0.065934
4    0.000000
5    0.117647
dtype: float64

this branch:

>>> s = pd.Series([90, 91, np.nan, 85, np.nan, 95])
>>> s.pct_change()
0         NaN
1    0.011111
2         NaN
3   -0.065934
4         NaN
5    0.117647
dtype: float64

So I think need to be careful here

@TomAugspurger
Copy link
Contributor

Hmm thanks for checking. This seems like something we can do in a backwards-compatible way (possibly with a deprecation).

@WillAyd
Copy link
Member

WillAyd commented Sep 13, 2019

I've personally put this on the back burner for now - @albertvillanova are you interested in picking back up?

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

I think this requires some effort to manage the backwards compat piece but not something I personally have time / motivation to dedicate to. Looks stale otherwise so closing, but @albertvillanova if something you'd like to pick back up please ping and we can continue on!

@WillAyd WillAyd closed this Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series pct_change fill_method behavior
7 participants