-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Series pct_change fill_method behavior #25291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 19 commits
0e4e1c2
4be1bdc
192bded
bb74285
4418bf1
3670ffe
8f36c7a
add18de
59eab18
279f433
4072ca0
a016d8a
80a09c9
1bf00f8
9208f61
fd2cdf8
66cc4a4
14c7a05
932fc66
ed86a7b
a1ca0ca
efefaf6
1e854ed
84c036a
1acee7c
764846d
7184698
3821857
a2be8f6
e456c6b
fd18d04
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9865,6 +9865,10 @@ def _check_percentile(self, q): | |
The number of consecutive NAs to fill before stopping. | ||
freq : DateOffset, timedelta, or offset alias string, optional | ||
Increment to use from time series API (e.g. 'M' or BDay()). | ||
skipna : bool, default True | ||
Exclude NA/null values before computing percent change. | ||
|
||
.. versionadded:: 0.25.0 | ||
**kwargs | ||
Additional keyword arguments are passed into | ||
`DataFrame.shift` or `Series.shift`. | ||
|
@@ -9881,6 +9885,11 @@ def _check_percentile(self, q): | |
Series.shift : Shift the index by some number of periods. | ||
DataFrame.shift : Shift the index by some number of periods. | ||
|
||
Notes | ||
----- | ||
The default `skipna=True` drops NAs before computing the percentage | ||
change, and the results are reindexed like the original calling object. | ||
|
||
Examples | ||
-------- | ||
**Series** | ||
|
@@ -9904,22 +9913,42 @@ def _check_percentile(self, q): | |
2 -0.055556 | ||
dtype: float64 | ||
|
||
See the percentage change in a Series where filling NAs with last | ||
valid observation forward to next valid. | ||
See how the computing of percentage change is performed in a Series | ||
with NAs. With default `skipna=True`, NAs are dropped before the | ||
computation and eventually the results are reindexed like the original | ||
object, thus keeping the original NAs. | ||
|
||
>>> s = pd.Series([90, 91, None, 85]) | ||
>>> s = pd.Series([90, 91, None, 85, None, 95]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realize this was in place before you but let's change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update this |
||
>>> s | ||
0 90.0 | ||
1 91.0 | ||
2 NaN | ||
3 85.0 | ||
4 NaN | ||
5 95.0 | ||
dtype: float64 | ||
|
||
>>> s.pct_change() | ||
0 NaN | ||
1 0.011111 | ||
2 NaN | ||
3 -0.065934 | ||
4 NaN | ||
5 0.117647 | ||
dtype: float64 | ||
|
||
On the other hand, if a fill method is passed, NAs are filled before | ||
the computation. For example, before the computation of percentage | ||
change, forward fill method `ffill` first fills NAs with last valid | ||
observation forward to next valid. | ||
|
||
>>> s.pct_change(fill_method='ffill') | ||
0 NaN | ||
1 0.011111 | ||
2 0.000000 | ||
3 -0.065934 | ||
4 0.000000 | ||
5 0.117647 | ||
dtype: float64 | ||
|
||
**DataFrame** | ||
|
@@ -9961,14 +9990,75 @@ def _check_percentile(self, q): | |
2016 2015 2014 | ||
GOOG NaN -0.151997 -0.086016 | ||
APPL NaN 0.337604 0.012002 | ||
|
||
In a DataFrame with NAs, when computing the percentage change with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd are you sure that example is useful?
|
||
default `skipna=True`, NAs are first droppped on each column/row, and | ||
the results are eventually reindexed as originally. | ||
|
||
>>> df = pd.DataFrame({ | ||
... 'a': [90, 91, None, 85, None, 95], | ||
... 'b': [91, None, 85, None, 95, None], | ||
... 'c': [None, 85, None, 95, None, None]}) | ||
>>> df | ||
a b c | ||
0 90.0 91.0 NaN | ||
1 91.0 NaN 85.0 | ||
2 NaN 85.0 NaN | ||
3 85.0 NaN 95.0 | ||
4 NaN 95.0 NaN | ||
5 95.0 NaN NaN | ||
|
||
>>> df.pct_change() | ||
a b c | ||
0 NaN NaN NaN | ||
1 0.011111 NaN NaN | ||
2 NaN -0.065934 NaN | ||
3 -0.065934 NaN 0.117647 | ||
4 NaN 0.117647 NaN | ||
5 0.117647 NaN NaN | ||
|
||
>>> df.pct_change(axis=1) | ||
a b c | ||
0 NaN 0.011111 NaN | ||
1 NaN NaN -0.065934 | ||
2 NaN NaN NaN | ||
3 NaN NaN 0.117647 | ||
4 NaN NaN NaN | ||
5 NaN NaN NaN | ||
|
||
Otherwise, if a fill method is passed, NAs are filled before the | ||
computation. | ||
|
||
>>> df.pct_change(fill_method='ffill') | ||
a b c | ||
0 NaN NaN NaN | ||
1 0.011111 0.000000 NaN | ||
2 0.000000 -0.065934 0.000000 | ||
3 -0.065934 0.000000 0.117647 | ||
4 0.000000 0.117647 0.000000 | ||
5 0.117647 0.000000 0.000000 | ||
""" | ||
|
||
@Appender(_shared_docs['pct_change'] % _shared_doc_kwargs) | ||
albertvillanova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, | ||
**kwargs): | ||
# TODO: Not sure if above is correct - need someone to confirm. | ||
def pct_change(self, periods=1, fill_method=None, limit=None, freq=None, | ||
skipna=None, **kwargs): | ||
if fill_method is not None and skipna: | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
raise ValueError("cannot pass both fill_method and skipna") | ||
elif limit is not None and skipna: | ||
raise ValueError("cannot pass both limit and skipna") | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if fill_method is None and limit is None and skipna is None: | ||
skipna = True | ||
axis = self._get_axis_number(kwargs.pop('axis', self._stat_axis_name)) | ||
if fill_method is None: | ||
if skipna and isinstance(self, pd.DataFrame): | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use ABCDataFrame |
||
# If DataFrame, apply to each column/row | ||
return self.apply( | ||
lambda s: s.pct_change(periods=periods, freq=freq, | ||
skipna=skipna, **kwargs), | ||
axis=axis | ||
) | ||
if skipna: | ||
data = self.dropna() | ||
elif fill_method is None: | ||
data = self | ||
else: | ||
data = self.fillna(method=fill_method, limit=limit, axis=axis) | ||
|
@@ -9979,6 +10069,8 @@ def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None, | |
if freq is None: | ||
mask = isna(com.values_from_object(data)) | ||
np.putmask(rs.values, mask, np.nan) | ||
if skipna: | ||
rs = rs.reindex_like(self) | ||
return rs | ||
|
||
def _agg_by_level(self, name, axis=0, level=0, skipna=True, **kwargs): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like this is a change, but its actually not, you are just adding the
skipna
arg. pls make that more clear. put this in other enhancements.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I am not just adding
skipna
arg; I am settingskipna=True
as default (before, the default wasfill_method='pad'
). I think this is an API breaking change. Indeed I quoted what you told me:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, maybe also say adding the
skipna
arg (as its not obvious that it was added in the note)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you say that thiis is current behavior and the default is NO change.