Skip to content

pct change bug issue 21200 #21235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Dec 12, 2018
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
ee2c034
pct change bug issue 21200
simonariddell May 28, 2018
47b5908
add whatsnew
simonariddell May 28, 2018
41beb4a
fix pep8 issues
simonariddell May 28, 2018
25efd37
Removed vectorization from groupby pct change
simonariddell May 30, 2018
849fac4
try using cache arguments to keep vectorization
simonariddell Jun 4, 2018
eaede34
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Jun 9, 2018
64097cc
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Jun 19, 2018
fcd7183
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Jun 19, 2018
e8681d6
Merge branch 'groupby_pctchange_bug' of https://github.com/SimonAleck…
simonariddell Jun 24, 2018
77c3935
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Jun 24, 2018
d037e65
Update test to address WillAyds code suggestion
simonariddell Jul 29, 2018
a4fcb33
incorporate Will CRs
simonariddell Sep 27, 2018
64d2e4f
cr update
simonariddell Sep 28, 2018
eabfe1f
merge upstream
simonariddell Sep 28, 2018
a63553a
incorporate CR and merge in master
simonariddell Sep 29, 2018
c72c112
Merge branch 'master' of https://github.com/pandas-dev/pandas
simonariddell Nov 23, 2018
a745209
merge in master
simonariddell Nov 23, 2018
5d3fc2d
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Nov 25, 2018
ac9d005
Merge remote-tracking branch 'upstream/master'
simonariddell Nov 25, 2018
1f1f705
Merge branch 'master' into groupby_pctchange_bug
simonariddell Nov 25, 2018
795d7be
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Nov 25, 2018
f0a9c6d
Merge branch 'groupby_pctchange_bug' of https://github.com/SimonAleck…
simonariddell Nov 27, 2018
a979f79
BUG,TST: Remove case where vectorization fails in pct_change groupby …
simonariddell Nov 27, 2018
5eb79c2
Merge branch 'groupby_pctchange_bug' of https://github.com/SimonAleck…
simonariddell Nov 27, 2018
b11d20f
BUG,TST: GH21234 groupby pct_change
simonariddell Nov 27, 2018
f480299
BUG,TST: GH21234 groupby pct_change
simonariddell Nov 28, 2018
a3c95cb
BUG,TST: GH21235 groupby pct_change whatsnew
simonariddell Dec 2, 2018
e5289a2
BUG, TST: GH 21235 groupby pct_change whatsnew
simonariddell Dec 9, 2018
1ccc507
Merge remote-tracking branch 'upstream/master'
simonariddell Dec 9, 2018
7f4a244
merge master
simonariddell Dec 9, 2018
c26cf60
BUG, TST: GH 21235 groupby pct_change whatsnew
simonariddell Dec 9, 2018
877b433
BUG, TST: GH 21235 groupby pct_change whatsnew
simonariddell Dec 9, 2018
c8b686e
Merge remote-tracking branch 'upstream/master'
simonariddell Dec 11, 2018
faf532e
Merge remote-tracking branch 'upstream/master'
simonariddell Dec 12, 2018
6d87075
Merge branch 'master' into groupby_pctchange_bug
simonariddell Dec 12, 2018
ad934b7
BUG, TST: GH 21235 groupby pct_change whatsnew
simonariddell Dec 12, 2018
01d705f
BUG, TST: GH 21235 groupby pct_change test parametrize
simonariddell Dec 12, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,29 @@ Backwards incompatible API changes
- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
- Bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to a separate subsection


**New behavior**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow the formatting of other subsections


.. ipython:: python

import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no import

df = pd.DataFrame({'grp': ['a', 'a', 'b'], 'foo': [1.0, 1.1, 2.2]})
df

df.groupby('grp').pct_change()


**Previous behavior**:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous first


.. code-block:: ipython

In [1]: df.groupby('grp').pct_change()
Out[1]:
foo
0 NaN
1 0.1
2 1.0

.. _whatsnew_0240.api_breaking.deps:

Expand Down
10 changes: 8 additions & 2 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1221,9 +1221,15 @@ def _apply_to_column_groupbys(self, func):
return func(self)

def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None):
"""Calculate percent change of each value to previous entry in group"""
"""Calcuate pct_change of each value to previous entry in group"""
# TODO: Remove this conditional when #23918 is fixed
if freq:
return self.apply(lambda x: x.pct_change(periods=periods,
fill_method=fill_method,
limit=limit, freq=freq))
filled = getattr(self, fill_method)(limit=limit)
shifted = filled.shift(periods=periods, freq=freq)
fill_grp = filled.groupby(self.grouper.labels)
shifted = fill_grp.shift(periods=periods, freq=freq)

return (filled / shifted) - 1

Expand Down
9 changes: 4 additions & 5 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2025,11 +2025,10 @@ def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None,
fill_method=fill_method,
limit=limit, freq=freq,
axis=axis))

filled = getattr(self, fill_method)(limit=limit).drop(
self.grouper.names, axis=1)
shifted = filled.shift(periods=periods, freq=freq)

filled = getattr(self, fill_method)(limit=limit)
filled = filled.drop(self.grouper.names, axis=1)
fill_grp = filled.groupby(self.grouper.labels)
shifted = fill_grp.shift(periods=periods, freq=freq)
return (filled / shifted) - 1

@Substitution(name='groupby')
Expand Down
47 changes: 24 additions & 23 deletions pandas/tests/groupby/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -765,36 +765,37 @@ def test_pad_stable_sorting(fill_method):


@pytest.mark.parametrize("test_series", [True, False])
@pytest.mark.parametrize("freq", [None, 'D'])
@pytest.mark.parametrize("periods,fill_method,limit", [
(1, 'ffill', None), (1, 'ffill', 1),
(1, 'bfill', None), (1, 'bfill', 1),
(-1, 'ffill', None), (-1, 'ffill', 1),
(-1, 'bfill', None), (-1, 'bfill', 1)])
def test_pct_change(test_series, periods, fill_method, limit):
vals = [np.nan, np.nan, 1, 2, 4, 10, np.nan, np.nan]
exp_vals = Series(vals).pct_change(periods=periods,
fill_method=fill_method,
limit=limit).tolist()

df = DataFrame({'key': ['a'] * len(vals) + ['b'] * len(vals),
'vals': vals * 2})
grp = df.groupby('key')

def get_result(grp_obj):
return grp_obj.pct_change(periods=periods,
fill_method=fill_method,
limit=limit)
(-1, 'bfill', None), (-1, 'bfill', 1),
])
def test_pct_change(test_series, freq, periods, fill_method, limit):
# GH 21200, 21621
if freq == 'D':
pytest.xfail("'freq' test not necessary until #23918 completed and"
"freq is used in the vectorized approach")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could put the xxfail inthe parametrize, something like this format

@pytest.mark.parametrize("engine,ext", [
    pytest.param('openpyxl', '.xlsx', marks=pytest.mark.skipif(
        not td.safe_import('openpyxl'), reason='No openpyxl')),

but of course its an xfail and not a skip and this is not an excel test :<

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this actually fail or be non-performant compared to the other argument combinations? I think the latter and if so probably OK to just remove xfail / skip altogether; we have the TODO in the actual code in regards to the perf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be non-performant, but it wouldn't fail (well it technically would fail, but because the index in the test is not a date-time). Personally, I think it makes sense to make the index date-time, and not xfail this. Let me know if you agree, and I'll do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. No I think that would make the test here more complicated than it needs to be; if you could follow the approach as outlined by @jreback would be preferable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand this xfail and it seems like the behavior is correct bc, as @simonariddell mentioned, it should raise without a DatetimeIndex. what is the xfailed case supposed to be testing?


vals = [3, np.nan, np.nan, np.nan, 1, 2, 4, 10, np.nan, 4]
keys = ['a', 'b']
key_v = np.repeat(keys, len(vals))
df = DataFrame({'key': key_v, 'vals': vals * 2})

df_g = getattr(df.groupby('key'), fill_method)(limit=limit)
grp = df_g.groupby('key')

expected = grp['vals'].obj / grp['vals'].shift(periods) - 1

if test_series:
exp = pd.Series(exp_vals * 2)
exp.name = 'vals'
grp = grp['vals']
result = get_result(grp)
tm.assert_series_equal(result, exp)
result = df.groupby('key')['vals'].pct_change(
periods=periods, fill_method=fill_method, limit=limit, freq=freq)
tm.assert_series_equal(result, expected)
else:
exp = DataFrame({'vals': exp_vals * 2})
result = get_result(grp)
tm.assert_frame_equal(result, exp)
result = df.groupby('key').pct_change(
periods=periods, fill_method=fill_method, limit=limit, freq=freq)
tm.assert_frame_equal(result, expected.to_frame('vals'))


@pytest.mark.parametrize("func", [np.any, np.all])
Expand Down