-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pct change bug issue 21200 #21235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pct change bug issue 21200 #21235
Changes from 31 commits
ee2c034
47b5908
41beb4a
25efd37
849fac4
eaede34
64097cc
fcd7183
e8681d6
77c3935
d037e65
a4fcb33
64d2e4f
eabfe1f
a63553a
c72c112
a745209
5d3fc2d
ac9d005
1f1f705
795d7be
f0a9c6d
a979f79
5eb79c2
b11d20f
f480299
a3c95cb
e5289a2
1ccc507
7f4a244
c26cf60
877b433
c8b686e
faf532e
6d87075
ad934b7
01d705f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -378,6 +378,29 @@ Backwards incompatible API changes | |
- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`) | ||
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`) | ||
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`) | ||
- Bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`) | ||
|
||
**New behavior**: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. follow the formatting of other subsections |
||
|
||
.. ipython:: python | ||
|
||
import pandas as pd | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no import |
||
df = pd.DataFrame({'grp': ['a', 'a', 'b'], 'foo': [1.0, 1.1, 2.2]}) | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
df | ||
|
||
df.groupby('grp').pct_change() | ||
|
||
|
||
**Previous behavior**: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. previous first |
||
|
||
.. code-block:: ipython | ||
|
||
In [1]: df.groupby('grp').pct_change() | ||
Out[1]: | ||
foo | ||
0 NaN | ||
1 0.1 | ||
2 1.0 | ||
|
||
.. _whatsnew_0240.api_breaking.deps: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -765,36 +765,37 @@ def test_pad_stable_sorting(fill_method): | |
|
||
|
||
@pytest.mark.parametrize("test_series", [True, False]) | ||
@pytest.mark.parametrize("freq", [None, 'D']) | ||
@pytest.mark.parametrize("periods,fill_method,limit", [ | ||
(1, 'ffill', None), (1, 'ffill', 1), | ||
(1, 'bfill', None), (1, 'bfill', 1), | ||
(-1, 'ffill', None), (-1, 'ffill', 1), | ||
(-1, 'bfill', None), (-1, 'bfill', 1)]) | ||
def test_pct_change(test_series, periods, fill_method, limit): | ||
vals = [np.nan, np.nan, 1, 2, 4, 10, np.nan, np.nan] | ||
exp_vals = Series(vals).pct_change(periods=periods, | ||
fill_method=fill_method, | ||
limit=limit).tolist() | ||
|
||
df = DataFrame({'key': ['a'] * len(vals) + ['b'] * len(vals), | ||
'vals': vals * 2}) | ||
grp = df.groupby('key') | ||
|
||
def get_result(grp_obj): | ||
return grp_obj.pct_change(periods=periods, | ||
fill_method=fill_method, | ||
limit=limit) | ||
(-1, 'bfill', None), (-1, 'bfill', 1), | ||
]) | ||
def test_pct_change(test_series, freq, periods, fill_method, limit): | ||
# GH 21200, 21621 | ||
if freq == 'D': | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pytest.xfail("'freq' test not necessary until #23918 completed and" | ||
"freq is used in the vectorized approach") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could put the xxfail inthe parametrize, something like this format
but of course its an xfail and not a skip and this is not an excel test :< There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this actually fail or be non-performant compared to the other argument combinations? I think the latter and if so probably OK to just remove xfail / skip altogether; we have the TODO in the actual code in regards to the perf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be non-performant, but it wouldn't fail (well it technically would fail, but because the index in the test is not a date-time). Personally, I think it makes sense to make the index date-time, and not xfail this. Let me know if you agree, and I'll do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah OK. No I think that would make the test here more complicated than it needs to be; if you could follow the approach as outlined by @jreback would be preferable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to understand this xfail and it seems like the behavior is correct bc, as @simonariddell mentioned, it should raise without a DatetimeIndex. what is the xfailed case supposed to be testing? |
||
|
||
vals = [3, np.nan, np.nan, np.nan, 1, 2, 4, 10, np.nan, 4] | ||
keys = ['a', 'b'] | ||
key_v = np.repeat(keys, len(vals)) | ||
df = DataFrame({'key': key_v, 'vals': vals * 2}) | ||
|
||
df_g = getattr(df.groupby('key'), fill_method)(limit=limit) | ||
grp = df_g.groupby('key') | ||
|
||
expected = grp['vals'].obj / grp['vals'].shift(periods) - 1 | ||
|
||
if test_series: | ||
exp = pd.Series(exp_vals * 2) | ||
exp.name = 'vals' | ||
grp = grp['vals'] | ||
result = get_result(grp) | ||
tm.assert_series_equal(result, exp) | ||
result = df.groupby('key')['vals'].pct_change( | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
periods=periods, fill_method=fill_method, limit=limit, freq=freq) | ||
tm.assert_series_equal(result, expected) | ||
else: | ||
exp = DataFrame({'vals': exp_vals * 2}) | ||
result = get_result(grp) | ||
tm.assert_frame_equal(result, exp) | ||
result = df.groupby('key').pct_change( | ||
periods=periods, fill_method=fill_method, limit=limit, freq=freq) | ||
tm.assert_frame_equal(result, expected.to_frame('vals')) | ||
|
||
|
||
@pytest.mark.parametrize("func", [np.any, np.all]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to a separate subsection