-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pct change bug issue 21200 #21235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pct change bug issue 21200 #21235
Changes from 3 commits
ee2c034
47b5908
41beb4a
25efd37
849fac4
eaede34
64097cc
fcd7183
e8681d6
77c3935
d037e65
a4fcb33
64d2e4f
eabfe1f
a63553a
c72c112
a745209
5d3fc2d
ac9d005
1f1f705
795d7be
f0a9c6d
a979f79
5eb79c2
b11d20f
f480299
a3c95cb
e5289a2
1ccc507
7f4a244
c26cf60
877b433
c8b686e
faf532e
6d87075
ad934b7
01d705f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -722,19 +722,29 @@ def interweave(list_obj): | |
|
||
|
||
@pytest.mark.parametrize("test_series", [True, False]) | ||
@pytest.mark.parametrize("shuffle", [True, False]) | ||
@pytest.mark.parametrize("periods,fill_method,limit", [ | ||
(1, 'ffill', None), (1, 'ffill', 1), | ||
(1, 'bfill', None), (1, 'bfill', 1), | ||
(-1, 'ffill', None), (-1, 'ffill', 1), | ||
(-1, 'bfill', None), (-1, 'bfill', 1)]) | ||
def test_pct_change(test_series, periods, fill_method, limit): | ||
vals = [np.nan, np.nan, 1, 2, 4, 10, np.nan, np.nan] | ||
exp_vals = Series(vals).pct_change(periods=periods, | ||
fill_method=fill_method, | ||
limit=limit).tolist() | ||
|
||
df = DataFrame({'key': ['a'] * len(vals) + ['b'] * len(vals), | ||
'vals': vals * 2}) | ||
def test_pct_change(test_series, shuffle, periods, fill_method, limit): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a comment with the issue number for this test (like |
||
vals = [3, np.nan, 1, 2, 4, 10, np.nan, np.nan] | ||
keys = ['a', 'b'] | ||
key_v = [k for j in list(map(lambda x: [x] * len(vals), keys)) for k in j] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you really want to do this, use |
||
df = DataFrame({'key': key_v, 'vals': vals * 2}) | ||
if shuffle: | ||
order = np.random.RandomState(seed=42).permutation(len(df)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm missing something but what's the point of this? If you are seeding your random call isn't this just always going to return the same order? If so would be preferable just to be explicit about that order, though not sure if its necessary in the first place There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, you're right. Nixed it. |
||
df = df.reindex(order).reset_index(drop=True) | ||
|
||
manual_apply = [] | ||
for k in keys: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why aren't you just using this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This loop / concat mechanism is rather non-idiomatic - whats the reasoning for this instead of just doing a comparison to the anonymous function as was done in the original issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I chose to do this as it means the test will ultimately assert that the result of a groupby apply is the same as looping over the keys and doing a normal apply, as the two should be identical. Or, more specifically, the groupby apply should ought to replicate the results of a normal apply looped over the groups. Edit: Fixed it in the newest CR. I like your suggestion more now that I see it in code. |
||
subgroup = Series(df.loc[df.key == k, 'vals'].values) | ||
manual_apply.append(subgroup.pct_change(periods=periods, | ||
fill_method=fill_method, | ||
limit=limit)) | ||
exp_vals = pd.concat(manual_apply).reset_index(drop=True) | ||
exp = pd.DataFrame(exp_vals, columns=['A']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. concat already returns a frame, use |
||
grp = df.groupby('key') | ||
|
||
def get_result(grp_obj): | ||
|
@@ -743,14 +753,18 @@ def get_result(grp_obj): | |
limit=limit) | ||
|
||
if test_series: | ||
exp = pd.Series(exp_vals * 2) | ||
exp.name = 'vals' | ||
exp = exp.loc[:, 'A'] | ||
grp = grp['vals'] | ||
result = get_result(grp) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not that your change introduced it but can we get rid of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed |
||
df.insert(0, 'A', result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the point of all of the insert / sort calls? Would rather just construct the expected value and simply compare that to the result of a pct_change call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. Fixed. |
||
result = df.sort_values(by='key') | ||
result = result.loc[:, 'A'] | ||
result = result.reset_index(drop=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't use inplace in testing |
||
tm.assert_series_equal(result, exp) | ||
else: | ||
exp = DataFrame({'vals': exp_vals * 2}) | ||
result = get_result(grp) | ||
result.reset_index(drop=True, inplace=True) | ||
result.columns = ['A'] | ||
tm.assert_frame_equal(result, exp) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this essentially eliminating a vectorized solution across the board? How are the ASVs comparing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pct_change is blacklisted in benchmarks/groupby.py.