Skip to content

BUG: pivot_table strings as aggfunc #18810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 23, 2017

Conversation

bobhaffner
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Dec 17, 2017

Hello @bobhaffner! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 23, 2017 at 20:43 Hours UTC

@codecov
Copy link

codecov bot commented Dec 17, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@507157d). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18810   +/-   ##
=========================================
  Coverage          ?   91.65%           
=========================================
  Files             ?      154           
  Lines             ?    51368           
  Branches          ?        0           
=========================================
  Hits              ?    47080           
  Misses            ?     4288           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.52% <100%> (?)
#single 40.84% <0%> (?)
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 96.35% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 507157d...e24d232. Read the comment docs.

@@ -321,7 +321,8 @@ Reshaping
- Bug in :func:`DataFrame.stack` which fails trying to sort mixed type levels under Python 3 (:issue:`18310`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
-
- Bug in :func:`Dataframe.pivot_table` which fails when you pass a string for aggfunc arg (:issue:`18713`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't use the second-person (or any such person-pronouns in general).

@gfyoung gfyoung added Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 17, 2017
@bobhaffner
Copy link
Contributor Author

Thanks for the feedback @gfyoung.

stds = f(np.std)
expected = concat([means, stds], keys=['mean', 'std'], axis=1)
tm.assert_frame_equal(result, expected)

Copy link
Member

@gfyoung gfyoung Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you're testing here, but I would modify a couple of things:

  1. Add a newline after the first tm.assert_frame_equal. It makes it easier to read.
  2. Construct the output explicitly using the DataFrame constructor (and check against that too). At this point, I know for example that f('sum') and f(np.sum) should be the same, but I don't know what the actual result is. Here is how I might go about writing the first half of this test.
...
result = f('sum')
func_result = f(np.sum)
expected = DataFrame(...)
tm.assert_frame_equal(result, func_result)  # consistency
tm.assert_frame_equal(result, expected)  # correctness

Copy link
Contributor

@jreback jreback Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can parametrize this test by passing these functions in as well

something like

@pytest.mark.parametrize("f, f_numpy", [('sum', np.sum), ('mean', np.mean'), ('std', np.std])
def test_pivot_func_string(self, f, f_numpy):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, both, I'll incorporate these changes

stds = f(np.std)
expected = concat([means, stds], keys=['mean', 'std'], axis=1)
tm.assert_frame_equal(result, expected)

Copy link
Contributor

@jreback jreback Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can parametrize this test by passing these functions in as well

something like

@pytest.mark.parametrize("f, f_numpy", [('sum', np.sum), ('mean', np.mean'), ('std', np.std])
def test_pivot_func_string(self, f, f_numpy):

@bobhaffner bobhaffner force-pushed the pv_allow_str_aggfuncs branch from 6a1f4f6 to fe36303 Compare December 18, 2017 23:44
@bobhaffner
Copy link
Contributor Author

test_pivot_string_as_func is an attempt at @gfyoung's suggestion regarding comparing to a frame created from DataFrame()

test_pivot_string_func_vs_func is an attempt at @jreback's suggestion of using @pytest.mark.parametrize to compare string_funcs to numpy funcs

Separating the cases into two functions seemed like the right thing to do. As opposed to one rather long one with multiple DataFrame statements

(['sum', 'mean'], [np.sum, np.mean]),
(['sum', 'std'], [np.sum, np.std]),
(['std', 'mean'], [np.std, np.mean])]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird one, flake8 likes that newline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using funcs, simply put this in the parametrize


@pytest.mark.parametrize("f, f_numpy", funcs)
def test_pivot_string_func_vs_func(self, f, f_numpy):
# GH #18713
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that this test is for consistency purposes.

@@ -1109,6 +1109,48 @@ def test_pivot_margins_name_unicode(self):
expected = pd.DataFrame(index=index)
tm.assert_frame_equal(table, expected)

def test_pivot_string_as_func(self):
# GH #18713
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that this test is for correctness.

@gfyoung
Copy link
Member

gfyoung commented Dec 18, 2017

Separating the cases into two functions seemed like the right thing to do. As opposed to one rather long one with multiple DataFrame statements.

Seems reasonable to me.

@bobhaffner
Copy link
Contributor Author

Just following up on this to make sure no other changes are required. No worries if folks are busy

(['sum', 'mean'], [np.sum, np.mean]),
(['sum', 'std'], [np.sum, np.std]),
(['std', 'mean'], [np.std, np.mean])]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using funcs, simply put this in the parametrize

@bobhaffner bobhaffner force-pushed the pv_allow_str_aggfuncs branch from c4921c1 to 1a9cfeb Compare December 21, 2017 16:04
@@ -341,7 +341,8 @@ Reshaping
- Bug in :func:`DataFrame.stack` which fails trying to sort mixed type levels under Python 3 (:issue:`18310`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
-
- Bug in :func:`Dataframe.pivot_table` which fails when the ``aggfunc`` arg is of type string. The behavior is now consistent with other methods like ``agg`` and ``apply`` (:issue:`18713`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one

@bobhaffner bobhaffner force-pushed the pv_allow_str_aggfuncs branch from 1a9cfeb to 0decc05 Compare December 21, 2017 17:57
@bobhaffner
Copy link
Contributor Author

Very interesting. Looks like the rebase handled the whatsnew change quite nicely

(pandas-dev) Bobs-MacBook-Pro:pandas-dev bob$ git rebase upstream/master
First, rewinding head to replay your work on top of it...
Applying: pivot_table strings as aggfunc
Using index info to reconstruct a base tree...
A	doc/source/whatsnew/v0.22.0.txt
.git/rebase-apply/patch:11: trailing whitespace.
- 
warning: 1 line adds whitespace errors.
Falling back to patching base and 3-way merge...
Auto-merging doc/source/whatsnew/v0.23.0
Applying: fixed pep8 issues
Applying: fixed whatsnew
Using index info to reconstruct a base tree...
A	doc/source/whatsnew/v0.22.0.txt
Falling back to patching base and 3-way merge...
Auto-merging doc/source/whatsnew/v0.23.0
Applying: updated tests
Applying: added comments to test funcs
Applying: removed list usage in test parametrize

@gfyoung
Copy link
Member

gfyoung commented Dec 21, 2017

@bobhaffner : This is git - you should expect great things from it 😄 (unless you're pushing large files OR operating at big-tech-company-scale... :0 )

@bobhaffner
Copy link
Contributor Author

@gfyoung Haha, gotcha :-)

@jreback jreback added this to the 0.23.0 milestone Dec 21, 2017
@jreback
Copy link
Contributor

jreback commented Dec 21, 2017

lgtm. ping on green.

@bobhaffner
Copy link
Contributor Author

lgtm. ping on green.

@jreback is this intended for me or some kind of shorthand for a notification process? At any rate, ping :-)

Looks like there's a conflict with the whatsnew piece being changed back to the old name. I can follow suit or wait till this is sorted out.

@jreback jreback merged commit 81adbe6 into pandas-dev:master Dec 23, 2017
@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

thanks @bobhaffner

yes 'ping on green' is a notification of you to ping me when this has passed and all buttons are green.

@bobhaffner
Copy link
Contributor Author

Thanks @jreback and @gfyoung!!!

@bobhaffner bobhaffner deleted the pv_allow_str_aggfuncs branch December 24, 2017 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pivot_table aggfunc should accept string function-likes
4 participants