Skip to content

BUG: pivot_table strings as aggfunc #18810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 23, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.23.0
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,8 @@ Reshaping
- Bug in :func:`DataFrame.stack` which fails trying to sort mixed type levels under Python 3 (:issue:`18310`)
- Fixed construction of a :class:`Series` from a ``dict`` containing ``NaN`` as key (:issue:`18480`)
- Bug in :func:`Series.rank` where ``Series`` containing ``NaT`` modifies the ``Series`` inplace (:issue:`18521`)
-
- Bug in :func:`Dataframe.pivot_table` which fails when the ``aggfunc`` arg is of type string. The behavior is now consistent with other methods like ``agg`` and ``apply`` (:issue:`18713`)
-

Numeric
^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
fill_value=fill_value, aggfunc=func,
margins=margins, margins_name=margins_name)
pieces.append(table)
keys.append(func.__name__)
keys.append(getattr(func, '__name__', func))

return concat(pieces, keys=keys, axis=1)

keys = index + columns
Expand Down
45 changes: 45 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,51 @@ def test_pivot_margins_name_unicode(self):
expected = pd.DataFrame(index=index)
tm.assert_frame_equal(table, expected)

def test_pivot_string_as_func(self):
# GH #18713
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that this test is for correctness.

# for correctness purposes
data = DataFrame({'A': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
'bar', 'bar', 'foo', 'foo', 'foo'],
'B': ['one', 'one', 'one', 'two', 'one', 'one',
'one', 'two', 'two', 'two', 'one'],
'C': range(11)})

result = pivot_table(data, index='A', columns='B', aggfunc='sum')
mi = MultiIndex(levels=[['C'], ['one', 'two']],
labels=[[0, 0], [0, 1]], names=[None, 'B'])
expected = DataFrame({('C', 'one'): {'bar': 15, 'foo': 13},
('C', 'two'): {'bar': 7, 'foo': 20}},
columns=mi).rename_axis('A')
tm.assert_frame_equal(result, expected)

result = pivot_table(data, index='A', columns='B',
aggfunc=['sum', 'mean'])
mi = MultiIndex(levels=[['sum', 'mean'], ['C'], ['one', 'two']],
labels=[[0, 0, 1, 1], [0, 0, 0, 0], [0, 1, 0, 1]],
names=[None, None, 'B'])
expected = DataFrame({('mean', 'C', 'one'): {'bar': 5.0, 'foo': 3.25},
('mean', 'C', 'two'): {'bar': 7.0,
'foo': 6.666666666666667},
('sum', 'C', 'one'): {'bar': 15, 'foo': 13},
('sum', 'C', 'two'): {'bar': 7, 'foo': 20}},
columns=mi).rename_axis('A')
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize('f, f_numpy',
[('sum', np.sum),
('mean', np.mean),
('std', np.std),
(['sum', 'mean'], [np.sum, np.mean]),
(['sum', 'std'], [np.sum, np.std]),
(['std', 'mean'], [np.std, np.mean])])
def test_pivot_string_func_vs_func(self, f, f_numpy):
# GH #18713
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that this test is for consistency purposes.

# for consistency purposes
result = pivot_table(self.data, index='A', columns='B', aggfunc=f)
expected = pivot_table(self.data, index='A', columns='B',
aggfunc=f_numpy)
tm.assert_frame_equal(result, expected)

Copy link
Member

@gfyoung gfyoung Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you're testing here, but I would modify a couple of things:

  1. Add a newline after the first tm.assert_frame_equal. It makes it easier to read.
  2. Construct the output explicitly using the DataFrame constructor (and check against that too). At this point, I know for example that f('sum') and f(np.sum) should be the same, but I don't know what the actual result is. Here is how I might go about writing the first half of this test.
...
result = f('sum')
func_result = f(np.sum)
expected = DataFrame(...)
tm.assert_frame_equal(result, func_result)  # consistency
tm.assert_frame_equal(result, expected)  # correctness

Copy link
Contributor

@jreback jreback Dec 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can parametrize this test by passing these functions in as well

something like

@pytest.mark.parametrize("f, f_numpy", [('sum', np.sum), ('mean', np.mean'), ('std', np.std])
def test_pivot_func_string(self, f, f_numpy):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, both, I'll incorporate these changes


class TestCrosstab(object):

Expand Down