Skip to content

ENH: DataFrame.pivot accepts a list of values #18636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Mar 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b74ee0f
add pivot with multi-values
ibrahimsharaf Dec 4, 2017
a36f9e0
update whatsnew
ibrahimsharaf Dec 4, 2017
5f94728
fix review comments
ibrahimsharaf Dec 5, 2017
3008d8e
PEP8 fixes
ibrahimsharaf Dec 5, 2017
b3ea1c2
merge master
ibrahimsharaf Dec 8, 2017
539ffdc
merge master
ibrahimsharaf Dec 8, 2017
d176585
Merge branch 'master' into pivot_multi
ibrahimsharaf Dec 16, 2017
6646798
remove tuple from test
ibrahimsharaf Dec 16, 2017
ea77a97
update pivot docstring
ibrahimsharaf Dec 16, 2017
1d6bf58
remove unused import
ibrahimsharaf Dec 16, 2017
c000811
Merge branch 'master' of https://github.com/pandas-dev/pandas into pi…
ibrahimsharaf Dec 19, 2017
c750807
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
d3a7bec
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
c50b2dd
Merge branch 'pivot_multi' of https://github.com/ibrahimsharaf/pandas…
ibrahimsharaf Jan 2, 2018
df2f0b0
Push requested changes
ibrahimsharaf Jan 2, 2018
bb85875
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Jan 4, 2018
8f8b45f
Revert whatsnew v0.22.0
ibrahimsharaf Jan 4, 2018
99abef4
Add two more tests
ibrahimsharaf Jan 4, 2018
41ad9c0
PEP8
ibrahimsharaf Jan 4, 2018
2f5d6f7
Merge remote-tracking branch 'upstream/master' into ibrahimsharaf-piv…
TomAugspurger Feb 27, 2018
516690c
Use pytest raises instead of xfail
ibrahimsharaf Mar 1, 2018
e30fd1c
Remove unnecessary code
ibrahimsharaf Mar 1, 2018
eb9d85f
Fix review comments
ibrahimsharaf Mar 2, 2018
786e5f7
Merge and resolve
ibrahimsharaf Mar 18, 2018
8ea45f8
xfail multiindex test
ibrahimsharaf Mar 18, 2018
3825c9a
Add additional test
ibrahimsharaf Mar 18, 2018
8e54fc9
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Mar 20, 2018
e293741
Review changes
ibrahimsharaf Mar 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,7 @@ Other Enhancements
- ``Resampler`` objects now have a functioning :attr:`~pandas.core.resample.Resampler.pipe` method.
Previously, calls to ``pipe`` were diverted to the ``mean`` method (:issue:`17905`).
- :func:`~pandas.api.types.is_scalar` now returns ``True`` for ``DateOffset`` objects (:issue:`18943`).
- :func:`DataFrame.pivot` now accepts a list for the ``values=`` kwarg (:issue:`17160`).
- Added :func:`pandas.api.extensions.register_dataframe_accessor`,
:func:`pandas.api.extensions.register_series_accessor`, and
:func:`pandas.api.extensions.register_index_accessor`, accessor for libraries downstream of pandas
Expand Down
31 changes: 21 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4956,11 +4956,14 @@ def pivot(self, index=None, columns=None, values=None):
existing index.
columns : string or object
Column to use to make new frame's columns.
values : string or object, optional
Column to use for populating new frame's values. If not
values : string, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not
specified, all remaining columns will be used and the result will
have hierarchically indexed columns.

.. versionchanged :: 0.23.0
Also accept list of column names.

Returns
-------
DataFrame
Expand Down Expand Up @@ -4989,15 +4992,16 @@ def pivot(self, index=None, columns=None, values=None):
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6]})
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz
0 one A 1
1 one B 2
2 one C 3
3 two A 4
4 two B 5
5 two C 6
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t

>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
Expand All @@ -5011,6 +5015,13 @@ def pivot(self, index=None, columns=None, values=None):
one 1 2 3
two 4 5 6

>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
baz zoo
bar A B C A B C
foo
one 1 2 3 x y z
two 4 5 6 q w t

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
Expand Down
15 changes: 10 additions & 5 deletions pandas/core/reshape/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,16 +392,21 @@ def pivot(self, index=None, columns=None, values=None):
cols = [columns] if index is None else [index, columns]
append = index is None
indexed = self.set_index(cols, append=append)
return indexed.unstack(columns)
else:
if index is None:
index = self.index
else:
index = self[index]
indexed = self._constructor_sliced(
self[values].values,
index=MultiIndex.from_arrays([index, self[columns]]))
return indexed.unstack(columns)
index = MultiIndex.from_arrays([index, self[columns]])

if is_list_like(values) and not isinstance(values, tuple):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on why excluding tuples

# Exclude tuple because it is seen as a single column name
indexed = self._constructor(self[values].values, index=index,
columns=values)
else:
indexed = self._constructor_sliced(self[values].values,
index=index)
return indexed.unstack(columns)


def pivot_simple(index, columns, values):
Expand Down
83 changes: 83 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,89 @@ def test_pivot_periods(self):
pv = df.pivot(index='p1', columns='p2', values='data1')
tm.assert_frame_equal(pv, expected)

@pytest.mark.parametrize('values', [
['baz', 'zoo'], np.array(['baz', 'zoo']),
pd.Series(['baz', 'zoo']), pd.Index(['baz', 'zoo'])
])
def test_pivot_with_list_like_values(self, values):
# issue #17160
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='foo', columns='bar', values=values)

data = [[1, 2, 3, 'x', 'y', 'z'],
[4, 5, 6, 'q', 'w', 't']]
index = Index(data=['one', 'two'], name='foo')
columns = MultiIndex(levels=[['baz', 'zoo'], ['A', 'B', 'C']],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=[None, 'bar'])
expected = DataFrame(data=data, index=index,
columns=columns, dtype='object')
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize('values', [
['bar', 'baz'], np.array(['bar', 'baz']),
pd.Series(['bar', 'baz']), pd.Index(['bar', 'baz'])
])
def test_pivot_with_list_like_values_nans(self, values):
# issue #17160
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='zoo', columns='foo', values=values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with values as a tuple (it should fail)


data = [[np.nan, 'A', np.nan, 4],
[np.nan, 'C', np.nan, 6],
[np.nan, 'B', np.nan, 5],
['A', np.nan, 1, np.nan],
['B', np.nan, 2, np.nan],
['C', np.nan, 3, np.nan]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are all the NaNs coming from? They don't seem to be there in the docstring example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, because here 'zoo' is used for index and not 'foo'.
Can you add a test for that as well?

index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo')
columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with a MultiIndex and pass values as a tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you looking for something like the following? (please correct me if I am wrong)

    bar          baz       
  first second first second
0   one      A     1      x
1   one      B     2      y
2   one      C     3      z
3   two      A     4      q
4   two      B     5      w
5   two      C     6      t 

then pivot it:

df.pivot(index=('bar', 'first'), columns=('bar', 'second'), values=('baz', 'first'))

so the output would be:

     A   B   C
one  1   2   3
two  4   5   6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[None, 'foo'])
expected = DataFrame(data=data, index=index,
columns=columns, dtype='object')
tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(reason='MultiIndexed unstack with tuple names fails'
'with KeyError #19966')
def test_pivot_with_multiindex(self):
# issue #17160
index = Index(data=[0, 1, 2, 3, 4, 5])
data = [['one', 'A', 1, 'x'],
['one', 'B', 2, 'y'],
['one', 'C', 3, 'z'],
['two', 'A', 4, 'q'],
['two', 'B', 5, 'w'],
['two', 'C', 6, 't']]
columns = MultiIndex(levels=[['bar', 'baz'], ['first', 'second']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = DataFrame(data=data, index=index, columns=columns, dtype='object')
result = df.pivot(index=('bar', 'first'), columns=('bar', 'second'),
values=('baz', 'first'))

data = {'A': Series([1, 4], index=['one', 'two']),
'B': Series([2, 5], index=['one', 'two']),
'C': Series([3, 6], index=['one', 'two'])}
expected = DataFrame(data)
tm.assert_frame_equal(result, expected)

def test_pivot_with_tuple_of_values(self):
# issue #17160
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
with pytest.raises(KeyError):
# tuple is seen as a single column name
df.pivot(index='zoo', columns='foo', values=('bar', 'baz'))

def test_margins(self):
def _check_output(result, values_col, index=['A', 'B'],
columns=['C'],
Expand Down