Skip to content

ENH: DataFrame.pivot accepts a list of values #18636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Mar 26, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b74ee0f
add pivot with multi-values
ibrahimsharaf Dec 4, 2017
a36f9e0
update whatsnew
ibrahimsharaf Dec 4, 2017
5f94728
fix review comments
ibrahimsharaf Dec 5, 2017
3008d8e
PEP8 fixes
ibrahimsharaf Dec 5, 2017
b3ea1c2
merge master
ibrahimsharaf Dec 8, 2017
539ffdc
merge master
ibrahimsharaf Dec 8, 2017
d176585
Merge branch 'master' into pivot_multi
ibrahimsharaf Dec 16, 2017
6646798
remove tuple from test
ibrahimsharaf Dec 16, 2017
ea77a97
update pivot docstring
ibrahimsharaf Dec 16, 2017
1d6bf58
remove unused import
ibrahimsharaf Dec 16, 2017
c000811
Merge branch 'master' of https://github.com/pandas-dev/pandas into pi…
ibrahimsharaf Dec 19, 2017
c750807
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
d3a7bec
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
c50b2dd
Merge branch 'pivot_multi' of https://github.com/ibrahimsharaf/pandas…
ibrahimsharaf Jan 2, 2018
df2f0b0
Push requested changes
ibrahimsharaf Jan 2, 2018
bb85875
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Jan 4, 2018
8f8b45f
Revert whatsnew v0.22.0
ibrahimsharaf Jan 4, 2018
99abef4
Add two more tests
ibrahimsharaf Jan 4, 2018
41ad9c0
PEP8
ibrahimsharaf Jan 4, 2018
2f5d6f7
Merge remote-tracking branch 'upstream/master' into ibrahimsharaf-piv…
TomAugspurger Feb 27, 2018
516690c
Use pytest raises instead of xfail
ibrahimsharaf Mar 1, 2018
e30fd1c
Remove unnecessary code
ibrahimsharaf Mar 1, 2018
eb9d85f
Fix review comments
ibrahimsharaf Mar 2, 2018
786e5f7
Merge and resolve
ibrahimsharaf Mar 18, 2018
8ea45f8
xfail multiindex test
ibrahimsharaf Mar 18, 2018
3825c9a
Add additional test
ibrahimsharaf Mar 18, 2018
8e54fc9
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Mar 20, 2018
e293741
Review changes
ibrahimsharaf Mar 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -240,4 +240,4 @@ With conda, use

Note that the inconsistency in the return value for all-*NA* series is still
there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with
the empty case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

the empty case.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ Other Enhancements
- ``Resampler`` objects now have a functioning :attr:`~pandas.core.resample.Resampler.pipe` method.
Previously, calls to ``pipe`` were diverted to the ``mean`` method (:issue:`17905`).
- :func:`~pandas.api.types.is_scalar` now returns ``True`` for ``DateOffset`` objects (:issue:`18943`).
- :func:`DataFrame.pivot` now accepts a list for the ``values=`` kwarg (:issue:`17160`).

.. _whatsnew_0230.api_breaking:

Expand Down
30 changes: 20 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4355,8 +4355,8 @@ def pivot(self, index=None, columns=None, values=None):
existing index.
columns : string or object
Column name to use to make new frame's columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably due to merging master, but you can you undo this change? (there have been changes to this docstring on master, and you have accidentally reverted some of those changes)

values : string or object, optional
Column name to use for populating new frame's values. If not
values : string, object or (0.23.0) a list of the previous, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the version after the word list

Column name(s) to use for populating new frame's values. If not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add (0.23.0) after 'list of the previous'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column name(s) -> Column(s) to be consistent with above

specified, all remaining columns will be used and the result will
have hierarchically indexed columns

Expand All @@ -4381,15 +4381,16 @@ def pivot(self, index=None, columns=None, values=None):

>>> df = pd.DataFrame({'foo': ['one','one','one','two','two','two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6]})
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz
0 one A 1
1 one B 2
2 one C 3
3 two A 4
4 two B 5
5 two C 6
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t

>>> df.pivot(index='foo', columns='bar', values='baz')
A B C
Expand All @@ -4401,6 +4402,15 @@ def pivot(self, index=None, columns=None, values=None):
one 1 2 3
two 4 5 6

>>> df.pivot(index='foo', columns='bar', values=['baz'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh we already have this example, I c, ok you can remove this one (with ['bar'])

A B C
one 1 2 3
two 4 5 6

>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show an example using a single value first

A B C A B C
one 1 2 3 x y z
two 4 5 6 q w t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output doesn't seem correct. Should there be a mult-indexed columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Fixed.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also undo this removal of blank lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fixed now @jorisvandenbossche.

"""
from pandas.core.reshape.reshape import pivot
Expand Down
17 changes: 10 additions & 7 deletions pandas/core/reshape/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,15 +368,18 @@ def pivot(self, index=None, columns=None, values=None):
cols = [columns] if index is None else [index, columns]
append = index is None
indexed = self.set_index(cols, append=append)
return indexed.unstack(columns)
else:
if index is None:
index = self.index
index = self.index if index is None else self[index]
index = MultiIndex.from_arrays([index, self[columns]])
if is_list_like(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need:

is_list_like(values) and not isinstance(values, tuple)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you need to check if its a tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm trying to figure out why the MultiIndex with a tuple of values test is giving me an error, would be great if you give me a hand!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# use DF in case of Iterable values (e.g: list, Series, np.array)
indexed = DataFrame(self[values].values,
index=index,
columns=values)
else:
index = self[index]
indexed = Series(self[values].values,
index=MultiIndex.from_arrays([index, self[columns]]))
return indexed.unstack(columns)
indexed = Series(self[values].values,
index=index)
return indexed.unstack(columns)


def pivot_simple(index, columns, values):
Expand Down
27 changes: 27 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,33 @@ def test_pivot_periods(self):
pv = df.pivot(index='p1', columns='p2', values='data1')
tm.assert_frame_equal(pv, expected)

@pytest.mark.parametrize('values', [
['bar', 'baz'], np.array(['bar', 'baz']),
pd.Series(['bar', 'baz']), pd.Index(['bar', 'baz'])
])
def test_pivot_with_list_like_values(self, values):
# issue #17160
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='zoo', columns='foo', values=values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with values as a tuple (it should fail)


data = [[np.nan, 'A', np.nan, 4],
[np.nan, 'C', np.nan, 6],
[np.nan, 'B', np.nan, 5],
['A', np.nan, 1, np.nan],
['B', np.nan, 2, np.nan],
['C', np.nan, 3, np.nan]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are all the NaNs coming from? They don't seem to be there in the docstring example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, because here 'zoo' is used for index and not 'foo'.
Can you add a test for that as well?

index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo')
columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with a MultiIndex and pass values as a tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you looking for something like the following? (please correct me if I am wrong)

    bar          baz       
  first second first second
0   one      A     1      x
1   one      B     2      y
2   one      C     3      z
3   two      A     4      q
4   two      B     5      w
5   two      C     6      t 

then pivot it:

df.pivot(index=('bar', 'first'), columns=('bar', 'second'), values=('baz', 'first'))

so the output would be:

     A   B   C
one  1   2   3
two  4   5   6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[None, 'foo'])
expected = DataFrame(data=data, index=index,
columns=columns, dtype='object')
tm.assert_frame_equal(result, expected)

def test_margins(self):
def _check_output(result, values_col, index=['A', 'B'],
columns=['C'],
Expand Down