-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: DataFrame.pivot accepts a list of values #18636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DataFrame.pivot accepts a list of values #18636
Conversation
pandas/tests/reshape/test_pivot.py
Outdated
'baz': [1, 2, 3, 4, 5, 6], | ||
'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) | ||
|
||
results = df.pivot(index='zoo', columns='foo', values=['bar', 'baz']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result=
pandas/tests/reshape/test_pivot.py
Outdated
|
||
results = df.pivot(index='zoo', columns='foo', values=['bar', 'baz']) | ||
|
||
data = [[None, 'A', None, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use np.nan rather than None
pandas/tests/reshape/test_pivot.py
Outdated
@@ -353,6 +353,29 @@ def test_pivot_periods(self): | |||
pv = df.pivot(index='p1', columns='p2', values='data1') | |||
tm.assert_frame_equal(pv, expected) | |||
|
|||
def test_pivot_with_multi_values(self): | |||
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment
pandas/tests/reshape/test_pivot.py
Outdated
@@ -353,6 +353,29 @@ def test_pivot_periods(self): | |||
pv = df.pivot(index='p1', columns='p2', values='data1') | |||
tm.assert_frame_equal(pv, expected) | |||
|
|||
def test_pivot_with_multi_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say with list_like_values rather than multi_values
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -129,6 +124,10 @@ Other API Changes | |||
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`) | |||
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`) | |||
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`) | |||
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you picked up some other commits here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are merged on master now, so I think merging them as is won't harm (or duplicate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it will do harm or not, but can you still fix this? (it makes reviewing harder as there are unrelated changes)
In principle
git fetch upstream
git merge upstream/master
should do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged
pandas/tests/reshape/test_pivot.py
Outdated
[None, 'C', None, 6], | ||
[None, 'B', None, 5], | ||
['A', None, 1, None], | ||
['B', None, 2, None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parmaterize this with values a list, tuple, np.array, pd.Series, pd.Index (should all act the same)
pandas/core/reshape/reshape.py
Outdated
index = self.index | ||
index = self.index if index is None else self[index] | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
if isinstance(values, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_list_like
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here on what is going on
Hello @ibrahimsharaf! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 20, 2018 at 15:31 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #18636 +/- ##
==========================================
- Coverage 91.59% 91.55% -0.04%
==========================================
Files 155 155
Lines 51255 51255
==========================================
- Hits 46948 46929 -19
- Misses 4307 4326 +19
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18636 +/- ##
==========================================
- Coverage 91.8% 91.77% -0.03%
==========================================
Files 152 152
Lines 49215 49217 +2
==========================================
- Hits 45181 45171 -10
- Misses 4034 4046 +12
Continue to review full report at Codecov.
|
@jreback should the documentation page be modified at this PR? |
Yes, you need to update the docstring / tutorial docs in this PR. Can you show an actual example of what this PR can do? (eg show the result of the df in the test) As the test example result looks a bit strange from a quick look (all the NaNs) |
@jreback can you help me find the reason of test failing (Travis log)? it happens when I pass the values columns as a tuple in those lines (test, def) I tried adding tuple type in this condition (core/frame.py), it passed this test, but failed many others. |
The reason of the error is that a tuple |
So what you are saying is that dealing with a tuple of values should be split from the |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -139,6 +139,8 @@ Other Enhancements | |||
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`) | |||
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`) | |||
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`) | |||
- :func:`DataFrame.pivot` now accepts a list of values (:issue:`17160`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add for the values=
kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now accepts a list for the values=
kwarg.
pandas/core/frame.py
Outdated
@@ -4390,6 +4391,19 @@ def pivot(self, index=None, columns=None, values=None): | |||
one 1 2 3 | |||
two 4 5 6 | |||
|
|||
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show an example using a single value first
pandas/core/reshape/reshape.py
Outdated
index = self.index | ||
index = self.index if index is None else self[index] | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
if is_list_like(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need:
is_list_like(values) and not isinstance(values, tuple)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you need to check if its a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm trying to figure out why the MultiIndex with a tuple of values test is giving me an error, would be great if you give me a hand!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/tests/reshape/test_pivot.py
Outdated
'baz': [1, 2, 3, 4, 5, 6], | ||
'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) | ||
|
||
result_list = df.pivot(index='zoo', columns='foo', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls parametrize on these types, somethign like
@pytest.mark.parametrize("constructor", [pd.Series, pd.idnex, np.array, list]))
['B', np.nan, 2, np.nan], | ||
['C', np.nan, 3, np.nan]] | ||
index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo') | ||
columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test with a MultiIndex and pass values as a tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you looking for something like the following? (please correct me if I am wrong)
bar baz
first second first second
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
then pivot it:
df.pivot(index=('bar', 'first'), columns=('bar', 'second'), values=('baz', 'first'))
so the output would be:
A B C
one 1 2 3
two 4 5 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to merge master and push your changes.
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -139,6 +139,8 @@ Other Enhancements | |||
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`) | |||
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`) | |||
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`) | |||
- :func:`DataFrame.pivot` now accepts a list of values (:issue:`17160`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now accepts a list for the values=
kwarg.
pandas/core/frame.py
Outdated
values : string or object, optional | ||
Column name to use for populating new frame's values. If not | ||
values : string, object or a list of the previous, optional | ||
Column name(s) to use for populating new frame's values. If not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add (0.23.0) after 'list of the previous'
pandas/core/frame.py
Outdated
one 1 2 3 x y z | ||
two 4 5 6 q w t | ||
|
||
>>> df.pivot(index='zoo', columns='foo', values=['bar', 'baz']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't show the same example, rather show one with values = a single value
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -240,4 +240,4 @@ With conda, use | |||
|
|||
Note that the inconsistency in the return value for all-*NA* series is still | |||
there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with | |||
the empty case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pandas/core/frame.py
Outdated
@@ -4355,8 +4355,8 @@ def pivot(self, index=None, columns=None, values=None): | |||
existing index. | |||
columns : string or object | |||
Column name to use to make new frame's columns | |||
values : string or object, optional | |||
Column name to use for populating new frame's values. If not | |||
values : string, object or (0.23.0) a list of the previous, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the version after the word list
pandas/core/frame.py
Outdated
@@ -4401,6 +4402,15 @@ def pivot(self, index=None, columns=None, values=None): | |||
one 1 2 3 | |||
two 4 5 6 | |||
|
|||
>>> df.pivot(index='foo', columns='bar', values=['baz']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh we already have this example, I c, ok you can remove this one (with ['bar']
)
pandas/core/reshape/reshape.py
Outdated
index = self.index | ||
index = self.index if index is None else self[index] | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
if is_list_like(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you need to check if its a tuple?
'baz': [1, 2, 3, 4, 5, 6], | ||
'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) | ||
|
||
result = df.pivot(index='zoo', columns='foo', values=values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test with values as a tuple (it should fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. looks good otherwise. ping on green.
return indexed.unstack(columns) | ||
index = MultiIndex.from_arrays([index, self[columns]]) | ||
|
||
if is_list_like(values) and not isinstance(values, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on why excluding tuples
pandas/tests/reshape/test_pivot.py
Outdated
'zoo': ['x', 'y', 'z', 'q', 'w', 't']}) | ||
with pytest.raises(KeyError): | ||
# tuple is seen as a single column name | ||
result = df.pivot(index='zoo', columns='foo', values=('bar', 'baz')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don;t need the result = here (its a lint error)
Hi @jreback, thanks for reviewing. We still have a KeyError exception with the MultiIndex failing test, how do you think it should be fixed? |
can u paste what case is failing |
#19966 for the isssue causing that test to fail. We can see how hard that is to fix. I'd say just xfail the test for now, rather than trying to work around it. |
pandas/core/frame.py
Outdated
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']) | ||
A B C A B C | ||
one 1 2 3 x y z | ||
two 4 5 6 q w t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output doesn't seem correct. Should there be a mult-indexed columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Fixed.
[np.nan, 'B', np.nan, 5], | ||
['A', np.nan, 1, np.nan], | ||
['B', np.nan, 2, np.nan], | ||
['C', np.nan, 3, np.nan]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are all the NaNs coming from? They don't seem to be there in the docstring example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, because here 'zoo' is used for index and not 'foo'.
Can you add a test for that as well?
Shall you guys take a look? @jreback @jorisvandenbossche @TomAugspurger |
lgtm @ibrahimsharaf @TomAugspurger @jorisvandenbossche might have some comments |
pandas/core/frame.py
Outdated
Column to use to make new frame's columns. | ||
values : string or object, optional | ||
Column to use for populating new frame's values. If not | ||
Column name to use to make new frame's columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably due to merging master, but you can you undo this change? (there have been changes to this docstring on master, and you have accidentally reverted some of those changes)
pandas/core/frame.py
Outdated
Column to use for populating new frame's values. If not | ||
Column name to use to make new frame's columns | ||
values : string, object or a list (0.23.0) of the previous, optional | ||
Column name(s) to use for populating new frame's values. If not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column name(s)
-> Column(s)
to be consistent with above
pandas/core/frame.py
Outdated
values : string or object, optional | ||
Column to use for populating new frame's values. If not | ||
Column name to use to make new frame's columns | ||
values : string, object or a list (0.23.0) of the previous, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mentioning the (0.23.0)
here, can you instead add a
.. versionchanged:: 0.23.0
Also accept list of column names.
to the parameter description?
pandas/core/frame.py
Outdated
@@ -5011,8 +5012,14 @@ def pivot(self, index=None, columns=None, values=None): | |||
one 1 2 3 | |||
two 4 5 6 | |||
|
|||
A ValueError is raised if there are any duplicates. | |||
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']) | |||
baz zoo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment seems a bit off here. I think the 'baz' should align with the 'A' below. Best to check how it is in the console output.
pandas/core/frame.py
Outdated
Notice that the first two rows are the same for our `index` | ||
and `columns` arguments. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also undo this removal of blank lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fixed now @jorisvandenbossche.
Can you @jreback @jorisvandenbossche restart Travis build? |
ping @jorisvandenbossche |
lgtm. @jorisvandenbossche merge when ok. |
@ibrahimsharaf Thanks a lot! |
DataFrame.pivot
accepts a list of column names as values #17160git diff upstream/master -u -- "*.py" | flake8 --diff