Skip to content

ENH: DataFrame.pivot accepts a list of values #18636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Mar 26, 2018

Conversation

ibrahimsharaf
Copy link
Contributor

@ibrahimsharaf ibrahimsharaf commented Dec 4, 2017

@ibrahimsharaf ibrahimsharaf changed the title add pivot with multi-values DataFrame.pivot accepts a list of values Dec 4, 2017
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

results = df.pivot(index='zoo', columns='foo', values=['bar', 'baz'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=


results = df.pivot(index='zoo', columns='foo', values=['bar', 'baz'])

data = [[None, 'A', None, 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use np.nan rather than None

@@ -353,6 +353,29 @@ def test_pivot_periods(self):
pv = df.pivot(index='p1', columns='p2', values='data1')
tm.assert_frame_equal(pv, expected)

def test_pivot_with_multi_values(self):
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

@@ -353,6 +353,29 @@ def test_pivot_periods(self):
pv = df.pivot(index='p1', columns='p2', values='data1')
tm.assert_frame_equal(pv, expected)

def test_pivot_with_multi_values(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say with list_like_values rather than multi_values

@@ -129,6 +124,10 @@ Other API Changes
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you picked up some other commits here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are merged on master now, so I think merging them as is won't harm (or duplicate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it will do harm or not, but can you still fix this? (it makes reviewing harder as there are unrelated changes)
In principle

git fetch upstream
git merge upstream/master

should do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged

[None, 'C', None, 6],
[None, 'B', None, 5],
['A', None, 1, None],
['B', None, 2, None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parmaterize this with values a list, tuple, np.array, pd.Series, pd.Index (should all act the same)

index = self.index
index = self.index if index is None else self[index]
index = MultiIndex.from_arrays([index, self[columns]])
if isinstance(values, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_list_like here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here on what is going on

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 5, 2017
@pep8speaks
Copy link

pep8speaks commented Dec 5, 2017

Hello @ibrahimsharaf! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 20, 2018 at 15:31 Hours UTC

@codecov
Copy link

codecov bot commented Dec 5, 2017

Codecov Report

Merging #18636 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18636      +/-   ##
==========================================
- Coverage   91.59%   91.55%   -0.04%     
==========================================
  Files         155      155              
  Lines       51255    51255              
==========================================
- Hits        46948    46929      -19     
- Misses       4307     4326      +19
Flag Coverage Δ
#multiple 89.42% <100%> (-0.02%) ⬇️
#single 40.67% <0%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/reshape/reshape.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.44% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
pandas/util/testing.py 82.01% <0%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a764663...5f94728. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 5, 2017

Codecov Report

Merging #18636 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18636      +/-   ##
==========================================
- Coverage    91.8%   91.77%   -0.03%     
==========================================
  Files         152      152              
  Lines       49215    49217       +2     
==========================================
- Hits        45181    45171      -10     
- Misses       4034     4046      +12
Flag Coverage Δ
#multiple 90.16% <100%> (-0.03%) ⬇️
#single 41.84% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <ø> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01882ba...e293741. Read the comment docs.

@ibrahimsharaf
Copy link
Contributor Author

@jreback should the documentation page be modified at this PR?

@jorisvandenbossche
Copy link
Member

@jreback should the documentation page be modified at this PR?

Yes, you need to update the docstring / tutorial docs in this PR.

Can you show an actual example of what this PR can do? (eg show the result of the df in the test) As the test example result looks a bit strange from a quick look (all the NaNs)

@ibrahimsharaf
Copy link
Contributor Author

ibrahimsharaf commented Dec 8, 2017

@jreback can you help me find the reason of test failing (Travis log)? it happens when I pass the values columns as a tuple in those lines (test, def)

I tried adding tuple type in this condition (core/frame.py), it passed this test, but failed many others.

@jorisvandenbossche
Copy link
Member

The reason of the error is that a tuple ('bar', 'baz') is seen as a single column name (for a MultiIndex), which is the correct behaviour IMO. So you shouldn't try to interpret a list the same as a tuple.

@ibrahimsharaf
Copy link
Contributor Author

ibrahimsharaf commented Dec 8, 2017

So what you are saying is that dealing with a tuple of values should be split from the is_list_like in pivot function?

@@ -139,6 +139,8 @@ Other Enhancements
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`)
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`)
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`)
- :func:`DataFrame.pivot` now accepts a list of values (:issue:`17160`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add for the values= kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now accepts a list for the values= kwarg.

@@ -4390,6 +4391,19 @@ def pivot(self, index=None, columns=None, values=None):
one 1 2 3
two 4 5 6

>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show an example using a single value first

index = self.index
index = self.index if index is None else self[index]
index = MultiIndex.from_arrays([index, self[columns]])
if is_list_like(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need:

is_list_like(values) and not isinstance(values, tuple)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you need to check if its a tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm trying to figure out why the MultiIndex with a tuple of values test is giving me an error, would be great if you give me a hand!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result_list = df.pivot(index='zoo', columns='foo',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls parametrize on these types, somethign like

@pytest.mark.parametrize("constructor", [pd.Series, pd.idnex, np.array, list]))

['B', np.nan, 2, np.nan],
['C', np.nan, 3, np.nan]]
index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo')
columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with a MultiIndex and pass values as a tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you looking for something like the following? (please correct me if I am wrong)

    bar          baz       
  first second first second
0   one      A     1      x
1   one      B     2      y
2   one      C     3      z
3   two      A     4      q
4   two      B     5      w
5   two      C     6      t 

then pivot it:

df.pivot(index=('bar', 'first'), columns=('bar', 'second'), values=('baz', 'first'))

so the output would be:

     A   B   C
one  1   2   3
two  4   5   6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to merge master and push your changes.

@@ -139,6 +139,8 @@ Other Enhancements
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`)
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`)
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`)
- :func:`DataFrame.pivot` now accepts a list of values (:issue:`17160`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now accepts a list for the values= kwarg.

values : string or object, optional
Column name to use for populating new frame's values. If not
values : string, object or a list of the previous, optional
Column name(s) to use for populating new frame's values. If not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add (0.23.0) after 'list of the previous'

one 1 2 3 x y z
two 4 5 6 q w t

>>> df.pivot(index='zoo', columns='foo', values=['bar', 'baz'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't show the same example, rather show one with values = a single value

@@ -240,4 +240,4 @@ With conda, use

Note that the inconsistency in the return value for all-*NA* series is still
there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with
the empty case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -4355,8 +4355,8 @@ def pivot(self, index=None, columns=None, values=None):
existing index.
columns : string or object
Column name to use to make new frame's columns
values : string or object, optional
Column name to use for populating new frame's values. If not
values : string, object or (0.23.0) a list of the previous, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the version after the word list

@@ -4401,6 +4402,15 @@ def pivot(self, index=None, columns=None, values=None):
one 1 2 3
two 4 5 6

>>> df.pivot(index='foo', columns='bar', values=['baz'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh we already have this example, I c, ok you can remove this one (with ['bar'])

index = self.index
index = self.index if index is None else self[index]
index = MultiIndex.from_arrays([index, self[columns]])
if is_list_like(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you need to check if its a tuple?

'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

result = df.pivot(index='zoo', columns='foo', values=values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with values as a tuple (it should fail)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments. looks good otherwise. ping on green.

return indexed.unstack(columns)
index = MultiIndex.from_arrays([index, self[columns]])

if is_list_like(values) and not isinstance(values, tuple):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on why excluding tuples

'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
with pytest.raises(KeyError):
# tuple is seen as a single column name
result = df.pivot(index='zoo', columns='foo', values=('bar', 'baz'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don;t need the result = here (its a lint error)

@jreback jreback added this to the 0.23.0 milestone Mar 1, 2018
@ibrahimsharaf
Copy link
Contributor Author

ibrahimsharaf commented Mar 2, 2018

Hi @jreback, thanks for reviewing. We still have a KeyError exception with the MultiIndex failing test, how do you think it should be fixed?

@jreback
Copy link
Contributor

jreback commented Mar 2, 2018

can u paste what case is failing

@ibrahimsharaf
Copy link
Contributor Author

@TomAugspurger
Copy link
Contributor

#19966 for the isssue causing that test to fail.

We can see how hard that is to fix. I'd say just xfail the test for now, rather than trying to work around it.

>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
A B C A B C
one 1 2 3 x y z
two 4 5 6 q w t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output doesn't seem correct. Should there be a mult-indexed columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Fixed.

[np.nan, 'B', np.nan, 5],
['A', np.nan, 1, np.nan],
['B', np.nan, 2, np.nan],
['C', np.nan, 3, np.nan]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are all the NaNs coming from? They don't seem to be there in the docstring example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, because here 'zoo' is used for index and not 'foo'.
Can you add a test for that as well?

@ibrahimsharaf
Copy link
Contributor Author

Shall you guys take a look? @jreback @jorisvandenbossche @TomAugspurger

@jreback
Copy link
Contributor

jreback commented Mar 20, 2018

lgtm @ibrahimsharaf

@TomAugspurger @jorisvandenbossche might have some comments

Column to use to make new frame's columns.
values : string or object, optional
Column to use for populating new frame's values. If not
Column name to use to make new frame's columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably due to merging master, but you can you undo this change? (there have been changes to this docstring on master, and you have accidentally reverted some of those changes)

Column to use for populating new frame's values. If not
Column name to use to make new frame's columns
values : string, object or a list (0.23.0) of the previous, optional
Column name(s) to use for populating new frame's values. If not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column name(s) -> Column(s) to be consistent with above

values : string or object, optional
Column to use for populating new frame's values. If not
Column name to use to make new frame's columns
values : string, object or a list (0.23.0) of the previous, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mentioning the (0.23.0) here, can you instead add a

.. versionchanged:: 0.23.0
    Also accept list of column names.

to the parameter description?

@@ -5011,8 +5012,14 @@ def pivot(self, index=None, columns=None, values=None):
one 1 2 3
two 4 5 6

A ValueError is raised if there are any duplicates.
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
baz zoo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment seems a bit off here. I think the 'baz' should align with the 'A' below. Best to check how it is in the console output.

Notice that the first two rows are the same for our `index`
and `columns` arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also undo this removal of blank lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fixed now @jorisvandenbossche.

@ibrahimsharaf
Copy link
Contributor Author

Can you @jreback @jorisvandenbossche restart Travis build?

@ibrahimsharaf
Copy link
Contributor Author

ping @jorisvandenbossche

@jreback
Copy link
Contributor

jreback commented Mar 25, 2018

lgtm. @jorisvandenbossche merge when ok.

@jorisvandenbossche jorisvandenbossche changed the title DataFrame.pivot accepts a list of values ENH: DataFrame.pivot accepts a list of values Mar 26, 2018
@jorisvandenbossche jorisvandenbossche merged commit 6c0c277 into pandas-dev:master Mar 26, 2018
@jorisvandenbossche
Copy link
Member

@ibrahimsharaf Thanks a lot!

@ibrahimsharaf ibrahimsharaf deleted the pivot_multi branch March 26, 2018 08:14
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants