Skip to content

ENH: DataFrame.pivot accepts a list of values #18636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Mar 26, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b74ee0f
add pivot with multi-values
ibrahimsharaf Dec 4, 2017
a36f9e0
update whatsnew
ibrahimsharaf Dec 4, 2017
5f94728
fix review comments
ibrahimsharaf Dec 5, 2017
3008d8e
PEP8 fixes
ibrahimsharaf Dec 5, 2017
b3ea1c2
merge master
ibrahimsharaf Dec 8, 2017
539ffdc
merge master
ibrahimsharaf Dec 8, 2017
d176585
Merge branch 'master' into pivot_multi
ibrahimsharaf Dec 16, 2017
6646798
remove tuple from test
ibrahimsharaf Dec 16, 2017
ea77a97
update pivot docstring
ibrahimsharaf Dec 16, 2017
1d6bf58
remove unused import
ibrahimsharaf Dec 16, 2017
c000811
Merge branch 'master' of https://github.com/pandas-dev/pandas into pi…
ibrahimsharaf Dec 19, 2017
c750807
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
d3a7bec
Merge branch 'master' into pivot_multi
ibrahimsharaf Jan 2, 2018
c50b2dd
Merge branch 'pivot_multi' of https://github.com/ibrahimsharaf/pandas…
ibrahimsharaf Jan 2, 2018
df2f0b0
Push requested changes
ibrahimsharaf Jan 2, 2018
bb85875
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Jan 4, 2018
8f8b45f
Revert whatsnew v0.22.0
ibrahimsharaf Jan 4, 2018
99abef4
Add two more tests
ibrahimsharaf Jan 4, 2018
41ad9c0
PEP8
ibrahimsharaf Jan 4, 2018
2f5d6f7
Merge remote-tracking branch 'upstream/master' into ibrahimsharaf-piv…
TomAugspurger Feb 27, 2018
516690c
Use pytest raises instead of xfail
ibrahimsharaf Mar 1, 2018
e30fd1c
Remove unnecessary code
ibrahimsharaf Mar 1, 2018
eb9d85f
Fix review comments
ibrahimsharaf Mar 2, 2018
786e5f7
Merge and resolve
ibrahimsharaf Mar 18, 2018
8ea45f8
xfail multiindex test
ibrahimsharaf Mar 18, 2018
3825c9a
Add additional test
ibrahimsharaf Mar 18, 2018
8e54fc9
Merge remote-tracking branch 'upstream/master' into pivot_multi
ibrahimsharaf Mar 20, 2018
e293741
Review changes
ibrahimsharaf Mar 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,7 @@ hence we'll concentrate our efforts cythonizing these two functions.
Plain cython
~~~~~~~~~~~~

First we're going to need to import the cython magic function to ipython (for
cython versions < 0.21 you can use ``%load_ext cythonmagic``):
First we're going to need to import the cython magic function to ipython:

.. ipython:: python
:okwarning:
Expand Down
2 changes: 1 addition & 1 deletion doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ Optional Dependencies
~~~~~~~~~~~~~~~~~~~~~

* `Cython <http://www.cython.org>`__: Only necessary to build development
version. Version 0.23 or higher.
version. Version 0.24 or higher.
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions, Version 0.14.0 or higher
* `xarray <http://xarray.pydata.org>`__: pandas like handling for > 2 dims, needed for converting Panels to xarray objects. Version 0.7.0 or higher is recommended.
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage. Version 3.0.0 or higher required, Version 3.2.1 or higher highly recommended.
Expand Down
11 changes: 5 additions & 6 deletions doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,16 +77,13 @@ Other Enhancements
- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` for a categorical dtype (:issue:`17033`)
- :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
- Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)
- :func:`DataFrame.pivot` now accepts a list of values (:issue:`17160`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add for the values= kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now accepts a list for the values= kwarg.


.. _whatsnew_0220.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
- :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)

.. _whatsnew_0220.api_breaking.deps:

Dependencies have increased minimum versions
Expand All @@ -104,8 +101,6 @@ If installed, we now require:
+-----------------+-----------------+----------+




.. _whatsnew_0220.api:

Other API Changes
Expand All @@ -129,6 +124,10 @@ Other API Changes
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
- Building from source now explicity requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you picked up some other commits here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are merged on master now, so I think merging them as is won't harm (or duplicate).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it will do harm or not, but can you still fix this? (it makes reviewing harder as there are unrelated changes)
In principle

git fetch upstream
git merge upstream/master

should do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged

- :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)

.. _whatsnew_0220.deprecations:

Expand Down
16 changes: 9 additions & 7 deletions pandas/core/reshape/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,15 +374,17 @@ def pivot(self, index=None, columns=None, values=None):
cols = [columns] if index is None else [index, columns]
append = index is None
indexed = self.set_index(cols, append=append)
return indexed.unstack(columns)
else:
if index is None:
index = self.index
index = self.index if index is None else self[index]
index = MultiIndex.from_arrays([index, self[columns]])
if isinstance(values, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_list_like here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here on what is going on

indexed = DataFrame(self[values].values,
index=index,
columns=values)
else:
index = self[index]
indexed = Series(self[values].values,
index=MultiIndex.from_arrays([index, self[columns]]))
return indexed.unstack(columns)
indexed = Series(self[values].values,
index=index)
return indexed.unstack(columns)


def pivot_simple(index, columns, values):
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,29 @@ def test_pivot_periods(self):
pv = df.pivot(index='p1', columns='p2', values='data1')
tm.assert_frame_equal(pv, expected)

def test_pivot_with_multi_values(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say with list_like_values rather than multi_values

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two', 'two'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

results = df.pivot(index='zoo', columns='foo', values=['bar', 'baz'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=


data = [[None, 'A', None, 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use np.nan rather than None

[None, 'C', None, 6],
[None, 'B', None, 5],
['A', None, 1, None],
['B', None, 2, None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parmaterize this with values a list, tuple, np.array, pd.Series, pd.Index (should all act the same)

['C', None, 3, None]]
index = Index(data=['q', 't', 'w', 'x', 'y', 'z'], name='zoo')
columns = MultiIndex(levels=[['bar', 'baz'], ['one', 'two']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with a MultiIndex and pass values as a tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you looking for something like the following? (please correct me if I am wrong)

    bar          baz       
  first second first second
0   one      A     1      x
1   one      B     2      y
2   one      C     3      z
3   two      A     4      q
4   two      B     5      w
5   two      C     6      t 

then pivot it:

df.pivot(index=('bar', 'first'), columns=('bar', 'second'), values=('baz', 'first'))

so the output would be:

     A   B   C
one  1   2   3
two  4   5   6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[None, 'foo'])
expected = DataFrame(data=data, index=index,
columns=columns, dtype='object')

tm.assert_frame_equal(results, expected)

def test_margins(self):
def _check_output(result, values_col, index=['A', 'B'],
columns=['C'],
Expand Down