Skip to content

ENH: Allow multi values for index and columns in df.pivot #30928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Feb 9, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
7e461a1
remove \n from docstring
charlesdong1991 Dec 3, 2018
1314059
fix conflicts
charlesdong1991 Jan 19, 2019
8bcb313
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jul 30, 2019
7bd097d
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Jan 11, 2020
f7b37d6
Allow multi values for index and columns in pivot
charlesdong1991 Jan 11, 2020
180194d
better naming
charlesdong1991 Jan 11, 2020
98e9730
fix linting
charlesdong1991 Jan 11, 2020
8739957
Add whatsnew note
charlesdong1991 Jan 11, 2020
8a2af11
Update docstring
charlesdong1991 Jan 11, 2020
4cdd17a
fix pep8
charlesdong1991 Jan 11, 2020
ced4ec7
fix pep
charlesdong1991 Jan 11, 2020
7f0ea51
fix linting
charlesdong1991 Jan 12, 2020
abe991b
rebase and fix conflict
charlesdong1991 Jan 18, 2020
d312f56
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Jan 18, 2020
3ed3a60
move from 1.0.0 to 1.1
charlesdong1991 Jan 18, 2020
cc1826e
update doc and add example
charlesdong1991 Jan 18, 2020
311670b
fix pep8
charlesdong1991 Jan 18, 2020
61d32b0
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Jan 21, 2020
3aa04fa
code change on reviews
charlesdong1991 Jan 21, 2020
9f5f170
rename
charlesdong1991 Jan 21, 2020
ce0e85d
merge master and resolve conflict
charlesdong1991 Jan 21, 2020
20a54ba
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Jan 23, 2020
ab20be2
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Jan 25, 2020
c70230b
Merge remote-tracking branch 'upstream/master' into multi_pivot
charlesdong1991 Feb 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,10 @@ Reshaping
- Bug in :meth:`DataFrame.pivot_table` when ``margin`` is ``True`` and only ``column`` is defined (:issue:`31016`)
- Fix incorrect error message in :meth:`DataFrame.pivot` when ``columns`` is set to ``None``. (:issue:`30924`)
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
- :meth:`DataFrame.pivot` can now take lists for ``index`` and ``columns`` arguments (:issue:`21425`)
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)


Sparse
^^^^^^

Expand Down
44 changes: 42 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5896,11 +5896,19 @@ def groupby(

Parameters
----------%s
index : str or object, optional
index : str or object or a list of str, optional
Column to use to make new frame's index. If None, uses
existing index.
columns : str or object

.. versionchanged:: 1.1.0
Also accept list of index names.

columns : str or object or a list of str
Column to use to make new frame's columns.

.. versionchanged:: 1.1.0
Also accept list of columns names.

values : str, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not
specified, all remaining columns will be used and the result will
Expand Down Expand Up @@ -5967,6 +5975,38 @@ def groupby(
one 1 2 3 x y z
two 4 5 6 q w t

You could also assign a list of column names or a list of index names.

>>> df = pd.DataFrame({
... "lev1": [1, 1, 1, 2, 2, 2],
... "lev2": [1, 1, 2, 1, 1, 2],
... "lev3": [1, 2, 1, 2, 1, 2],
... "lev4": [1, 2, 3, 4, 5, 6],
... "values": [0, 1, 2, 3, 4, 5]})
>>> df
lev1 lev2 lev3 lev4 values
0 1 1 1 1 0
1 1 1 2 2 1
2 1 2 1 3 2
3 2 1 2 4 3
4 2 1 1 5 4
5 2 2 2 6 5

>>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
lev2 1 2
lev3 1 2 1 2
lev1
1 0.0 1.0 2.0 NaN
2 4.0 3.0 NaN 5.0

>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
lev3 1 2
lev1 lev2
1 1 0.0 1.0
2 2.0 NaN
2 1 4.0 3.0
2 NaN 5.0

A ValueError is raised if there are any duplicates.

>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
Expand Down
22 changes: 18 additions & 4 deletions pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,17 +425,31 @@ def _convert_by(by):
def pivot(data: "DataFrame", index=None, columns=None, values=None) -> "DataFrame":
if columns is None:
raise TypeError("pivot() missing 1 required argument: 'columns'")
columns = columns if is_list_like(columns) else [columns]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asa followon can you try to type tings in the signature


if values is None:
cols = [columns] if index is None else [index, columns]
cols: List[str] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want List[Label] (import from pandas._typing) to allow anything column name, unless this really does have to be string-only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this will matter until we type the signature, so ok with handling this as a followon.

if index is None:
pass
elif is_list_like(index):
cols = list(index)
else:
cols = [index]
cols.extend(columns)

append = index is None
indexed = data.set_index(cols, append=append)
else:
if index is None:
index = data.index
index = [Series(data.index, name=data.index.name)]
elif is_list_like(index):
index = [data[idx] for idx in index]
else:
index = data[index]
index = MultiIndex.from_arrays([index, data[columns]])
index = [data[index]]

data_columns = [data[col] for col in columns]
index.extend(data_columns)
index = MultiIndex.from_arrays(index)

if is_list_like(values) and not isinstance(values, tuple):
# Exclude tuple because it is seen as a single column name
Expand Down
192 changes: 192 additions & 0 deletions pandas/tests/reshape/test_pivot_multilevel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
import numpy as np
import pytest

import pandas as pd
from pandas import Index, MultiIndex
import pandas._testing as tm


@pytest.mark.parametrize(
"input_index, input_columns, input_values, "
"expected_values, expected_columns, expected_index",
[
(
["lev4"],
"lev3",
"values",
[
[0.0, np.nan],
[np.nan, 1.0],
[2.0, np.nan],
[np.nan, 3.0],
[4.0, np.nan],
[np.nan, 5.0],
[6.0, np.nan],
[np.nan, 7.0],
],
Index([1, 2], name="lev3"),
Index([1, 2, 3, 4, 5, 6, 7, 8], name="lev4"),
),
(
["lev4"],
"lev3",
None,
[
[1.0, np.nan, 1.0, np.nan, 0.0, np.nan],
[np.nan, 1.0, np.nan, 1.0, np.nan, 1.0],
[1.0, np.nan, 2.0, np.nan, 2.0, np.nan],
[np.nan, 1.0, np.nan, 2.0, np.nan, 3.0],
[2.0, np.nan, 1.0, np.nan, 4.0, np.nan],
[np.nan, 2.0, np.nan, 1.0, np.nan, 5.0],
[2.0, np.nan, 2.0, np.nan, 6.0, np.nan],
[np.nan, 2.0, np.nan, 2.0, np.nan, 7.0],
],
MultiIndex.from_tuples(
[
("lev1", 1),
("lev1", 2),
("lev2", 1),
("lev2", 2),
("values", 1),
("values", 2),
],
names=[None, "lev3"],
),
Index([1, 2, 3, 4, 5, 6, 7, 8], name="lev4"),
),
(
["lev1", "lev2"],
"lev3",
"values",
[[0, 1], [2, 3], [4, 5], [6, 7]],
Index([1, 2], name="lev3"),
MultiIndex.from_tuples(
[(1, 1), (1, 2), (2, 1), (2, 2)], names=["lev1", "lev2"]
),
),
(
["lev1", "lev2"],
"lev3",
None,
[[1, 2, 0, 1], [3, 4, 2, 3], [5, 6, 4, 5], [7, 8, 6, 7]],
MultiIndex.from_tuples(
[("lev4", 1), ("lev4", 2), ("values", 1), ("values", 2)],
names=[None, "lev3"],
),
MultiIndex.from_tuples(
[(1, 1), (1, 2), (2, 1), (2, 2)], names=["lev1", "lev2"]
),
),
],
)
def test_pivot_list_like_index(
input_index,
input_columns,
input_values,
expected_values,
expected_columns,
expected_index,
):
# GH 21425, test when index is given a list
df = pd.DataFrame(
{
"lev1": [1, 1, 1, 1, 2, 2, 2, 2],
"lev2": [1, 1, 2, 2, 1, 1, 2, 2],
"lev3": [1, 2, 1, 2, 1, 2, 1, 2],
"lev4": [1, 2, 3, 4, 5, 6, 7, 8],
"values": [0, 1, 2, 3, 4, 5, 6, 7],
}
)

result = df.pivot(index=input_index, columns=input_columns, values=input_values)
expected = pd.DataFrame(
expected_values, columns=expected_columns, index=expected_index
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"input_index, input_columns, input_values, "
"expected_values, expected_columns, expected_index",
[
(
"lev4",
["lev3"],
"values",
[
[0.0, np.nan],
[np.nan, 1.0],
[2.0, np.nan],
[np.nan, 3.0],
[4.0, np.nan],
[np.nan, 5.0],
[6.0, np.nan],
[np.nan, 7.0],
],
Index([1, 2], name="lev3"),
Index([1, 2, 3, 4, 5, 6, 7, 8], name="lev4"),
),
(
["lev1", "lev2"],
["lev3"],
"values",
[[0, 1], [2, 3], [4, 5], [6, 7]],
Index([1, 2], name="lev3"),
MultiIndex.from_tuples(
[(1, 1), (1, 2), (2, 1), (2, 2)], names=["lev1", "lev2"]
),
),
(
["lev1"],
["lev2", "lev3"],
"values",
[[0, 1, 2, 3], [4, 5, 6, 7]],
MultiIndex.from_tuples(
[(1, 1), (1, 2), (2, 1), (2, 2)], names=["lev2", "lev3"]
),
Index([1, 2], name="lev1"),
),
(
["lev1", "lev2"],
["lev3", "lev4"],
"values",
[
[0.0, 1.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, 2.0, 3.0, np.nan, np.nan, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, 4.0, 5.0, np.nan, np.nan],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 6.0, 7.0],
],
MultiIndex.from_tuples(
[(1, 1), (2, 2), (1, 3), (2, 4), (1, 5), (2, 6), (1, 7), (2, 8)],
names=["lev3", "lev4"],
),
MultiIndex.from_tuples(
[(1, 1), (1, 2), (2, 1), (2, 2)], names=["lev1", "lev2"]
),
),
],
)
def test_pivot_list_like_columns(
input_index,
input_columns,
input_values,
expected_values,
expected_columns,
expected_index,
):
# GH 21425, test when columns is given a list
df = pd.DataFrame(
{
"lev1": [1, 1, 1, 1, 2, 2, 2, 2],
"lev2": [1, 1, 2, 2, 1, 1, 2, 2],
"lev3": [1, 2, 1, 2, 1, 2, 1, 2],
"lev4": [1, 2, 3, 4, 5, 6, 7, 8],
"values": [0, 1, 2, 3, 4, 5, 6, 7],
}
)

result = df.pivot(index=input_index, columns=input_columns, values=input_values)
expected = pd.DataFrame(
expected_values, columns=expected_columns, index=expected_index
)
tm.assert_frame_equal(result, expected)