Skip to content

Commit 20d5b1c

Browse files
authored
BUG: pivot_table with overlapping values (#61293)
* modified: pandas/tests/reshape/test_pivot_multilevel.py - Added two tests, :func:`test_pivot_table_values_in_columns` and :func:`test_pivot_table_values_in_index`, to ensure that the `values` param is still used when the argument is shared between the `columns` and `values` params, and `index` and `values` params. * modified: pandas/core/reshape/pivot.py - Added condition to :func:`__internal_pivot_table` to aggregate `values` explicitly if `values` were passed, otherwise aggregate all remaining columns. This allows the tests :func:`test_pivot_table_values_in_columns` and :func:`test_pivot_table_values_in_index` in test_pivot_multilevel.py to pass. * modified: pandas/tests/reshape/test_pivot.py - Added test :func:`test_pivot_table_values_as_two_params` to test that the updates in pivot.py result in expected results, satisfying GH issue #57876. * modified: pandas/tests/reshape/test_pivot.py - Added GH issue comment to test :func:`test_pivot_table_values_as_two_params`. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Combined tests :func:`test_pivot_table_values_in_columns` and :func:`test_pivot_table_values_in_index` into a single parametrized test, :func:`test_pivot_table_multiindex_values_as_two_params` to reduce duplicate setup code. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Added GH issue #61292 as comment to test :func:`test_pivot_table_multiindex_values_as_two_params`. * modified: pandas/core/reshape/pivot.py - Simplified proposed logic in :func:`__internal_pivot_table`. * modified: pandas/core/reshape/pivot.py - Added GH issue numbers to new logic in :func:`__internal_pivot_table`. * modified: pandas/core/reshape/pivot.py - Added ignore-comment to silence mypy error in :func:`__internal_pivot_table`. - Added TODO-comment stating that the :meth:`DataFrameGroupBy.__getitem__` should be overloaded to match the pandas-stubs type declarations, informing mypy that the type is correct given `values` is a list. * modified: doc/source/whatsnew/v3.0.0.rst - Added pivot_table bug to Bugs/Reshaping section referencing issues #57876 and #61292. * modified: pandas/core/reshape/pivot.py - Moved and simplified mypy comment per feedback. * modified: pandas/core/reshape/pivot.py - Removed comment about explicit aggregation per feedback. * modified: pandas/tests/reshape/test_pivot.py - Removed param names and updated `argnames` arg per feedback in parametrized marker. * modified: pandas/tests/reshape/test_pivot.py - Removed param names in favor of implicit args per feedback. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Removed param names and updated arg for `argnames` in parametrized marker per feedback. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Reduced `expected` assignments from two to one per feedback. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Removed param names in favor of implicit args per feedback. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Moved e_data, e_index, and e_cols to parametrized marker instead of declaring inside the test :func:`test_pivot_table_multiindex_values_as_two_params`. * modified: pandas/tests/reshape/test_pivot.py - Moved `expected` setup to parametrized marker instead of in the test :meth:`TestPivotTable.test_pivot_table_values_as_two_params`. * modified: pandas/tests/reshape/test_pivot.py - Removed walrus operator declarations in parametrized marker for test :meth:TestPivotTable.test_pivot_table_values_as_two_params`. Appears related to this mypy issue -> python/mypy#17377. * modified: pandas/tests/reshape/test_pivot_multilevel.py - Removed walrus operator declarations as I'm sure mypy would raise an issue with it given that it did in test_pivot.py (see commit 52cf560).
1 parent 1ac8552 commit 20d5b1c

File tree

4 files changed

+95
-0
lines changed

4 files changed

+95
-0
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -835,6 +835,7 @@ Reshaping
835835
- Bug in :meth:`DataFrame.unstack` producing incorrect results when ``sort=False`` (:issue:`54987`, :issue:`55516`)
836836
- Bug in :meth:`DataFrame.merge` when merging two :class:`DataFrame` on ``intc`` or ``uintc`` types on Windows (:issue:`60091`, :issue:`58713`)
837837
- Bug in :meth:`DataFrame.pivot_table` incorrectly subaggregating results when called without an ``index`` argument (:issue:`58722`)
838+
- Bug in :meth:`DataFrame.pivot_table` incorrectly ignoring the ``values`` argument when also supplied to the ``index`` or ``columns`` parameters (:issue:`57876`, :issue:`61292`)
838839
- Bug in :meth:`DataFrame.stack` with the new implementation where ``ValueError`` is raised when ``level=[]`` (:issue:`60740`)
839840
- Bug in :meth:`DataFrame.unstack` producing incorrect results when manipulating empty :class:`DataFrame` with an :class:`ExtentionDtype` (:issue:`59123`)
840841
- Bug in :meth:`concat` where concatenating DataFrame and Series with ``ignore_index = True`` drops the series name (:issue:`60723`, :issue:`56257`)

pandas/core/reshape/pivot.py

+5
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,11 @@ def __internal_pivot_table(
336336
values = list(values)
337337

338338
grouped = data.groupby(keys, observed=observed, sort=sort, dropna=dropna)
339+
if values_passed:
340+
# GH#57876 and GH#61292
341+
# mypy is not aware `grouped[values]` will always be a DataFrameGroupBy
342+
grouped = grouped[values] # type: ignore[assignment]
343+
339344
agged = grouped.agg(aggfunc, **kwargs)
340345

341346
if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):

pandas/tests/reshape/test_pivot.py

+40
Original file line numberDiff line numberDiff line change
@@ -2554,6 +2554,46 @@ def test_pivot_table_index_and_column_keys_with_nan(self, dropna):
25542554

25552555
tm.assert_frame_equal(left=result, right=expected)
25562556

2557+
@pytest.mark.parametrize(
2558+
"index, columns, e_data, e_index, e_cols",
2559+
[
2560+
(
2561+
"Category",
2562+
"Value",
2563+
[
2564+
[1.0, np.nan, 1.0, np.nan],
2565+
[np.nan, 1.0, np.nan, 1.0],
2566+
],
2567+
Index(data=["A", "B"], name="Category"),
2568+
Index(data=[10, 20, 40, 50], name="Value"),
2569+
),
2570+
(
2571+
"Value",
2572+
"Category",
2573+
[
2574+
[1.0, np.nan],
2575+
[np.nan, 1.0],
2576+
[1.0, np.nan],
2577+
[np.nan, 1.0],
2578+
],
2579+
Index(data=[10, 20, 40, 50], name="Value"),
2580+
Index(data=["A", "B"], name="Category"),
2581+
),
2582+
],
2583+
ids=["values-and-columns", "values-and-index"],
2584+
)
2585+
def test_pivot_table_values_as_two_params(
2586+
self, index, columns, e_data, e_index, e_cols
2587+
):
2588+
# GH#57876
2589+
data = {"Category": ["A", "B", "A", "B"], "Value": [10, 20, 40, 50]}
2590+
df = DataFrame(data)
2591+
result = df.pivot_table(
2592+
index=index, columns=columns, values="Value", aggfunc="count"
2593+
)
2594+
expected = DataFrame(data=e_data, index=e_index, columns=e_cols)
2595+
tm.assert_frame_equal(result, expected)
2596+
25572597

25582598
class TestPivot:
25592599
def test_pivot(self):

pandas/tests/reshape/test_pivot_multilevel.py

+49
Original file line numberDiff line numberDiff line change
@@ -250,3 +250,52 @@ def test_pivot_df_multiindex_index_none():
250250
columns=Index(["label1", "label2"], name="label"),
251251
)
252252
tm.assert_frame_equal(result, expected)
253+
254+
255+
@pytest.mark.parametrize(
256+
"index, columns, e_data, e_index, e_cols",
257+
[
258+
(
259+
"index",
260+
["col", "value"],
261+
[
262+
[50.0, np.nan, 100.0, np.nan],
263+
[np.nan, 100.0, np.nan, 200.0],
264+
],
265+
Index(data=["A", "B"], name="index"),
266+
MultiIndex.from_arrays(
267+
arrays=[[1, 1, 2, 2], [50, 100, 100, 200]], names=["col", "value"]
268+
),
269+
),
270+
(
271+
["index", "value"],
272+
"col",
273+
[
274+
[50.0, np.nan],
275+
[np.nan, 100.0],
276+
[100.0, np.nan],
277+
[np.nan, 200.0],
278+
],
279+
MultiIndex.from_arrays(
280+
arrays=[["A", "A", "B", "B"], [50, 100, 100, 200]],
281+
names=["index", "value"],
282+
),
283+
Index(data=[1, 2], name="col"),
284+
),
285+
],
286+
ids=["values-and-columns", "values-and-index"],
287+
)
288+
def test_pivot_table_multiindex_values_as_two_params(
289+
index, columns, e_data, e_index, e_cols
290+
):
291+
# GH#61292
292+
data = [
293+
["A", 1, 50, -1],
294+
["B", 1, 100, -2],
295+
["A", 2, 100, -2],
296+
["B", 2, 200, -4],
297+
]
298+
df = pd.DataFrame(data=data, columns=["index", "col", "value", "extra"])
299+
result = df.pivot_table(values="value", index=index, columns=columns)
300+
expected = pd.DataFrame(data=e_data, index=e_index, columns=e_cols)
301+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)