BUG: df.pivot_table fails when margin is True and only columns is defined #31088

charlesdong1991 · 2020-01-16T21:31:59Z

closes BUG: pivot_table with multi-index columns only and margins=True gives wrong output or fails #31016
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2020-01-16T21:38:59Z

pandas/tests/reshape/test_pivot.py

+                ),
+            ),
+            (
+                ["A", "B", "C"],


Does this test anything different from the test preceding it? It not I think can be removed

thanks for your quick response! @WillAyd

emm, this shows different number of levels for columns, is this a bit redendunt? if so, i will remove

I think can be removed; the one preceding already tests for a multi index

WillAyd · 2020-01-16T21:43:02Z

pandas/core/reshape/pivot.py

-                table_pieces.append(Series(margin[key], index=[all_key]))
+                # GH31016 this is to calculate margin for each group, and assign
+                # corresponded key as index
+                transformed_piece = DataFrame(piece.apply(aggfunc)).T


Does this have any performance impacts?

might be? i was thinking of it as well

how to measure it? do you mean create a giant mock dataset and time it? any suggestions? would like to test it out!

We have asvs in benchmarks/reshape.py that would be good to run here at least

thanks! @WillAyd i am not very familiar with asv bench, but i see some tests starting with time_pivot****, shall I add a new test in there?

Run the existing ones first:

cd asv_bench asv continuous upstream/master HEAD -b reshape.PivotTable

just copied and pasted and tested a bit

Run the existing ones first:

cd asv_bench asv continuous upstream/master HEAD -b reshape.PivotTable

oops! thanks for the tip!! did not see it 😅 will run

running asv on current i get: BENCHMARKS NOT SIGNIFICANTLY CHANGED.

but i do find an issue with self-review, will investigate and fix tomorrow. thanks for the help on asv @WillAyd

charlesdong1991 · 2020-01-17T07:59:25Z

Hi, @WillAyd , i think the PR is ok to review now. Please let me know how you plan to do the asv bechmark for this piece, shall I add a test in asv_bench for this one? (as mentioned above, running asv on the current locally is good)

jreback · 2020-01-18T17:18:11Z

looks good, can you rebase. adding an asv would be nice as well.

pep8speaks · 2020-01-18T18:17:24Z

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-18 18:21:01 UTC

charlesdong1991 · 2020-01-18T18:40:15Z

emm, running asv locally with new tests added, this manipulation does perform much slower than other tests, and the second trial even failed. Any idea/How to interpret why the second one failed? is it because it is too slow and then is above the threshold? @jreback @WillAyd

I am pretty new to asv, and some feedbacks would be highly appreciated! Thanks!

WillAyd · 2020-01-18T18:58:31Z

@charlesdong1991 if you can post the actual shell output instead of a screenshot typically better. That said it doesn’t look like things changed that much.

Not sure why the second run failed but you can retry with —show-stderr option

charlesdong1991 · 2020-01-18T19:20:23Z

[ 29.17%] ··· Running (reshape.PivotTable.time_pivot_table--)......
[ 50.00%] · For pandas commit c01d7142 <fix_issue_31016> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.7-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· reshape.PivotTable.time_pivot_table                                                                        25.8±0.5ms
[ 58.33%] ··· reshape.PivotTable.time_pivot_table_agg                                                                      49.5±2ms
[ 62.50%] ··· reshape.PivotTable.time_pivot_table_categorical                                                              19.5±5ms
[ 66.67%] ··· reshape.PivotTable.time_pivot_table_categorical_observed                                                   11.5±0.7ms
[ 70.83%] ··· reshape.PivotTable.time_pivot_table_margins                                                                  94.7±6ms
[ 75.00%] ··· reshape.PivotTable.time_pivot_table_margins_only_column                                                      271±40ms
[ 75.00%] · For pandas commit d170cc05 <multi_pivot^2> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.7-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking virtualenv-py3.7-Cython-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 79.17%] ··· reshape.PivotTable.time_pivot_table                                                                          26.3±2ms
[ 83.33%] ··· reshape.PivotTable.time_pivot_table_agg                                                                      52.3±2ms
[ 87.50%] ··· reshape.PivotTable.time_pivot_table_categorical                                                              20.7±3ms
[ 91.67%] ··· reshape.PivotTable.time_pivot_table_categorical_observed                                                   11.5±0.6ms
[ 95.83%] ··· reshape.PivotTable.time_pivot_table_margins                                                                  106±10ms
[100.00%] ··· reshape.PivotTable.time_pivot_table_margins_only_column                                                        failed

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

sorry, is this what you mean by posting actual output? @WillAyd

Yeah, thanks mate! And seems the first round is running the code in my current PR, and second one is using the code on master branch. --show-stderr option does provide some insight, and the reason is this code is not working on the current master, so it failed, see below for the result of our a new run with this option.

[100.00%] ···· Traceback (most recent call last):
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 1434, in <module>
                   main()
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 1427, in main
                   commands[mode](args)
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 1166, in main_run
                   result = benchmark.do_run()
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 573, in do_run
                   return self.run(*self._current_params)
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 672, in run
                   min_run_count=self.min_run_count)
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/asv/benchmark.py", line 704, in benchmark_timing
                   timing = timer.timeit(number)
                 File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/timeit.py", line 176, in timeit
                   timing = self.inner(it, self.timer)
                 File "<timeit-src>", line 6, in inner
                 File "/Users/cw1921/pandas-dev/asv_bench/benchmarks/reshape.py", line 165, in time_pivot_table_margins_only_column
                   self.df.pivot_table(columns=["key2", "key3"], margins=True)
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/pandas/core/frame.py", line 6104, in pivot_table
                   observed=observed,
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/pandas/core/reshape/pivot.py", line 167, in pivot_table
                   fill_value=fill_value,
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/pandas/core/reshape/pivot.py", line 237, in _add_margins
                   margins_name,
                 File "/Users/cw1921/pyvenv/pandas-dev/lib/python3.7/site-packages/pandas/core/reshape/pivot.py", line 353, in _generate_marginal_results
                   table_pieces.append(Series(margin[key], index=[all_key]))
               KeyError: 'one'
               asv: benchmark failed (exit status 1)

so i think we are good! @WillAyd @jreback not sure why the performance even gets slightly better 😅 but at least it's not getting worse, haha

charlesdong1991 · 2020-01-20T18:56:32Z

gentle ping @jreback @WillAyd any followup? ^^

jreback · 2020-01-20T19:43:00Z

lgtm. @WillAyd

WillAyd · 2020-01-20T20:08:18Z

Thanks @charlesdong1991

jbrockmendel · 2021-07-01T16:44:45Z

pandas/core/reshape/pivot.py

+                # GH31016 this is to calculate margin for each group, and assign
+                # corresponded key as index
+                transformed_piece = DataFrame(piece.apply(aggfunc)).T
+                transformed_piece.index = Index([all_key], name=piece.index.name)


@charlesdong1991 what if piece.index is a MultiIndex? .name here will be None. Do we need to use .names somewhere?

you are right, it will be None if it is a MultiIndex. Need to have a deeper look to see it is reasonable to use .names because if using it, then that's a frozen list iirc, and we could not assign to index.name. Also if a MI here, then maybe also need to use MultiIndex instead of Index.

charlesdong1991 added 10 commits December 3, 2018 17:43

remove \n from docstring

7e461a1

fix conflicts

1314059

Merge remote-tracking branch 'upstream/master'

8bcb313

Merge remote-tracking branch 'upstream/master'

24c3ede

fix issue 17038

dea38f2

revert change

cd9e7ac

revert change

e5e912b

Merge remote-tracking branch 'upstream/master' into fix_issue_31016

f01d61e

fix issue 31016

98472ba

add whatsnew

e13c45b

WillAyd requested changes Jan 16, 2020

View reviewed changes

WillAyd added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 16, 2020

charlesdong1991 added 4 commits January 16, 2020 22:49

remove redundent test

8fbee87

fix linting

fea8db8

fix import

258aacd

fix import issue

2fc6622

charlesdong1991 requested a review from WillAyd January 17, 2020 07:57

more robust

436b7a2

simonjayhawkins added the Bug label Jan 18, 2020

jreback added this to the 1.1 milestone Jan 18, 2020

charlesdong1991 added 2 commits January 18, 2020 19:13

rebase and resolve conflict

2145d13

add asv

d1f184d

charlesdong1991 added 2 commits January 18, 2020 19:18

fix pep8

fe9e8bc

fix pep8

c01d714

jreback approved these changes Jan 20, 2020

View reviewed changes

WillAyd approved these changes Jan 20, 2020

View reviewed changes

WillAyd merged commit f1aaf62 into pandas-dev:master Jan 20, 2020

jbrockmendel reviewed Jul 1, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: df.pivot_table fails when margin is True and only columns is defined #31088

BUG: df.pivot_table fails when margin is True and only columns is defined #31088

charlesdong1991 commented Jan 16, 2020 •

edited

Loading

WillAyd Jan 16, 2020

charlesdong1991 Jan 16, 2020

WillAyd Jan 16, 2020

charlesdong1991 Jan 16, 2020

WillAyd Jan 16, 2020

charlesdong1991 Jan 16, 2020 •

edited

Loading

WillAyd Jan 16, 2020

charlesdong1991 Jan 16, 2020

WillAyd Jan 16, 2020

charlesdong1991 Jan 16, 2020

charlesdong1991 Jan 16, 2020

charlesdong1991 Jan 16, 2020

charlesdong1991 commented Jan 17, 2020

jreback commented Jan 18, 2020

pep8speaks commented Jan 18, 2020 •

edited

Loading

charlesdong1991 commented Jan 18, 2020

WillAyd commented Jan 18, 2020

charlesdong1991 commented Jan 18, 2020 •

edited

Loading

charlesdong1991 commented Jan 20, 2020

jreback commented Jan 20, 2020

WillAyd commented Jan 20, 2020

jbrockmendel Jul 1, 2021

charlesdong1991 Jul 2, 2021 •

edited

Loading

BUG: df.pivot_table fails when margin is True and only columns is defined #31088

BUG: df.pivot_table fails when margin is True and only columns is defined #31088

Conversation

charlesdong1991 commented Jan 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesdong1991 Jan 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesdong1991 commented Jan 17, 2020

jreback commented Jan 18, 2020

pep8speaks commented Jan 18, 2020 • edited Loading

Comment last updated at 2020-01-18 18:21:01 UTC

charlesdong1991 commented Jan 18, 2020

WillAyd commented Jan 18, 2020

charlesdong1991 commented Jan 18, 2020 • edited Loading

charlesdong1991 commented Jan 20, 2020

jreback commented Jan 20, 2020

WillAyd commented Jan 20, 2020

Choose a reason for hiding this comment

charlesdong1991 Jul 2, 2021 • edited Loading

Choose a reason for hiding this comment

charlesdong1991 commented Jan 16, 2020 •

edited

Loading

charlesdong1991 Jan 16, 2020 •

edited

Loading

pep8speaks commented Jan 18, 2020 •

edited

Loading

charlesdong1991 commented Jan 18, 2020 •

edited

Loading

charlesdong1991 Jul 2, 2021 •

edited

Loading