-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: df.pivot_table fails when margin is True and only columns is defined #31088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e461a1
1314059
8bcb313
24c3ede
dea38f2
cd9e7ac
e5e912b
f01d61e
98472ba
e13c45b
8fbee87
fea8db8
258aacd
2fc6622
436b7a2
2145d13
d1f184d
fe9e8bc
c01d714
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -226,15 +226,7 @@ def _add_margins( | |
|
||
elif values: | ||
marginal_result_set = _generate_marginal_results( | ||
table, | ||
data, | ||
values, | ||
rows, | ||
cols, | ||
aggfunc, | ||
observed, | ||
grand_margin, | ||
margins_name, | ||
table, data, values, rows, cols, aggfunc, observed, margins_name, | ||
) | ||
if not isinstance(marginal_result_set, tuple): | ||
return marginal_result_set | ||
|
@@ -303,15 +295,7 @@ def _compute_grand_margin(data, values, aggfunc, margins_name: str = "All"): | |
|
||
|
||
def _generate_marginal_results( | ||
table, | ||
data, | ||
values, | ||
rows, | ||
cols, | ||
aggfunc, | ||
observed, | ||
grand_margin, | ||
margins_name: str = "All", | ||
table, data, values, rows, cols, aggfunc, observed, margins_name: str = "All", | ||
): | ||
if len(cols) > 0: | ||
# need to "interleave" the margins | ||
|
@@ -345,12 +329,22 @@ def _all_key(key): | |
table_pieces.append(piece) | ||
margin_keys.append(all_key) | ||
else: | ||
margin = grand_margin | ||
from pandas import DataFrame | ||
|
||
cat_axis = 0 | ||
for key, piece in table.groupby(level=0, axis=cat_axis, observed=observed): | ||
all_key = _all_key(key) | ||
if len(cols) > 1: | ||
all_key = _all_key(key) | ||
else: | ||
all_key = margins_name | ||
table_pieces.append(piece) | ||
table_pieces.append(Series(margin[key], index=[all_key])) | ||
# GH31016 this is to calculate margin for each group, and assign | ||
# corresponded key as index | ||
transformed_piece = DataFrame(piece.apply(aggfunc)).T | ||
transformed_piece.index = Index([all_key], name=piece.index.name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @charlesdong1991 what if piece.index is a MultiIndex? .name here will be None. Do we need to use .names somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are right, it will be |
||
|
||
# append piece for margin into table_piece | ||
table_pieces.append(transformed_piece) | ||
margin_keys.append(all_key) | ||
|
||
result = concat(table_pieces, axis=cat_axis) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any performance impacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be? i was thinking of it as well
how to measure it? do you mean create a giant mock dataset and time it? any suggestions? would like to test it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have asvs in benchmarks/reshape.py that would be good to run here at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! @WillAyd i am not very familiar with asv bench, but i see some tests starting with
time_pivot****
, shall I add a new test in there?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run the existing ones first:
cd asv_bench asv continuous upstream/master HEAD -b reshape.PivotTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just copied and pasted and tested a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! thanks for the tip!! did not see it 😅 will run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running asv on current i get:
BENCHMARKS NOT SIGNIFICANTLY CHANGED.
but i do find an issue with self-review, will investigate and fix tomorrow. thanks for the help on asv @WillAyd