-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.crosstab not working when margin and normalize are set together #27663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/reshape/pivot.py
Outdated
index_margin = table.loc[margins_name, :].drop(margins_name) | ||
# separate cases between multiindex and index | ||
if isinstance(table_index, MultiIndex): | ||
index_margin = table.loc[margins_name, :].drop(margins_name, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not always passing axis= here? I don't like the need for this if/then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i figured out a way to walk it around, and also avoid MI problem, seems even test speed is faster ^^ @jreback
pandas/core/reshape/pivot.py
Outdated
index_margin = index_margin / index_margin.sum() | ||
# index_margin is a dataframe, and use a hacky way: sum(axis=1)[0] | ||
# to get the normalized result, and use sum() instead for series | ||
if isinstance(index_margin, ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to distinguish MI here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, after i changed the code above, this issue can be solve as well. And i removed this if/then. thank you for pointing it out @jreback
ok this looks good, can you merge master and fix the conflic. ping on green. |
Thanks for your follow-up review @jreback !! I merged the master and resolved the conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. but a comment
pandas/core/reshape/pivot.py
Outdated
# to keep index and columns names | ||
table_index_names = table.index.names | ||
table_columns_names = table.columns.names | ||
# save the column and index margin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if margins is True, then we are guaranteed to have the margins_name be the last row / column? can you add an assert to this, that the last row.name / col.name == margins_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, since this normalization will take one step further from the output of pivot_table
function, and in this function, if margin is set to True
, then there will be a new column/index
added to the end ('All' or 'New_Margin_Name'
). But you are right, it's better to add an assertion to this, will do later today! @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, this just makes it clear to a future reader, otherwise lgtm. ping on green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added assertion, pls feel free to take a look @jreback
Yeah, this time it passes all checks. @jreback |
…alize are set together (#27783)
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff