-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Crosstab bug in #18321 #19326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Crosstab bug in #18321 #19326
Changes from 7 commits
6140d5a
6bcb2d7
c1e87e6
af90130
8f4380f
451792f
30da7f4
0970999
52072ae
ba9006d
cc0b9ba
ae55326
a914b15
81dd6c2
c0f1d29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -455,17 +455,23 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None, | |
|
||
from pandas import DataFrame | ||
df = DataFrame(data, index=common_idx) | ||
common_cols_idx = df.columns | ||
|
||
if values is None: | ||
df['__dummy__'] = 0 | ||
kwargs = {'aggfunc': len, 'fill_value': 0} | ||
else: | ||
df['__dummy__'] = values | ||
kwargs = {'aggfunc': aggfunc} | ||
|
||
table = df.pivot_table('__dummy__', index=rownames, columns=colnames, | ||
table = df.pivot_table(['__dummy__'], index=rownames, columns=colnames, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Passing array so that |
||
margins=margins, margins_name=margins_name, | ||
dropna=dropna, **kwargs) | ||
|
||
if not table.empty: | ||
added_cols_idx = df.columns.difference(common_cols_idx).values[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this guaranteed to be len 1? use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Guaranteed length 1 since |
||
table = table[added_cols_idx] | ||
|
||
# Post-process | ||
if normalize is not False: | ||
table = _normalize(table, normalize=normalize, margins=margins, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1628,7 +1628,8 @@ def test_crosstab_dup_index_names(self): | |
pytest.raises(ValueError, pd.crosstab, s, s) | ||
|
||
@pytest.mark.parametrize("names", [['a', ('b', 'c')], | ||
[('a', 'b'), 'c']]) | ||
[('a', 'b'), 'c'], | ||
[('a', 'b'), ('c', 'd')]]) | ||
def test_crosstab_tuple_name(self, names): | ||
s1 = pd.Series(range(3), name=names[0]) | ||
s2 = pd.Series(range(1, 4), name=names[1]) | ||
|
@@ -1638,3 +1639,20 @@ def test_crosstab_tuple_name(self, names): | |
|
||
result = pd.crosstab(s1, s2) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("names", [['a', 'b'], | ||
[('a', 'b'), 'c'], | ||
[('a', 'b'), ('c', 'd')], | ||
[(1, 2, 3), ('a', 'b', 'c')]]) | ||
def test_crosstab_cols_output(self, names): | ||
s1 = pd.Series(range(3), name=names[0]) | ||
s2 = pd.Series(range(1, 4), name=names[1]) | ||
result = pd.crosstab(s1, s2) | ||
result_col_list = list(result.columns) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like a more explicit test here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Will update. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these need full comparisons on the result, IOW construct the expected frame and compare. o need to just compare the columns. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. To be updated. |
||
assert result_col_list == [1, 2, 3] | ||
|
||
s1 = pd.Series(range(0), name=names[0]) | ||
s2 = pd.Series(range(1, 4), name=names[1]) | ||
result = pd.crosstab(s1, s2) | ||
result_col_list = list(result.columns) | ||
assert result_col_list == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this more informative. from a user perspective the bug was that you had an incorrectly named output index when you were crosstabbing 2 series with tuples names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted.