Skip to content

BUG: Crosstab bug in #18321 #19326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -509,6 +509,7 @@ Reshaping
- Bug in :func:`DataFrame.merge` in which merging using ``Index`` objects as vectors raised an Exception (:issue:`19038`)
- Bug in :func:`DataFrame.stack`, :func:`DataFrame.unstack`, :func:`Series.unstack` which were not returning subclasses (:issue:`15563`)
- Bug in timezone comparisons, manifesting as a conversion of the index to UTC in ``.concat()`` (:issue:`18523`)
- Bug in :func:`crosstab` where the added column is removed incorrectly (:issue:`18321`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this more informative. from a user perspective the bug was that you had an incorrectly named output index when you were crosstabbing 2 series with tuples names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted.

-

Numeric
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/reshape/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,17 +455,22 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

from pandas import DataFrame
df = DataFrame(data, index=common_idx)
original_columns_set = set(df.columns.values)
if values is None:
df['__dummy__'] = 0
kwargs = {'aggfunc': len, 'fill_value': 0}
else:
df['__dummy__'] = values
kwargs = {'aggfunc': aggfunc}
added_column = list(set(df.columns.values) - original_columns_set)[0]

table = df.pivot_table('__dummy__', index=rownames, columns=colnames,
table = df.pivot_table(['__dummy__'], index=rownames, columns=colnames,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing array so that pivot_table won't try to remove the extra dummy layers

margins=margins, margins_name=margins_name,
dropna=dropna, **kwargs)

if not table.empty:
table = table[added_column]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you don't need all of the computation of the added_column to be above, just here.

also, just use the Index operations themselves.

df.columns.difference(common_columns)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I'll look into this operation


# Post-process
if normalize is not False:
table = _normalize(table, normalize=normalize, margins=margins,
Expand Down
6 changes: 5 additions & 1 deletion pandas/tests/reshape/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -1628,7 +1628,8 @@ def test_crosstab_dup_index_names(self):
pytest.raises(ValueError, pd.crosstab, s, s)

@pytest.mark.parametrize("names", [['a', ('b', 'c')],
[('a', 'b'), 'c']])
[('a', 'b'), 'c'],
[('a', 'b'), ('c', 'd')]])
def test_crosstab_tuple_name(self, names):
s1 = pd.Series(range(3), name=names[0])
s2 = pd.Series(range(1, 4), name=names[1])
Expand All @@ -1638,3 +1639,6 @@ def test_crosstab_tuple_name(self, names):

result = pd.crosstab(s1, s2)
tm.assert_frame_equal(result, expected)

result_col_list = list(result.columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a more explicit test here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need full comparisons on the result, IOW construct the expected frame and compare. o need to just compare the columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. To be updated.

assert result_col_list == [1, 2, 3]