Skip to content

crosstab shows margins in wrong order if index/columns are ordered categories #25278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oriolmirosa opened this issue Feb 12, 2019 · 6 comments · Fixed by #52645
Closed

crosstab shows margins in wrong order if index/columns are ordered categories #25278

oriolmirosa opened this issue Feb 12, 2019 · 6 comments · Fixed by #52645
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@oriolmirosa
Copy link

Code Sample

Do a crosstab with an ordered categorical variable, with the order being different than the alphabetic default:

df = pd.DataFrame({'First': ['B', 'B', 'C', 'A', 'B', 'C'], 
                   'Second': ['C', 'B', 'B', 'B', 'C', 'A']})

df.First = df.First.astype(CategoricalDtype(ordered=True))

# Default order is alphabetic
df.First.cat.categories

>> Index(['A', 'B', 'C'], dtype='object')

# Change the order
df.First = df.First.cat.reorder_categories(['C', 'A', 'B'])
df.First.cat.categories

>> Index(['C', 'A', 'B'], dtype='object')

# Do a simple crosstab with margins
pd.crosstab(df.First, df.Second, margins=True)

>> Second   A   B   C   All
>> First 
>> C        1   1   0   1
>> A        0   1   0   3
>> B        0   1   2   2
>> All      1   3   2   6

Problem description

The margins are showing the wrong values. In fact, they're showing the values that would be expected from the default ordering of the categories. This might be related to #20496, but the output here surely looks like a bug

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-1032-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.29.4
numpy: 1.16.1
scipy: 1.2.0
pyarrow: 0.12.0
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: 0.9.0
pandas_datareader: None
gcsfs: None

@WillAyd
Copy link
Member

WillAyd commented Feb 14, 2019

Thanks for the report. Investigation and PRs are always welcome

@WillAyd WillAyd added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 14, 2019
@hwalinga
Copy link
Contributor

hwalinga commented May 1, 2020

I can confirm this is now fixed on the master.

>>> pd.crosstab(df.First, df.Second, margins=True)
Second  A  B  C  All
First               
C       1  1  0    2
A       0  1  0    1
B       0  1  2    3
All     1  3  2    6

So, this can be closed.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jun 26, 2021
@aidoskanapyanov
Copy link
Contributor

Is this resolved or tests are still needed?

@aidoskanapyanov
Copy link
Contributor

take

@aidoskanapyanov
Copy link
Contributor

aidoskanapyanov commented Apr 13, 2023

I can confirm this is now fixed on the master.

>>> pd.crosstab(df.First, df.Second, margins=True)
Second  A  B  C  All
First               
C       1  1  0    2
A       0  1  0    1
B       0  1  2    3
All     1  3  2    6

So, this can be closed.

Hi @hwalinga! I've added a PR #52645 with the unit test for this example. It passes on the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@WillAyd @oriolmirosa @mroeschke @hwalinga @aidoskanapyanov and others