BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

sfc-gh-mvashishtha · 2025-01-22T23:29:08Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


df = pd.DataFrame(
    {"A": [1, 1, 2, 2, 2], "B": [3, 3, 4, 4, 4], "C": [1, 1, 1, 1, 1]},
    dtype=float    
)

assert (
    pd.crosstab(index=df["A"], columns=df["B"], values=df["C"], aggfunc="skew", dropna=True).shape == 
    pd.crosstab(index=df["A"], columns=df["B"], values=df["C"], aggfunc="skew", dropna=False).shape
)

Issue Description

When I provide dropna=True to crosstab and specify a values array to aggregate, pandas seems to drop rows and columns where all the results are NaN. In this case pandas skew gives NaN if there are fewer than 3 values in a group, so only the value for index=2, column=4 has a non-NaN value. That leaves a row of NaN and a column of NaN, so pandas drops both the all-NaN row and the all-NaN column.

Expected Behavior

The documentation says that dropna=True means, "Do not include columns whose entries are all NaN." I expect pandas to follow that rule and to keep values where the aggregation result is NaN. In the example above, I expect to see 2 rows for the 2 possible values of A, 1 and 2, an 2 columns for the 2 possible values of B, 3 and 4.

Installed Versions

INSTALLED VERSIONS
------------------
commit                : 0691c5cf90477d3503834d983f69350f250a6ff7
python                : 3.9.21
python-bits           : 64
OS                    : Darwin
OS-release            : 24.2.0
Version               : Darwin Kernel Version 24.2.0: Fri Dec  6 18:56:34 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6020
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.3
numpy                 : 2.0.2
pytz                  : 2024.2
dateutil              : 2.8.2
pip                   : 24.2
Cython                : None
sphinx                : None
IPython               : 8.12.0
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
blosc                 : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
tzdata                : 2025.1
qtpy                  : None
pyqt5                 : None

The text was updated successfully, but these errors were encountered:

rhshadrach · 2025-01-25T02:24:52Z

Thanks for the report.

In the example above, I expect to see 2 rows for the 2 possible values of A, 1 and 2, an 2 columns for the 2 possible values of B, 3 and 4.

When dropna=False, there is a column of all NaN values. Why should this not be dropped?

In regards to the dropping of the row, I expect this has some overlap with #53521, but haven't checked this.

sfc-gh-mvashishtha added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 22, 2025

sfc-gh-mvashishtha mentioned this issue Jan 23, 2025

BUG: crosstab(aggfunc='skew') raises IndexError: single positional indexer is out-of-bounds #60768

Open

3 tasks

rhshadrach added Needs Discussion Requires discussion from core team before further action Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2025

Pranav970 mentioned this issue Mar 4, 2025

Create bug.py Pranav970/pandas#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

sfc-gh-mvashishtha commented Jan 22, 2025 •

edited

Loading

rhshadrach commented Jan 25, 2025

BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

BUG: pd.crosstab(dropna=True, values=) drops rows and columns where any aggregation result is null. #60767

Comments

sfc-gh-mvashishtha commented Jan 22, 2025 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

rhshadrach commented Jan 25, 2025

sfc-gh-mvashishtha commented Jan 22, 2025 •

edited

Loading