Skip to content

BUG: Correct crosstab for categorical inputs #37468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 31, 2020

Conversation

bashtage
Copy link
Contributor

Change catch types to reflect error changes

closes #37465

@bashtage bashtage added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 28, 2020
@bashtage bashtage changed the title BUG: Correct crosstabl for categorical inputs BUG: Correct crosstab for categorical inputs Oct 28, 2020
@bashtage bashtage closed this Oct 28, 2020
@bashtage bashtage reopened this Oct 28, 2020
@bashtage
Copy link
Contributor Author

Retrigger CI

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback added this to the 1.1.4 milestone Oct 29, 2020
@jreback
Copy link
Contributor

jreback commented Oct 29, 2020

@bashtage if u can add a note to 1.1.4 can push this in

ping on greenish

@bashtage
Copy link
Contributor Author

@jreback green and note added.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code sample in #37465 passes on 1.1.3. may not need a backport

@jreback
Copy link
Contributor

jreback commented Oct 30, 2020

sorry i misread the OP

@bashtage can u move the note to 1.2

@jreback jreback modified the milestones: 1.1.4, 1.2 Oct 30, 2020
@bashtage
Copy link
Contributor Author

@jreback I confirmed that this is not a bug in the 1.1.x branch.

I have left out a release note since this code has never been released. This seems like the right thing to do, isn't it?

For posterity, if needed, the line is

- Fixed regression in :meth:crosstab that failed when the input series are categorical (:issue:37465)

so I can add to 1.20 if required.

@pep8speaks
Copy link

pep8speaks commented Oct 30, 2020

Hello @bashtage! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-31 00:09:24 UTC


def test_categoricals():
# https://github.com/pandas-dev/pandas/issues/37465
g = np.random.RandomState(25982704)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth parameterize this to check if a is a cat and b is not (and vice versa), might be slightly tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have been parameterized. I'll ping on green.

Change catch types to reflect error changes

closes pandas-dev#37465
@jreback jreback merged commit 2974e22 into pandas-dev:master Oct 31, 2020
@jreback
Copy link
Contributor

jreback commented Oct 31, 2020

thanks @bashtage

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
ukarroum pushed a commit to ukarroum/pandas that referenced this pull request Nov 2, 2020
a_is_cat = is_categorical_dtype(a.dtype)
assert not a_is_cat or a.value_counts().loc[1] == 0
result = crosstab(a, b, margins=True, dropna=False)
values = [[18, 16, 34], [0, 0, np.nan], [34, 32, 66], [52, 48, 100]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage im working on a branch that is failing on this test, getting a 0 instead of an nan here. is there any chance the branch is right and the test is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is quite likley that it should be 0.0. The NaN is in the margins column, which is normalized. Not totally sure how `NaN crept in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: crosstab fails with categoricals on master
5 participants