-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Correct crosstab for categorical inputs #37468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Retrigger CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@bashtage if u can add a note to 1.1.4 can push this in ping on greenish |
b15f90f
to
ea22f10
Compare
@jreback green and note added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code sample in #37465 passes on 1.1.3. may not need a backport
sorry i misread the OP @bashtage can u move the note to 1.2 |
f5d0dd9
to
8381498
Compare
@jreback I confirmed that this is not a bug in the 1.1.x branch. I have left out a release note since this code has never been released. This seems like the right thing to do, isn't it? For posterity, if needed, the line is
so I can add to 1.20 if required. |
8381498
to
f97f78d
Compare
f97f78d
to
d428ab6
Compare
|
||
def test_categoricals(): | ||
# https://github.com/pandas-dev/pandas/issues/37465 | ||
g = np.random.RandomState(25982704) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth parameterize this to check if a is a cat and b is not (and vice versa), might be slightly tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have been parameterized. I'll ping on green.
d428ab6
to
ed4842b
Compare
Change catch types to reflect error changes closes pandas-dev#37465
ed4842b
to
0f2fb8d
Compare
thanks @bashtage |
a_is_cat = is_categorical_dtype(a.dtype) | ||
assert not a_is_cat or a.value_counts().loc[1] == 0 | ||
result = crosstab(a, b, margins=True, dropna=False) | ||
values = [[18, 16, 34], [0, 0, np.nan], [34, 32, 66], [52, 48, 100]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bashtage im working on a branch that is failing on this test, getting a 0 instead of an nan here. is there any chance the branch is right and the test is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is quite likley that it should be 0.0. The NaN
is in the margins column, which is normalized. Not totally sure how `NaN crept in.
Change catch types to reflect error changes
closes #37465
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff