Skip to content

TST: assert_series/frame not comparing for categoricals #13076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue May 3, 2016 · 5 comments
Closed

TST: assert_series/frame not comparing for categoricals #13076

jreback opened this issue May 3, 2016 · 5 comments
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented May 3, 2016

xref dask/partd#7

In [1]: df1 = DataFrame({'A' : Series(list('aba')).astype('category', categories=list('ab'), ordered=True)})

In [2]: df2 = DataFrame({'A' : Series(list('aba')).astype('category', categories=list('ba'), ordered=True)})

In [3]: tm.assert_frame_equal(df1, df2)

In [11]: df2 = DataFrame({'A' : Series(list('aba')).astype('category', categories=list('ab'), ordered=False)})

In [12]: tm.assert_frame_equal(df1, df2)

In [13]: df2 = DataFrame({'A' : Series(list('aba')).astype('category', categories=list('ba'), ordered=False)})

In [14]: tm.assert_frame_equal(df1, df2)

all of these should fail as the ordered categoricals don't compare equal in the categories, and ordered is not being compared (nor categories)

Furthermore the assert_series_equal fail for the same.

assert_categorical_equal DOES the right thing, but not being called here. (not that this should say categories are not equal?)

In [17]: tm.assert_categorical_equal(df1.A.values, df2.A.values)
AssertionError: Index are different

Index values are different (100.0 %)
[left]:  Index([u'a', u'b'], dtype='object')
[right]: Index([u'b', u'a'], dtype='object')
@jreback jreback added Testing pandas testing functions or related to the test suite Categorical Categorical Data Type Difficulty Intermediate labels May 3, 2016
@jreback jreback added this to the 0.18.2 milestone May 3, 2016
@jreback
Copy link
Contributor Author

jreback commented May 3, 2016

cc @sinhrks (on the assert_categorical_equal question)

@jreback jreback changed the title TST: assert_* fail for categorical comparisons TST: assert_series/frame not comparing for categoricals May 3, 2016
@sinhrks
Copy link
Member

sinhrks commented May 4, 2016

Correct. In addition, CategoricalIndex should also check internal Categorical.

I've once tried this, and found lots of errors including stata. Needs some time to check it one by one...

@jreback
Copy link
Contributor Author

jreback commented May 4, 2016

maybe lets do in 2 stages
first fix assert_categorical_equal then series/frame

@sinhrks
Copy link
Member

sinhrks commented May 4, 2016

Let me clarify... 2 stages are:

  1. Fix assert_categorical_equal to show correct assertion message (say categories are different or something)
  2. Fix assert_frame/series/index_equal to call assert_categorical_equal internally.

@jreback
Copy link
Contributor Author

jreback commented May 4, 2016

yep might be less invasive that way

and for series/frame could have an option that we gradually turn on so can do the change in pieces (not ideal but could be lots of test comparison issues)

jreback pushed a commit that referenced this issue May 11, 2016
stage 1 of #13076

Author: sinhrks <[email protected]>

Closes #13080 from sinhrks/test_categorical_message and squashes the following commits:

81172ce [sinhrks] TST: fix assert_categorical_equal message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants