Skip to content

ENH: assert_* has very superficial description if CategorialIndex are different #18056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue Oct 31, 2017 · 3 comments · Fixed by #18069
Closed

ENH: assert_* has very superficial description if CategorialIndex are different #18056

topper-123 opened this issue Oct 31, 2017 · 3 comments · Fixed by #18069
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented Oct 31, 2017

Code Sample, a copy-pastable example if possible

>>> c1 = pd.CategoricalIndex(['a', 'b'])
>>> c2 = pd.CategoricalIndex(['c', 'd'])
>>> s1 = pd.Series([1,2], index=c1)
>>> s2 = pd.Series([1,2], index=c2)
>>> pd.testing.assert_series_equal(s1, s2)
AssertionError: Series.index are different

Attribute "dtype" are different
[left]:  category
[right]: category

Note [left] and [right] output the same, so this help message isn't helpful.

Problem description

You can't see from the description where the differences in the CategoricalIndexes are. The reason is that Categorical return "category" for str(c1) and to see details, you need repr(c1).

Expected Output

The solution could be in the function util.testing.py::raise_assert_detail to replace

msg = """{obj} are different

{message}
[left]:  {left}
[right]: {right}""".format(obj=obj, message=message, left=left, right=right)

with the repr-formattes strings:

msg = """{obj} are different

{message}
[left]:  {left!r}
[right]: {right!r}""".format(obj=obj, message=message, left=left, right=right)

Note the (!r). This will give the full repr output which you'll almost always want anyway.

Alternatively, the decision to let str(c1) output "category" could be changed. That would be a breaking change, though, and would require proper deprecation warning.

I could submit a PR if the solution with !r format option is acceptable.

@topper-123 topper-123 changed the title ENH: assert_* has very superficial description of CategorialIndex are different ENH: assert_* has very superficial description if CategorialIndex are different Oct 31, 2017
@TomAugspurger
Copy link
Contributor

Alternatively, the decision to let str(c1) output "category" could be changed. That would be a breaking change, though, and would require proper deprecation warning.

In the CategoricalDtype refactor, we decided to have str(categoricaldtype) be 'category', and reprbe the full one.

A PR changing to use !r would be good I think.

@TomAugspurger TomAugspurger added the Testing pandas testing functions or related to the test suite label Oct 31, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Oct 31, 2017
@topper-123
Copy link
Contributor Author

Ok, I'll look into that.

Is there any chance this gets in v0.21.1? This output string comes up in quite a lot of places in my code.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2017

@topper-123 sure this could go in 0.21.1. The criteria is bug fixes, or small not-too-invasive enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants