Skip to content

Safely raise errors when object contains unicode #20593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

janrito
Copy link
Contributor

@janrito janrito commented Apr 3, 2018

This safely turns nd.array objects that contain unicode into a
representation that can be printed

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

janrito added 2 commits April 3, 2018 14:13
This safely turns nd.array objects that contain unicode into a
representation that can be printed
@WillAyd
Copy link
Member

WillAyd commented Apr 5, 2018

This needs tests - can you add some in pandas/tests/util/test_testing.py?

@pep8speaks
Copy link

pep8speaks commented Apr 5, 2018

Hello @janrito! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 09, 2018 at 17:34 Hours UTC

@jreback jreback added Testing pandas testing functions or related to the test suite Unicode Unicode strings labels Apr 5, 2018
@@ -1165,3 +1165,4 @@ Other

- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Bug in accessing a :func:`pandas.get_option`, which raised ``KeyError`` rather than ``OptionError`` when looking up a non-existant option key in some cases (:issue:`19789`)
- Bug in :func:`raise_assert_detail` for Series and DataFrames with differing unicode data (:issue:`20503`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an internal function, rather say assert_series_equal and assert_frame_equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 8f607c3

@@ -276,6 +276,19 @@ def test_numpy_array_equal_message(self):
assert_almost_equal(np.array([[1, 2], [3, 4]]),
np.array([[1, 3], [3, 4]]))

expected = """numpy array are different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a new test & indicate the gh issue with a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 6b087f2

@@ -678,6 +693,21 @@ def test_frame_equal_message(self):
pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]}),
by_blocks=True)

expected = """DataFrame\\.iloc\\[:, 1\\] are different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 6b087f2

@janrito
Copy link
Contributor Author

janrito commented Apr 5, 2018

Thanks for the review @jreback! I've changed the message, separated the tests and added comments referencing the gh issue

@janrito
Copy link
Contributor Author

janrito commented Apr 6, 2018

Tests are failing on python3, not sure how to deal with the printing issue

It seems like error messages are converted to native str type (binary in py2 and unicode in py3) so either way we encode this will add u or b prefixes to the objects

@janrito
Copy link
Contributor Author

janrito commented Apr 9, 2018

Ok, I'm not sure if this is the most elegant way of dealing with it, but if we encode only on python2 when the objects are base_strings, it doesn't print the text type prefixes in either python version

@codecov
Copy link

codecov bot commented Apr 9, 2018

Codecov Report

Merging #20593 into master will increase coverage by 0.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20593      +/-   ##
==========================================
+ Coverage   91.82%   91.83%   +0.01%     
==========================================
  Files         153      153              
  Lines       49256    49269      +13     
==========================================
+ Hits        45229    45247      +18     
+ Misses       4027     4022       -5
Flag Coverage Δ
#multiple 90.22% <50%> (+0.01%) ⬆️
#single 41.9% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 84.38% <50%> (-0.35%) ⬇️
pandas/plotting/_core.py 82.39% <0%> (-0.12%) ⬇️
pandas/core/frame.py 97.15% <0%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (ø) ⬆️
pandas/util/_test_decorators.py 92% <0%> (+0.1%) ⬆️
pandas/plotting/_converter.py 66.81% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73cb32e...1f7e231. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small comments. ping on green.

@@ -992,11 +992,18 @@ def raise_assert_detail(obj, message, left, right, diff=None):
left = pprint_thing(left)
elif is_categorical_dtype(left):
left = repr(left)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you import PY2 and string_types up top

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍575b2e8

if isinstance(right, np.ndarray):
right = pprint_thing(right)
elif is_categorical_dtype(right):
right = repr(right)

if compat.PY2 and isinstance(right, compat.string_types):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 45d2b8e

@@ -499,10 +517,12 @@ def _assert_not_equal(self, a, b, **kwargs):
def test_equal(self):
self._assert_equal(Series(range(3)), Series(range(3)))
self._assert_equal(Series(list('abc')), Series(list('abc')))
self._assert_equal(Series(list(u'áàä')), Series(list(u'áàä')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test where left is unicode and right is non-unicode (but string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 1f7e231

@jreback jreback added this to the 0.23.0 milestone Apr 9, 2018
@jreback jreback merged commit e8f206d into pandas-dev:master Apr 9, 2018
@jreback
Copy link
Contributor

jreback commented Apr 9, 2018

thanks @janrito nice patch!

@janrito janrito deleted the bug/fix-assert-frame-equals-with-unicode-data branch April 10, 2018 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

assert_frame_equal cannot handle differences in with unicode data
4 participants