Safely raise errors when object contains unicode #20593

janrito · 2018-04-03T13:38:07Z

This safely turns nd.array objects that contain unicode into a
representation that can be printed

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes assert_frame_equal cannot handle differences in with unicode data #20503
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This safely turns nd.array objects that contain unicode into a representation that can be printed

WillAyd · 2018-04-05T12:45:31Z

This needs tests - can you add some in pandas/tests/util/test_testing.py?

pep8speaks · 2018-04-05T14:59:58Z

Hello @janrito! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 09, 2018 at 17:34 Hours UTC

jreback · 2018-04-05T15:06:14Z

doc/source/whatsnew/v0.23.0.txt

@@ -1165,3 +1165,4 @@ Other

 - Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
 - Bug in accessing a :func:`pandas.get_option`, which raised ``KeyError`` rather than ``OptionError`` when looking up a non-existant option key in some cases (:issue:`19789`)
+- Bug in :func:`raise_assert_detail` for Series and DataFrames with differing unicode data (:issue:`20503`)


this is an internal function, rather say assert_series_equal and assert_frame_equal

👍 8f607c3

jreback · 2018-04-05T15:06:33Z

pandas/tests/util/test_testing.py

@@ -276,6 +276,19 @@ def test_numpy_array_equal_message(self):
            assert_almost_equal(np.array([[1, 2], [3, 4]]),
                                np.array([[1, 3], [3, 4]]))

+        expected = """numpy array are different


can you make a new test & indicate the gh issue with a comment

👍 6b087f2

jreback · 2018-04-05T15:06:42Z

pandas/tests/util/test_testing.py

@@ -678,6 +693,21 @@ def test_frame_equal_message(self):
                               pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]}),
                               by_blocks=True)

+        expected = """DataFrame\\.iloc\\[:, 1\\] are different


👍 6b087f2

janrito · 2018-04-05T15:48:10Z

Thanks for the review @jreback! I've changed the message, separated the tests and added comments referencing the gh issue

janrito · 2018-04-06T15:02:41Z

Tests are failing on python3, not sure how to deal with the printing issue

It seems like error messages are converted to native str type (binary in py2 and unicode in py3) so either way we encode this will add u or b prefixes to the objects

janrito · 2018-04-09T15:27:53Z

Ok, I'm not sure if this is the most elegant way of dealing with it, but if we encode only on python2 when the objects are base_strings, it doesn't print the text type prefixes in either python version

codecov · 2018-04-09T16:44:20Z

Codecov Report

Merging #20593 into master will increase coverage by 0.01%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master   #20593      +/-   ##
==========================================
+ Coverage   91.82%   91.83%   +0.01%     
==========================================
  Files         153      153              
  Lines       49256    49269      +13     
==========================================
+ Hits        45229    45247      +18     
+ Misses       4027     4022       -5

Flag	Coverage Δ
#multiple	`90.22% <50%> (+0.01%)`	⬆️
#single	`41.9% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/util/testing.py	`84.38% <50%> (-0.35%)`	⬇️
pandas/plotting/_core.py	`82.39% <0%> (-0.12%)`	⬇️
pandas/core/frame.py	`97.15% <0%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (ø)`	⬆️
pandas/util/_test_decorators.py	`92% <0%> (+0.1%)`	⬆️
pandas/plotting/_converter.py	`66.81% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73cb32e...1f7e231. Read the comment docs.

jreback

lgtm. small comments. ping on green.

jreback · 2018-04-09T16:47:56Z

pandas/util/testing.py

@@ -992,11 +992,18 @@ def raise_assert_detail(obj, message, left, right, diff=None):
        left = pprint_thing(left)
    elif is_categorical_dtype(left):
        left = repr(left)
+


can you import PY2 and string_types up top

👍575b2e8

jreback · 2018-04-09T16:49:57Z

pandas/util/testing.py

    if isinstance(right, np.ndarray):
        right = pprint_thing(right)
    elif is_categorical_dtype(right):
        right = repr(right)

+    if compat.PY2 and isinstance(right, compat.string_types):


can you add a comment on these.

👍 45d2b8e

jreback · 2018-04-09T16:50:35Z

pandas/tests/util/test_testing.py

@@ -499,10 +517,12 @@ def _assert_not_equal(self, a, b, **kwargs):
    def test_equal(self):
        self._assert_equal(Series(range(3)), Series(range(3)))
        self._assert_equal(Series(list('abc')), Series(list('abc')))
+        self._assert_equal(Series(list(u'áàä')), Series(list(u'áàä')))


can you add a test where left is unicode and right is non-unicode (but string)

👍 1f7e231

jreback · 2018-04-09T18:34:07Z

thanks @janrito nice patch!

janrito added 2 commits April 3, 2018 14:13

Safely raise errors when object contains unicode

9cd83ff

This safely turns nd.array objects that contain unicode into a representation that can be printed

Whatsnew entry

46cadc0

janrito added 2 commits April 5, 2018 15:58

Tests for comparisons of objects containing unicode

99ac0e8

Only need to pprint with the display encoding

329002c

jreback added Testing pandas testing functions or related to the test suite Unicode Unicode strings labels Apr 5, 2018

Linting

b506cf6

jreback requested changes Apr 5, 2018

View reviewed changes

janrito added 2 commits April 5, 2018 16:27

Whatnew update

8f607c3

Separate tests and document gh issue

6b087f2

Encode in utf-8 only in python2

51366f2

jreback requested changes Apr 9, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Apr 9, 2018

janrito added 3 commits April 9, 2018 18:02

import compat.PY2 and compat.string_types directly

575b2e8

Added documenting comments

45d2b8e

Add binary <-> unicode tests

1f7e231

jreback approved these changes Apr 9, 2018

View reviewed changes

jreback merged commit e8f206d into pandas-dev:master Apr 9, 2018

janrito deleted the bug/fix-assert-frame-equals-with-unicode-data branch April 10, 2018 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safely raise errors when object contains unicode #20593

Safely raise errors when object contains unicode #20593

janrito commented Apr 3, 2018 •

edited

Loading

WillAyd commented Apr 5, 2018

pep8speaks commented Apr 5, 2018 •

edited

Loading

jreback Apr 5, 2018

janrito Apr 5, 2018

jreback Apr 5, 2018

janrito Apr 5, 2018

jreback Apr 5, 2018

janrito Apr 5, 2018

janrito commented Apr 5, 2018

janrito commented Apr 6, 2018

janrito commented Apr 9, 2018

codecov bot commented Apr 9, 2018 •

edited

Loading

jreback left a comment

jreback Apr 9, 2018

janrito Apr 9, 2018

jreback Apr 9, 2018

janrito Apr 9, 2018

jreback Apr 9, 2018

janrito Apr 9, 2018

jreback commented Apr 9, 2018

Safely raise errors when object contains unicode #20593

Safely raise errors when object contains unicode #20593

Conversation

janrito commented Apr 3, 2018 • edited Loading

WillAyd commented Apr 5, 2018

pep8speaks commented Apr 5, 2018 • edited Loading

Comment last updated on April 09, 2018 at 17:34 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janrito commented Apr 5, 2018

janrito commented Apr 6, 2018

janrito commented Apr 9, 2018

codecov bot commented Apr 9, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 9, 2018

janrito commented Apr 3, 2018 •

edited

Loading

pep8speaks commented Apr 5, 2018 •

edited

Loading

codecov bot commented Apr 9, 2018 •

edited

Loading