-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Safely raise errors when object contains unicode #20593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safely raise errors when object contains unicode #20593
Conversation
This safely turns nd.array objects that contain unicode into a representation that can be printed
This needs tests - can you add some in |
Hello @janrito! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on April 09, 2018 at 17:34 Hours UTC |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -1165,3 +1165,4 @@ Other | |||
|
|||
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`) | |||
- Bug in accessing a :func:`pandas.get_option`, which raised ``KeyError`` rather than ``OptionError`` when looking up a non-existant option key in some cases (:issue:`19789`) | |||
- Bug in :func:`raise_assert_detail` for Series and DataFrames with differing unicode data (:issue:`20503`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an internal function, rather say assert_series_equal and assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 8f607c3
pandas/tests/util/test_testing.py
Outdated
@@ -276,6 +276,19 @@ def test_numpy_array_equal_message(self): | |||
assert_almost_equal(np.array([[1, 2], [3, 4]]), | |||
np.array([[1, 3], [3, 4]])) | |||
|
|||
expected = """numpy array are different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a new test & indicate the gh issue with a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 6b087f2
pandas/tests/util/test_testing.py
Outdated
@@ -678,6 +693,21 @@ def test_frame_equal_message(self): | |||
pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 7]}), | |||
by_blocks=True) | |||
|
|||
expected = """DataFrame\\.iloc\\[:, 1\\] are different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 6b087f2
Thanks for the review @jreback! I've changed the message, separated the tests and added comments referencing the gh issue |
Tests are failing on python3, not sure how to deal with the printing issue It seems like error messages are converted to native str type (binary in py2 and unicode in py3) so either way we encode this will add u or b prefixes to the objects |
Ok, I'm not sure if this is the most elegant way of dealing with it, but if we encode only on python2 when the objects are base_strings, it doesn't print the text type prefixes in either python version |
Codecov Report
@@ Coverage Diff @@
## master #20593 +/- ##
==========================================
+ Coverage 91.82% 91.83% +0.01%
==========================================
Files 153 153
Lines 49256 49269 +13
==========================================
+ Hits 45229 45247 +18
+ Misses 4027 4022 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small comments. ping on green.
@@ -992,11 +992,18 @@ def raise_assert_detail(obj, message, left, right, diff=None): | |||
left = pprint_thing(left) | |||
elif is_categorical_dtype(left): | |||
left = repr(left) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you import PY2 and string_types up top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍575b2e8
pandas/util/testing.py
Outdated
if isinstance(right, np.ndarray): | ||
right = pprint_thing(right) | ||
elif is_categorical_dtype(right): | ||
right = repr(right) | ||
|
||
if compat.PY2 and isinstance(right, compat.string_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 45d2b8e
@@ -499,10 +517,12 @@ def _assert_not_equal(self, a, b, **kwargs): | |||
def test_equal(self): | |||
self._assert_equal(Series(range(3)), Series(range(3))) | |||
self._assert_equal(Series(list('abc')), Series(list('abc'))) | |||
self._assert_equal(Series(list(u'áàä')), Series(list(u'áàä'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test where left is unicode and right is non-unicode (but string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 1f7e231
thanks @janrito nice patch! |
This safely turns nd.array objects that contain unicode into a
representation that can be printed
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff