Skip to content

fix hashing string-casting error #21187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 21, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 24, 2018

# set sys.defaultencoding to ascii, then change it back after the test
enc = sys.getdefaultencoding()
reload(sys) # noqa:F821
sys.setdefaultencoding('ascii')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a context manager this i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a context manager for locale in pd.util.testing. Can that be used here or do you have something else in mind? (I agree it would be prettier)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls use that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tm.set_locale doesn't change sys.getdefaultencoding(). I could make a new contextmanager specifically for this (which I guess would be a no-op in py3?)

@codecov
Copy link

codecov bot commented May 24, 2018

Codecov Report

Merging #21187 into master will decrease coverage by 0.01%.
The diff coverage is 11.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21187      +/-   ##
==========================================
- Coverage   91.92%    91.9%   -0.02%     
==========================================
  Files         153      153              
  Lines       49563    49572       +9     
==========================================
+ Hits        45559    45560       +1     
- Misses       4004     4012       +8
Flag Coverage Δ
#multiple 90.3% <11.11%> (-0.02%) ⬇️
#single 41.8% <11.11%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 85.27% <11.11%> (-0.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b36b451...7ab9324. Read the comment docs.

@@ -202,6 +203,35 @@ def test_latex_repr(self):

class TestCategoricalRepr(object):

@pytest.mark.skipif(compat.PY3, reason="Decoding failure only in PY2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a test for py3 as well that uses utf8 as the encoding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, couldn't hurt.

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string labels May 24, 2018
@@ -100,6 +100,7 @@ Bug Fixes
- Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)
- Bug in :func:`pandas.testing.assert_index_equal` which raised ``AssertionError`` incorrectly, when comparing two :class:`CategoricalIndex` objects with param ``check_categorical=False`` (:issue:`19776`)
- Bug in rendering :class:`Series` with ``Categorical`` dtype in rare conditions under Python 2.7 (:issue:`21002`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.23.2

# GH#21002 if len(index) > 60, sys.getdefaultencoding()=='ascii',
# and we are working in PY2, then rendering a Categorical could raise
# UnicodeDecodeError by trying to decode when it shouldn't
from pandas.core.base import StringMixin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can import at the top

str(ser)

else:
# set sys.defaultencoding to ascii, then change it back after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this into a context manager in pandas.util.testing

@jreback jreback added this to the 0.23.2 milestone Jun 21, 2018
@jreback jreback merged commit e24da6c into pandas-dev:master Jun 21, 2018
@jreback
Copy link
Contributor

jreback commented Jun 21, 2018

thanks!

and soon enough we will blow away all PY2 code in any event! but nice patch.

@jbrockmendel jbrockmendel deleted the unicat branch June 22, 2018 03:32
jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rendering Series[Categorical] raises UnicodeDecodeError
3 participants