Skip to content

DOC: make io.rst utf8 only #5926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

DOC: make io.rst utf8 only #5926

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 13, 2014

#5142

@JanSchulz , can you test whether this solves the problem for you?

@jorisvandenbossche
Copy link
Member

Maybe this can solve the windows building issue (I will also test), but aside: do we want this in the docs? Because the example by itself does work, it's only the building that does not work (as far as I understand).

@ghost
Copy link
Author

ghost commented Jan 13, 2014

I think we want users/contributors to be able to build the docs, yeah. Even if they're on windows/diff locale
Do you feel the workaround clutter detracts much from the example?

It took a lot of effort to get pandas to play nice wth unicode and one lesson learned is not
to mix encodings. The docs should be utf8-clean IMO.

@jorisvandenbossche
Copy link
Member

I tried it, and it does not solve the issue. And in retrospect, that is maybe also logical: the problem in windows is in the building of the rst with unicode to html, and it is the output generated by the code example which causes this. With your changes, the output of the code example still contains special characters (which is also the point of the code example), and so causes the build on windows to stop.

I think @JanSchulz had another approach as a kind of hack: something along the lines of #5142 (comment). I also vaguely remember that the issue was fixed when using ipython's version of the ipython directive, but I should check that.

@ghost
Copy link
Author

ghost commented Jan 13, 2014

Then I misunderstood the issue. There is a definite difference:

s1='word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'
s2=s1.decode('utf8').encode('latin-1')

s1.decode('utf8')
Out[33]: u'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'

s2.decode('utf8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-34-7c1601a98c33> in <module>()
----> 1 s2.decode('utf8')

/usr/lib64/python2.7/encodings/utf_8.pyc in decode(input, errors)
     14 
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17 
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 14: invalid continuation byte

and in the case of python's "encoding utf8" premable that makes all the difference.
I expected sphinx to accept utf8 input, if it doesn't that seems like a bug to me.

Thanks for testing.

@ghost ghost closed this Jan 13, 2014
@ghost ghost deleted the PR_GH5142 branch January 13, 2014 20:41
@ghost
Copy link
Author

ghost commented Jan 13, 2014

btw, there was some decode action in our hacked version of ipython_directive, #5925 may actually solve
the problem by sheer coincidence.

@jorisvandenbossche
Copy link
Member

See also here #5142 (comment). There was indeed a .decode('utf8') in our version of the ipython directive for some other reason, but that broke the building on windows.

@jorisvandenbossche
Copy link
Member

I will try out the other PR with your rebase.

@ghost
Copy link
Author

ghost commented Jan 13, 2014

I'm seriously skimming past all the important bits today :), sorry.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant