BLD: ipython_directive, handle non-ascii execution results #6185

ghost · 2014-01-30T04:57:14Z

@jorisvandenbossche I think this is a clean solution to the non-utf8
output problem we discussed, specifically the example in io.rst.
This PR should produce the right output on windows and be more amenable
to py3 compat work too. If you confirm that it's good I feel this is clean enough
to try and get upstream, completing the upstreaming effort.

related #5530, #5925 (comment)

jorisvandenbossche · 2014-01-30T10:34:44Z

@y-p This looks nice! And in any case, the docs build on Windows.

But yet some possible quirks, I commented inline.

For the faster doc building, will try out later (no time anymore today).

jorisvandenbossche · 2014-01-30T10:50:52Z

doc/source/io.rst


-   data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')
+   data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'


Possible that it is still utf-8, and not latin? See your previous commit where you changed this: 3dbb9ea. So now it's unicode tried to show as latin?

Because now I get:

word length 0 TrÇÏumen 7 1 GrÇ¬Ç?e 5

However, if I change it back to the original latin-1 (what you get from b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')), it's also not correct:

word length 0 Tr�umen 7 1 Gr�áe 5

True, good catch.

phaebz · 2014-01-30T10:57:40Z

@y-p Seems you did some of the wishlist items in #5530 (comment) yourself, nice!

@jorisvandenbossche asked me in #5530 (comment) to try on py3. Tried it on win+py3, I get an instant fail. Since this PR is about sphinx speedup, section building and non-ascii handling on platforms such as Windows, I think the py3 compat should be done in a new PR after this one is merged.

jorisvandenbossche · 2014-01-30T11:00:01Z

doc/sphinxext/ipython_directive.py

+        try:
+            encoding = [options.get('output_encoding', 'utf8')]
+            self.shell.output_encoding = encoding
+        except:


How is this except ever reached? I mean, does the try block ever fail? It's just putting the encoding, not yet really encoding it, or do I see it wrong?

Because I supposed I would get such a warning if I removed the :output_encoding: 'latin1', but that's not the case.

remnant of an earlier version that accepted multiple encodings to try in order, and
had to use eval because of limitations in ReST directives. will fix.

ghost · 2014-01-30T19:32:54Z

The bad output in the cell is the result of pprint_thing used by the repr, and it''s output
matches that of the console encoding. on most linux that's utf8, if you try to join that up to
a unicode string:

In [10]: u''+u'Träumen'.encode('utf8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-10-181286dcc4b0> in <module>()
----> 1 u''+u'Träumen'.encode('utf8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

That's the problem. I've changed things accordingly.
On windows, the encoding might be different so the output might be differently encoded.
So I changed conf.py to set pandas' output encoding to utf8 regardless before processing
the docs. Should work.

…tems (win)

jorisvandenbossche · 2014-01-30T20:13:22Z

Hmm, it's still the same. But the docs build, so that's the most important for me if the output looks nice on linux.

And I think you forgot to put in the commit that changes io.rst (the :output_econding:).

ghost · 2014-01-30T20:46:33Z

That's not required. So this is nicer code that's no worse then the previous attempt?
If so, I'll merge, and leave that windows thing to when I can setup a windows box to
comfortably test things myself.

jorisvandenbossche · 2014-01-30T20:56:09Z

Indeed, I can confirm this is nicer code for the same (incorrect on windows but building) result :-)

What is the addition of :output_encoding: to ipython directive for? If it is not needed in this case?

jorisvandenbossche · 2014-01-30T21:07:03Z

BTW, I forgot almost I tested it, and it does matter (as is what I would understand from the code). Without specifying :output_encoding: 'latin_1' I get question marks like 0 Tr�umen. If I specify it I get like Tr�umen as I mentioned before. But in both cases the it builds.

ghost · 2014-01-30T21:32:51Z

I inserted a decoding step on all write to stdout from code executing
in the embedded ipython shell. That option sets the endocing to use for it.
It may actually be better to just always use utf8, I'll have to consider before
submitting upstream.

You may get Tr�umen but that's wrong as well. I'll stop using you as a CI
server, and get to it when I can. Thanks for the feedback. appreciated.

BLD: ipython_directive, handle non-ascii execution results

jorisvandenbossche · 2014-02-22T12:15:25Z

@y-p Do you find this good enough to push upstream? If you don't have time, I can try to do this.

jorisvandenbossche reviewed Jan 30, 2014
View reviewed changes

y-p added 2 commits January 30, 2014 21:40

BLD: ipython_directive, handle non-ascii execution results

c4fa9ff

BLD/DOC: docs conf.py, force pandas to alwayd output utf8 on eall sys…

7393a61

…tems (win)

ghost pushed a commit that referenced this pull request Jan 31, 2014

Merge pull request #6185 from y-p/PR_ip_d_unicode

b484a9f

BLD: ipython_directive, handle non-ascii execution results

ghost merged commit b484a9f into pandas-dev:master Jan 31, 2014

ghost deleted the PR_ip_d_unicode branch January 31, 2014 02:02

jorisvandenbossche mentioned this pull request Feb 22, 2014

DOC: use Sphinx extension directly from numpy/ipython instead of maintaining our own version #5221

Closed

jorisvandenbossche mentioned this pull request Feb 19, 2018

DOC/BLD: update vendored IPython.sphinxext version #19765

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLD: ipython_directive, handle non-ascii execution results #6185

BLD: ipython_directive, handle non-ascii execution results #6185

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

jorisvandenbossche Jan 30, 2014

ghost Jan 30, 2014

phaebz commented Jan 30, 2014

jorisvandenbossche Jan 30, 2014

ghost Jan 30, 2014

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

ghost commented Jan 30, 2014

jorisvandenbossche commented Feb 22, 2014


		data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')
		data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'

BLD: ipython_directive, handle non-ascii execution results #6185

BLD: ipython_directive, handle non-ascii execution results #6185

Conversation

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

jorisvandenbossche Jan 30, 2014

Choose a reason for hiding this comment

ghost Jan 30, 2014

Choose a reason for hiding this comment

phaebz commented Jan 30, 2014

jorisvandenbossche Jan 30, 2014

Choose a reason for hiding this comment

ghost Jan 30, 2014

Choose a reason for hiding this comment

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

ghost commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

jorisvandenbossche commented Jan 30, 2014

ghost commented Jan 30, 2014

jorisvandenbossche commented Feb 22, 2014