Skip to content

BLD: ipython_directive, handle non-ascii execution results #6185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
2 commits merged into from Jan 31, 2014
Merged

BLD: ipython_directive, handle non-ascii execution results #6185

2 commits merged into from Jan 31, 2014

Conversation

ghost
Copy link

@ghost ghost commented Jan 30, 2014

@jorisvandenbossche I think this is a clean solution to the non-utf8
output problem we discussed, specifically the example in io.rst.
This PR should produce the right output on windows and be more amenable
to py3 compat work too. If you confirm that it's good I feel this is clean enough
to try and get upstream, completing the upstreaming effort.

related #5530, #5925 (comment)

@jorisvandenbossche
Copy link
Member

@y-p This looks nice! And in any case, the docs build on Windows.

But yet some possible quirks, I commented inline.

For the faster doc building, will try out later (no time anymore today).


data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')
data = b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible that it is still utf-8, and not latin? See your previous commit where you changed this: 3dbb9ea. So now it's unicode tried to show as latin?

Because now I get:

       word  length
0  TrÇÏumen       7
1   GrǬÇ?e       5

However, if I change it back to the original latin-1 (what you get from b'word,length\nTr\xc3\xa4umen,7\nGr\xc3\xbc\xc3\x9fe,5'.decode('utf8').encode('latin-1')), it's also not correct:

      word  length
0  Tr�umen       7
1    Gr�áe       5

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good catch.

@phaebz
Copy link
Contributor

phaebz commented Jan 30, 2014

@y-p Seems you did some of the wishlist items in #5530 (comment) yourself, nice!

@jorisvandenbossche asked me in #5530 (comment) to try on py3. Tried it on win+py3, I get an instant fail. Since this PR is about sphinx speedup, section building and non-ascii handling on platforms such as Windows, I think the py3 compat should be done in a new PR after this one is merged.

try:
encoding = [options.get('output_encoding', 'utf8')]
self.shell.output_encoding = encoding
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this except ever reached? I mean, does the try block ever fail? It's just putting the encoding, not yet really encoding it, or do I see it wrong?

Because I supposed I would get such a warning if I removed the :output_encoding: 'latin1', but that's not the case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnant of an earlier version that accepted multiple encodings to try in order, and
had to use eval because of limitations in ReST directives. will fix.

@ghost
Copy link
Author

ghost commented Jan 30, 2014

The bad output in the cell is the result of pprint_thing used by the repr, and it''s output
matches that of the console encoding. on most linux that's utf8, if you try to join that up to
a unicode string:

In [10]: u''+u'Träumen'.encode('utf8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-10-181286dcc4b0> in <module>()
----> 1 u''+u'Träumen'.encode('utf8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

That's the problem. I've changed things accordingly.
On windows, the encoding might be different so the output might be differently encoded.
So I changed conf.py to set pandas' output encoding to utf8 regardless before processing
the docs. Should work.

@jorisvandenbossche
Copy link
Member

Hmm, it's still the same. But the docs build, so that's the most important for me if the output looks nice on linux.

And I think you forgot to put in the commit that changes io.rst (the :output_econding:).

@ghost
Copy link
Author

ghost commented Jan 30, 2014

That's not required. So this is nicer code that's no worse then the previous attempt?
If so, I'll merge, and leave that windows thing to when I can setup a windows box to
comfortably test things myself.

@jorisvandenbossche
Copy link
Member

Indeed, I can confirm this is nicer code for the same (incorrect on windows but building) result :-)

What is the addition of :output_encoding: to ipython directive for? If it is not needed in this case?

@jorisvandenbossche
Copy link
Member

BTW, I forgot almost I tested it, and it does matter (as is what I would understand from the code). Without specifying :output_encoding: 'latin_1' I get question marks like 0 Tr�umen. If I specify it I get like Tr�umen as I mentioned before. But in both cases the it builds.

@ghost
Copy link
Author

ghost commented Jan 30, 2014

I inserted a decoding step on all write to stdout from code executing
in the embedded ipython shell. That option sets the endocing to use for it.
It may actually be better to just always use utf8, I'll have to consider before
submitting upstream.

You may get Tr�umen but that's wrong as well. I'll stop using you as a CI
server, and get to it when I can. Thanks for the feedback. appreciated.

ghost pushed a commit that referenced this pull request Jan 31, 2014
BLD: ipython_directive, handle non-ascii execution results
@ghost ghost merged commit b484a9f into pandas-dev:master Jan 31, 2014
@ghost ghost deleted the PR_ip_d_unicode branch January 31, 2014 02:02
@jorisvandenbossche
Copy link
Member

@y-p Do you find this good enough to push upstream? If you don't have time, I can try to do this.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants