Unicode handling in `to_latex`. Needs encoding? #7061

jseabold · 2014-05-06T23:00:35Z

I can't seem to get this one to work and to_latex doesn't allow a user-specified encoding. I think this might need a look.

I have it in unicode, so try that way.

pd.DataFrame([[u'au\xdfgangen']]).to_latex('test.tex')

Nope. Ok, so let's encode it as a utf-8 string

pd.DataFrame([[u'au\xdfgangen']]).apply(lambda x : x.str.encode('utf-8')).to_latex('test.tex')

Nope. It looks like it's getting coerced back to unicode in formatter._to_str_columns() then tries to write it as ASCII...

The text was updated successfully, but these errors were encountered:

jseabold · 2014-05-06T23:19:50Z

Can pass a StringIO instance to buf, then encode and write this yourself as a workaround.

TomAugspurger · 2014-05-07T01:48:21Z

Works correctly in python 3 as well.

I've got a fix that seems to work for python 2. Changing

            with open(self.buf, 'w') as f:
                write(f, frame, column_format, strcols, longtable)

to

            import codecs
            with codecs.open(self.buf, 'wb', encoding=encoding) as f:
                write(f, frame, column_format, strcols, longtable)

along with adding an encoding kwarg to to_latex (default to utf-8?). I haven't done much with unicode, so I'm still reading about it. Let me know if this seems wrong to you, or if it needs to be done elsewhere.

jseabold · 2014-05-07T02:36:30Z

Yes, I'm slowly trying to move to python 3 partially for this reason.

That seems reasonable to me. I assumed that the other functions just encoded the unicode/string according to the given encoding, but I'm not sure.

The default in to_csv and friends is encoding=None. I assume it falls back to the default encoding for the locale, but I'm not positive on that. I started to check but I'm under a deadline right now.

nbonnotte · 2015-11-20T14:01:34Z

I just encountered the same problem with pandas 0.17, so I guess the fix has not been included?

@TomAugspurger do you intent on making a PR?

TomAugspurger · 2015-11-20T14:10:29Z

I never got around to submitting a pull request. Feel free to do so if you want! My fix above might work (would need to be tested), but it might be better to tie this in with how to_csv handles encodings (not sure, haven't looked).

nbonnotte · 2015-11-20T14:27:27Z

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.

TomAugspurger · 2015-11-20T14:46:38Z

There’s a possibility that we’ll be able to replace some of the to_latex code with a Jinja template, similar to the Style stuff. So don’t spend too much time on it :)

On Nov 20, 2015, at 8:27 AM, Nicolas Bonnotte [email protected] wrote:

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.

—
Reply to this email directly or view it on GitHub #7061 (comment).

nbonnotte · 2015-11-28T19:54:21Z

to_csv uses csv.writer, with an adapter to intercept the output and convert it to utf-8. It would be possible to factorize code so that the same may be used for both to_csv and to_latex, but it would require a bit of work.

Considering this, and your previous remark, and the simplicity of your solution, I'll just implement the latter. But the encoding parameter of to_csv defaults to ascii with Python 2 and to utf-8 for Python 3, so I'll do that for to_latex.

jreback · 2015-11-29T18:16:16Z

@nbonnotte yes, this just requires a encoding argument. you may want to add a LatexFormatter (as a sub-class of DataFrameFormatter) as this will allow some re-factoring to be internally done later on.

jreback · 2016-01-15T16:24:52Z

closed by #11914

jseabold added the Data IO label May 6, 2014

jreback added Output-Formatting labels May 7, 2014

jreback added this to the 0.15.0 milestone May 7, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jorisvandenbossche added IO LaTeX to_latex and removed Output-Formatting __repr__ of pandas objects, to_string labels Aug 22, 2015

nbonnotte mentioned this issue Dec 27, 2015

EHN encoding parameter for to_latex #11914

Closed

jreback modified the milestones: 0.18.0, Next Major Release Jan 11, 2016

nbonnotte mentioned this issue Jan 13, 2016

ENH missing decimal parameter in .to_latex and .to_html #12031

Closed

jreback pushed a commit that referenced this issue Jan 15, 2016

ENH in .to_latex() support for utf-8 encoding in Python 2, #7061

3a832df

jreback closed this as completed Jan 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode handling in `to_latex`. Needs encoding? #7061

Unicode handling in `to_latex`. Needs encoding? #7061

jseabold commented May 6, 2014

jseabold commented May 6, 2014

TomAugspurger commented May 7, 2014

jseabold commented May 7, 2014

nbonnotte commented Nov 20, 2015

TomAugspurger commented Nov 20, 2015

nbonnotte commented Nov 20, 2015

TomAugspurger commented Nov 20, 2015

nbonnotte commented Nov 28, 2015

jreback commented Nov 29, 2015

jreback commented Jan 15, 2016

Unicode handling in to_latex. Needs encoding? #7061

Unicode handling in to_latex. Needs encoding? #7061

Comments

jseabold commented May 6, 2014

jseabold commented May 6, 2014

TomAugspurger commented May 7, 2014

jseabold commented May 7, 2014

nbonnotte commented Nov 20, 2015

TomAugspurger commented Nov 20, 2015

nbonnotte commented Nov 20, 2015

TomAugspurger commented Nov 20, 2015

nbonnotte commented Nov 28, 2015

jreback commented Nov 29, 2015

jreback commented Jan 15, 2016

Unicode handling in `to_latex`. Needs encoding? #7061

Unicode handling in `to_latex`. Needs encoding? #7061