Skip to content

Unicode handling in to_latex. Needs encoding? #7061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue May 6, 2014 · 10 comments
Closed

Unicode handling in to_latex. Needs encoding? #7061

jseabold opened this issue May 6, 2014 · 10 comments
Labels
IO Data IO issues that don't fit into a more specific label IO LaTeX to_latex Unicode Unicode strings
Milestone

Comments

@jseabold
Copy link
Contributor

jseabold commented May 6, 2014

I can't seem to get this one to work and to_latex doesn't allow a user-specified encoding. I think this might need a look.

I have it in unicode, so try that way.

pd.DataFrame([[u'au\xdfgangen']]).to_latex('test.tex')

Nope. Ok, so let's encode it as a utf-8 string

pd.DataFrame([[u'au\xdfgangen']]).apply(lambda x : x.str.encode('utf-8')).to_latex('test.tex')

Nope. It looks like it's getting coerced back to unicode in formatter._to_str_columns() then tries to write it as ASCII...

@jseabold
Copy link
Contributor Author

jseabold commented May 6, 2014

Can pass a StringIO instance to buf, then encode and write this yourself as a workaround.

@TomAugspurger
Copy link
Contributor

Works correctly in python 3 as well.

I've got a fix that seems to work for python 2. Changing

            with open(self.buf, 'w') as f:
                write(f, frame, column_format, strcols, longtable)

to

            import codecs
            with codecs.open(self.buf, 'wb', encoding=encoding) as f:
                write(f, frame, column_format, strcols, longtable)

along with adding an encoding kwarg to to_latex (default to utf-8?). I haven't done much with unicode, so I'm still reading about it. Let me know if this seems wrong to you, or if it needs to be done elsewhere.

@jseabold
Copy link
Contributor Author

jseabold commented May 7, 2014

Yes, I'm slowly trying to move to python 3 partially for this reason.

That seems reasonable to me. I assumed that the other functions just encoded the unicode/string according to the given encoding, but I'm not sure.

The default in to_csv and friends is encoding=None. I assume it falls back to the default encoding for the locale, but I'm not positive on that. I started to check but I'm under a deadline right now.

@jreback jreback added this to the 0.15.0 milestone May 7, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jorisvandenbossche jorisvandenbossche added IO LaTeX to_latex and removed Output-Formatting __repr__ of pandas objects, to_string labels Aug 22, 2015
@nbonnotte
Copy link
Contributor

I just encountered the same problem with pandas 0.17, so I guess the fix has not been included?

@TomAugspurger do you intent on making a PR?

@TomAugspurger
Copy link
Contributor

I never got around to submitting a pull request. Feel free to do so if you want! My fix above might work (would need to be tested), but it might be better to tie this in with how to_csv handles encodings (not sure, haven't looked).

@nbonnotte
Copy link
Contributor

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.

@TomAugspurger
Copy link
Contributor

There’s a possibility that we’ll be able to replace some of the to_latex code with a Jinja template, similar to the Style stuff. So don’t spend too much time on it :)

On Nov 20, 2015, at 8:27 AM, Nicolas Bonnotte [email protected] wrote:

Good point. On the same vein, I just noticed that the decimal option is available for to_csv but not for to_latex...

I'll have a look.


Reply to this email directly or view it on GitHub #7061 (comment).

@nbonnotte
Copy link
Contributor

to_csv uses csv.writer, with an adapter to intercept the output and convert it to utf-8. It would be possible to factorize code so that the same may be used for both to_csv and to_latex, but it would require a bit of work.

Considering this, and your previous remark, and the simplicity of your solution, I'll just implement the latter. But the encoding parameter of to_csv defaults to ascii with Python 2 and to utf-8 for Python 3, so I'll do that for to_latex.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

@nbonnotte yes, this just requires a encoding argument. you may want to add a LatexFormatter (as a sub-class of DataFrameFormatter) as this will allow some re-factoring to be internally done later on.

@jreback
Copy link
Contributor

jreback commented Jan 15, 2016

closed by #11914

@jreback jreback closed this as completed Jan 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label IO LaTeX to_latex Unicode Unicode strings
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants