Skip to content

ENH: add math mode to formatter escape="latex" #50040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
DanielHabenicht opened this issue Dec 3, 2022 · 8 comments · Fixed by #50398
Closed
1 of 3 tasks

ENH: add math mode to formatter escape="latex" #50040

DanielHabenicht opened this issue Dec 3, 2022 · 8 comments · Fixed by #50398
Assignees
Labels
Enhancement good first issue IO LaTeX to_latex Styler conditional formatting using DataFrame.style

Comments

@DanielHabenicht
Copy link

DanielHabenicht commented Dec 3, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Have a DataFrame that contains both content in latex mathmode (eg. $x=3$), but also content which need to be escaped (eg. something_with_underscore).

With the current escape mode of the style.format() Function it is only possible to either escape both or none of them, instead of being agnostic of what is needed.

import pandas as pd
df = pd.DataFrame([[1,"$x=3$"],[3,"something_with_underscore"]], columns=["A", "B"])
print(df.style.format(escape="latex").to_latex())

# Prints:
# \begin{tabular}{lrl}
#  & A & B \\
# 0 & 1 & \$x=3\$ \\
# 1 & 3 & something\_with\_underscore\\
# \end{tabular}

# or 
print(df.style.to_latex()) # no call to format function

# Prints:
# \begin{tabular}{lrl}
#  & A & B \\
# 0 & 1 & $x=3$ \\
# 1 & 3 & something_with_underscore\\
# \end{tabular}
# ... which is invalid LaTeX

Feature Description

  1. The easy way is to add an escape="latex-math" option and exclude the $ escaping if this string is supplied.

    def _escape_latex(s):

  2. Because there might be other people needing different escapings one could also refactor the escape argument to take both predefined strings (e.g. "html" and "latex") but also a map with characters to escape.

    Could be called like this:

    df.style.format(escape="latex;math").to_latex()
    # or
    df.style.format(escape={
       "%", "\\%")
    }).to_latex()
  3. Both

Alternative Solutions

Replace the \\$ character with $ afterwards.

As mentioned by @attack68, if the content to be escaped is only in the header you can use format_index:

import pandas as pd
df = pd.DataFrame([[1,"$x=3$"],[3,"Text"]], columns=["A", "under_score"])
print(df.style.format_index(axis=1, escape="latex").to_latex())

\begin{tabular}{lrl}
 & A & under\_score \\
0 & 1 & $x=3$ \\
1 & 3 & Text \\
\end{tabular}

Additional Context

Seems like other people have the same problem:
https://stackoverflow.com/questions/48516653/how-to-prevent-pandas-dataframe-to-latex-from-adding-escape-characters-before
https://stackoverflow.com/questions/44260547/pandas-to-latex-escapes-mathmode

@DanielHabenicht DanielHabenicht added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 3, 2022
@attack68
Copy link
Contributor

attack68 commented Dec 4, 2022

The stackoverflow examples specifically reference DataFrame.to_latex which operates differently to Styler.to_latex.

Your given example is also not ideal to describe your problem since your can actually code the separate scenarios using format_index and format separately..

import pandas as pd
df = pd.DataFrame([[1,"$x=3$"],[3,"Text"]], columns=["A", "under_score"])
print(df.style.format_index(axis=1, escape="latex").to_latex())

\begin{tabular}{lrl}
 & A & under\_score \\
0 & 1 & $x=3$ \\
1 & 3 & Text \\
\end{tabular}

However, I observe your point about math mode and allowing a customisable map and/or a specific escape callable to be passed is quite sensible. The escapers are generally quite simple functions. The suggestions regarding having a "latex-math" escape mode which is slightly different to "latex" is also valid.

@attack68 attack68 added Styler conditional formatting using DataFrame.style IO LaTeX to_latex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 4, 2022
@attack68
Copy link
Contributor

attack68 commented Dec 4, 2022

Of course you can always write your own formatting function anyway and supply that directly as the callable.

@DanielHabenicht
Copy link
Author

DanielHabenicht commented Dec 4, 2022

You have summarized it better than I can.

The stackoverflow examples specifically reference DataFrame.to_latex which operates differently to Styler.to_latex.

I just referenced the Stackoverflow answers to emphasize that its not just my problem, but reoccurring.

import pandas as pd
df = pd.DataFrame([[1,"$x=3$"],[3,"Text"]], columns=["A", "under_score"])
print(df.style.format_index(axis=1, escape="latex").to_latex())

\begin{tabular}{lrl}
 & A & under\_score \\
0 & 1 & $x=3$ \\
1 & 3 & Text \\
\end{tabular}

Thanks for nice workaround!

However, I observe your point about math mode and allowing a customisable map and/or a specific escape callable to be passed is quite sensible. The escapers are generally quite simple functions. The suggestions regarding having a "latex-math" escape mode which is slightly different to "latex" is also valid.

I edited my request to reflect the problem better.

Of course you can always write your own formatting function anyway and supply that directly as the callable.

Nice, I didn't saw that it is implemented.
The documentation should be updated to reflect that one can use a function too!

Should I add another mode and document it?

@attack68
Copy link
Contributor

attack68 commented Dec 4, 2022

My point was that one can use escape as an 'easy' way to get LaTeX valid string.

df = pd.DataFrame([["%", "&", "$x=3$"]])
print(df.style.format(escape="latex").to_latex())

\begin{tabular}{llll}
 & 0 & 1 & 2 \\
0 & \% & \& & \$x=3\$ \\
\end{tabular}

But the format method itself is well documented to accept a callable that formats the values. For example

def _custom_formatter(s):
    return s.replace("%", "\\%").replace("&", "\\&")
print(df.style.format(_custom_formatter).to_latex())

\begin{tabular}{llll}
 & 0 & 1 & 2 \\
0 & \% & \& & $x=3$ \\
\end{tabular}

Therefore, this is a second workaround, and I do not believe your requirements are currently beyond pandas. It is just that, perhaps, a common usage pattern (latex-math) is slightly beyond the basic input parameters,

@DanielHabenicht
Copy link
Author

Oh, okay. I thought you were trying to tell that one can:

def _custom_escape(s):
    return s.replace("%", "\\%").replace("&", "\\&")
print(df.style.format(escape=_custom_escape).to_latex())

But that doesn't seem the case. Using the formatter multiple times is of course possible. :)

Then just update me if I should go ahead with providing a PR or if its not worth it maintaining it in the future.

@attack68
Copy link
Contributor

attack68 commented Dec 8, 2022

escape currently has the mode html and latex. I think it would be worth adding the escape mode latex-math where the only difference is that $ is not escaped (and potentially \[ and \] also but worth investigating).

This will be a user-friendly addition to the API arguments.

@natmokval
Copy link
Contributor

take

@attack68
Copy link
Contributor

@seyon99 you can't take an issue that has already been taken and which also already has a solution via the pull request that has already been submitted and approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement good first issue IO LaTeX to_latex Styler conditional formatting using DataFrame.style
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants