Skip to content

ENH: add math mode to formatter escape="latex-math" #50398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Dec 22, 2022

Added latex-math mode to avoid escaping $.
We probably shouldn't escape\[ and \]. They can not be applied between \begin{tabular} and \end{tabular}, which is used for both Series and DataFrames.

@natmokval natmokval marked this pull request as draft December 23, 2022 10:42
@natmokval natmokval force-pushed the 50040-add-math-mode-formatter-escape=latex branch 2 times, most recently from 4fb4d86 to adedc02 Compare December 29, 2022 15:31
@natmokval natmokval marked this pull request as ready for review December 29, 2022 19:23
@natmokval natmokval requested a review from attack68 December 29, 2022 19:24
@natmokval
Copy link
Contributor Author

Hi @attack68, could you please review this PR?

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The escape option is also a global pandas option configurable. We should expand the documentation there to specify the new option also.

A question: the characters _, {, }, ^ and \ are frequently used within the math mode, i.e. between $'s to specify equations. Does preserving the $'s but still escaping those math-related characters inside $'s break the representation of the math in Latex?

For example what happens to $ \alpha = \frac{\beta}{\zeta^2} $ under this escape mode?

Comment on lines 992 to 994
Use 'latex-math' to replace the characters ``&``, ``%``, ``#``, ``_``,
``{``, ``}``, ``~``, ``^``, and ``\``
in the cell display string with LaTeX-safe sequences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst this is true I think it is clearer to say:

"Replaces the same characters as the latex option except $ which is preserved to allow conversion to math mode".

We should also add an example probably.

def _escape_latex_math(s):
r"""
Replace the characters ``&``, ``%``, ``#``, ``_``, ``{``, ``}``,
``~``, ``^``, and ``\`` in the string with LaTeX-safe sequences.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add here: note ``$`` not escaped.

@attack68 attack68 added IO LaTeX to_latex Styler conditional formatting using DataFrame.style Enhancement labels Dec 31, 2022
@natmokval
Copy link
Contributor Author

Hi @attack68, I checked the example $ \alpha = \frac{\beta}{\zeta^2} $, and it didn’t work with the original fix. So I created more LaTeX expressions in math mode to understand how math-mode should work. My suggestion: there is no easy way to escape characters in math-mode. We can rely on user, he knows how to deal with LaTeX expressions.

For example: On one hand, the characters { and } are used in expressions to select logically a part of the expression. On the other hand, \{ and \} are curly brackets. We cannot decide for the user what he meant.

I added an example for math-mode and corrected LaTeX mode example.

@attack68
Copy link
Contributor

I think we may be able to use regex to detect these. Im not brialliant at regex, but doing a search on stack overflow we can use look back and look forward, I think. These would be the rules I thiunk we aim for:

  • all of the usual characters, except $, are escaped, unless:
  • any of these charcaters are detected as being between a left side $ and a right side $, in which case they form part of the equation and should not be escaped.

@natmokval
Copy link
Contributor Author

I think we may be able to use regex to detect these. Im not brialliant at regex, but doing a search on stack overflow we can use look back and look forward, I think. These would be the rules I thiunk we aim for:

* all of the usual characters, except `$`, are escaped, **unless**:

* any of these charcaters are detected as being between a left side `$` and a right side `$`, in which case they form part of the equation and should not be escaped.

Thank you, I understand. I'll try to use regex.

@natmokval
Copy link
Contributor Author

Hi @attack68. As you suggested I made a solution using regex to detect math-mode. It works well for expressions that are surrounded by the character $. For example: "abc $first\$math$ middle $secondmath$"

On the other hand, math expressions can be surrounded by the characters \( and \).
For example: " ab \(firstmath\) middle \( secondmath \) %(x)^ "
Or even with some mistakes, like " \)ab \(firstmath\) middle \( secondmath \) %(x)^ \)"
For this case, I have an approach as well.

I was thinking about updating the PR but I still have one question. Should we consider the case when we have an expression containing both math-mode types, $ and \(`` \)? For example: "abc \(firstmath\) middle $secondmath$"
It doesn’t look easy to process such expression.

@attack68
Copy link
Contributor

I think a basic PR just to handle and test the $ math mode case will be a good start. Once that is in place we could extend it in the future, and with the tests developed provides a reliable base from which to work

@natmokval natmokval force-pushed the 50040-add-math-mode-formatter-escape=latex branch from 8a329f8 to 4f9e8e1 Compare January 28, 2023 21:20
@natmokval
Copy link
Contributor Author

Hi @attack68. I added a basic implementation for math mode between $ and a test.
Could you please take a look?

str :
Escaped string
"""
s = s.replace(r"\$", r"ab2§=§8yz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you are doing here, you are substituting out an escaped $ sign, using a proxy uuid string ab2§=§8yz, but this string is the same string as I have used in the _escape_latex method for the same reason for dealing with some other character combiantions. I wonder if this might cause cross-pollution. Regardless, since it is a procy uuid string, it would be better to change the digits I think (using only digits that are not escaped!)


>>> df = pd.DataFrame([["123"], ["~ ^"], ["$%#"]])
>>> df = pd.DataFrame([["123"], ["~ ^"], ["%#"]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its fine to change this example. Just curious as to the reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one change worth making, otherwise I think it is good to add it. The test demonstrates the effect nicely. Didn't check the CI tests, we make sur ethey are green before merging.

I changed uuid string to be unique, as you suggested. And also reverted the example for latex mode. I changed it by mistake, there is nothing wrong with $ in latex mode (the online latex compiler which I used at first had some sort of bug).

CI tests look good.

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one change worth making, otherwise I think it is good to add it.
The test demonstrates the effect nicely.
Didn't check the CI tests, we make sur ethey are green before merging.

@attack68
Copy link
Contributor

Ah I just noticed there is no addition to the whatsnew. You might to add a simple one line addition.

@natmokval
Copy link
Contributor Author

Thank you, I added a line to whatsnew.

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@mroeschke
Copy link
Member

Could you merge in main and resolve the merge conflict?

@natmokval
Copy link
Contributor Author

I merged main in and resolved the merge conflict.

@natmokval
Copy link
Contributor Author

I resolved a new merge conflict and did merge in main. Sorry, it took so long.
Is there anything else that is needed to be done for this PR?

@attack68
Copy link
Contributor

attack68 commented Mar 5, 2023

@mroeschke can we still get this in to 2.0?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sure. This is okay for 2.0 since it looks to have been ready for a while and started before the RC release

@mroeschke mroeschke merged commit dfd5d66 into pandas-dev:main Mar 6, 2023
@mroeschke
Copy link
Member

Thanks @natmokval

@natmokval
Copy link
Contributor Author

Thank you for reviewing my PR.
I thought it might be useful to add to latex mode a case when a math expression starts with \( and ends with \). I'd like to do it and open a new PR.

@attack68
Copy link
Contributor

attack68 commented Mar 6, 2023

I think you can use all of the following:
image

These are more difficult to cover I think. I'm glad we have this PR to get a basic latex-math mode in there but these will require extensive tests I think.
A new PR will probably have to be quite test driven I would expect.

I dont know which of these tend to display in Mathjax either, for notebook display.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO LaTeX to_latex Styler conditional formatting using DataFrame.style
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add math mode to formatter escape="latex"
3 participants