Skip to content

ENH: .to_latex(longtable=True) latex caption and label support #25339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jeschwar
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

When creating a latex table with DataFrame.to_latex(longtable=False) the output is written inside a latex tabular environment and stored in some file like pandas_tabular.tex; the user can conveniently typeset the table in a main report.tex file complete with caption and label as follows:

\begin{table}
\caption{the caption}
\label{the label}
\input{pandas_tabular.tex}
\end{table}

This is good because the pandas_tabular.tex file can be re-created and the main report.tex simply needs to be recompiled to get the updated output.

The problem when creating a latex longtable with DataFrame.to_latex(longtable=True) is the caption and label need to go inside the latex longtable environment which is stored in a some file like pandas_longtable.tex. The latex longtable environment does not go inside a table environment like the tabular environment does; this means that setting the caption and label requires the user to edit the pandas_longtable.tex file after its creation. This does not support an automated workflow like we have with the tabular environment.

This PR adds caption and label support to DataFrame.to_latex(longtable=True) with the arguments lt_caption and lt_label. Example usage is described below.

The following python code creates some data in a DataFrame and writes it to disk in tabular and longtable latex environments:

import numpy as np
import pandas as pd


# create some example data with more rows than would fit on a single page
df = pd.DataFrame(np.random.randn(60,3))

# write the first 5 rows to regular table in a latex tabular environment
df.head().to_latex(
    'pandas_tabular.tex',
)

# write the whole table in the latex longtable environment c/w caption and label
df.to_latex(
    'pandas_longtable.tex',
    longtable=True,
    lt_caption='table in \\texttt{longtable} environment',
    lt_label='tab:longtable',
)

The following latex code is contained in a main report.tex and is used to typset both tables:

\documentclass{article}

\usepackage{longtable}
\usepackage{booktabs}

\begin{document}

% typeset the table in the tabular environment
Table \ref{tab:tabular}	is a \texttt{tabular} and has 5 rows:
\begin{table}[h]	
\centering	
\caption{table in \texttt{tabular} environment}
\label{tab:tabular}
\input{pandas_tabular.tex}	
\end{table}

% typeset the table in the longtable environment
Table \ref{tab:longtable} is a \texttt{longtable} and has 60 rows:
\input{pandas_longtable.tex}
\end{document}

Using DataFrame.to_latex(longtable=True) with the new arguments lt_caption and lt_label means we don't have to edit pandas_longtable.tex after its creation to get the caption and label working. This functionality also works with Series.to_latex(longtable=True).

PDF output is shown below:

image

@codecov
Copy link

codecov bot commented Feb 16, 2019

Codecov Report

Merging #25339 into master will decrease coverage by <.01%.
The diff coverage is 52.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25339      +/-   ##
==========================================
- Coverage   91.72%   91.71%   -0.01%     
==========================================
  Files         173      173              
  Lines       52831    52842      +11     
==========================================
+ Hits        48457    48462       +5     
- Misses       4374     4380       +6
Flag Coverage Δ
#multiple 90.27% <52.94%> (-0.01%) ⬇️
#single 41.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 94.16% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.99% <100%> (ø) ⬆️
pandas/io/formats/latex.py 95.52% <45.45%> (-4.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe6ca...7c105fa. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Feb 16, 2019

Codecov Report

Merging #25339 into master will decrease coverage by <.01%.
The diff coverage is 52.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25339      +/-   ##
==========================================
- Coverage   91.72%   91.71%   -0.01%     
==========================================
  Files         173      173              
  Lines       52831    52842      +11     
==========================================
+ Hits        48457    48462       +5     
- Misses       4374     4380       +6
Flag Coverage Δ
#multiple 90.27% <52.94%> (-0.01%) ⬇️
#single 41.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 94.16% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.99% <100%> (ø) ⬆️
pandas/io/formats/latex.py 95.52% <45.45%> (-4.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83fe6ca...7c105fa. Read the comment docs.

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Feb 16, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is label / caption useful to a tabular write?

you would need to add a test for this

@jeschwar
Copy link
Contributor Author

@jreback that is a good suggestion to allow the label and caption arguments to apply to the latex table environment when longtable=False. This means that DataFrame.to_latex() would have to write the nested latex table/tabular environments in the output; this may be ok for most uses. Here is what I am thinking for possible scenarios and corresponding behavior:

.to_latex(longtable=False, caption=None, label=None)

  • only output the latex tabular environment which is the current behavior

.to_latex(longtable=True, caption=None, label=None)

  • output the latex longtable environment without any captions or labels which the the current behavior

.to_latex(longtable=False, caption='some caption', label='tab:some label')

  • output the latex nested table/tabular environments which include the caption and label from the user
  • this code could be added to this PR
  • if the user wants to add customized latex code inside the table environment but outside the tabular environment then they should not pass values for the caption and label arguments

.to_latex(longtable=True, caption='some caption', label='tab:some label')

  • output the latex longtable environment which includes the caption and label from the user as initially described in this PR

Thoughts anyone?

@WillAyd
Copy link
Member

WillAyd commented Feb 20, 2019

@jeschwar not a LaTeX expert by any means but your proposal makes sense. Can you open as an issue and reference that from this PR? That is typically easiest for change management

@jeschwar
Copy link
Contributor Author

Thanks @WillAyd I created issue #25436 and will create a new PR because the scope has increased.

@jeschwar jeschwar closed this Feb 25, 2019
@jeschwar jeschwar deleted the to_latex_longtable_caption_label branch February 25, 2019 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants