Skip to content

REF: simplify latex formatting #35872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Sep 7, 2020
Merged

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Aug 24, 2020

Refactor to_latex using polymorphism and the builder design pattern.

Polymorphism

Previously there was a complicated logic in multiple methods
for either longtable or regular table.
This PR implements various builders:

  • RegularTableBuilder
  • LongTableBuilder
  • TabularBuilder

Selection of the appropriate builder is carried out in LatexFormatter,
depending on the kwargs provided.

Each builder must implement construction of all the table's components:

  • Beginning and end of the environment,
  • Separators (top, middle, bottom),
  • Header,
  • Body.

This allows one eliminate complex logic throughout multiple methods.

Separate Concerns

Separated concerns: row formatting into string (separate class RowStringConverter)
is carried out independently of the table builder.
Derived from RowStringConverter are two classes used to iterate over headers or body of the tables:

  • RowHeaderIterator
  • RowBodyIterator

Extract Methods

Multiple methods were extracted to improve code readability.

Previously there was a complicated logic in multiple methods
for either longtable or regular table.

This commit implements
 - ``LatexTableFormatter``,
 - ``LatexTabularFormatter``,
 - ``LatexLongTableFormatter``,
derived from ``LatexFormatter``, based on ``LatexFormatterAbstract``.

Each of the derived classes implements its own methods for writing
 - beginning of the table;
 - caption and labels;
 - separators;
 - end of the table.

LatexFormatter changes
----------------------

 - Make the process of creating tables more readable.
 - Drop escape and bold_rows attr
New classes for iterations over headers and body:
 - RowHeaderIterator;
 - RowBodyIterator,
based on RowCreator.
This reverts ``pandas/io/formats/format.py``
to its nearly original state and enables polymorphism
under the hood inside ``pandas/io/formats/latex.py``.

Add ``pandas/tests/io/formats/test_latex.py`` to test
lower-level functions/classes declared in
``pandas/io/formats/latex.py``
@ivanovmg ivanovmg changed the title Refactor/to latex REF: Implement polymorphism in latex formatting Aug 24, 2020
Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

According to the code coverage bot under these changes patches of code are not covered by the tests. The thing to do there is for each patch either clean it up or add tests so that they're hit

Also, if I've got this right you're adding/removing methods. For new methods we'll need docstrings; old methods need to be deprecated prior to removal

@ivanovmg
Copy link
Member Author

ivanovmg commented Aug 25, 2020

What is not covered by the tests are the abstract methods.

    @property
    @abstractmethod
    def env_end(self):
        pass

I think that the best thing is to setup coverage in such a way to ignore simple pass in the function.
Alternatively I can simply put a docstring there, which will be considered by the bot as a check.

The methods removed (actually one, _write_tabular_begin) and renamed (_print_cline -> _compose_cline) are internal implementation. Do I need to keep them and mark deprecated? They are not used anywhere else.

Also, do you want me to add docstrings for the internal methods? Or only public ones?
I mean, it is probably not reasonable to add docstrings to _escape_symbols or _convert_to_bold.
Unless required.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: deprecation I'm not sure. I would think it's not necessary as long as the methods are not exposed to users.

I would maybe change the title of the PR to something like "REF: simplify latex formatting" to make it clearer to other reviews what your goal is.

Otherwise looks okay to me. You'll need feedback from the core team to move forward

@ivanovmg ivanovmg changed the title REF: Implement polymorphism in latex formatting REF: simplify latex formatting Aug 26, 2020
@jreback jreback added IO LaTeX to_latex Refactor Internal refactoring of code labels Aug 27, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally this is easier to follow. It feels slightly java-esque from the naming and builder pattern., but not too overboard. ok on the inheritance; almost always prefer composition, but this is not crazy. some comments.

return self._nlevels if self.fmt.header else 0

@property
def _ilevels(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would make more descriptive properties here; they don't really need to be private, but ok

e.g.

index_nlevels is more descriptive.

Copy link
Member Author

@ivanovmg ivanovmg Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it was not clear for me what these _ilevels-like values were in the original code. So, I decided not to change the names. As you suggested, I renamed them.

Hopefully, this is reasonable:

_ilevels -> index_levels
_clevels -> column_levels
_nlevels -> header_levels -- that was difficult for me to understand what it means


buf.write("\\toprule\n")
@staticmethod
def _escape_symbols(row):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally type every parameter both in and out

buf : string or file handle
File path or object. If not specified, the result is returned as
a string.
@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a class method no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. At least not according to this https://refactoring.guru/design-patterns/builder/python/example

self._get_index_format() + self._get_column_format_based_on_dtypes()
)
elif not isinstance(input_column_format, str): # pragma: no cover
raise AssertionError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this hit in the tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original implementation.
I have just replaced AssertionError with ValueError and added several a test on this.

Problems
--------

Could not figure out how to deal with the following.

1. Property setters
2. Inability of mypy to figure out that list can get iterator as
   argument rather than iterable only.

pandas/io/formats/latex.py:44: error: Argument 1 to "list" has incompatible type "Iterator[Tuple[Any, ...]]"; expected "Iterable[List[str]]"  [arg-type]
pandas/io/formats/latex.py:544: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str")  [assignment]
@ivanovmg ivanovmg requested a review from jreback August 27, 2020 09:08
@ivanovmg
Copy link
Member Author

ivanovmg commented Sep 4, 2020

@jreback, is there anything else to be done from my side? It seems that I covered all the points you raised previously.

@jreback jreback added this to the 1.2 milestone Sep 5, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of comments, pls merge master and ping on green.

position=self.position,
)

def _select_builder(self) -> Type[TableBuilderAbstract]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inline thisin the builder function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to extract this method to make mypy happy.

If I refactor to this (which was originally there).

    @property
    def builder(self) -> TableBuilderAbstract:
         kwargs = dict(
            formatter=self.fmt,
            column_format=self.column_format,
            multicolumn=self.multicolumn,
            multicolumn_format=self.multicolumn_format,
            multirow=self.multirow,
            caption=self.caption,
            label=self.label,
            position=self.position,
        )
        if self.longtable:
            return LongTableBuilder(**kwargs)
        if any([self.caption, self.label, self.position]):
            return RegularTableBuilder(**kwargs)
        return TabularBuilder(**kwargs)

then mypy complains with the following error.

pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter"  [arg-type]
pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]"  [arg-type]
pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "bool"  [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter"  [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]"  [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "bool"  [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter"  [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]"  [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "bool"  [arg-type]

Honestly, typing stuff makes life more complicated, IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap typing does make things more complicated, no arg there :->

@pep8speaks
Copy link

pep8speaks commented Sep 6, 2020

Hello @ivanovmg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-06 17:57:25 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green.

@ivanovmg
Copy link
Member Author

ivanovmg commented Sep 7, 2020

Can anybody restart the task Windows py37_np16?
It is unlikely that the failure in TestPandasContainer.test_to_s3 is anyhow related to the present PR.

@jreback jreback merged commit 773f64d into pandas-dev:master Sep 7, 2020
@jreback
Copy link
Contributor

jreback commented Sep 7, 2020

thanks @ivanovmg for this, very nice!

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Sep 8, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
@ivanovmg ivanovmg deleted the refactor/to_latex branch November 6, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO LaTeX to_latex Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants