-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: simplify latex formatting #35872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously there was a complicated logic in multiple methods for either longtable or regular table. This commit implements - ``LatexTableFormatter``, - ``LatexTabularFormatter``, - ``LatexLongTableFormatter``, derived from ``LatexFormatter``, based on ``LatexFormatterAbstract``. Each of the derived classes implements its own methods for writing - beginning of the table; - caption and labels; - separators; - end of the table. LatexFormatter changes ---------------------- - Make the process of creating tables more readable. - Drop escape and bold_rows attr
New classes for iterations over headers and body: - RowHeaderIterator; - RowBodyIterator, based on RowCreator.
This reverts ``pandas/io/formats/format.py`` to its nearly original state and enables polymorphism under the hood inside ``pandas/io/formats/latex.py``. Add ``pandas/tests/io/formats/test_latex.py`` to test lower-level functions/classes declared in ``pandas/io/formats/latex.py``
cbe59db
to
f601598
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
According to the code coverage bot under these changes patches of code are not covered by the tests. The thing to do there is for each patch either clean it up or add tests so that they're hit
Also, if I've got this right you're adding/removing methods. For new methods we'll need docstrings; old methods need to be deprecated prior to removal
What is not covered by the tests are the abstract methods.
I think that the best thing is to setup coverage in such a way to ignore simple pass in the function. The methods removed (actually one, Also, do you want me to add docstrings for the internal methods? Or only public ones? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: deprecation I'm not sure. I would think it's not necessary as long as the methods are not exposed to users.
I would maybe change the title of the PR to something like "REF: simplify latex formatting" to make it clearer to other reviews what your goal is.
Otherwise looks okay to me. You'll need feedback from the core team to move forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally this is easier to follow. It feels slightly java-esque from the naming and builder pattern., but not too overboard. ok on the inheritance; almost always prefer composition, but this is not crazy. some comments.
pandas/io/formats/latex.py
Outdated
return self._nlevels if self.fmt.header else 0 | ||
|
||
@property | ||
def _ilevels(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would make more descriptive properties here; they don't really need to be private, but ok
e.g.
index_nlevels is more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it was not clear for me what these _ilevels
-like values were in the original code. So, I decided not to change the names. As you suggested, I renamed them.
Hopefully, this is reasonable:
_ilevels -> index_levels
_clevels -> column_levels
_nlevels -> header_levels -- that was difficult for me to understand what it means
pandas/io/formats/latex.py
Outdated
|
||
buf.write("\\toprule\n") | ||
@staticmethod | ||
def _escape_symbols(row): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally type every parameter both in and out
buf : string or file handle | ||
File path or object. If not specified, the result is returned as | ||
a string. | ||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a class method no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. At least not according to this https://refactoring.guru/design-patterns/builder/python/example
pandas/io/formats/latex.py
Outdated
self._get_index_format() + self._get_column_format_based_on_dtypes() | ||
) | ||
elif not isinstance(input_column_format, str): # pragma: no cover | ||
raise AssertionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this hit in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the original implementation.
I have just replaced AssertionError with ValueError and added several a test on this.
Problems -------- Could not figure out how to deal with the following. 1. Property setters 2. Inability of mypy to figure out that list can get iterator as argument rather than iterable only. pandas/io/formats/latex.py:44: error: Argument 1 to "list" has incompatible type "Iterator[Tuple[Any, ...]]"; expected "Iterable[List[str]]" [arg-type] pandas/io/formats/latex.py:544: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str") [assignment]
@jreback, is there anything else to be done from my side? It seems that I covered all the points you raised previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of comments, pls merge master and ping on green.
position=self.position, | ||
) | ||
|
||
def _select_builder(self) -> Type[TableBuilderAbstract]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would inline thisin the builder function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to extract this method to make mypy
happy.
If I refactor to this (which was originally there).
@property
def builder(self) -> TableBuilderAbstract:
kwargs = dict(
formatter=self.fmt,
column_format=self.column_format,
multicolumn=self.multicolumn,
multicolumn_format=self.multicolumn_format,
multirow=self.multirow,
caption=self.caption,
label=self.label,
position=self.position,
)
if self.longtable:
return LongTableBuilder(**kwargs)
if any([self.caption, self.label, self.position]):
return RegularTableBuilder(**kwargs)
return TabularBuilder(**kwargs)
then mypy
complains with the following error.
pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter" [arg-type]
pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]" [arg-type]
pandas/io/formats/latex.py:578: error: Argument 1 to "LongTableBuilder" has incompatible type "**Dict[str, object]"; expected "bool" [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter" [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]" [arg-type]
pandas/io/formats/latex.py:580: error: Argument 1 to "RegularTableBuilder" has incompatible type "**Dict[str, object]"; expected "bool" [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "DataFrameFormatter" [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "Optional[str]" [arg-type]
pandas/io/formats/latex.py:581: error: Argument 1 to "TabularBuilder" has incompatible type "**Dict[str, object]"; expected "bool" [arg-type]
Honestly, typing stuff makes life more complicated, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap typing does make things more complicated, no arg there :->
- TableBuilder -> GenericTableBuilder - Add doctests explaining what each table builder does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green.
Can anybody restart the task Windows py37_np16? |
thanks @ivanovmg for this, very nice! |
pandas/tests/io/formats/test_latex.py
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Refactor
to_latex
using polymorphism and the builder design pattern.Polymorphism
Previously there was a complicated logic in multiple methods
for either longtable or regular table.
This PR implements various builders:
RegularTableBuilder
LongTableBuilder
TabularBuilder
Selection of the appropriate builder is carried out in
LatexFormatter
,depending on the kwargs provided.
Each builder must implement construction of all the table's components:
This allows one eliminate complex logic throughout multiple methods.
Separate Concerns
Separated concerns: row formatting into string (separate class
RowStringConverter
)is carried out independently of the table builder.
Derived from
RowStringConverter
are two classes used to iterate over headers or body of the tables:RowHeaderIterator
RowBodyIterator
Extract Methods
Multiple methods were extracted to improve code readability.