-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN/TYP: pandas/io/formats/excel.py #37862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanovmg for the PR. a couple of comments. will look in more detail shortly.
pandas/io/formats/excel.py
Outdated
# "Union[Sequence[Optional[Hashable]], bool]"; expected | ||
# "Sized" [arg-type] | ||
if self._has_aliases: | ||
assert isinstance(self.header, Sized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad to see this. for some reason my local mypy diagrees with the CI about whether its L601 or L602 that has the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the problem was with len(self.header), so it should be line 602, but presumably one line was added on top since then.
EDIT: I guess I am not correct here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once we add a runtime isinstance, this sort of duplicates the _has_aliases property.
since self.header is a sequence or bool, maybe _has_aliases could reflect that instead of the explicit checks for the 4 classes.
if we have a guarantee from the is_* call, a cast would be more appropriate than the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Used casting instead of asserting.
In _has_aliases
check if not boolean rather than 4 sequence-like classes.
Is that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the types are correct for the header parameter of ExcelFormatter then this should be fine.
also cast is a no-op, so no harm using self.header = cast(Sequence, self.header)
and the other self.header -> header changes wouldn't be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed the casting as you suggested.
It's green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanovmg. some more comments (after thinking about this some more!)
pandas/io/formats/excel.py
Outdated
@property | ||
def _has_aliases(self) -> bool: | ||
"""Whether the aliases for column names are present.""" | ||
return not isinstance(self.header, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking some more, maybe best not to use bool.
we've never really had a discussion on use of bool in type annotations, but sometimes we are actually accepting any truthy value and other times we explicitly check if x is False or if x is True.
the header parameter of the docstring of ExcelFormatter should probably be updated to match. It currently is
header : boolean or list of string, default True
Write out column names. If a list of string is given it is
assumed to be aliases for the column names
even though (tuple, list, np.ndarray, ABCIndex) is the current accepted types on master.
and the current mypy issue is that header is annotated as Union[Sequence[Label], bool]
perhaps use is_list_like here. (ignoring that a str is a Sequence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced with is_list_like
.
Also slightly updated docstring. Will it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is still a discrepancy. Can a sequence of any hashable be passed or does it have to be a sequence of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExcelFormatter
is called only from Styler.to_excel
and NDFrame.to_excel
.
According to these docstrings
header : bool or list of str, default True
I also looked through the test base and noticed that header is either bool or some sequence of strings (on some occasions).
I am not aware of other use cases, when header is a sequence of any hashables rather than strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can leave as is for now.
The docstrings of public methods document the public api, the type annotations should normally be as permissible as possible for argument positions and the docstring and type annotations should match.
we don't want to change the public api without discussion, so can circle back to this.
pandas/io/formats/excel.py
Outdated
col=coloffset + i + 1, | ||
val=values[i], | ||
style=self.header_style, | ||
mergestart=lnum if spans[i] > 1 else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe spans[i] > 1
could first be assigned to another appropriately named variable to clarify intent and (slightly) reduce duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused what spans[i] > 1
means. I guess it is whether there is a multi-level col. So, I used the variable is_multilevel
. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are always multilevel in this function. maybe needs_merge_cells
or spans_multiple_cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good names indeed. Thank you!
lgtm. @simonjayhawkins generally these typing PRs will defer to you so merge when ready. |
@simonjayhawkins is there anything else expected from me on this PR? |
Co-authored-by: Simon Hawkins <[email protected]>
Co-authored-by: Simon Hawkins <[email protected]>
thanks @ivanovmg |
xref TYP: investigate/fix ignored mypy errors #37715
tests added / passed
passes
black pandas
passes
git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
Handle two type ignore items in
pandas/io/formats/excel.py
De-duplicate
has_aliases
variable by extracting propertyAdd typing for iterators yielding
ExcelCell
Use named arguments for
ExcelCell
on some occasions (in two places de-duplicate by updating kwargs inplace)