Skip to content

CLN/TYP: pandas/io/formats/excel.py #37862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Nov 24, 2020
Merged

Conversation

ivanovmg
Copy link
Member

  • xref TYP: investigate/fix ignored mypy errors #37715

  • tests added / passed

  • passes black pandas

  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

  • whatsnew entry

  • Handle two type ignore items in pandas/io/formats/excel.py

  • De-duplicate has_aliases variable by extracting property

  • Add typing for iterators yielding ExcelCell

  • Use named arguments for ExcelCell on some occasions (in two places de-duplicate by updating kwargs inplace)

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg for the PR. a couple of comments. will look in more detail shortly.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Nov 15, 2020
@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Nov 15, 2020
# "Union[Sequence[Optional[Hashable]], bool]"; expected
# "Sized" [arg-type]
if self._has_aliases:
assert isinstance(self.header, Sized)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to see this. for some reason my local mypy diagrees with the CI about whether its L601 or L602 that has the problem

Copy link
Member Author

@ivanovmg ivanovmg Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the problem was with len(self.header), so it should be line 602, but presumably one line was added on top since then.
EDIT: I guess I am not correct here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we add a runtime isinstance, this sort of duplicates the _has_aliases property.

since self.header is a sequence or bool, maybe _has_aliases could reflect that instead of the explicit checks for the 4 classes.

if we have a guarantee from the is_* call, a cast would be more appropriate than the assert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.
Used casting instead of asserting.
In _has_aliases check if not boolean rather than 4 sequence-like classes.
Is that what you meant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the types are correct for the header parameter of ExcelFormatter then this should be fine.

also cast is a no-op, so no harm using self.header = cast(Sequence, self.header) and the other self.header -> header changes wouldn't be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed the casting as you suggested.

@ivanovmg
Copy link
Member Author

It's green.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg. some more comments (after thinking about this some more!)

@property
def _has_aliases(self) -> bool:
"""Whether the aliases for column names are present."""
return not isinstance(self.header, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking some more, maybe best not to use bool.

we've never really had a discussion on use of bool in type annotations, but sometimes we are actually accepting any truthy value and other times we explicitly check if x is False or if x is True.

the header parameter of the docstring of ExcelFormatter should probably be updated to match. It currently is

    header : boolean or list of string, default True
        Write out column names. If a list of string is given it is
        assumed to be aliases for the column names

even though (tuple, list, np.ndarray, ABCIndex) is the current accepted types on master.

and the current mypy issue is that header is annotated as Union[Sequence[Label], bool]

perhaps use is_list_like here. (ignoring that a str is a Sequence)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced with is_list_like.
Also slightly updated docstring. Will it work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is still a discrepancy. Can a sequence of any hashable be passed or does it have to be a sequence of strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExcelFormatter is called only from Styler.to_excel and NDFrame.to_excel.
According to these docstrings

header : bool or list of str, default True

I also looked through the test base and noticed that header is either bool or some sequence of strings (on some occasions).

I am not aware of other use cases, when header is a sequence of any hashables rather than strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can leave as is for now.

The docstrings of public methods document the public api, the type annotations should normally be as permissible as possible for argument positions and the docstring and type annotations should match.

we don't want to change the public api without discussion, so can circle back to this.

col=coloffset + i + 1,
val=values[i],
style=self.header_style,
mergestart=lnum if spans[i] > 1 else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe spans[i] > 1 could first be assigned to another appropriately named variable to clarify intent and (slightly) reduce duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused what spans[i] > 1 means. I guess it is whether there is a multi-level col. So, I used the variable is_multilevel. Please let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are always multilevel in this function. maybe needs_merge_cells or spans_multiple_cells

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good names indeed. Thank you!

@jreback
Copy link
Contributor

jreback commented Nov 17, 2020

lgtm. @simonjayhawkins generally these typing PRs will defer to you so merge when ready.

@ivanovmg
Copy link
Member Author

@simonjayhawkins is there anything else expected from me on this PR?

ivanovmg and others added 3 commits November 24, 2020 18:08
@jreback jreback added the IO Excel read_excel, to_excel label Nov 24, 2020
@jreback jreback merged commit 7368b2a into pandas-dev:master Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

thanks @ivanovmg

@ivanovmg ivanovmg deleted the cleanup/excel branch December 3, 2020 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants