Skip to content

ENH: add ignore index to DataFrame / Series.sample #38594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 32 commits into from
Closed

ENH: add ignore index to DataFrame / Series.sample #38594

wants to merge 32 commits into from

Conversation

erfannariman
Copy link
Member

@erfannariman erfannariman commented Dec 20, 2020

@pep8speaks
Copy link

pep8speaks commented Dec 20, 2020

Hello @erfannariman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-31 17:42:29 UTC

@arw2019 arw2019 added Needs Review Enhancement Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Dec 20, 2020
@jorisvandenbossche jorisvandenbossche removed the Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply label Dec 21, 2020
@jreback jreback added this to the 1.3 milestone Dec 22, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good small comment. pls merge master (and some checks are failing)

@erfannariman
Copy link
Member Author

erfannariman commented Dec 31, 2020

Are you sure?

pandas/_typing.py:169: error: Cannot assign multiple types to name "RandomState" without an explicit "Type[...]" annotation [misc]

doesn't seem unrelated. See here for some help on that

Apparently this error is raised because I define RandomState twice in an if else. When I remove the if else, there is no such exception.

So this will be correct:

RandomState = Union[int, ArrayLike, np.random.Generator, np.random.RandomState]

While the following will raise the type annotation error.

if np_version_under1p17:
    RandomState = Union[int, ArrayLike, np.random.RandomState]
else:
    RandomState = Union[int, ArrayLike, np.random.Generator, np.random.RandomState]

@erfannariman
Copy link
Member Author

@MarcoGorelli @arw2019 @jreback for now I removed defining the RandomState type and set it to Any. After moving all numpy versions in the tests >= 1.17 we can add:

RandomState = Union[int, ArrayLike, np.random.Generator, np.random.RandomState]

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

you can just define the type conditionally based on numpy version

do not use Any (u can also remove the Any and open an issue)

@erfannariman
Copy link
Member Author

erfannariman commented Dec 31, 2020

you can just define the type conditionally based on numpy version

Can you check my comment above about: if np_version_under1p17:, that does not seem to work.

do not use Any (u can also remove the Any and open an issue)

You mean an issue that we still have to set typehint for RandomState I assume?


edit: opened ticket #38860 and removed Any @jreback

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

@MarcoGorelli MarcoGorelli self-requested a review January 3, 2021 09:50
@MarcoGorelli
Copy link
Member

Fixing the typing here isn't so easy - there's

weights = pd.Series(weights, dtype="float64")

which probably wants renaming, because on the right you have Union[str, ArrayLike] and you're trying to cast it to Series.

Then there's

weights = weights.fillna(0)

, where the return type of Series.fillna can be Series or None (depending on the value of inplace, but mypy doesn't know that). So I think either fillna needs an overload so that when inplace is Literal[False] the return type is Series, or we can just use cast (cc @simonjayhawkins )

@MarcoGorelli
Copy link
Member

Looks like the overload approach is taken in reset_index

pandas/pandas/core/frame.py

Lines 4841 to 4872 in 9159513

@overload
# https://github.com/python/mypy/issues/6580
# Overloaded function signatures 1 and 2 overlap with incompatible return types
def reset_index( # type: ignore[misc]
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[False] = ...,
col_level: Hashable = ...,
col_fill: Hashable = ...,
) -> DataFrame:
...
@overload
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[True] = ...,
col_level: Hashable = ...,
col_fill: Hashable = ...,
) -> None:
...
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = None,
drop: bool = False,
inplace: bool = False,
col_level: Hashable = 0,
col_fill: Hashable = "",
) -> Optional[DataFrame]:

@erfannariman
Copy link
Member Author

Fixing the typing here isn't so easy - there's

weights = pd.Series(weights, dtype="float64")

which probably wants renaming, because on the right you have Union[str, ArrayLike] and you're trying to cast it to Series.

Then there's

weights = weights.fillna(0)

, where the return type of Series.fillna can be Series or None (depending on the value of inplace, but mypy doesn't know that). So I think either fillna needs an overload so that when inplace is Literal[False] the return type is Series, or we can just use cast (cc @simonjayhawkins )

Thanks for having a in-depth look at this. This turned out to be way more complex and time consuming than I thought. Would it be an idea to open a new ticket for overload on fillna?

@MarcoGorelli
Copy link
Member

Sounds reasonable to me. If this is ready apart from that, maybe it's OK to leave weights untyped here for this PR, and take care of it elsewhere?

@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@erfannariman if you can merge master and see if can get to green (we can defer the typing on the random if that's causing an issue)

@erfannariman
Copy link
Member Author

@erfannariman if you can merge master and see if can get to green (we can defer the typing on the random if that's causing an issue)

Yes sure, will do this weekend, this one is basically ready to go except the typing complexity.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@MarcoGorelli
Copy link
Member

Thanks for having a in-depth look at this. This turned out to be way more complex and time consuming than I thought. Would it be an idea to open a new ticket for overload on fillna?

This is done now, if you merge upstream/master then the typing should be a lot simpler here

@simonjayhawkins
Copy link
Member

@erfannariman can you merge master #38594 (comment)

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone May 25, 2021
@simonjayhawkins
Copy link
Member

@erfannariman Thanks for the PR. closing as stale. ping if you want to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add ignore_index to DataFrame / Series.sample
7 participants