Skip to content

API: Add pandas.api.typing #48578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Apr 17, 2023
Merged

Conversation

rhshadrach
Copy link
Member

@rhshadrach
Copy link
Member Author

Marking this as a draft until the discussion in the corresponding issue is resolved.

@rhshadrach rhshadrach marked this pull request as draft September 17, 2022 01:06
@rhshadrach rhshadrach marked this pull request as ready for review September 28, 2022 22:48
@phofl
Copy link
Member

phofl commented Oct 7, 2022

Can you merge main?

@rhshadrach rhshadrach marked this pull request as draft October 10, 2022 03:26
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
@rhshadrach rhshadrach marked this pull request as ready for review October 23, 2022 14:10
@rhshadrach
Copy link
Member Author

@phofl - done.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 18, 2022
@rhshadrach rhshadrach added Typing type annotations, mypy/pyright type checking and removed Groupby Stale labels Jan 2, 2023
@rhshadrach rhshadrach changed the title API: Add DataFrameGroupBy and SeriesGroupBy to top level API: Add pandas.api.typing Jan 23, 2023
@rhshadrach
Copy link
Member Author

@simonjayhawkins - friendly ping

…pby_in_api

� Conflicts:
�	doc/source/whatsnew/v2.1.0.rst
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach lgtm pending green. and conflicts to resolve

I think this could be coupled to #27522, so as a follow-up or here if others think that the coupling is strong (e.g. need for api namespace)

…pby_in_api

� Conflicts:
�	doc/source/whatsnew/v2.1.0.rst
@rhshadrach
Copy link
Member Author

Thanks @simonjayhawkins for the review

I think this could be coupled to #27522, so as a follow-up or here if others think that the coupling is strong (e.g. need for api namespace)

The issue with the Styler is still a blocker here, and I plan to address it in a followup PR. Last I looked, it would require a bit of refactoring. Because of this and the amount of changes deprecating pandas.core would take, I'm quite resistant to including such work here.

@simonjayhawkins
Copy link
Member

I'm quite resistant to including such work here.

sure. if we deprecate core it would enforce the usage of pandas[.api].typing but I guess changing the api docs for the public classes currently in core is a step in the right direction.

@rhshadrach
Copy link
Member Author

@simonjayhawkins - friendly ping

@rhshadrach
Copy link
Member Author

@Dr-Irv - friendly ping.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 9, 2023

@rhshadrach I may be being a bit pedantic here, but now the docs don't correspond to what happens at runtime, and maybe you should add tests for this.

Consider this:

>>> import pandas as pd
>>> df=pd.DataFrame({"x":[i for i in range(10)]}, index=pd.date_range("230101", periods=10))
>>> df
            x
2001-01-23  0
2001-01-24  1
2001-01-25  2
2001-01-26  3
2001-01-27  4
2001-01-28  5
2001-01-29  6
2001-01-30  7
2001-01-31  8
2001-02-01  9
>>> df.resample("2d")
<pandas.core.resample.DatetimeIndexResampler object at 0x000002425662CF40>
>>> type(df.rolling(2))
<class 'pandas.core.window.rolling.Rolling'>

So the docs are saying that df.resample() is returning a pandas.api.typing.Resampler, but that's not the user experience. The docs say that df.rolling() returns a pandas.api.typing.Rolling, but that's not the user experience. I think the same is true for the other types as well.

Now it is probably the case that the type pandas.api.typing.Resampler is the same as pandas.core.resample.DatetimeIndex.Resampler and that the type pandas.api.typingRolling is the same as pandas.core.window.rolling.Rolling, but shouldn't we have tests for each of these types you created to make sure that the types we document that we return do correspond to the actual types returned?

I'm probably suggesting this because when we change things in pandas-stubs to correspond to this change, we will end up creating those tests there (in a slightly different form)

@rhshadrach
Copy link
Member Author

Thanks @Dr-Irv - it's an interesting point. But is there something that can be done here? For example, within DataFrame.groupby replacing

from pandas.core.groupby.generic import DataFrameGroupBy

with

from pandas.api.typing import DataFrameGroupBy

I still see

print(type(pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]}).groupby('a')))
# <class 'pandas.core.groupby.generic.DataFrameGroupBy'>

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 12, 2023

it's an interesting point. But is there something that can be done here?

I don't think we want to through the effort of changing the returned classes, but I think you can create a test like this:

gb = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]}).groupby('a')
assert isinstance(gb, pd.api.typing.DataFrameGroupBy)

That would at least test that you've correctly mapped the types in the typing module with the actual types.

from pandas.io.stata import StataReader

__all__ = [
"DataFrameGroupBy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commenting that in a follow up PR it might make sense for these objects to have their own API Reference page?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - agreed. I can open an issue to track.

@rhshadrach
Copy link
Member Author

That would at least test that you've correctly mapped the types in the typing module with the actual types.

But there is no mapping here, yea? We import the object and then include said object in __all__. If there was a mapping, I would agree we should test it. As it stands, I think the only thing this would test is if there were two e.g. DataFrameGroupBy, that we've imported the right one. But if there were indeed two different DataFrameGroupBy objects, we'd have bigger problems I think 😆

I don't think such a test is warranted, and I'll also note we don't have tests like this for any existing pandas.api objects.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Apr 13, 2023

But there is no mapping here, yea? We import the object and then include said object in __all__. If there was a mapping, I would agree we should test it. As it stands, I think the only thing this would test is if there were two e.g. DataFrameGroupBy, that we've imported the right one. But if there were indeed two different DataFrameGroupBy objects, we'd have bigger problems I think 😆

I don't think such a test is warranted, and I'll also note we don't have tests like this for any existing pandas.api objects.

Maybe "mapping" is the wrong word.

I'm thinking of the following. Suppose that someone removed one of the types from pandas/api/typing/__init__.py or that you imported a class in that file, but forgot to include it in __all__. The test would capture these kind of errors.

Just because we don't have tests for the other pandas.api objects, doesn't mean we can't start now....

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

@rhshadrach
Copy link
Member Author

I'm thinking of the following. Suppose that someone removed one of the types from pandas/api/typing/__init__.py or that you imported a class in that file, but forgot to include it in __all__. The test would capture these kind of errors.

Ah, I was under the false impression that this was being done. Agreed we should be testing this! I've added the ones for api.typing and plan to do the others in a followup (I've opened #52688 to track).

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach . Let me know if I should do the merge, or if someone else should do it.

…pby_in_api

� Conflicts:
�	doc/source/whatsnew/v2.1.0.rst
@mroeschke mroeschke merged commit 23f74d3 into pandas-dev:main Apr 17, 2023
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the groupby_in_api branch April 18, 2023 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Add pandas.api.typing
5 participants