Skip to content

Added __getattr__ in DataFrameGroupBy #457

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 7, 2022
Merged

Added __getattr__ in DataFrameGroupBy #457

merged 4 commits into from
Dec 7, 2022

Conversation

ramvikrams
Copy link
Contributor

@ramvikrams ramvikrams commented Dec 1, 2022

  • Closes DataFrameGroupBy attribute access for columns #439 (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value
    the error is saying expected Series but got SeriesGroupBy, I can't understand it

Sorry sir forgot to add the correct commit file name.

@@ -359,3 +359,4 @@ class DataFrameGroupBy(GroupBy):
ascending: bool = ...,
dropna: bool = ...,
) -> Series[float]: ...
def __getattr__(self, name: str) -> Series: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SeriesGroupBy

>>> df = pd.DataFrame(
...         {
...             'C': [1, 5, 5, 2, 5, 5],
...             'D': [2.0, 5.0, 8.0, 1.0, 2.0, 9.0],
...         }
...     )
>>> gb = df.groupby('C').D
>>> gb
<pandas.core.groupby.generic.SeriesGroupBy object at 0x000002E6ED25CA00>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done sir

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SeriesGroupBy

}
)
gb = df.groupby("C").D
check(assert_type(gb, SeriesGroupBy), SeriesGroupBy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer a test that does not import SeriesGroupBy because that is not a documented class.

Borrowing from lines 669, but modified to test the getattr:

    df = pd.DataFrame(
        data={"col1": [1, 1, 2], "col2": [3, 4, 5], "col3": [0, 1, 0], 0: [-1, -1, -1]}
    )
    check(assert_type(df.groupby("col1").col3.agg(min), pd.Series), pd.Series)
    check(
        assert_type(df.groupby("col1").col3.agg([min, max]), pd.DataFrame),
        pd.DataFrame,
    )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll do that sir

Copy link
Contributor Author

@ramvikrams ramvikrams Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sir I am failing to understand that even if we write tests check(assert_type((df.groupby("C").__getattr__("D")), pd.Series), pd.Series) like this, how will it change because the return type of __getattr__ will still be SeriesGroupBy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that you are not testing __getattr__() directly, but indirectly, because you always use a method in SeriesGroupBy in real code. The example I gave is calling agg() on the SeriesGroupBy result. Both df.groupby("col1").col3 and df.groupby("col1")["col3"] return a SeriesGroupBy object.

See the test test_types_groupby_agg for an example where we use the second form, which is where I copied this from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sir I read the test_types_groupby_agg but here when I give the str to __getattr__() the error is there about expected series but got seriesgroupby pushing a commit so you can also check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement the test as I wrote above.

Copy link
Contributor Author

@ramvikrams ramvikrams Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try it this way it shows AttributeError: 'SeriesGroupBy' object has no attribute 'col2' but this error should not be reported

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow instructions precisely

@@ -359,3 +359,4 @@ class DataFrameGroupBy(GroupBy):
ascending: bool = ...,
dropna: bool = ...,
) -> Series[float]: ...
def __getattr__(self, name: str) -> Series: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be SeriesGroupBy

"col3": [9,8,7,5,6,1]
}
)
check(assert_type(df.groupby("col1").__getattr__("col3"), pd.Series), pd.Series)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be the same as #457 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ramvikrams

@Dr-Irv Dr-Irv merged commit 58416f2 into pandas-dev:main Dec 7, 2022
@ramvikrams
Copy link
Contributor Author

thanks @ramvikrams

Thanks sir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrameGroupBy attribute access for columns
2 participants