Skip to content

ENH: Named aggregations with multiple columns #33306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

fpunny
Copy link

@fpunny fpunny commented Apr 5, 2020

@pep8speaks
Copy link

pep8speaks commented Apr 5, 2020

Hello @fpunny! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-08 02:24:07 UTC

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Apr 6, 2020

Thanks @fpunny for the PR. can you please add tests and a whatsnew and update docstrings/examples etc so users would know how to use this functionality.

@fpunny
Copy link
Author

fpunny commented Apr 8, 2020

@simonjayhawkins Hi simon, we created the whatsnew entry, added some test cases, and passed all of the checks. This being our first enhancement for Pandas, we're hoping to get some feedback on our implementation. We're fairly new to the codebase, hence we would love to learn a bit more about how we can improve - in hopes of merging this

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current aggregation code is too complicated, so we need to simplify things a bit before looking to expand. see my comments above about removing complexity.

The ideal is actually to move all the aggregation code to pandas/core/aggregation.py but creating an Aggregator sub-class depending on the types, similar to what we do with apply.py

I would take that as a refactor, or the expedient of moving some of the logic to free-functions.

all as a pre-cursor PR.

@@ -339,8 +340,22 @@ def _aggregate(self, arg, *args, **kwargs):
raise SpecificationError("nested renamer is not supported")
elif isinstance(obj, ABCSeries):
raise SpecificationError("nested renamer is not supported")
elif isinstance(obj, ABCDataFrame) and k not in obj.columns:
raise KeyError(f"Column '{k}' does not exist!")
elif isinstance(obj, ABCDataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do a pre-cursor PR to move the current code to pandas/core/aggregation.py (so that you just call a function here). this is too complicated

if k not in obj.columns:
# Check if list thingy
try:
keys = np.frombuffer(k, dtype=np.dtype("<U1"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to do things like this, use is_list_like

raise KeyError(f"Column '{key}' does not exist!")

# Memorize operation
deserialized_keys[k] = keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we keeping state?

@erfannariman
Copy link
Member

This MR seems inactive, how complex is this issue to pick up as a first issue? @jreback

@jreback
Copy link
Contributor

jreback commented May 25, 2020

this particular issue is a bit complex

however my comments above about simplifying the existing code could be a good first issue

@WillAyd
Copy link
Member

WillAyd commented Jun 26, 2020

Closing as I think this has gone stale, but @fpunny ping if you'd like to pick back up

@WillAyd WillAyd closed this Jun 26, 2020
@erfannariman
Copy link
Member

@WillAyd If OP does not have time, I can give this a try, not sure if I can make it, but I would like to see how far I come.

@WillAyd
Copy link
Member

WillAyd commented Jun 26, 2020

Sounds good! Feel free to clone this and push a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Named aggregations with multiple columns
8 participants