-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Named aggregations with multiple columns #33306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @fpunny for the PR. can you please add tests and a whatsnew and update docstrings/examples etc so users would know how to use this functionality. |
@simonjayhawkins Hi simon, we created the whatsnew entry, added some test cases, and passed all of the checks. This being our first enhancement for Pandas, we're hoping to get some feedback on our implementation. We're fairly new to the codebase, hence we would love to learn a bit more about how we can improve - in hopes of merging this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current aggregation code is too complicated, so we need to simplify things a bit before looking to expand. see my comments above about removing complexity.
The ideal is actually to move all the aggregation code to pandas/core/aggregation.py but creating an Aggregator sub-class depending on the types, similar to what we do with apply.py
I would take that as a refactor, or the expedient of moving some of the logic to free-functions.
all as a pre-cursor PR.
@@ -339,8 +340,22 @@ def _aggregate(self, arg, *args, **kwargs): | |||
raise SpecificationError("nested renamer is not supported") | |||
elif isinstance(obj, ABCSeries): | |||
raise SpecificationError("nested renamer is not supported") | |||
elif isinstance(obj, ABCDataFrame) and k not in obj.columns: | |||
raise KeyError(f"Column '{k}' does not exist!") | |||
elif isinstance(obj, ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do a pre-cursor PR to move the current code to pandas/core/aggregation.py (so that you just call a function here). this is too complicated
if k not in obj.columns: | ||
# Check if list thingy | ||
try: | ||
keys = np.frombuffer(k, dtype=np.dtype("<U1")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to do things like this, use is_list_like
raise KeyError(f"Column '{key}' does not exist!") | ||
|
||
# Memorize operation | ||
deserialized_keys[k] = keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we keeping state?
This MR seems inactive, how complex is this issue to pick up as a first issue? @jreback |
this particular issue is a bit complex however my comments above about simplifying the existing code could be a good first issue |
Closing as I think this has gone stale, but @fpunny ping if you'd like to pick back up |
@WillAyd If OP does not have time, I can give this a try, not sure if I can make it, but I would like to see how far I come. |
Sounds good! Feel free to clone this and push a new PR |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff