ENH: Named aggregations with multiple columns #33306

fpunny · 2020-04-05T19:28:13Z

closes Named aggregations with multiple columns #29268
tests added
- tests passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2020-04-05T19:28:17Z

Hello @fpunny! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-08 02:24:07 UTC

simonjayhawkins · 2020-04-06T10:29:50Z

Thanks @fpunny for the PR. can you please add tests and a whatsnew and update docstrings/examples etc so users would know how to use this functionality.

…9268

…sue-29268

fpunny · 2020-04-08T00:30:35Z

@simonjayhawkins Hi simon, we created the whatsnew entry, added some test cases, and passed all of the checks. This being our first enhancement for Pandas, we're hoping to get some feedback on our implementation. We're fairly new to the codebase, hence we would love to learn a bit more about how we can improve - in hopes of merging this

jreback

current aggregation code is too complicated, so we need to simplify things a bit before looking to expand. see my comments above about removing complexity.

The ideal is actually to move all the aggregation code to pandas/core/aggregation.py but creating an Aggregator sub-class depending on the types, similar to what we do with apply.py

I would take that as a refactor, or the expedient of moving some of the logic to free-functions.

all as a pre-cursor PR.

jreback · 2020-04-10T21:18:39Z

pandas/core/base.py

@@ -339,8 +340,22 @@ def _aggregate(self, arg, *args, **kwargs):
                        raise SpecificationError("nested renamer is not supported")
                    elif isinstance(obj, ABCSeries):
                        raise SpecificationError("nested renamer is not supported")
-                    elif isinstance(obj, ABCDataFrame) and k not in obj.columns:
-                        raise KeyError(f"Column '{k}' does not exist!")
+                    elif isinstance(obj, ABCDataFrame):


can you do a pre-cursor PR to move the current code to pandas/core/aggregation.py (so that you just call a function here). this is too complicated

jreback · 2020-04-10T21:18:57Z

pandas/core/base.py

+                        if k not in obj.columns:
+                            # Check if list thingy
+                            try:
+                                keys = np.frombuffer(k, dtype=np.dtype("<U1"))


we don't want to do things like this, use is_list_like

jreback · 2020-04-10T21:19:07Z

pandas/core/base.py

+                                    raise KeyError(f"Column '{key}' does not exist!")
+
+                                # Memorize operation
+                                deserialized_keys[k] = keys


why are we keeping state?

erfannariman · 2020-05-25T19:23:19Z

This MR seems inactive, how complex is this issue to pick up as a first issue? @jreback

jreback · 2020-05-25T20:55:01Z

this particular issue is a bit complex

however my comments above about simplifying the existing code could be a good first issue

WillAyd · 2020-06-26T15:42:34Z

Closing as I think this has gone stale, but @fpunny ping if you'd like to pick back up

erfannariman · 2020-06-26T18:31:35Z

@WillAyd If OP does not have time, I can give this a try, not sure if I can make it, but I would like to see how far I come.

WillAyd · 2020-06-26T18:40:05Z

Sounds good! Feel free to clone this and push a new PR

Finished implementation of aggregation feature

9285b6c

simonjayhawkins added API Design Enhancement Groupby labels Apr 6, 2020

fpunny and others added 11 commits April 6, 2020 18:31

Merge branch 'master' of github.com:CSCD01/pandas-team24 into issue-2…

cd037e0

…9268

Fixed attributeError catch

37522f0

Updated documentation to reflect multi column named aggregation

96346f9

Changes to documentation

ccbc403

Fixed lint issue

046e9cb

fixed zip

1919892

Add test cases to groupby aggregate

6817d26

Fixed implementation

35d9129

Merge branch 'issue-29268' of github.com:CSCD01/pandas-team24 into is…

22499b8

…sue-29268

Run linter

7016e67

Added whatsnew entry

883a601

William Granados added 2 commits April 7, 2020 21:54

changed missing key test

f599eb5

fixed failing pip8 error

2b08514

jreback requested changes Apr 10, 2020

View reviewed changes

WillAyd closed this Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Named aggregations with multiple columns #33306

ENH: Named aggregations with multiple columns #33306

fpunny commented Apr 5, 2020 •

edited

Loading

pep8speaks commented Apr 5, 2020 •

edited

Loading

simonjayhawkins commented Apr 6, 2020 •

edited

Loading

fpunny commented Apr 8, 2020

jreback left a comment

jreback Apr 10, 2020

jreback Apr 10, 2020

jreback Apr 10, 2020

erfannariman commented May 25, 2020

jreback commented May 25, 2020

WillAyd commented Jun 26, 2020

erfannariman commented Jun 26, 2020

WillAyd commented Jun 26, 2020

ENH: Named aggregations with multiple columns #33306

ENH: Named aggregations with multiple columns #33306

Conversation

fpunny commented Apr 5, 2020 • edited Loading

pep8speaks commented Apr 5, 2020 • edited Loading

Comment last updated at 2020-04-08 02:24:07 UTC

simonjayhawkins commented Apr 6, 2020 • edited Loading

fpunny commented Apr 8, 2020

jreback left a comment

Choose a reason for hiding this comment

jreback Apr 10, 2020

Choose a reason for hiding this comment

jreback Apr 10, 2020

Choose a reason for hiding this comment

jreback Apr 10, 2020

Choose a reason for hiding this comment

erfannariman commented May 25, 2020

jreback commented May 25, 2020

WillAyd commented Jun 26, 2020

erfannariman commented Jun 26, 2020

WillAyd commented Jun 26, 2020

fpunny commented Apr 5, 2020 •

edited

Loading

pep8speaks commented Apr 5, 2020 •

edited

Loading

simonjayhawkins commented Apr 6, 2020 •

edited

Loading