Skip to content

ENH: allow 'pad', 'backfill' and 'cumcount' in groupby.transform #31269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fujiaxiang opened this issue Jan 24, 2020 · 13 comments · Fixed by #34453
Closed

ENH: allow 'pad', 'backfill' and 'cumcount' in groupby.transform #31269

fujiaxiang opened this issue Jan 24, 2020 · 13 comments · Fixed by #34453
Labels
Milestone

Comments

@fujiaxiang
Copy link
Member

Summary

Currently on master, groupby.transform with func equals one of ('fillna', 'pad', 'backfill', 'ffill', etc.) yields wrong results (See #30918).

With updates from PR (#31101), the incorrect outputs are fixed. However, when func is one of ('pad', 'backfill', 'cumcout'), groupby.transform then raises AttributeError similar to what's reported in #27472.

Code Sample

# On branch of PR31101
>>> pd.__version__
'1.0.0rc0+162.g56c70234e'

>> df = pd.DataFrame(
...     {
...         "A": ["foo", "foo", "foo", "foo", "bar", "bar", "baz"],
...         "B": [1, 2, np.nan, 3, 3, np.nan, 4],
...     }
... )
>>> df
     A    B
0  foo  1.0
1  foo  2.0
2  foo  NaN
3  foo  3.0
4  bar  3.0
5  bar  NaN
6  baz  4.0

>>> df.groupby("A").pad()  # This is ok
     B
0  1.0
1  2.0
2  2.0
3  3.0
4  3.0
5  3.0
6  4.0

>>> df.groupby("A").transform("pad")  # This raises AttributeError
Traceback (most recent call last):
...
...
AttributeError: 'Series' object has no attribute 'pad'

>>> df.groupby("A").transform("cumcount")
Traceback (most recent call last):
...
...
AttributeError: 'Series' object has no attribute 'cumcount'

Ideally we want to allow all of the above 3 functions in groupby.transform

@punndcoder28
Copy link
Contributor

Hey! I would like to work on this. Could you guide me?

@fujiaxiang
Copy link
Member Author

@punndcoder28 The relevant PR #31101 has not been merged yet. After that is merged, you can try these three functions on master, and see what you can do. You can also take a look at #27472 to see what has been discussed.

@punndcoder28
Copy link
Contributor

Ok. I will keep an eye out for PR #31101 and start as soon as it is merged.

@punndcoder28
Copy link
Contributor

Hey @fujiaxiang. I tried out the functions on master now that PR #31101 got merged. I am getting the AttributeError only for cumcount.

@fujiaxiang
Copy link
Member Author

Are you sure? I just tried again on current master and I got AttributeError for all 3. Please note that I'm talking about df.groupby("A").transform("pad"), and not df.groupby("A").pad().

@punndcoder28
Copy link
Contributor

punndcoder28 commented Jan 26, 2020

This is the code I am executing:

>>> df1 = pd.DataFrame(
...     {
...             "A": ["foo", "foo", "foo", "foo", "bar", "bar", "baz"], 
...             "B": [1,2,np.nan, 3,3,np.nan,4]
...     }
... )
>>> df1.groupby("A").transform("pad")
     B
0  2.0
1  2.0
2  2.0
3  2.0
4  1.0
5  1.0
6  2.0
>>> df1.groupby("A").transform("cumcount")
Traceback (most recent call last):
  ...
AttributeError: ("'Series' object has no attribute 'cumcount'", 'occurred at index B')
>>> df1.groupby("A").transform("backfill")
     B
0  3.0
1  3.0
2  3.0
3  3.0
4  1.0
5  1.0
6  2.0

Am I missing something?

@fujiaxiang
Copy link
Member Author

I think you are not on the latest master. You can see that the result you showed for both pad and backfill are not correct, and are inline with the behavior prior to PR #31101. Could you pull the latest master first?

@punndcoder28
Copy link
Contributor

Yes, my mistake. I built from the latest master and I am getting the errors.

@punndcoder28
Copy link
Contributor

Hey @fujiaxiang . I went through the code but couldn't figure out where to make changes. Since you had worked on a similar issue could you give me some pointers?

@fujiaxiang
Copy link
Member Author

As a general guide you should try looking at the implementation of groupby.transform for other similar methods and mimic those. We already have implementations for these functions somewhere in the code so this shouldn't be too difficult to do.

@fujiaxiang
Copy link
Member Author

@punndcoder28 are you still interested? If not, I will take a look at it myself.

@punndcoder28
Copy link
Contributor

Yes I am still interested. I am looking that the definitions of similar functions like you had advised.

@punndcoder28
Copy link
Contributor

Hey @fujiaxiang . I was unable to find the definitions for other similar functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants