Skip to content

Shortcut functions in transform are not grouped #19354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DiegoAlbertoTorres opened this issue Jan 22, 2018 · 3 comments · Fixed by #41697
Closed

Shortcut functions in transform are not grouped #19354

DiegoAlbertoTorres opened this issue Jan 22, 2018 · 3 comments · Fixed by #41697
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@DiegoAlbertoTorres
Copy link
Contributor

$ ipython3
Python 3.6.3 (default, Oct  3 2017, 21:45:48) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A': [1, 1, 2], 'B': [1, 2, 3]})

In [3]: df.groupby('A').transform('rank')
Out[3]: 
   B
0  1
1  1
2  2

In [4]: df.groupby('A').transform(lambda x: x.rank())
Out[4]: 
     B
0  1.0
1  2.0
2  1.0

Problem description

It seems like the functions provided through shorthand strings, such as 'rank', do not obey the groupings of data. For example, one would expect Out[3] and Out[4] to be the same in the code above. Instead, it seems like using .transform('rank') doesn't actually obey the grouping and just ranks independently. I've reproduced this with larger dataframes as well.

@chris-b1
Copy link
Contributor

Thanks for the report. It's not exactly that it ignores the grouping, but instead is falling through a fast-path, that assumes rank() is an aggregation - here:

def _transform_fast(self, result, obj):

I think a quick-fix would be to add 'rank' to the list of whitelisted transforms here, but might need to rethink/refactor this whole dispatch a bit

_cython_transforms = frozenset(['cumprod', 'cumsum', 'shift',

Also xref to #15779

@chris-b1 chris-b1 added this to the Next Major Release milestone Jan 22, 2018
@jreback
Copy link
Contributor

jreback commented Jan 22, 2018

xref #14741 and #11759

@DiegoAlbertoTorres welcome for a fix here!

@mroeschke
Copy link
Member

Looks like these match on master now. Could use a test

In [99]: In [1]: import pandas as pd
    ...:
    ...: In [2]: df = pd.DataFrame({'A': [1, 1, 2], 'B': [1, 2, 3]})
    ...:
    ...: In [3]: df.groupby('A').transform('rank')
Out[99]:
     B
0  1.0
1  2.0
2  1.0

In [100]: In [4]: df.groupby('A').transform(lambda x: x.rank())
     ...:
Out[100]:
     B
0  1.0
1  2.0
2  1.0

In [101]: df
Out[101]:
   A  B
0  1  1
1  1  2
2  2  3

In [102]: pd.__version__
Out[102]: '1.1.0.dev0+1974.g0159cba6e'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Apply Apply, Aggregate, Transform, Map Bug Groupby labels Jun 28, 2020
@mroeschke mroeschke mentioned this issue May 28, 2021
10 tasks
@mroeschke mroeschke modified the milestones: Contributions Welcome, 1.3 May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
5 participants