-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Feature: Add "random" rank in the group for DataFrame.rank and similar functions. #31051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
-1 on this you simply want random numbers between min and max w/o replacement np.random.randn already does this |
@jreback But I don't want any number between min and max for a certain group. I finally found a way to solve it by using np.random.shuffle. In [1]: import numpy as np
...: import pandas as pd
...: df = pd.DataFrame([1, 2, 1, 3, 3, 2, 1, 3], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
...:
...: full = pd.concat([df, df.rank(method='first'), df.rank(method='min'), df.rank(method='max'), df.rank(method='dense'), df], axis=1, sort=False)
...: full.columns = ['value', 'first', 'min', 'max', 'dense', 'ranked']
...:
...: print(full)
...:
...: full_group = full.groupby('dense')
...:
...:
value first min max dense ranked
a 1 1.0 1.0 3.0 1.0 1
b 2 4.0 4.0 5.0 2.0 2
c 1 2.0 1.0 3.0 1.0 1
d 3 6.0 6.0 8.0 3.0 3
e 3 7.0 6.0 8.0 3.0 3
f 2 5.0 4.0 5.0 2.0 2
g 1 3.0 1.0 3.0 1.0 1
h 3 8.0 6.0 8.0 3.0 3
In [2]: for region_idxs in full_group.indices.values():
...: tie_scores_shuffle = full.iloc[region_idxs, full.columns.get_loc('first')]
...: if tie_scores_shuffle._is_view:
...: tie_scores_shuffle = tie_scores_shuffle.copy()
...: np.random.shuffle(tie_scores_shuffle)
...: full.iloc[region_idxs, full.columns.get_loc('ranked')] = tie_scores_shuffle
...:
...: print(full)
...:
...:
value first min max dense ranked
a 1 1.0 1.0 3.0 1.0 3.0
b 2 4.0 4.0 5.0 2.0 5.0
c 1 2.0 1.0 3.0 1.0 2.0
d 3 6.0 6.0 8.0 3.0 7.0
e 3 7.0 6.0 8.0 3.0 6.0
f 2 5.0 4.0 5.0 2.0 4.0
g 1 3.0 1.0 3.0 1.0 1.0
h 3 8.0 6.0 8.0 3.0 8.0
In [3]: for region_idxs in full_group.indices.values():
...: tie_scores_shuffle = full.iloc[region_idxs, full.columns.get_loc('first')]
...: if tie_scores_shuffle._is_view:
...: tie_scores_shuffle = tie_scores_shuffle.copy()
...: np.random.shuffle(tie_scores_shuffle)
...: full.iloc[region_idxs, full.columns.get_loc('ranked')] = tie_scores_shuffle
...:
...: print(full)
...:
...:
value first min max dense ranked
a 1 1.0 1.0 3.0 1.0 1.0
b 2 4.0 4.0 5.0 2.0 5.0
c 1 2.0 1.0 3.0 1.0 2.0
d 3 6.0 6.0 8.0 3.0 7.0
e 3 7.0 6.0 8.0 3.0 8.0
f 2 5.0 4.0 5.0 2.0 4.0
g 1 3.0 1.0 3.0 1.0 3.0
h 3 8.0 6.0 8.0 3.0 6.0 |
you can just do something like def f(x): in any event this is out of scope for a method in pandas |
The following does what I want:
|
Feature: Add "random" rank in the group for DataFrame.rank and similar functions.
Code Sample, a copy-pastable example if possible
It would be nice if it could be implemented in pandas as I have huge dataframes (100G) for which I need this feature.
In the worst case, is there a way to do something like this with multiple pandas commands?
A slightly related issue: #9481
The text was updated successfully, but these errors were encountered: