You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I stumbled upon the following behavior of groubpy with categorical, which seems at least inconsistent with the way groupby usually operates.
When grouping on a string type column with sort=False, the order of the groups is the order in which the keys first appear in the column.
However, when grouping with a categorical column, the groups seem to be always ordered by the categorical, even when sort=False.
importpandasaspdd= {'foo': [10, 8, 5, 6, 4, 1, 7], 'bar': [10, 20, 30, 40, 50, 60, 70],
'baz': ['d', 'c', 'e', 'a', 'a', 'd', 'c']}
df=pd.DataFrame(d)
cat=pd.cut(df['foo'], np.linspace(0, 10, 5))
df['range'] =catgroups=df.groupby('range', sort=True)
# Expected behaviourresult=groups.agg('mean')
# Why are the categorical still sorted in this case ?groups2=df.groupby('range', sort=False)
result2=groups2.agg('mean')
# I would expect an output like this one: keep the order in which the groups# are first encounteredgroups3=df.groupby('baz', sort=False)
result3=groups3.agg('mean')
result
bar
foo
range
(0, 2.5]
60
1.0
(2.5, 5]
40
4.0
(5, 7.5]
55
6.5
(7.5, 10]
15
CC
result2
bar
foo
range
(0, 2.5]
60
1.0
(2.5, 5]
40
4.0
(5, 7.5]
55
6.5
(7.5, 10]
15
CC
result3
bar
foo
baz
d
35
5.5
c
45
7.5
e
30
5.0
a
45
9.0
pd.__version__Out[110]: '0.15.1'
Setting as_index=False does not change the presented bahavior.
The text was updated successfully, but these errors were encountered:
Currently this will only work naively, that is the Intervals that are returned are strings
e.g. (7.5, 10]. In 0.16 their is work being done on a Interval/IntervalIndex which will allow this to actually be sorted in a certain order. see #8707
as always pull-requests are welcome to work on these issues. (though reporting them makes good tests cases too!)
Hello everyone,
I stumbled upon the following behavior of groubpy with categorical, which seems at least inconsistent with the way groupby usually operates.
When grouping on a string type column with
sort=False
, the order of the groups is the order in which the keys first appear in the column.However, when grouping with a categorical column, the groups seem to be always ordered by the categorical, even when
sort=False
.result
result2
result3
Setting
as_index=False
does not change the presented bahavior.The text was updated successfully, but these errors were encountered: