You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using pandas groupby with rolling, flag sort in groupby, sort order of group keys is changed. This occurs when we have more than 1 group key and values in key 2 are not equal for all values of key 1.
In example above we create dataframe with 4 columns, machine_id, product_id,sample_date and sales_qty. Dataframe consist of 2 different machine and 9 different products. Dataframe is sorted by: machine_id, product_id,sample_date. We want to calculate rolling mean for each machine_id-product_id group. However, if we remove some products for machine with lower machine_id (machine_id which is prior in sort, in this example 546) and then calculate rolling mean with flag sort= False in groupby order of group keys in result is actually changed. product_ids which were removed from machine with machine_id = 546 (product_id = 526,529,532) are at the end of dataframe.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Simpler reproducible example
Reproducible Example
Issue Description
When using pandas groupby with rolling, flag sort in groupby, sort order of group keys is changed. This occurs when we have more than 1 group key and values in key 2 are not equal for all values of key 1.
In example above we create dataframe with 4 columns, machine_id, product_id, sample_date and sales_qty. Dataframe consist of 2 different machine and 9 different products. Dataframe is sorted by: machine_id, product_id, sample_date. We want to calculate rolling mean for each machine_id-product_id group. However, if we remove some products for machine with lower machine_id (machine_id which is prior in sort, in this example 546) and then calculate rolling mean with flag
sort= False
in groupby order of group keys in result is actually changed. product_ids which were removed from machine with machine_id = 546 (product_id = 526,529,532) are at the end of dataframe.Expected Behavior
Because flag
sort = False
in groupby expected behaviour is that result will have same sort as the original dataframe, i.e. expected result is:Installed Versions
INSTALLED VERSIONS
commit : 91111fd
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-56-generic
Version : #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.3
numba : 0.56.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 10.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : 1.4.42
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
The text was updated successfully, but these errors were encountered: