Major Performance regression of df.groupby(..).indices

I'm experiencing major performance regressions with pandas=1.1.5 versus 1.1.3

Version 1.1.3:
```
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
 ... : import numpy as np
 ... : import pandas as pd
 ... : pd.__version__
Out[2]: '1.1.3'
In[3]: numel = 10000000
 ... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
 ... : start = time.time()
 ... : groupby_indices = df.groupby('b').indices
 ... : time.time() - start
Out[3]: 0.46085023880004883
```

Version 1.1.5:
```
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
 ... : import numpy as np
 ... : import pandas as pd
 ... : pd.__version__
Out[2]: '1.1.5'
In[3]: numel = 10000000
 ... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
 ... : start = time.time()
 ... : groupby_indices = df.groupby('b').indices
 ... : time.time() - start
Out[3]: 57.36550998687744
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Major Performance regression of df.groupby(..).indices #38495

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Major Performance regression of df.groupby(..).indices #38495

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions