Groupby on empty frame different results based on dtypes. #20888

amanhanda · 2018-04-30T21:36:51Z

Code Sample, a copy-pastable example if possible

In [117]: import pandas as pd

In [118]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":[4]})

In [119]: group_keys = ["a", "b", "c"]

In [120]: g = df[df.a==2].groupby(group_keys)

In [121]: g.first().index
Out[121]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])
# Change the dtype for "d"
In [122]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":["d"]})

In [123]: g = df[df.a==2].groupby(group_keys)

In [124]: g.first().index
Out[124]: Index([], dtype='object')

# Version 0.18
In [36]: g.first().index
Out[36]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])

Problem description

The groupby should return multi-index in both cases. In one case when we have all int64 data, the resultant groupby on an empty frame returns the multi-index. When we have a str (object) dtype, the return is an empty index. Not consistent. Previous version, 0.18.0 returned multi-index in both cases.

Expected Output

In [36]: g.first().index
Out[36]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.36.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.9.0
xarray: 0.10.2
IPython: 5.6.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.2
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.3.2
html5lib: 0.999
sqlalchemy: 1.2.6
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-04-30T22:34:55Z

Can confirm this was happening in 0.22 but looks to be fine on master

amanhanda · 2018-05-01T11:06:14Z

What would be the prescribed workaround in 0.22?

mroeschke · 2019-10-27T01:23:16Z

Yup looks to work on master. Could use a test.

In [181]: In [117]: import pandas as pd
     ...:
     ...: In [118]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":[4]})
     ...:
     ...: In [119]: group_keys = ["a", "b", "c"]
     ...:
     ...: In [120]: g = df[df.a==2].groupby(group_keys)
     ...:
     ...: In [121]: g.first().index
Out[181]: MultiIndex([], names=['a', 'b', 'c'])

In [182]: In [122]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":["d"]})
     ...:

In [183]: In [123]: g = df[df.a==2].groupby(group_keys)
     ...:

In [184]: In [124]: g.first().index
     ...:
Out[184]: MultiIndex([], names=['a', 'b', 'c'])

In [185]: In [36]: g.first().index
     ...:
Out[185]: MultiIndex([], names=['a', 'b', 'c'])

In [186]: pd.__version__
Out[186]: '0.26.0.dev0+682.g08ab156eb'

Closes #20888

…29455) Closes pandas-dev#20888

jbrockmendel added the Groupby label Jul 30, 2018

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Groupby labels Oct 27, 2019

ganevgv mentioned this issue Nov 7, 2019

TST: add test for empty frame groupby dtypes consistency #29455

Merged

5 tasks

gfyoung added this to the 1.0 milestone Nov 7, 2019

gfyoung closed this as completed in #29455 Nov 8, 2019

gfyoung pushed a commit that referenced this issue Nov 8, 2019

TST: add test for empty frame groupby dtypes consistency (#29455)

c2305d7

Closes #20888

Reksbril pushed a commit to Reksbril/pandas that referenced this issue Nov 18, 2019

TST: add test for empty frame groupby dtypes consistency (pandas-dev#…

3e7dc06

…29455) Closes pandas-dev#20888

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: add test for empty frame groupby dtypes consistency (pandas-dev#…

1aa4c5e

…29455) Closes pandas-dev#20888

proost pushed a commit to proost/pandas that referenced this issue Dec 19, 2019

TST: add test for empty frame groupby dtypes consistency (pandas-dev#…

03a39ff

…29455) Closes pandas-dev#20888

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Groupby on empty frame different results based on dtypes. #20888

Groupby on empty frame different results based on dtypes. #20888

amanhanda commented Apr 30, 2018

INSTALLED VERSIONS

WillAyd commented Apr 30, 2018

amanhanda commented May 1, 2018

mroeschke commented Oct 27, 2019

Groupby on empty frame different results based on dtypes. #20888

Groupby on empty frame different results based on dtypes. #20888

Comments

amanhanda commented Apr 30, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Apr 30, 2018

amanhanda commented May 1, 2018

mroeschke commented Oct 27, 2019

Output of `pd.show_versions()`