BUG: groupby dropna=False doesn't work for multi index columns #39895

deepers · 2021-02-18T21:09:13Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample

In [1]: df = pd.DataFrame(dict(a=[nan], b=[0], c=[1]))
In [2]: df
    a  b  c
0 NaN  0  1

# groupby drops groups with null by default
In [3]: df.groupby(['a', 'b']).first()
Empty DataFrame
Columns: [c]
Index: []

# pass dropna=False to keep groups with nulls
In [4]: df.groupby(['a', 'b'], dropna=False).first()
       c
a   b   
NaN 0  1

# This doesn't work if the groups are in a multi index.
In [5]: df.set_index(['a', 'b']).groupby(['a', 'b'], dropna=False).first()
Empty DataFrame
Columns: [c]
Index: []

Problem description

It seems like setting dropna=False in groupby doesn't work when the group columns are part of a multi index. I'm not sure if this is intentional, but it is certainly confusing.

Expected Output

I would expect the output of [5] above to match that of [4].

Output of `pd.show_versions()`

{
  "system": {
    "commit": "7d32926db8f7541c356066dcadabf854487738de",
    "python": "3.8.5.final.0",
    "python-bits": 64,
    "OS": "Linux",
    "OS-release": "5.4.0-1035-aws",
    "Version": "#37-Ubuntu SMP Wed Jan 6 21:01:57 UTC 2021",
    "machine": "x86_64",
    "processor": "x86_64",
    "byteorder": "little",
    "LC_ALL": "C.UTF-8",
    "LANG": "C.UTF-8",
    "LOCALE": {
      "language-code": "en_US",
      "encoding": "UTF-8"
    }
  },
  "dependencies": {
    "pandas": "1.2.2",
    "numpy": "1.20.1",
    "pytz": "2021.1",
    "dateutil": "2.8.1",
    "pip": "20.3.3",
    "setuptools": "51.3.3",
    "Cython": null,
    "pytest": null,
    "hypothesis": null,
    "sphinx": null,
    "blosc": null,
    "feather": null,
    "xlsxwriter": null,
    "lxml.etree": null,
    "html5lib": null,
    "pymysql": null,
    "psycopg2": null,
    "jinja2": null,
    "IPython": "7.20.0",
    "pandas_datareader": null,
    "bs4": "4.9.3",
    "bottleneck": null,
    "fsspec": null,
    "fastparquet": "0.5.0",
    "gcsfs": null,
    "matplotlib": null,
    "numexpr": null,
    "odfpy": null,
    "openpyxl": null,
    "pandas_gbq": null,
    "pyarrow": "3.0.0",
    "pyxlsb": null,
    "s3fs": null,
    "scipy": null,
    "sqlalchemy": null,
    "tables": null,
    "tabulate": null,
    "xarray": null,
    "xlrd": null,
    "xlwt": null,
    "numba": "0.52.0"
  }
}

The text was updated successfully, but these errors were encountered:

deepers · 2021-05-03T17:57:36Z

Attaching a proper test script.

import numpy as np
import pandas as pd


def main():
    df = pd.DataFrame(dict(a=[np.NaN], b=[0], c=[1]))
    print('df', df, sep='\n')

    print('\ngroupby() drops groups with null by default')
    print(df.groupby(['a', 'b']).first())

    print('\npass dropna=False to keep groups with nulls')
    print(df.groupby(['a', 'b'], dropna=False).first())

    print("\nThis doesn't work if the groups are in a multi index.")
    print(df.set_index(['a', 'b']).groupby(['a', 'b'], dropna=False).first())


if __name__ == '__main__':
    main()

The output I'm seeing is as below, but I would expect that the last two outputs should be the same.

df
    a  b  c
0 NaN  0  1

groupby drops groups with null by default
Empty DataFrame
Columns: [c]
Index: []

pass dropna=False to keep groups with nulls
       c
a   b   
NaN 0  1

This doesn't work if the groups are in a multi index.
Empty DataFrame
Columns: [c]
Index: []

deepers added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 18, 2021

mroeschke added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2021

dchigarev mentioned this issue Dec 8, 2021

GroupBy ignores dropna=False parameter for aggregations implemented with the MapReduce approach modin-project/modin#3817

Closed

rhshadrach added this to the Contributions Welcome milestone Feb 3, 2022

GYHHAHA mentioned this issue Jul 14, 2022

TST: add test for groupby with dropna=False on multi-index #47717

Merged

5 tasks

mroeschke closed this as completed in #47717 Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: groupby dropna=False doesn't work for multi index columns #39895

BUG: groupby dropna=False doesn't work for multi index columns #39895

deepers commented Feb 18, 2021

deepers commented May 3, 2021

BUG: groupby dropna=False doesn't work for multi index columns #39895

BUG: groupby dropna=False doesn't work for multi index columns #39895

Comments

deepers commented Feb 18, 2021

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

deepers commented May 3, 2021

Output of `pd.show_versions()`