Skip to content

Dataframe.sum() returns an error with MultiIndex columns and skipna=False #37622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bertrandmarc opened this issue Nov 4, 2020 · 1 comment
Closed
Labels

Comments

@bertrandmarc
Copy link

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

iterables = [['bar', 'baz'], ['one', 'two'], ['dog', 'cat']]

df = pd.DataFrame(np.random.randn(2, 8), columns=pd.MultiIndex.from_product(iterables, names=['first', 'second', 'third']))

df.sum(axis=1, level=['first', 'third'], skipna=False)

>>> df
first        bar                                     baz                              
second       one                 two                 one                 two          
third        dog       cat       dog       cat       dog       cat       dog       cat
0      -1.093149  1.087846 -0.067161  1.129021 -1.024021  1.519433 -0.349039  1.215196
1       0.622675 -1.479139 -0.057143 -0.377272 -1.061766  1.606416 -0.172099 -0.564745

Problem description

Dataframe.sum returns an error with MultiIndex columns, several levels and skipna=False. However it appears to work fine with the transposed version of the dataframe (and axis=0), or with a single level, or with skipna=True.

>>> df.sum(axis=1, level=['first', 'third'], skipna=False)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 368, in _get_axis_number
    return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11419, in stat_func
    return self._agg_by_level(
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10258, in _agg_by_level
    return grouped.aggregate(applyf)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/generic.py", line 959, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1083, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 641, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 701, in _aggregate_series_pure_python
    res = func(group, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1060, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10257, in <lambda>
    applyf = lambda x: method(x, axis=axis, skipna=skipna, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11422, in stat_func
    return self._reduce(
  File "/usr/lib/python3/dist-packages/pandas/core/series.py", line 4223, in _reduce
    self._get_axis_number(axis)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 370, in _get_axis_number
    raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series

Expected Output

I would expect a Dataframe with the sum of the columns. Please note that it works fine with MultiIndex index (i.e with the transposed Dataframe and axis=0)

>>> df.transpose().sum(level=['first', 'third'], skipna=False)
                    0         1
first third                    
bar   dog   -1.160310  0.565531
      cat    2.216866 -1.856411
baz   dog   -1.373060 -1.233865
      cat    2.734629  1.041672

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.0-1-amd64
Version : #1 SMP Debian 5.9.1-1 (2020-10-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.1.3
numpy : 1.19.3
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 50.3.0
Cython : None
pytest : 4.6.11
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.2.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2

@bertrandmarc bertrandmarc added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2020
@jbrockmendel jbrockmendel added Reduction Operations sum, mean, min, max, etc. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021
@mroeschke
Copy link
Member

Thanks for the report but the level keyword has been deprecated & removed from sum so I don't think this bug is applicable anymore so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants