Skip to content

BUG: Multiindex.nunique raises NotImplementedError #34019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
mizuy opened this issue May 6, 2020 · 5 comments · Fixed by #47638
Closed
2 of 3 tasks

BUG: Multiindex.nunique raises NotImplementedError #34019

mizuy opened this issue May 6, 2020 · 5 comments · Fixed by #47638
Assignees
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@mizuy
Copy link

mizuy commented May 6, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

pd.DataFrame([[1,2],[1,2]]).set_index([0,1]).index.nunique()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-80-09716b89b157> in <module>
----> 1 pd.DataFrame([[1,2],[1,2]]).set_index([0,1]).index.nunique()

/opt/conda/lib/python3.7/site-packages/pandas/core/base.py in nunique(self, dropna)
   1284         uniqs = self.unique()
   1285         n = len(uniqs)
-> 1286         if dropna and isna(uniqs).any():
   1287             n -= 1
   1288         return n

/opt/conda/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in isna(obj)
    124     Name: 1, dtype: bool
    125     """
--> 126     return _isna(obj)
    127 
    128 

/opt/conda/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in _isna_new(obj)
    136     # hack (for now) because MI registers as ndarray
    137     elif isinstance(obj, ABCMultiIndex):
--> 138         raise NotImplementedError("isna is not defined for MultiIndex")
    139     elif isinstance(obj, type):
    140         return False

NotImplementedError: isna is not defined for MultiIndex

exactly same error raised by pandas 0.25.3

Problem description

The method exists, but always throws NotImplementedError. I can use len(df.index.unique()) as a substitute. In the previous version of pandas, documentation have Multiindex.nunique entry (https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.MultiIndex.unique.html), but it has disappeared now. So I think it might be possible that this is not a bug, but an intended exception.

Expected Output

should be 1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1
setuptools : 46.1.3.post20200325
Cython : 0.29.17
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.0
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.2.8
numba : 0.48.0

@mizuy mizuy added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2020
@dsaxton dsaxton added the Index Related to the Index class or subclasses label May 6, 2020
@dsaxton
Copy link
Member

dsaxton commented May 6, 2020

To me this feels like a bug, or at the very least is probably not the intended behavior. I think defining isna on a MultiIndex so this doesn't raise could make sense (or if not skip the NA check for MultiIndex). It does beg the question what it means for a MultiIndex value to be NA: do all the values need to be NA, or at least one (if the latter then the adjustment within nunique becomes more complicated)?

@dsaxton dsaxton added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2020
FRidh added a commit to FRidh/pandas that referenced this issue Dec 18, 2020
@FRidh
Copy link

FRidh commented Dec 18, 2020

Proposed fix in #38558.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex and removed Needs Discussion Requires discussion from core team before further action labels Dec 22, 2020
@jbrockmendel
Copy link
Member

This works in master, could use a test

@jbrockmendel jbrockmendel added Needs Tests Unit test(s) needed to prevent regressions and removed Index Related to the Index class or subclasses labels Jun 19, 2021
@migunasekera
Copy link

take

@noatamir
Copy link
Member

noatamir commented Jul 8, 2022

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
8 participants