Skip to content

BUG: Discrepancy between .is_numeric() and is_numeric_dtype for bool Index #51152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
jrbourbeau opened this issue Feb 3, 2023 · 9 comments · Fixed by #51160
Closed
3 tasks done

BUG: Discrepancy between .is_numeric() and is_numeric_dtype for bool Index #51152

jrbourbeau opened this issue Feb 3, 2023 · 9 comments · Fixed by #51160
Assignees
Labels
Bug Index Related to the Index class or subclasses

Comments

@jrbourbeau
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.api.types import is_numeric_dtype

idx = pd.Index([True], dtype=bool)

print(f"{idx.is_numeric() = }")
print(f"{is_numeric_dtype(idx) = }")

Issue Description

With the latest nightly wheel of pandas, the following deprecation message is given

FutureWarning: Index.is_numeric is deprecated. Use pandas.api.types.is_numeric_dtype instead

However, when accounting for this deprecation over in dask/dask, we observed that is_numeric() and is_numeric_dtype don't give the same result for and Index with bool data

Expected Behavior

I'd expect these to both give the same answer

Installed Versions

INSTALLED VERSIONS
------------------
commit           : f06c96a93fb2e21c9f801192d9ed5896c5ce3535
python           : 3.10.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 22.2.0
Version          : Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.0.0.dev0+1401.gf06c96a93f
numpy            : 1.24.1
pytz             : 2022.1
dateutil         : 2.8.2
setuptools       : 59.8.0
pip              : 22.0.4
Cython           : None
pytest           : 7.1.3
hypothesis       : None
sphinx           : 4.5.0
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 3.1.2
IPython          : 8.2.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : None
brotli           :
fastparquet      : 2022.12.1.dev6
fsspec           : 2023.1.0+5.g012816b
gcsfs            : None
matplotlib       : 3.5.1
numba            : None
numexpr          : 2.8.0
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 11.0.0.dev316
pyreadstat       : None
pyxlsb           : None
s3fs             : 2022.10.0
scipy            : 1.9.0
snappy           :
sqlalchemy       : 1.4.35
tables           : 3.7.0
tabulate         : None
xarray           : 2022.3.0
xlrd             : None
zstandard        : None
tzdata           : None
qtpy             : None
pyqt5            : None
@jrbourbeau jrbourbeau added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 3, 2023
@phofl
Copy link
Member

phofl commented Feb 3, 2023

Hi, thanks for your report. I think the best suited function is not public right now. is_any_numeric_dtype is equivalent to is_numeric. @mroeschke do you know why this is private? Documentation looks like It could be public

@phofl phofl added Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 3, 2023
@mroeschke
Copy link
Member

Not sure, guessing is_any_numeric_dtype was more of a helper function initially so it was never made public. Could either make it public of add a keyword to is_numeric_dtype to include boolean

@phofl
Copy link
Member

phofl commented Feb 3, 2023

We would have to exclude complex as well, which would make the keyword name awkward I guess (or add 2, but not nice either).

Having said that, is_any_numeric_dtype is not a great name either. We could rename to is_any_real_numeric_dtype before making it public?

@ABCPAN-rank
Copy link
Contributor

Hi I deprecate is_numeric . This is my first taste in contribution . I think is_any_numeric_dtype is a helpful function so it isn’t public. You can click here DEPR: deprecate Index.is_numeric #50769 for more information about it.

@phofl
Copy link
Member

phofl commented Feb 4, 2023

Ah thx, should have checked the git info.

Reading through the discussion, I think is_any_real_numeric_dtype() makes more sense, since it excludes complex numbers as you mentioned there.

Would you be interested in making a pr to rename and make public?

@ABCPAN-rank
Copy link
Contributor

Yes, i'm interested in doing it. But I am a novice in contribution. If I want to make function public, what should I do ? Writing test for is_any_real_numeric_dtype and updating whatsnew ?

@ABCPAN-rank
Copy link
Contributor

And I think is_real_dtype is a good name about the function . Because the same function which check number type is called is_xxx_dtype

@phofl
Copy link
Member

phofl commented Feb 4, 2023

I prefer is_any_real_numeric_dtype, is_real_dtype could mean anything, if the user is not aware that we are talking about numeric data.

  • add a whatsnew
  • add your function to pandas/core/dtypes/api.py
  • change your FutureWarning to point to the new function
  • add to section Partially validate docstrings (EX02) in ci/code_checks.sh
  • add to Data type introspection in doc/source/reference/arrays.rst
  • adding tests would be good as well

@ABCPAN-rank
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses
Projects
None yet
4 participants