Skip to content

Allow 'integer' in select_dtypes include argument #927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jonyscathe opened this issue May 22, 2024 · 1 comment · Fixed by #968
Closed

Allow 'integer' in select_dtypes include argument #927

jonyscathe opened this issue May 22, 2024 · 1 comment · Fixed by #968

Comments

@jonyscathe
Copy link

Describe the bug
#900 added better typing for the select_dtypes include and exclude arguments.
However, it missed at least one allowable include variable type in the AstypeArgExt literal list.
"integer" should be added to that literal list as it is allowed in the select_dtypes include/exclude arguments and will include or exclude all integer types.
There are likely a few other numpy scalar abstract base classes are also valid here, probably everything within a dashed-line box on the Hierarchy of type objects diagram located here: https://numpy.org/doc/stable/reference/arrays.scalars.html
Of those, only "number" has been added to the Literal list, but it is likely that anyone may want to include/exclude any of the other numpy

To Reproduce
Following code produces an error in mypy:

import pandas as pd
from numpy import double, int16

col_cat = pd.Series(['a', 'b', 'c', 'a'], dtype='category', name='col_cat')
col_bool = pd.Series([True, True, False, True], dtype=bool, name='col_bool')
col_int_0 = pd.Series([1, 2, 3, 4], dtype=int, name='col_int_0')
col_int_1 = pd.Series([1, 2, 3, 4], dtype=int16, name='col_int_1')
col_float_1 = pd.Series([1.1, 2.2, 3.3, 4.4], dtype=double, name='col_float_0')

X = pd.concat([col_cat, col_bool, col_int_0, col_int_1, col_float_1], axis=1)

integer_cols_ = list(X.select_dtypes(include=['integer']).columns)

mypy error is on the line inter_cols_ = ... and is:

error: List item 0 has incompatible type "Literal['integer']"; expected "Literal['bool', 'boolean', '?', 'b1', 'bool8', 'bool_', 'bool[pyarrow]', 'boolean[pyarrow]', 'int', 'Int8', 'Int16', 'Int32', 'Int64', 'b', 'i1', 'int8', 'byte', 'h', 'i2', 'int16', 'short', 'i', 'i4', 'int32', 'intc', 'l', 'i8', 'int64', 'int_', 'long', 'q', 'longlong', 'p', 'intp', 'int0', 'int8[pyarrow]', 'int16[pyarrow]', 'int32[pyarrow]', 'int64[pyarrow]', 'UInt8', 'UInt16', 'UInt32', 'UInt64', 'B', 'u1', 'uint8', 'ubyte', 'H', 'u2', 'uint16', 'ushort', 'I', 'u4', 'uint32', 'uintc', 'L', 'u8', 'uint', 'ulong', 'uint64', 'Q', 'ulonglong', 'P', 'uintp', 'uint0', 'uint8[pyarrow]', 'uint16[pyarrow]', 'uint32[pyarrow]', 'uint64[pyarrow]', 'str', 'string', 'U', 'str_', 'str0', 'unicode', 'unicode_', 'string[pyarrow]', 'bytes', 'S', 'a', 'bytes_', 'bytes0', 'string_', 'binary[pyarrow]', 'float', 'Float32', 'Float64', 'e', 'f2', '<f2', 'float16', 'half', 'f', 'f4', 'float32', 'single', 'd', 'f8', 'float64', 'double', 'float_', 'g', 'f16', 'float128', 'longdouble', 'longfloat', 'float[pyarrow]', 'double[pyarrow]', 'float16[pyarrow]', 'float32[pyarrow]', 'float64[pyarrow]', 'complex', 'F', 'c8', 'complex64', 'csingle', 'singlecomplex', 'D', 'c16', 'complex128', 'cdouble', 'cfloat', 'complex_', 'G', 'c32', 'complex256', 'clongdouble', 'clongfloat', 'longcomplex', 'timedelta64[Y]', 'timedelta64[M]', 'timedelta64[W]', 'timedelta64[D]', 'timedelta64[h]', 'timedelta64[m]', 'timedelta64[s]', 'timedelta64[ms]', 'timedelta64[us]', 'timedelta64[μs]', 'timedelta64[ns]', 'timedelta64[ps]', 'timedelta64[fs]', 'timedelta64[as]', 'm8[Y]', 'm8[M]', 'm8[W]', 'm8[D]', 'm8[h]', 'm8[m]', 'm8[s]', 'm8[ms]', 'm8[us]', 'm8[μs]', 'm8[ns]', 'm8[ps]', 'm8[fs]', 'm8[as]', '<m8[Y]', '<m8[M]', '<m8[W]', '<m8[D]', '<m8[h]', '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[μs]', '<m8[ns]', '<m8[ps]', '<m8[fs]', '<m8[as]', 'duration[s][pyarrow]', 'duration[ms][pyarrow]', 'duration[us][pyarrow]', 'duration[ns][pyarrow]', 'datetime64[Y]', 'datetime64[M]', 'datetime64[W]', 'datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[μs]', 'datetime64[ns]', 'datetime64[ps]', 'datetime64[fs]', 'datetime64[as]', 'M8[Y]', 'M8[M]', 'M8[W]', 'M8[D]', 'M8[h]', 'M8[m]', 'M8[s]', 'M8[ms]', 'M8[us]', 'M8[μs]', 'M8[ns]', 'M8[ps]', 'M8[fs]', 'M8[as]', '<M8[Y]', '<M8[M]', '<M8[W]', '<M8[D]', '<M8[h]', '<M8[m]', '<M8[s]', '<M8[ms]', '<M8[us]', '<M8[μs]', '<M8[ns]', '<M8[ps]', '<M8[fs]', '<M8[as]', 'date32[pyarrow]', 'date64[pyarrow]', 'timestamp[s][pyarrow]', 'timestamp[ms][pyarrow]', 'timestamp[us][pyarrow]', 'timestamp[ns][pyarrow]', 'category', 'object', 'O', 'V', 'void', 'void0', 'number', 'datetime64', 'datetime', 'timedelta', 'timedelta64', 'datetimetz'] | type[object] | dtype[generic] | ExtensionDtype"  [list-item]

Please complete the following information:

  • OS: Linux
  • OS Version: python:3.12.3-slim docker container
  • python version: 3.12.3
  • version of type checker: 1.10.0
  • version of installed pandas-stubs: 2.2.2.24054

Additional context
Add any other context about the problem here.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 22, 2024

Thanks for the report. Any additional strings/types such as "integer" that can be used in select_dtypes(), but not astype() should be added to the Literal defined here:

| Literal[
"number",
"datetime64",
"datetime",
"timedelta",
"timedelta64",
"datetimetz",
"datetime64[ns]",
]

PR welcome that includes tests added to

def test_select_dtypes() -> None:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants