Skip to content

BUG: incorrect type hint for subset in dropna #49702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
stevenlis opened this issue Nov 14, 2022 · 7 comments
Closed
2 of 3 tasks

BUG: incorrect type hint for subset in dropna #49702

stevenlis opened this issue Nov 14, 2022 · 7 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Typing type annotations, mypy/pyright type checking

Comments

@stevenlis
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
data = {'col1': [1, 3, 4], 'col2': [2, 3, 5], 'col2': [2, 4, 4]}
df = pd.DataFrame(data)

df.dropna(subset=df.columns.drop('col1'))

Issue Description

CS-JCBE5roQ

pandas/pandas/core/frame.py

Lines 6413 to 6573 in 91111fd

@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
def dropna(
self,
axis: Axis = 0,
how: str | NoDefault = no_default,
thresh: int | NoDefault = no_default,
subset: IndexLabel = None,
inplace: bool = False,
) -> DataFrame | None:
"""
Remove missing values.
See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
Determine if rows or columns which contain missing values are
removed.
* 0, or 'index' : Drop rows which contain missing values.
* 1, or 'columns' : Drop columns which contain missing value.
.. versionchanged:: 1.0.0
Pass tuple or list to drop on multiple axes.
Only a single axis is allowed.
how : {'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
* 'any' : If any NA values are present, drop that row or column.
* 'all' : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values. Cannot be combined with how.
subset : column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace : bool, default False
Whether to modify the DataFrame rather than creating a new one.
Returns
-------
DataFrame or None
DataFrame with NA entries dropped from it or None if ``inplace=True``.
See Also
--------
DataFrame.isna: Indicate missing values.
DataFrame.notna : Indicate existing (non-missing) values.
DataFrame.fillna : Replace missing values.
Series.dropna : Drop missing values.
Index.dropna : Drop missing indices.
Examples
--------
>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
... "toy": [np.nan, 'Batmobile', 'Bullwhip'],
... "born": [pd.NaT, pd.Timestamp("1940-04-25"),
... pd.NaT]})
>>> df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns')
name
0 Alfred
1 Batman
2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all')
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2)
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy'])
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Keep the DataFrame with valid entries in the same variable.
>>> df.dropna(inplace=True)
>>> df
name toy born
1 Batman Batmobile 1940-04-25
"""
if (how is not no_default) and (thresh is not no_default):
raise TypeError(
"You cannot set both the how and thresh arguments at the same time."
)
if how is no_default:
how = "any"
inplace = validate_bool_kwarg(inplace, "inplace")
if isinstance(axis, (tuple, list)):
# GH20987
raise TypeError("supplying multiple axes to axis is no longer supported.")
axis = self._get_axis_number(axis)
agg_axis = 1 - axis
agg_obj = self
if subset is not None:
# subset needs to be list
if not is_list_like(subset):
subset = [subset]
ax = self._get_axis(agg_axis)
indices = ax.get_indexer_for(subset)
check = indices == -1
if check.any():
raise KeyError(np.array(subset)[check].tolist())
agg_obj = self.take(indices, axis=agg_axis)
if thresh is not no_default:
count = agg_obj.count(axis=agg_axis)
mask = count >= thresh
elif how == "any":
# faster equivalent to 'agg_obj.count(agg_axis) == self.shape[agg_axis]'
mask = notna(agg_obj).all(axis=agg_axis, bool_only=False)
elif how == "all":
# faster equivalent to 'agg_obj.count(agg_axis) > 0'
mask = notna(agg_obj).any(axis=agg_axis, bool_only=False)
else:
raise ValueError(f"invalid how option: {how}")
if np.all(mask):
result = self.copy()
else:
result = self.loc(axis=axis)[mask]
if not inplace:
return result
self._update_inplace(result)
return None

Expected Behavior

IndexLabel | None might should be any array like object.

A similar issue: #48098

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.12.final.0
python-bits : 64
OS : Darwin
OS-release : 22.1.0
Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:52 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T8112
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.5.1
numpy : 1.23.1
pytz : 2022.5
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.1
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.10
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@phofl
Copy link
Member

phofl commented Nov 14, 2022

Hi,

Thanks for your report. Are you using our stubs? If yes, this should go there too

@stevenlis
Copy link
Author

I don't say it in my conda list, so i guess no?
CS-1Llozygc

@rhshadrach
Copy link
Member

The screenshot looks like VS code, I think that's what is using the stubs even if they aren't in your Python environment. cc @Dr-Irv

@rhshadrach rhshadrach added Typing type annotations, mypy/pyright type checking Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 17, 2022
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Nov 17, 2022

The screenshot looks like VS code, I think that's what is using the stubs even if they aren't in your Python environment. cc @Dr-Irv

Yes, most likely. Microsoft bundles pandas-stubs in their release. You can check that by hovering over dropna(), then right-click and select "go to declaration" and it will pull up a PYI file somewhere in your pylance distribution dist/bundles/stubs/pandas/. If that's the case, then this is an issue for the pandas-stubs project.

If that's the case, please create the issue there.

@stevenlis
Copy link
Author

Thanks @rhshadrach @Dr-Irv Indeed it's pylance in vscode... and they do use pandas-stubs. @phofl could you transfer the issue to pandas-stubs?

@phofl
Copy link
Member

phofl commented Nov 18, 2022

You have to open another issue in the stubs repo

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 30, 2022
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 6, 2023

Closing this as it is handled in the pandas-stubs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

4 participants