Skip to content

ENH: Add complement argument to (Series|DataFrame).filter to do reversely operation #49245

Closed
@Zeroto521

Description

@Zeroto521

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Assume there is a DataFrame and has a lot of columns.
I want to drop some of the columns.

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
        one  two  three
mouse     1    2      3
rabbit    4    5      6

# Want to drop the columns containing 'o'

>>> df.filter(like='o', complement=True)
        three
mouse       3
rabbit      6

Feature Description

def filter(
    self: NDFrameT,
    items=None,
    like: str | None = None,
    regex: str | None = None,
    axis=None,
    complement: bool = False,
):
    nkw = com.count_not_none(items, like, regex)
    if nkw > 1:
        raise TypeError(
            "Keyword arguments `items`, `like`, or `regex` "
            "are mutually exclusive"
        )

    if axis is None:
        axis = self._info_axis_name
    labels = self._get_axis(axis)

    if items is not None:

        def f(x) -> bool_t:
            return (x not in labels) if complement else (x in labels)

        name = self._get_axis_name(axis)
        return self.reindex(**{name: [r for r in items if f(r)]})
    elif like:

        def f(x) -> bool_t:
            assert like is not None  # needed for mypy
            return like in ensure_str(x)
    elif regex:

        matcher = re.compile(regex)
        def f(x) -> bool_t:
            return matcher.search(ensure_str(x)) is not None
    else:
        raise TypeError("Must pass either `items`, `like`, or `regex`")

    values = labels.map(f)
    if complement:
        return self.loc(axis=axis)[~values]
    return self.loc(axis=axis)[values]

Alternative Solutions

  • For DataFrame.drop, it requires the full names of columns.

The following solutions could work but they may be a little bit annoying.

  • Directly use not in, like df.loc[:, df.columns.map(lambda x: 'xxx' not in x)].
  • It also could use re to search these columns and then drop them.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions