ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

speed650 · 2023-09-25T19:31:26Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

Using the [ ] syntax can get messy/complicated for applying filters to a dataframe. The current Filter() function is also confusing to use. I Propose adding dedicated functions to quickly filter out columns and rows to make Pandas easier to use.

Feature Description

Propose adding 2 new functions

Filter_Columns(), Filter_Rows()

def Filter_Columns( columns: List, inverse:Bool, inplace:Bool )
def Filter_Rows( Rows: List, inverse:Bool, inplace:Bool )

Usage:

Filter_Columns( ['Names', 'Ages' ], inverse=False, inplace=True)

Shows columns for name and age.. Inverse is used to hide show other columns that are not name and age.

def Filter_Rows( [ ('name'==bob), ( 'age' > 20) ] , inverse:Bool, inplace:Bool )

Shows dataframe where the value in names column =bob, and age column >20

Chained

Filter_Columns( ['Names', 'Ages' ], inverse=False, inplace=True).Filter_Rows( [ ('name'==bob), ( 'age' > 20) ] , inverse:Bool, inplace:Bool )

Alternative Solutions

Use [ ] syntex... More confusing

Additional Context

No response

jbrockmendel · 2023-09-25T22:47:47Z

I’d be on board with deprecating filter so we can change it to a more standard row-filter

speed650 · 2023-09-25T22:57:32Z

That’s good to. Currently it has an input on “items” which is confusing.. you can have a function of “rows” and “columns”, “index”, depending on what you pass it will filter.

…

On Monday, September 25, 2023, jbrockmendel ***@***.***> wrote: I’d be on board with deprecating filter so we can change it to a more standard row-filter — Reply to this email directly, view it on GitHub <#55289 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCNPCBY5YLYCXD3TK3VLG6DX4ICZ7ANCNFSM6AAAAAA5GSY34Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Cappuchinoo · 2023-10-05T01:32:35Z

take

phofl · 2023-10-21T19:45:15Z

cc @pandas-dev/pandas-core

rhshadrach · 2023-10-22T12:52:03Z

Most pandas methods offer an axis argument. I'm not a fan of adding fragmentation to the API by sometimes have axis and sometimes having two methods *_rows and *_columns). I also prefer fewer methods with more arguments as opposed to more methods with fewer arguments, especially where there aren't arguments that go unused in certain cases. I can be on board if we want to move away from axis argument to having two methods - but think it should be across the API rather than just for some methods.

pyspark calls our current method select and uses filter for conditions (similar to what's proposed here). It looks like polars is similar. I like this terminology.

In 2.x, I propose we:

Alias filter to select
Add filter_cond (or something similar) - in my opinion this does not need, but can have, an axis argument
Deprecate filter

In the future, we then have the option to:

Alias filter_cond to filter
Deprecate filter_cond

For the filter_cond method, it should only be Boolean conditions (e.g. not accept lists of labels), does not need an inverse argument (negation is easy enough), and should have an inplace argument (it can't be done inplace).

jreback · 2023-10-22T13:12:20Z

@rhshadrach there have been a number of issues to do somewhat similar

not averse but it would likely need a dedicated issue (pls link the original) and comprehensive schedules for this

Dr-Irv · 2023-10-24T13:35:22Z

I did a PR that didn't go anywhere about having a filter for Index that would create some better syntax for "filtering" on an index, IMHO: #51370 That might cover some of the use cases here.

OP wrote:

The current Filter() function is also confusing to use. I Propose adding dedicated functions to quickly filter out columns and rows to make Pandas easier to use.

I guess the only benefit of this proposal is to allow a list of conditions to be applied.

So why not just add an argument to DataFrame.filter() that allows that list to be specified, and avoid filter_cond ?

bashtage · 2023-10-24T14:32:30Z

So why not just add an argument to DataFrame.filter() that allows that list to be specified, and avoid filter_cond ?

Most pandas methods offer an axis argument. I'm not a fan of adding fragmentation to the API by sometimes have axis and sometimes having two methods *_rows and *_columns).

I fully agree with these two. DataFrame.filter already accepts axis. It seems the only suggestion here is to add an inverse argument which would make filter drop?

I'm not sure I see where the value add of this proposal lies?

Unless I am missing something, these are pretty easy to do.

def Filter_Columns( columns, inverse=False) -> df.filter(columns)
def Filter_Rows(index, inverse=False) -> df.filter(index, axis=0)
def Filter_Columns( columns, inverse=True) -> df.drop(columns)
def Filter_Rows( columns, inverse=True) -> df.drop(index, axis=0)

rhshadrach · 2023-10-24T20:13:18Z

@bashtage

I fully agree with these two. DataFrame.filter already accepts axis. It seems the only suggestion here is to add an inverse argument which would make filter drop?

I don't think this is accurate. The OP is also asking to be able to filter based on conditions.

@Dr-Irv

So why not just add an argument to DataFrame.filter() that allows that list to be specified, and avoid filter_cond ?

A few reasons:

Arguments like and regex don't make sense when filtering by condition. I think having arguments that don't make sense in the presence of values of other arguments is not good API design.
I think it is more common one would filter by labels with columns, and filter by conditions with rows. If this is the case, there isn't a good default for axis if it were all one method.
R, pyspark, and polars all use select for "filter by label" and filter for "filter by condition".

Dr-Irv · 2023-10-24T21:07:48Z

A few reasons:

Arguments like and regex don't make sense when filtering by condition. I think having arguments that don't make sense in the presence of values of other arguments is not good API design.

I agree on the design part, but at least for filter today, we already have that items, like and regex are enforced to be mutually exclusive, so adding a condition would be consistent with that (admittedly - not a great design to begin with).

I think it is more common one would filter by labels with columns, and filter by conditions with rows. If this is the case, there isn't a good default for axis if it were all one method.

I have had use cases where filtering by condition on column names would be useful, unless you want a complex regex. Also, to filter rows by condition, you can just use query()

R, pyspark, and polars all use select for "filter by label" and filter for "filter by condition".

A reasonable argument to change things along the lines you propose

rhshadrach · 2023-10-24T21:23:20Z

I have had use cases where filtering by condition on column names would be useful, unless you want a complex regex. Also, to filter rows by condition, you can just use query()

Agreed there are cases - but I'm curious about your perception as to how common one case is vs another, as this is discussing the default value of axis.

query is great, I use it a lot in ad-hoc analysis, but it comes with a ton of overhead and I avoid it otherwise.

Dr-Irv · 2023-10-24T21:29:35Z

Agreed there are cases - but I'm curious about your perception as to how common one case is vs another, as this is discussing the default value of axis.

Since filter has a default of columns as the axis, I'd vote for that. Coming up with a way to select a subset of columns based on properties of the column names is something that only filter can do right now.

query is great, I use it a lot in ad-hoc analysis, but it comes with a ton of overhead and I avoid it otherwise.

I used to think that, but after some testing I did a few years ago, I didn't see the performance difference, and the syntax is pretty clean.

speed650 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 25, 2023

github-actions bot assigned Cappuchinoo Oct 5, 2023

Cappuchinoo mentioned this issue Oct 19, 2023

ENH: functions filter_columns and filter_rows created #55592

Closed

5 tasks

rhshadrach added Filters e.g. head, tail, nth Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 22, 2023

rhshadrach mentioned this issue Apr 20, 2025

ENH: Make DataFrame.filter accept filters in new formats #61317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

speed650 commented Sep 25, 2023

jbrockmendel commented Sep 25, 2023

speed650 commented Sep 25, 2023 via email

Cappuchinoo commented Oct 5, 2023

phofl commented Oct 21, 2023

rhshadrach commented Oct 22, 2023 •

edited

Loading

jreback commented Oct 22, 2023

Dr-Irv commented Oct 24, 2023

bashtage commented Oct 24, 2023

rhshadrach commented Oct 24, 2023 •

edited

Loading

Dr-Irv commented Oct 24, 2023

rhshadrach commented Oct 24, 2023

Dr-Irv commented Oct 24, 2023

ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289

Comments

speed650 commented Sep 25, 2023

Feature Type

Problem Description

Feature Description

Shows columns for name and age.. Inverse is used to hide show other columns that are not name and age.

Shows dataframe where the value in names column =bob, and age column >20

Alternative Solutions

Additional Context

jbrockmendel commented Sep 25, 2023

speed650 commented Sep 25, 2023 via email

Cappuchinoo commented Oct 5, 2023

phofl commented Oct 21, 2023

rhshadrach commented Oct 22, 2023 • edited Loading

jreback commented Oct 22, 2023

Dr-Irv commented Oct 24, 2023

bashtage commented Oct 24, 2023

rhshadrach commented Oct 24, 2023 • edited Loading

Dr-Irv commented Oct 24, 2023

rhshadrach commented Oct 24, 2023

Dr-Irv commented Oct 24, 2023

rhshadrach commented Oct 22, 2023 •

edited

Loading

rhshadrach commented Oct 24, 2023 •

edited

Loading