-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Improve Filter function with Filter_Columns and Filter_Rows #55289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I’d be on board with deprecating filter so we can change it to a more standard row-filter |
That’s good to.
Currently it has an input on “items” which is confusing.. you can have a
function of “rows” and “columns”, “index”, depending on what you pass it
will filter.
…On Monday, September 25, 2023, jbrockmendel ***@***.***> wrote:
I’d be on board with deprecating filter so we can change it to a more
standard row-filter
—
Reply to this email directly, view it on GitHub
<#55289 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCNPCBY5YLYCXD3TK3VLG6DX4ICZ7ANCNFSM6AAAAAA5GSY34Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
take |
cc @pandas-dev/pandas-core |
Most pandas methods offer an pyspark calls our current method In 2.x, I propose we:
In the future, we then have the option to:
For the |
@rhshadrach there have been a number of issues to do somewhat similar not averse but it would likely need a dedicated issue (pls link the original) and comprehensive schedules for this |
I did a PR that didn't go anywhere about having a OP wrote:
I guess the only benefit of this proposal is to allow a list of conditions to be applied. So why not just add an argument to |
I fully agree with these two. I'm not sure I see where the value add of this proposal lies? Unless I am missing something, these are pretty easy to do.
|
I don't think this is accurate. The OP is also asking to be able to filter based on conditions.
A few reasons:
|
I agree on the design part, but at least for
I have had use cases where filtering by condition on column names would be useful, unless you want a complex regex. Also, to filter rows by condition, you can just use
A reasonable argument to change things along the lines you propose |
Agreed there are cases - but I'm curious about your perception as to how common one case is vs another, as this is discussing the default value of axis. query is great, I use it a lot in ad-hoc analysis, but it comes with a ton of overhead and I avoid it otherwise. |
Since
I used to think that, but after some testing I did a few years ago, I didn't see the performance difference, and the syntax is pretty clean. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Using the [ ] syntax can get messy/complicated for applying filters to a dataframe. The current Filter() function is also confusing to use. I Propose adding dedicated functions to quickly filter out columns and rows to make Pandas easier to use.
Feature Description
Propose adding 2 new functions
Filter_Columns(), Filter_Rows()
def Filter_Columns( columns: List, inverse:Bool, inplace:Bool )
def Filter_Rows( Rows: List, inverse:Bool, inplace:Bool )
Usage:
Filter_Columns( ['Names', 'Ages' ], inverse=False, inplace=True)
Shows columns for name and age.. Inverse is used to hide show other columns that are not name and age.
def Filter_Rows( [ ('name'==bob), ( 'age' > 20) ] , inverse:Bool, inplace:Bool )
Shows dataframe where the value in names column =bob, and age column >20
Chained
Filter_Columns( ['Names', 'Ages' ], inverse=False, inplace=True).Filter_Rows( [ ('name'==bob), ( 'age' > 20) ] , inverse:Bool, inplace:Bool )
Alternative Solutions
Use [ ] syntex... More confusing
Additional Context
No response
The text was updated successfully, but these errors were encountered: