-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Make DataFrame.filter accept filters in new formats #61317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Partial proposal:
For UDFs, it seems to me that the usage in the OP can be readily handled by Another question is how strict we are on the values that will be filtered. Do we require these to be |
I like the idea. If I understand correctly, the main use of I see your point for using In any case, what you propose seems like a great improvement. |
Agreed - I should have said no new alternatives. 😆 For UDFs, one reason not to have However I do not find it intuitive that in A bit of restatement of my previous post, but it seems like Finally, if we are to have Overall, I lean toward operate by-row here, but not strongly.
I agree, but desire the deprecation would be slow. That is, first introduce filter and change the docs to discourage the use of cc @pandas-dev/pandas-core for any thoughts. |
If you operate by row, (or by column if the axis argument is retained), then if you passed a Series with the Series.name set to the index label then it would be easier to filter based on the index label and thereby potentially justify the removal of |
I see your point @rhshadrach, and I think what you propose is very reasonable and maybe even thr best option in theory. In practice, I would be very surprised if most users don't find the pyspark-like API of the function receiving the whole dataframe more intuitive. See this example in their docs: df.filter(df.age > 3).show() We can't compare directly with a lazy API, but I think what I propose is quite similar to this. Also, it was discussed before about adding df.filter(pd.col('age') > 3) Personally if filter will accept both this expression and a lambda, I think it's way more clear and intuitive that the lambda works the way I described. Let's see what other people think, maybe what's clear and intuitive to me it's not to others. |
Maybe I'm missing something, but why deprecate Why not leave So that |
That's a reasonable option. I think filter is more clear, and is what everybody else is using. If we were to implement the API from scratch now, I think it would be the obvious choice. For backward compatibility query may be better, and we can surely consider it. But I would rather have a very long deprecation timeline, than keep the API IMHO wrong because of a choice we did that now is not ideal. |
I think it'd be very nice for users to get this working:
I think implementing this is reasonably simple. I think the main challenge is how to design the API in a way that filter can be intuitive and still work with the current parameters. And particularly, keeping backward compatibility. But personally, I think this would be so useful, that worth finding a solution.
CC: @rhshadrach
The text was updated successfully, but these errors were encountered: