-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: small ops optimizations #28036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: small ops optimizations #28036
Conversation
pandas/core/sparse/frame.py
Outdated
@@ -540,9 +540,6 @@ def _combine_frame(self, other, func, fill_value=None, level=None): | |||
this, other = self.align(other, join="outer", level=level, copy=False) | |||
new_index, new_columns = this.index, this.columns | |||
|
|||
if self.empty and other.empty: | |||
return self._constructor(index=new_index).__finalize__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is covered by a test?
(in general, I would rather leave sparse alone, this is deprecated anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is.
I understand the impulse to leave sparse alone, but ATM some of it is a barrier to simplifying ops code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was just saying on the other PR as well, there is also the option to just leave sparse as is, but still go forward with simplifying the ops code for non-sparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also make it easier for you to focus only on the ops code for normal frame (without additional complexity of sharing with sparse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. core.ops functions call the DataFrame/SparseDataFrame methods, so anything I want to do to change the behavior of the non-sparse methods needs to be reflected in the sparse methods
Do you have an idea how much those changes matter? (anyway, even if it is almost negligible, I think eg checking the actual dtype with the
Not directly related to the ones you are doing here. I need to look it up again (I described it in the issue about the slowdown), but it was mainly about the way we iterate through the columns and how to reconstruct the DataFrame/Series back in the end, not so much the ops itself. |
Haven't measured these specifically, but I know the is_foo_dtype checks get about a 2x speedup from passing dtype #27224 |
reverted sparse changes, except for one cosmetic thing that annoyed me |
updated to change |
@@ -5318,7 +5325,7 @@ def _arith_op(left, right): | |||
|
|||
def _combine_match_index(self, other, func, level=None): | |||
left, right = self.align(other, join="outer", axis=0, level=level, copy=False) | |||
assert left.index.equals(right.index) | |||
# at this point we have `left.index.equals(right.index)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to remove the asserts here? If someone is that set on getting this optimization couldn't they just use the -OO flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i put these assertions in when trying to understand the ops code a while back. having a comment to the same effect performs the same task for the reader.
Thanks @jbrockmendel |
@jorisvandenbossche you've mentioned some ops optimizations; any suggestions for things to add here?