CLN: small ops optimizations #28036

jbrockmendel · 2019-08-20T14:42:49Z

@jorisvandenbossche you've mentioned some ops optimizations; any suggestions for things to add here?

jorisvandenbossche · 2019-08-20T15:16:55Z

pandas/core/sparse/frame.py

@@ -540,9 +540,6 @@ def _combine_frame(self, other, func, fill_value=None, level=None):
        this, other = self.align(other, join="outer", level=level, copy=False)
        new_index, new_columns = this.index, this.columns

-        if self.empty and other.empty:
-            return self._constructor(index=new_index).__finalize__(self)


this is covered by a test?

(in general, I would rather leave sparse alone, this is deprecated anyway)

Yes it is.

I understand the impulse to leave sparse alone, but ATM some of it is a barrier to simplifying ops code

As I was just saying on the other PR as well, there is also the option to just leave sparse as is, but still go forward with simplifying the ops code for non-sparse

This can also make it easier for you to focus only on the ops code for normal frame (without additional complexity of sharing with sparse)

Not sure I understand. core.ops functions call the DataFrame/SparseDataFrame methods, so anything I want to do to change the behavior of the non-sparse methods needs to be reflected in the sparse methods

jorisvandenbossche · 2019-08-20T15:21:34Z

Do you have an idea how much those changes matter?

(anyway, even if it is almost negligible, I think eg checking the actual dtype with the is_..._dtype functions is better (more explicit) than passing the "container that has a dtype" so an improvement anyway)

you've mentioned some ops optimizations; any suggestions for things to add here?

Not directly related to the ones you are doing here. I need to look it up again (I described it in the issue about the slowdown), but it was mainly about the way we iterate through the columns and how to reconstruct the DataFrame/Series back in the end, not so much the ops itself.

jbrockmendel · 2019-08-20T15:45:29Z

Do you have an idea how much those changes matter?

Haven't measured these specifically, but I know the is_foo_dtype checks get about a 2x speedup from passing dtype #27224

…allopts

jbrockmendel · 2019-08-20T21:36:32Z

reverted sparse changes, except for one cosmetic thing that annoyed me

jbrockmendel · 2019-08-21T16:19:33Z

updated to change if is_extension_array_dtype(right) and not is_scalar(right) to reverse the order of the two checks. if we happen to have a str, the is_extension_array_dtype(right) can be fairly expensive

…allopts

WillAyd · 2019-08-25T16:23:24Z

pandas/core/frame.py

@@ -5318,7 +5325,7 @@ def _arith_op(left, right):

    def _combine_match_index(self, other, func, level=None):
        left, right = self.align(other, join="outer", axis=0, level=level, copy=False)
-        assert left.index.equals(right.index)
+        # at this point we have `left.index.equals(right.index)`


Why do we want to remove the asserts here? If someone is that set on getting this optimization couldn't they just use the -OO flag?

i put these assertions in when trying to understand the ops code a while back. having a comment to the same effect performs the same task for the reader.

WillAyd · 2019-08-26T23:39:19Z

Thanks @jbrockmendel

jbrockmendel added 2 commits August 20, 2019 07:41

small ops optimizations

f27fb1e

remove some assertions

7ebd943

jorisvandenbossche reviewed Aug 20, 2019

View reviewed changes

jbrockmendel added 2 commits August 20, 2019 14:35

Merge branch 'master' of https://github.com/pandas-dev/pandas into sm…

64659a4

…allopts

Revert sparse changes

ea57dba

jbrockmendel added 2 commits August 20, 2019 15:55

dummy to force Travis

eb31258

do scalar check first

cc118a6

jorisvandenbossche approved these changes Aug 21, 2019

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into sm…

08098ce

…allopts

WillAyd reviewed Aug 25, 2019

View reviewed changes

WillAyd approved these changes Aug 26, 2019

View reviewed changes

WillAyd added the Clean label Aug 26, 2019

WillAyd added this to the 1.0 milestone Aug 26, 2019

WillAyd merged commit 3577746 into pandas-dev:master Aug 26, 2019

jbrockmendel deleted the smallopts branch August 27, 2019 00:19

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

CLN: small ops optimizations (pandas-dev#28036)

a747683

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

CLN: small ops optimizations (pandas-dev#28036)

d663226

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: small ops optimizations #28036

CLN: small ops optimizations #28036

jbrockmendel commented Aug 20, 2019

jorisvandenbossche Aug 20, 2019

jbrockmendel Aug 20, 2019

jorisvandenbossche Aug 20, 2019

jorisvandenbossche Aug 20, 2019

jbrockmendel Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jbrockmendel commented Aug 21, 2019

WillAyd Aug 25, 2019

jbrockmendel Aug 25, 2019

WillAyd commented Aug 26, 2019

CLN: small ops optimizations #28036

CLN: small ops optimizations #28036

Conversation

jbrockmendel commented Aug 20, 2019

jorisvandenbossche Aug 20, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 20, 2019

Choose a reason for hiding this comment

jorisvandenbossche Aug 20, 2019

Choose a reason for hiding this comment

jorisvandenbossche Aug 20, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 20, 2019

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jbrockmendel commented Aug 21, 2019

WillAyd Aug 25, 2019

Choose a reason for hiding this comment

jbrockmendel Aug 25, 2019

Choose a reason for hiding this comment

WillAyd commented Aug 26, 2019