REF: de-duplicate DataFrame/SparseDataFrame arithmetic code #23414

jbrockmendel · 2018-10-30T00:32:01Z

Not quite all the way de-duplicated, but this is as much as I could do without it getting convoluted.

…ith2

pep8speaks · 2018-10-30T00:32:07Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/frame.py !
There are no PEP8 issues in the file pandas/core/ops.py !
There are no PEP8 issues in the file pandas/core/sparse/frame.py !

codecov · 2018-10-30T01:10:04Z

Codecov Report

❗ No coverage uploaded for pull request base (master@592fd64). Click here to learn what that means.
The diff coverage is 97.22%.

@@            Coverage Diff            @@
##             master   #23414   +/-   ##
=========================================
  Coverage          ?   92.23%           
=========================================
  Files             ?      161           
  Lines             ?    51324           
  Branches          ?        0           
=========================================
  Hits              ?    47338           
  Misses            ?     3986           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.62% <97.22%> (?)`
#single	`42.3% <22.22%> (?)`

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`94.64% <100%> (ø)`
pandas/core/frame.py	`97.03% <100%> (ø)`
pandas/core/ops.py	`94.24% <80%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 592fd64...7ecb03f. Read the comment docs.

pandas/core/frame.py

pandas/core/sparse/frame.py

pandas/core/frame.py

…ith2

pandas/core/frame.py

jreback · 2018-11-01T00:14:37Z

pandas/core/sparse/frame.py

+                                columns=self.columns,
+                                default_fill_value=fill_value)
+
+        return res.__finalize__(self)


why would this call finalize? why would the DataFrame one not?

No idea, that's why I opened #23028.

ok, we basically need to always call __finalize__ after every op (binary or unary). It prob isn't done consistenly. So I would do it for both here.

I'll do this, but am not wild about it. Current __finalize__/_metadata handling is quarter-assed, and I'd prefer to leave it that way until it is specifically addressed than make it half-assed. Still though, will update.

Hmm this actually makes things less consistent within DataFrame (though slightly more consistent between DataFrame - SparseDataFrame) since lots of ops dont go through this path.

can you just call the super function here?

The DataFrame constructor doesn't take a fill_value argument. We would end up special-casing within the DataFrame method in a way equivalent to (but messier than and less performant than) overriding in the appropriate subclass.

pandas/core/sparse/frame.py

jreback · 2018-11-01T00:16:33Z

pandas/core/sparse/frame.py

+
+        fill_value = self._get_op_result_fill_value(other, func, axis)
+
+        res = self._constructor(result, index=self.index,


why are you passing columns to the constructor here, whereas in DataFrame you are setting?

I'm very specifically not changing the existing behavior.

well, let's do that then. in reality you could actually call the superclass (maybe should just do that), if you have the ability to pass a fill_value

…ith2

jreback

this is too much special casing on the dispatch function, pls try to unify these.

jreback · 2018-11-03T14:20:37Z

pandas/core/frame.py

+        result = self._constructor(result, index=self.index, copy=False)
+        # Pin columns instead of passing to constructor for compat with
+        # non-unique columns case
+        result.columns = self.columns


does this need to call __finalize__

This maintains the current behavior.

At some point we can/should make a concerted effort to be internally-consistent about calling __finalize__, but that is a large, orthogonal undertaking. Calling it here without doing it elsewhere would make things more inconsistent.

jbrockmendel · 2018-11-03T18:30:35Z

this is too much special casing on the dispatch function, pls try to unify these.

The two versions of this method are:

    def _wrap_dispatched_op(self, result, other, func, axis=None):
        result = self._constructor(result, index=self.index, copy=False)
        result.columns = self.columns
        return result

    def _wrap_dispatched_op(self, result, other, func, axis=None):
        fill_value = self._get_op_result_fill_value(other, func, axis)

        res = self._constructor(result, index=self.index,
                                columns=self.columns,
                                default_fill_value=fill_value)

        return res.__finalize__(self)

The only way I can see to combine these would be:

    def _wrap_dispatched_op(self, result, other, func, axis=None):
        if isinstance(self, ABCSparseDataFrame):
            [the version currently in SparseDataFrame]
        else:
            [the version currently in DataFrame]

The point of implementing this method is because it is the part of the apply_index/apply_columns/etc that cant be shared. With this in place, then we have a real shot at de-duplicating the rest of those methods.

jreback · 2018-11-11T23:59:14Z

can merge master and i'll look again.

…ith2

jbrockmendel · 2018-11-26T23:44:25Z

The actually-useful step is the one after this: using _wrap_dispatched_op to make SparseDataFrame._combine_frmae, _combine_match_index, _combine_match_columns, and _combine_const unncessary. But for reasons I'm not clear on, doing so breaks a bunch of tests. Since SparseDataFrame may be on its way out anyway, closing.

jbrockmendel added 10 commits October 29, 2018 10:20

implement wrap_dispatched_op

cb79d12

typo fixup

9c8a9c7

fixup copy/paste mixup

32a327b

use wrap_dispatched_op in sparse

e737d45

remove unnecessary errors kwarg

08f80fa

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

7f7ed51

…ith2

pass str_rep

ea4ce51

fix double wrapping

4d99777

revert non-central changes

f4c99ed

revert str_rep

7810dc6

sinhrks added Sparse Sparse Data Type Clean labels Oct 30, 2018

jbrockmendel commented Oct 30, 2018

View reviewed changes

pandas/core/frame.py Show resolved Hide resolved

jreback requested changes Oct 30, 2018

View reviewed changes

pandas/core/sparse/frame.py Show resolved Hide resolved

pandas/core/frame.py Show resolved Hide resolved

jbrockmendel added 3 commits October 30, 2018 10:13

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

635c575

…ith2

docstring fixup, validate axis

bf88f23

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

2ddb05a

…ith2

jreback requested changes Nov 1, 2018

View reviewed changes

jbrockmendel added 4 commits October 31, 2018 17:30

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

86fa938

…ith2

use dict comprehensions

3c2f323

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

8f3e9e9

…ith2

dict comphrension

63544b4

jreback requested changes Nov 3, 2018

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…

7ecb03f

…ith2

jbrockmendel closed this Nov 26, 2018

jbrockmendel deleted the arith2 branch April 5, 2020 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: de-duplicate DataFrame/SparseDataFrame arithmetic code #23414

REF: de-duplicate DataFrame/SparseDataFrame arithmetic code #23414

jbrockmendel commented Oct 30, 2018

pep8speaks commented Oct 30, 2018

codecov bot commented Oct 30, 2018 •

edited

Loading

jreback Nov 1, 2018

jbrockmendel Nov 1, 2018

jreback Nov 1, 2018

jbrockmendel Nov 1, 2018

jbrockmendel Nov 1, 2018

jreback Nov 2, 2018

jbrockmendel Nov 2, 2018

jreback Nov 1, 2018

jbrockmendel Nov 1, 2018

jreback Nov 1, 2018

jreback left a comment

jreback Nov 3, 2018

jbrockmendel Nov 3, 2018

jbrockmendel commented Nov 3, 2018

jreback commented Nov 11, 2018

jbrockmendel commented Nov 26, 2018


		fill_value = self._get_op_result_fill_value(other, func, axis)

		res = self._constructor(result, index=self.index,

REF: de-duplicate DataFrame/SparseDataFrame arithmetic code #23414

REF: de-duplicate DataFrame/SparseDataFrame arithmetic code #23414

Conversation

jbrockmendel commented Oct 30, 2018

pep8speaks commented Oct 30, 2018

codecov bot commented Oct 30, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 3, 2018

jreback commented Nov 11, 2018

jbrockmendel commented Nov 26, 2018

codecov bot commented Oct 30, 2018 •

edited

Loading