REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

jbrockmendel · 2019-08-20T03:02:38Z

I think that after this we're not far from being able to use the base class versions of _combine_frame, _combine_match_index, _combine_match_columns, and _combine_const. That'll be a good day.

The only actual behavior changed is in SparseDataFrame._combine_match_columns, which is changed to match the other methods. See #28025

…ockops2

jorisvandenbossche · 2019-08-20T15:27:41Z

The only actual behavior changed is in SparseDataFrame._combine_match_columns, which is changed to match the other methods. See #28025

As mentioned in the issue: we could also leave the SparseDataFrame code alone as is (it's deprecated, so I don't think we should change behaviour, and I would personally also not put effort in cleaning it up), but still refactor the DataFrame ops (which would mean less sharing of code, for now, but I don't think that is a problem since the sparse code will be removed anyway, so then there will be no duplication anymore)

jbrockmendel · 2019-08-20T15:47:36Z

(which would mean less sharing of code, for now, but I don't think that is a problem since the sparse code will be removed anyway, so then there will be no duplication anymore)

If I understand the suggestion correctly, this would involve copy/pasting all of the ops-based code that is currently shared. Or do you have something else in mind?

jorisvandenbossche · 2019-08-20T18:52:12Z

If I understand the suggestion correctly, this would involve copy/pasting all of the ops-based code that is currently shared. Or do you have something else in mind?

Yes, copy/paste all ops-based code that sparse needs and is currently shared with the other frame ops to the sparse module (and then we basically don't touch that code anymore until it is removed), and then you are free to refactor the ops code without needing to care about the deprecated sparse (and eg don't need to care about the small differences like #28025)

(note this is just talk without looking at the actual code, I don't know if it is practically feasible. I just wouldn't care in this specific case about temporarily duplicating some code if that allows to refactor more easily the code that we are going to keep).

…ockops2

jbrockmendel · 2019-08-20T19:00:23Z

Thanks for clarifying. Does the status quo for SparseDataFrame include bugfixes until it is removed? If so, that would push towards continuing to share code.

jorisvandenbossche · 2019-08-20T19:11:23Z

Does the status quo for SparseDataFrame include bugfixes until it is removed?

Well, if somebody wants to do a PR for a bug fix, that is of course welcome (and non-shared code can still receive bugfixes). But if you ask me if I find it worth your time to do bug fixes for SparseDataFrame, then I would say no. Even if it would be benefiting from a generic bug fix that you do in the ops code, I don't think it is a problem that sparse wouldn't automatically get that bug fix anymore

jbrockmendel · 2019-08-20T19:42:21Z

I think I understand the suggestion, thanks. Would you be OK with moving forward with the continue-sharing approach in this PR and in the next pass I'll try to use the approach you've described?

jorisvandenbossche · 2019-08-20T19:53:46Z

Would you be OK with moving forward with the continue-sharing approach in this PR and in the next pass I'll try to use the approach you've described?

If we would go with a non-sharing appraoch, then I personally think we should do it already now and remove the sparse changes here. But that is still an "if" of course.
(I understood correctly that this PR was changing behaviour right?)

jbrockmendel · 2019-08-20T19:58:32Z

(I understood correctly that this PR was changing behaviour right?)

This PR changes behavior, but we have not established that the behavior in question was intentional.

then I personally think we should

I understand your preference. My question is if the strength of that preference would make you -1 on moving forward with the less-preferred option (this) (conditional on resolving the behavior change)

jorisvandenbossche · 2019-08-20T20:08:11Z

My question is if the strength of that preference would make you -1 on moving forward with the less-preferred option (this) (conditional on resolving the behavior change)

Well, that depends on the potential behaviour change .. :)

we have not established that the behavior in question was intentional.

In the issue, I asked about a code example, to be able to see what the change is in an example (and not in a theoretical explanation of the code that I can only understand by reading and understanding the code). But that is actually another reason to stop the sharing, then we wouldn't need to need to solve this question, we could just ignore the potential difference with SparseDataFrame, and use our mental energy for discussing more interesting things :-)

To be clear, I don't really know what effort it would take to not share the code (I would think that, in this PR, it would mean reverting the changes in the sparse code, and copying (the original version of) some of the methods in that non-sparse code that you changed to the sparse code; but I didn't try it of course), so I do maybe a bit too lightly about it

…ockops2

jbrockmendel · 2019-08-23T00:17:34Z

@jreback if you're around, thoughts? after more experimentation, im increasing convinced that a) the small behavior change is probably slightly more correct, b) it really doesn't matter, and c) it is possible to rip out a lot of the sparse ops code after implementing _construct_result

…ockops2

jreback · 2019-08-23T04:21:40Z

@jreback if you're around, thoughts? after more experimentation, im increasing convinced that a) the small behavior change is probably slightly more correct, b) it really doesn't matter, and c) it is possible to rip out a lot of the sparse ops code after implementing _construct_result

agree on all points here; +1 on consolidating code and then simplifying ops. trying to isolate sparse via copy paste i think is just a mess and not worth any effort.

jorisvandenbossche · 2019-08-23T06:37:29Z

im increasing convinced that a) the small behavior change is probably slightly more correct,

Then again, can you try to explain (show) the behaviour change?

jbrockmendel · 2019-08-23T15:25:12Z

can you try to explain (show) the behaviour change?

df = pd.DataFrame([0])
sdf = df.to_sparse(fill_value=1)

# unchanged
>>> sdf.add(sdf[0], axis=0).default_fill_value
2.0

>>> (sdf + sdf[0]).default_fill_value
2.0  #  <-- PR
1    # <-- master

…ockops2

TomAugspurger · 2019-08-23T20:41:53Z

Essentially, binops propagate the default fill value more consistently?

It can probably be called a bug fix (sorry about the close and reopen).

…ockops2

TomAugspurger · 2019-08-27T18:30:49Z

OK, looking at this again, I think the different default_fill_value for ops between two dataframes vs. a dataframe and a series is a bug.

@jbrockmendel can you merge master to fix the merge conflicts?

@jorisvandenbossche any objections to declaring #28027 (comment) a bug?

jbrockmendel · 2019-08-27T18:41:25Z

can you merge master to fix the merge conflicts?

done

pandas/core/frame.py

jreback · 2019-09-02T21:25:11Z

pandas/core/frame.py

+        #  non-unique columns case
+        out.columns = self.columns
+        return out
+        # TODO: finalize?  we do for SparseDataFrame


finalize doesn't do really anything for frames, prob can't hurt though

especially in the frame+frame case it isn't obvious that is correct; I'd prefer to revisit

In principle we should finalize I think, but since it was not done right now, fine to leave it for later?

…ockops2

TomAugspurger

LGTM. Any other comments @jreback?

jbrockmendel · 2019-09-06T00:10:30Z

gentle ping. This and #28268 are pseudo-blockers for implementing blockwise arithmetic.

(psuedo-blocker as in not actual blockers, but will make the implementation much easier)

jbrockmendel · 2019-09-09T01:35:28Z

@jreback if we have to choose one more to get in more the weekend productivity streak ends, this should be it

…ockops2

TomAugspurger · 2019-09-12T17:10:43Z

@jreback I think this is blocking #28414. Thoughts on merging?

jreback · 2019-09-12T19:29:58Z

yeah this was fine (got lost in the queue). maybe rebase @jbrockmendel to be sure.

…ockops2

jorisvandenbossche · 2019-09-12T20:07:51Z

Did we decide on removing SparseDataFrame?
In which case this PR might not be needed (and the one it is blocking maybe also not), but at the same time then I also don't care about changing behaviour of SparseDataFrame ;) (or fixing bugs, as it turned out to be)

TomAugspurger · 2019-09-12T20:34:14Z

I think there were no objections to removing SparseSeries and SparseDataFrame for 1.0.

…

On Thu, Sep 12, 2019 at 3:09 PM Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/frame.py <#28027 (comment)>: > + new_data = ops.dispatch_to_series(self, other, func) + return self._construct_result(other, new_data, func) + + def _construct_result(self, other, result, func): + """ + Compat for DataFrame/SparseDataFrame op result wrapping. + + `func` is included for compat with SparseDataFrame signature, is not + needed here. + """ + out = self._constructor(result, index=self.index, copy=False) + # Pin columns instead of passing to constructor for compat with + # non-unique columns case + out.columns = self.columns + return out + # TODO: finalize? we do for SparseDataFrame In principle we should finalize I think, but since it was not done right now, fine to leave it for later? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#28027?email_source=notifications&email_token=AAKAOIRYPBPTJHTRAAI5L5DQJKOYRA5CNFSM4INMR4GKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCESZICA#discussion_r323927166>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITR2YW7ZKPA6N4UW5TQJKOYRANCNFSM4INMR4GA> .

jbrockmendel · 2019-09-12T21:05:48Z

This should be merged regardless of the decision about removing SparseDataFrame.

The edits here are worthwhile for DataFrame alone
if we are going to remove SparseDataFrame that will take a while, during which this will block other ops work

TomAugspurger · 2019-09-12T21:06:55Z

Yep, let's merge this.

…

On Thu, Sep 12, 2019 at 4:05 PM jbrockmendel ***@***.***> wrote: This should be merged regardless of the decision about removing SparseDataFrame. 1. The edits here are worthwhile for DataFrame alone 2. if we are going to remove SparseDataFrame that will take a while, during which this will block other ops work — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#28027?email_source=notifications&email_token=AAKAOIXECOSQ4S7J73YRKGTQJKVLJA5CNFSM4INMR4GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6TIJKY#issuecomment-531006635>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRLCGDPWMADDIA4T63QJKVLJANCNFSM4INMR4GA> .

jreback · 2019-09-17T12:40:55Z

thanks @jbrockmendel

@TomAugspurger you will likely need to rebase on the removals PR

…v#28027)

jbrockmendel added 4 commits August 19, 2019 18:55

implement _construct_result

8e1ad04

Change dispatch_to_series return, restore alignment

bdc0d72

remove comment

7a4b8c2

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

3505e6a

…ockops2

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

63c5f80

…ockops2

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

bf0d2bf

…ockops2

jbrockmendel added 2 commits August 22, 2019 19:56

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

c235b6a

…ockops2

pin columns

6cf996d

patch _default_fill_value

d867034

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

ec3b4f5

…ockops2

TomAugspurger closed this Aug 23, 2019

TomAugspurger reopened this Aug 23, 2019

jbrockmendel added 3 commits August 23, 2019 14:12

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

94b5dbd

…ockops2

pin default_fill_value after align

ca76be4

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

2620ef4

…ockops2

jreback added the Sparse Sparse Data Type label Sep 2, 2019

jreback requested changes Sep 2, 2019

View reviewed changes

jbrockmendel added 3 commits September 2, 2019 16:33

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

4c836d3

…ockops2

improve docstring

45f5419

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

ceeb1d2

…ockops2

TomAugspurger approved these changes Sep 3, 2019

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

ad53e94

…ockops2

jbrockmendel mentioned this pull request Sep 12, 2019

REF: remove unnecessary SparseDataFrame arith methods #28414

Closed

jreback added this to the 1.0 milestone Sep 12, 2019

jreback approved these changes Sep 12, 2019

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into bl…

cffa624

…ockops2

jreback merged commit 416931e into pandas-dev:master Sep 17, 2019

jbrockmendel deleted the blockops2 branch September 17, 2019 14:03

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: standardize usage in DataFrame vs SparseDataFrame ops (pandas-de…

49ddef9

…v#28027)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

REF: standardize usage in DataFrame vs SparseDataFrame ops (pandas-de…

1b16a5e

…v#28027)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 23, 2019

jreback commented Aug 23, 2019

jorisvandenbossche commented Aug 23, 2019

jbrockmendel commented Aug 23, 2019

TomAugspurger commented Aug 23, 2019 •

edited

Loading

TomAugspurger commented Aug 27, 2019

jbrockmendel commented Aug 27, 2019

jreback Sep 2, 2019

jbrockmendel Sep 2, 2019

jorisvandenbossche Sep 12, 2019

TomAugspurger left a comment

jbrockmendel commented Sep 6, 2019

jbrockmendel commented Sep 9, 2019

TomAugspurger commented Sep 12, 2019

jreback commented Sep 12, 2019

jorisvandenbossche commented Sep 12, 2019

TomAugspurger commented Sep 12, 2019 via email

jbrockmendel commented Sep 12, 2019

TomAugspurger commented Sep 12, 2019 via email

jreback commented Sep 17, 2019

REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

Conversation

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 23, 2019

jreback commented Aug 23, 2019

jorisvandenbossche commented Aug 23, 2019

jbrockmendel commented Aug 23, 2019

TomAugspurger commented Aug 23, 2019 • edited Loading

TomAugspurger commented Aug 27, 2019

jbrockmendel commented Aug 27, 2019

jreback Sep 2, 2019

Choose a reason for hiding this comment

jbrockmendel Sep 2, 2019

Choose a reason for hiding this comment

jorisvandenbossche Sep 12, 2019

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 6, 2019

jbrockmendel commented Sep 9, 2019

TomAugspurger commented Sep 12, 2019

jreback commented Sep 12, 2019

jorisvandenbossche commented Sep 12, 2019

TomAugspurger commented Sep 12, 2019 via email

jbrockmendel commented Sep 12, 2019

TomAugspurger commented Sep 12, 2019 via email

jreback commented Sep 17, 2019

TomAugspurger commented Aug 23, 2019 •

edited

Loading