Skip to content

API: Return sparse objects always for cumsum #14771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Nov 30, 2016

Always return SparseArray and SparseSeries for SparseArray.cumsum() and SparseSeries.cumsum() respectively, regardless of fill_value.

Close #12855.

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Nov 30, 2016
"""
nv.validate_cumsum(args, kwargs)

# TODO: gh-12855 - return a SparseArray here
if notnull(self.fill_value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use self._null_fill_value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Returns
-------
cumsum : SparseSeries if `self` has a null `fill_value` and a
generic Series otherwise
cumsum : SparseSeries
"""
nv.validate_cumsum(args, kwargs)
new_array = SparseArray.cumsum(self.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because self.values is SparseArray, this wrapping looks unnecessary.

Copy link
Member Author

@gfyoung gfyoung Nov 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

@codecov-io
Copy link

codecov-io commented Dec 1, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14771 into master will increase coverage by <.01%

@@             master     #14771   diff @@
==========================================
  Files           144        144          
  Lines         50981      50983     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43470      43472     +2   
  Misses         7511       7511          
  Partials          0          0          

Powered by Codecov. Last update 81a2f79...83314fc

@gfyoung gfyoung force-pushed the sparse-return-type branch from 6ab170f to 39eb909 Compare December 2, 2016 03:32

Parameters
----------
axis : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for compat? why are you writing it in the doc-string? (rather than raising on it if its not 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : The parameter was in the signature but undocumented. I believe it is for compat since SparseDataFrame actually uses this argument.

Copy link
Member Author

@gfyoung gfyoung Dec 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : Should we just do validation as part of this PR for both SparseArray and SparseSeries then?


Returns
-------
cumsum : SparseSeries if `self` has a null `fill_value` and a
generic Series otherwise
cumsum : SparseSeries
"""
nv.validate_cumsum(args, kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. I would do axis = self._get_axis_number(axis) (which validates).

and something similar to SparseArray.

We normally don't list these in the doc-string Parameters (even though they are in the signature)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, though it was misleading because I had initially assumed the axis parameter did something, even though it doesn't do anything in this implementation.

if notnull(self.fill_value):
return self.to_dense().cumsum()
if not self._null_fill_value:
return SparseArray(self.to_dense()).cumsum()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side thing. we should use _constructor (and define it for SparseArray).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can make a separate issue / do later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good as a follow-up.

----------
axis : int
Axis over which to perform the cumulative summation. This
parameter is ignored because `SparseSeries` is 1-D, but it
Copy link
Contributor

@jreback jreback Dec 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you are not matching the axis parameter for Series?

Help on function cumsum in module pandas.core.series:

cumsum(self, axis=None, skipna=True, *args, **kwargs)
    Return cumulative sum over requested axis.
    
    Parameters
    ----------
    axis : {index (0)}
    skipna : boolean, default True
        Exclude NA/null values. If an entire row/column is NA, the result
        will be NA
    
    Returns
    -------
    cumsum : scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason really. Done.

@gfyoung gfyoung force-pushed the sparse-return-type branch 2 times, most recently from ac042da to a472432 Compare December 6, 2016 16:34
@gfyoung
Copy link
Member Author

gfyoung commented Dec 8, 2016

@jreback : I addressed all of the comments, and everything is passing. Ready to merge if there are no other concerns.


Parameters
----------
axis : 0 or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it cannot be None, must be 0 (as you set the default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# TODO: gh-12855 - return a SparseSeries here
return Series(new_array, index=self.index).__finalize__(self)
if axis is not None:
self._get_axis_number(axis) # Unused but hould be valid!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only do

axis=self._get_axis_number(axis) and remove the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Always return SparseArray and SparseSeries for
SparseArray.cumsum() and SparseSeries.cumsum()
respectively, regardless of fill_value.

Close pandas-devgh-12855.
@gfyoung
Copy link
Member Author

gfyoung commented Dec 14, 2016

@jreback : Applied requested changes, and all is green. Ready to merge if there are no other concerns.

@jreback jreback added this to the 0.20.0 milestone Dec 15, 2016
@jreback
Copy link
Contributor

jreback commented Dec 15, 2016

lgtm.

@jreback jreback closed this in 33e11ad Dec 15, 2016
@jreback
Copy link
Contributor

jreback commented Dec 15, 2016

thanks!

@gfyoung gfyoung deleted the sparse-return-type branch December 15, 2016 16:06
ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016
Always return `SparseArray` and `SparseSeries` for
`SparseArray.cumsum()` and `SparseSeries.cumsum()` respectively,
regardless of `fill_value`.

Closes pandas-dev#12855.

Author: gfyoung <[email protected]>

Closes pandas-dev#14771 from gfyoung/sparse-return-type and squashes the following commits:

83314fc [gfyoung] API: Return sparse objects always for cumsum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants