Skip to content

BUG: fix SparseSeries reindex by using Series implementation #15461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Feb 20, 2017

I abunded a bit with the tests for .reindex() (for .loc there were already quite a bit).

This is completely independent from the approach in #15448 , which also made sense but on which I will come back in a different PR. In the current PR, I'm just exploiting the fact that sparse SparseBlocks "know what to do", and the SparseSeries itself has nothing particular to do for reindexing.

@jreback jreback added the Sparse Sparse Data Type label Feb 20, 2017
@codecov-io
Copy link

codecov-io commented Feb 20, 2017

Codecov Report

Merging #15461 into master will increase coverage by 0.25%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15461      +/-   ##
==========================================
+ Coverage   91.07%   91.33%   +0.25%     
==========================================
  Files         136      137       +1     
  Lines       49165    51133    +1968     
==========================================
+ Hits        44775    46700    +1925     
- Misses       4390     4433      +43
Impacted Files Coverage Δ
pandas/sparse/series.py 95.07% <100%> (-0.12%)
pandas/io/gbq.py 25% <0%> (-58.34%)
pandas/util/testing.py 81.11% <0%> (-0.96%)
pandas/core/internals.py 93.64% <0%> (-0.52%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/core/frame.py 97.83% <0%> (-0.1%)
pandas/core/reshape.py 99.15% <0%> (-0.1%)
pandas/core/generic.py 96.25% <0%> (-0.07%)
pandas/formats/format.py 95.34% <0%> (-0.07%)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04e1168...9084246. Read the comment docs.

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 24, 2017
@jreback
Copy link
Contributor

jreback commented Feb 24, 2017

lgtm. just some minor doc-string issues.

@@ -570,8 +570,7 @@ def copy(self, deep=True):
return self._constructor(new_data, sparse_index=self.sp_index,
fill_value=self.fill_value).__finalize__(self)

def reindex(self, index=None, method=None, copy=True, limit=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the signature as it was
use a shared_doc to construct the doc-strings (see how its done in Series itself).

@@ -537,6 +542,28 @@ def test_loc_slice(self):
orig.loc['A':'B'].to_sparse())
tm.assert_sp_series_equal(sparse.loc[:'B'], orig.loc[:'B'].to_sparse())

def test_reindex(self):
orig = self.orig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number

tm.assert_sp_series_equal(res, exp)

# single element list (GH 15447)
res = sparse.reindex(['A'], level=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test copy and other params (limit)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for copy, others should be already tested in pandas/test/series/missing.py, e.g. line 659.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, this is True for Series, but NOT for SparseSeries

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant line 640

@jreback
Copy link
Contributor

jreback commented Feb 24, 2017

can you test with multiple elements as well on the .loc indexing (unless those tests cases already exist)

@toobaz
Copy link
Member Author

toobaz commented Feb 25, 2017

.loc with multiple elements is already present (same test).

@toobaz
Copy link
Member Author

toobaz commented Mar 2, 2017

(The failed test is apparenty a bug in CircleCI)

@jreback jreback added this to the 0.20.0 milestone Mar 2, 2017
@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

lgtm. ping on green (I restarted circleci).

@toobaz
Copy link
Member Author

toobaz commented Mar 2, 2017

Failed again! My code is apparently so bad it crashes the install.

@jreback
Copy link
Contributor

jreback commented Mar 3, 2017

yeah something happening, though they appear to be working now.

@toobaz
Copy link
Member Author

toobaz commented Mar 3, 2017

Don't know whether you restarted again, but it's again/still failed.

@toobaz
Copy link
Member Author

toobaz commented Mar 3, 2017

ping (still stuck)

@jreback
Copy link
Contributor

jreback commented Mar 3, 2017

why don't you push again (you can simply do git commit --all --amend -C HEAD to generate a new commit. This is the only PR where this is happening.

@toobaz toobaz force-pushed the drop_sparse_reindex branch from 75714f5 to c198fbd Compare March 3, 2017 16:29
@toobaz toobaz force-pushed the drop_sparse_reindex branch from c198fbd to d6a46da Compare March 3, 2017 17:03
@toobaz
Copy link
Member Author

toobaz commented Mar 3, 2017

That didn't help, but rebasing on master apparently did.

Returns
-------
reindexed : SparseSeries
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think we need a check additional kwargs are not pass (IOW, tolerance, level, fill_value). I am actually not sure what happens if you pass these (prob ignored)?

so need some tests for these (should prob just raise ValueError as not supported by SparseSeries)

further need to have somethin in the doc-string that show them (for Series) or NOT for SparseSeries.

so you need a substitution that can handle this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now using the same code as Series, why shouldn't they work?! Anyway: I have now added tests for fill_value and tolerance. It seems to me that level is untested even on normal Series, I can add tests for both in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I don't know if it will or won't work. sure in theory it will, but w/o tests hard to say.

Copy link
Member Author

@toobaz toobaz Mar 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prob ignored" is different from "untested". Anyway: ready to merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz you changed the API by allowing all of these other keywords. They were simply ignored before, now they are not. So obviously not tested. So they either need to be fully tested or raise NotImplementedError. So not ready to merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be sure I understand: by just removing some obsolete and buggy code, I fix multiple bugs (ignoring a documented argument is a bug too) and leave untested only an argument of which (even in the Series case) I can't even find a working example, let alone an unambiguous documentation. If you don't even have time to at least be more specific in your replies to my comments... well I don't have time neither.

@toobaz
Copy link
Member Author

toobaz commented Mar 5, 2017

I'ìll push to my personal master this PR and other commits I have ready (but didn't want to distract you from this); I will try as long as time allows to often rebase on pandas-dev's master. Feel free to cherry-pick when you are interested.

@toobaz toobaz closed this Mar 5, 2017
@jreback
Copy link
Contributor

jreback commented Mar 5, 2017

@toobaz that is a pretty poor attitude. fixing something is great. making more bugs is not.

@pandas-dev pandas-dev locked and limited conversation to collaborators Mar 5, 2017
@pandas-dev pandas-dev unlocked this conversation Mar 5, 2017
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 5, 2017

Jeff, I think @toobaz has shown good attitude in the past and put effort in a lot of issues/PRs. Please be both a bit more patient with each other.

If I understand the issue here, the thing is that now the kwargs are passed, and before they were not (so as a result, silently ignored). And those kwargs are not (fully) tested with a SparseSeries.
But at the same time, @jreback you are OK with the approach in this PR of handling things to super(SparseSeries, self).reindex(..) ? And this fixes some bugs?
If that is the case, let's go forward with this approach, but just not yet pass all the kwargs (just keep ignoring them) until someone adds tests for them? That seems like a simple way forward for this PR?

@toobaz Would you still like to make that small change? (not pass the kwargs, or ony pass those for which you added tests) Then I think we can certainly merge this?

leave untested only an argument of which (even in the Series case) I can't even find a working example, let alone an unambiguous documentation.

@toobaz about which argument are you talking here exactly?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2017

@toobaz I simply wanted you to raise NotImplemented on the addtl kwargs. Sure tests for those would be nice. Your fix was fine, but expanded the API to a bunch of non-tested (and undocumented) code that should work, but needs validation.

@toobaz
Copy link
Member Author

toobaz commented Mar 6, 2017

Thanks to both for looking again into this.

The only (as far as I can see from the docs) untested argument is level, so if there are other "addtl kwargs" I'm missing, please name them. level is (as far as I can see) untested also for regular Series, and the documentation is unclear (to me at least). It is not NotImplemented, it is not tested (again, to the best of my grepping), for SparseSeries exactly as for Series. For SparseSeries, before, it was a bug. Now, being still not tested, it might possibly be buggy. As stated above, I would be happy to take care of it in another PR, so that this one can finally be merged. That said, if in the meanwhile you want me to raise NotImplemented for that specific argument - even though before it was not implemented, while now we have no specific reason to think it won't work, sure, I can do it.

(Then my problem is clearly more general. In my PRs, I see more and more often 1) multiple subsequent revisions asking for things which were not previously asked 2) comments showing zero attention to my replies/to the code being discussed 3) the assumption that every missing test is my responsibility, even though ironically none of my PRs ever extended the pandas API, and all those that fix bugs do provide related tests. I'm not asking you two to open a general discussion on this, I'm just explaining the reason why I will not be able to spend 2 weeks on getting each trivial bug fix merged: the fact that the time I spend on pandas is not paid doesn't necessarily imply I like to waste it. That's it, feel free not to reply, see you in the next PR, which I hope will see more a constructive conversation.)

@jorisvandenbossche
Copy link
Member

level is (as far as I can see) untested also for regular Series, and the documentation is unclear (to me at least).

Small example in the docs: http://pandas.pydata.org/pandas-docs/stable/advanced.html#advanced-reindexing-and-alignment. I think the behaviour of reindex(level=) is rather clear (and I also use it regularly), but agree that the docstring explanation is not .. (will open an issue for that).
Always difficult to find tests in our gigantic test suite, but: there are a few in tests/test_multilevel.py (here and here), further also in tests/indexes/test_multi.py.

For SparseSeries, before, it was a bug. Now, being still not tested, it might possibly be buggy.

I don't know anything of the sparse implementation, so I cannot assess whether it is likely to work out of the box or not. But in any case, this is enabling those keywords (i.e. introducing new functionality that was previously not possible to do), so best to add tests in such a case to assert it is indeed working.
But let's do that indeed in another PR (and let's open an issue for that), and let's just not pass the kwarg here (leaving the situation exactly how it was) or raise an error (that's the same to me).

@jorisvandenbossche
Copy link
Member

I think the behaviour of reindex(level=) is rather clear

See #15590 for issue on the docstring for level. But actually, this is not that clear as I thought it was :-)
I always used it to select a level on the calling object (and therefore I thought the docstring explanation was confusing), but actually it also works on the passed Index .. Will comment about that in #15590

@jorisvandenbossche jorisvandenbossche changed the title Drop sparse reindex BUG: fix SparseSeries reindex by using Series implementation Mar 6, 2017
@toobaz
Copy link
Member Author

toobaz commented Mar 6, 2017

Thanks for the reference to the tests and docs I had missed. Actually, while "exploring" more... I noticed that my PR does introduce a test for level, since before I was asked for it :-) And this happens (and I didn't remember) precisely because I copied the tests from Series and they just worked. So my grepping was really bad, but the PR should be good?

@jreback jreback closed this in c52ff68 Mar 7, 2017
@jreback
Copy link
Contributor

jreback commented Mar 7, 2017

thanks @toobaz

@toobaz toobaz deleted the drop_sparse_reindex branch March 7, 2017 23:19
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#15447

Author: Pietro Battiston <[email protected]>

Closes pandas-dev#15461 from toobaz/drop_sparse_reindex and squashes the following commits:

9084246 [Pietro Battiston] Test SparseSeries.reindex with fill_value and nearest
d6a46da [Pietro Battiston] Use _shared_docs for documentation
922c7b0 [Pietro Battiston] Test "copy" argument
af99190 [Pietro Battiston] Whatsnew
7945cb4 [Pietro Battiston] Tests for .loc() and .reindex() on sparse series with MultiIndex
55b99f8 [Pietro Battiston] BUG: Drop faulty and redundant reindex() for SparseSeries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong result of pandas.sparse.series.SparseSeries.loc with indexer of length 1
4 participants