-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix SparseSeries reindex by using Series implementation #15461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15461 +/- ##
==========================================
+ Coverage 91.07% 91.33% +0.25%
==========================================
Files 136 137 +1
Lines 49165 51133 +1968
==========================================
+ Hits 44775 46700 +1925
- Misses 4390 4433 +43
Continue to review full report at Codecov.
|
lgtm. just some minor doc-string issues. |
pandas/sparse/series.py
Outdated
@@ -570,8 +570,7 @@ def copy(self, deep=True): | |||
return self._constructor(new_data, sparse_index=self.sp_index, | |||
fill_value=self.fill_value).__finalize__(self) | |||
|
|||
def reindex(self, index=None, method=None, copy=True, limit=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave the signature as it was
use a shared_doc to construct the doc-strings (see how its done in Series itself).
pandas/tests/sparse/test_indexing.py
Outdated
@@ -537,6 +542,28 @@ def test_loc_slice(self): | |||
orig.loc['A':'B'].to_sparse()) | |||
tm.assert_sp_series_equal(sparse.loc[:'B'], orig.loc[:'B'].to_sparse()) | |||
|
|||
def test_reindex(self): | |||
orig = self.orig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number
tm.assert_sp_series_equal(res, exp) | ||
|
||
# single element list (GH 15447) | ||
res = sparse.reindex(['A'], level=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test copy
and other params (limit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test for copy
, others should be already tested in pandas/test/series/missing.py
, e.g. line 659.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, this is True for Series, but NOT for SparseSeries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant line 640
can you test with multiple elements as well on the |
|
(The failed test is apparenty a bug in CircleCI) |
lgtm. ping on green (I restarted circleci). |
Failed again! My code is apparently so bad it crashes the install. |
yeah something happening, though they appear to be working now. |
Don't know whether you restarted again, but it's again/still failed. |
ping (still stuck) |
why don't you push again (you can simply do |
75714f5
to
c198fbd
Compare
c198fbd
to
d6a46da
Compare
That didn't help, but rebasing on |
Returns | ||
------- | ||
reindexed : SparseSeries | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think we need a check additional kwargs are not pass (IOW, tolerance
, level
, fill_value
). I am actually not sure what happens if you pass these (prob ignored)?
so need some tests for these (should prob just raise ValueError
as not supported by SparseSeries
)
further need to have somethin in the doc-string that show them (for Series
) or NOT for SparseSeries
.
so you need a substitution that can handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now using the same code as Series
, why shouldn't they work?! Anyway: I have now added tests for fill_value
and tolerance
. It seems to me that level
is untested even on normal Series
, I can add tests for both in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I don't know if it will or won't work. sure in theory it will, but w/o tests hard to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"prob ignored" is different from "untested". Anyway: ready to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz you changed the API by allowing all of these other keywords. They were simply ignored before, now they are not. So obviously not tested. So they either need to be fully tested or raise NotImplementedError. So not ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to be sure I understand: by just removing some obsolete and buggy code, I fix multiple bugs (ignoring a documented argument is a bug too) and leave untested only an argument of which (even in the Series
case) I can't even find a working example, let alone an unambiguous documentation. If you don't even have time to at least be more specific in your replies to my comments... well I don't have time neither.
I'ìll push to my personal |
@toobaz that is a pretty poor attitude. fixing something is great. making more bugs is not. |
Jeff, I think @toobaz has shown good attitude in the past and put effort in a lot of issues/PRs. Please be both a bit more patient with each other. If I understand the issue here, the thing is that now the kwargs are passed, and before they were not (so as a result, silently ignored). And those kwargs are not (fully) tested with a SparseSeries. @toobaz Would you still like to make that small change? (not pass the kwargs, or ony pass those for which you added tests) Then I think we can certainly merge this?
@toobaz about which argument are you talking here exactly? |
@toobaz I simply wanted you to raise |
Thanks to both for looking again into this. The only (as far as I can see from the docs) untested argument is (Then my problem is clearly more general. In my PRs, I see more and more often 1) multiple subsequent revisions asking for things which were not previously asked 2) comments showing zero attention to my replies/to the code being discussed 3) the assumption that every missing test is my responsibility, even though ironically none of my PRs ever extended the pandas API, and all those that fix bugs do provide related tests. I'm not asking you two to open a general discussion on this, I'm just explaining the reason why I will not be able to spend 2 weeks on getting each trivial bug fix merged: the fact that the time I spend on pandas is not paid doesn't necessarily imply I like to waste it. That's it, feel free not to reply, see you in the next PR, which I hope will see more a constructive conversation.) |
Small example in the docs: http://pandas.pydata.org/pandas-docs/stable/advanced.html#advanced-reindexing-and-alignment. I think the behaviour of reindex(level=) is rather clear (and I also use it regularly), but agree that the docstring explanation is not .. (will open an issue for that).
I don't know anything of the sparse implementation, so I cannot assess whether it is likely to work out of the box or not. But in any case, this is enabling those keywords (i.e. introducing new functionality that was previously not possible to do), so best to add tests in such a case to assert it is indeed working. |
See #15590 for issue on the docstring for level. But actually, this is not that clear as I thought it was :-) |
Thanks for the reference to the tests and docs I had missed. Actually, while "exploring" more... I noticed that my PR does introduce a test for |
thanks @toobaz |
closes pandas-dev#15447 Author: Pietro Battiston <[email protected]> Closes pandas-dev#15461 from toobaz/drop_sparse_reindex and squashes the following commits: 9084246 [Pietro Battiston] Test SparseSeries.reindex with fill_value and nearest d6a46da [Pietro Battiston] Use _shared_docs for documentation 922c7b0 [Pietro Battiston] Test "copy" argument af99190 [Pietro Battiston] Whatsnew 7945cb4 [Pietro Battiston] Tests for .loc() and .reindex() on sparse series with MultiIndex 55b99f8 [Pietro Battiston] BUG: Drop faulty and redundant reindex() for SparseSeries
git diff master | flake8 --diff
I abunded a bit with the tests for
.reindex()
(for.loc
there were already quite a bit).This is completely independent from the approach in #15448 , which also made sense but on which I will come back in a different PR. In the current PR, I'm just exploiting the fact that sparse
SparseBlock
s "know what to do", and theSparseSeries
itself has nothing particular to do for reindexing.