-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.asof fails for all NaN Series (GH15713) #15758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15758 +/- ##
==========================================
- Coverage 91.01% 91.01% -0.01%
==========================================
Files 143 143
Lines 49395 49379 -16
==========================================
- Hits 44959 44940 -19
- Misses 4436 4439 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add tests for DataFrame? (both using a subset and not)?
pandas/tests/series/test_asof.py
Outdated
@@ -4,6 +4,8 @@ | |||
from pandas import (offsets, Series, notnull, | |||
isnull, date_range, Timestamp) | |||
|
|||
from pandas.util.testing import assert_series_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to import this, you can use it as tm.assert_series_equal(..)
as is done in the other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/series/test_asof.py
Outdated
@@ -148,3 +150,8 @@ def test_errors(self): | |||
s = Series(np.random.randn(N), index=rng) | |||
with self.assertRaises(ValueError): | |||
s.asof(s.index[0], subset='foo') | |||
|
|||
# series is all nans | |||
result = Series([np.nan]).asof([0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a separate test? (as it is not related to errors). Eg test_all_nans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/generic.py
Outdated
@@ -3971,6 +3971,9 @@ def asof(self, where, subset=None): | |||
if not isinstance(where, Index): | |||
where = Index(where) if is_list else Index([where]) | |||
|
|||
if self.isnull().values.all(): | |||
return pd.Series([np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work for a DataFrame (I mean: it will not be the correct return value, see the docstring)
Further, you can put this after the next line where nulls
is defined and reuse that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! thanks for the comments, feels great to learn how to contribute :)
somewhat separate. but do we have a guarantee on a not-found indexer?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note (for 0.20 / bug fix section)
pandas/tests/frame/test_asof.py
Outdated
|
||
def test_all_nans(self): | ||
# series is all nans | ||
result = DataFrame([np.nan]).asof([0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try these with non-defualt indexes and see what happens (your test will break)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and also, when you have a DataFrame with multiple columns, those columns should be preserved in the result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/generic.py
Outdated
@@ -3972,6 +3972,12 @@ def asof(self, where, subset=None): | |||
where = Index(where) if is_list else Index([where]) | |||
|
|||
nulls = self.isnull() if is_series else self[subset].isnull().any(1) | |||
if nulls.values.all(): | |||
if is_series: | |||
return pd.Series([np.nan]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to set the indexes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/generic.py
Outdated
@@ -3972,6 +3972,12 @@ def asof(self, where, subset=None): | |||
where = Index(where) if is_list else Index([where]) | |||
|
|||
nulls = self.isnull() if is_series else self[subset].isnull().any(1) | |||
if nulls.values.all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the .values
is still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values is still here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jorisvandenbossche ... I removed then put it back because I thought it generated a backward compatibility error. Currently the build breaks for Python 2.7.9. Now I saw it has nothing to do with it in Travis CI log: it's a "ci/lint.sh" exiting 1.
I will remove it again and see where the code is unformatted. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the .values
pandas/tests/series/test_asof.py
Outdated
|
||
def test_all_nans(self): | ||
# series is all nans | ||
result = Series([np.nan]).asof([0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a case not using zero as the argument?
And can you also add the case of a scalar, and of multiple values? (eg s.asof(10)
and s.asof([10, 11])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/frame/test_asof.py
Outdated
|
||
def test_all_nans(self): | ||
# series is all nans | ||
result = DataFrame([np.nan]).asof([0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and also, when you have a DataFrame with multiple columns, those columns should be preserved in the result
…results preserve columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some minor comments. ping when all green.
@@ -930,3 +930,5 @@ Bug Fixes | |||
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`) | |||
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`) | |||
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI in the future, if you put the whatnew notes in a blank space in Bug Fixes (these are on purpose), you wont' get merge conflicts
pandas/tests/series/test_asof.py
Outdated
# testing scalar input | ||
date = date_range('1/1/1990', periods=N * 3, freq='25s')[0] | ||
result = Series(np.nan, index=rng).asof(date) | ||
self.assertTrue(result != result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert isnull(result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/series/test_asof.py
Outdated
@@ -148,3 +148,23 @@ def test_errors(self): | |||
s = Series(np.random.randn(N), index=rng) | |||
with self.assertRaises(ValueError): | |||
s.asof(s.index[0], subset='foo') | |||
|
|||
def test_all_nans(self): | |||
# series is all nans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/frame/test_asof.py
Outdated
tm.assert_frame_equal(result, expected) | ||
|
||
def test_all_nans(self): | ||
# series is all nans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number as a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/frame/test_asof.py
Outdated
tm.assert_frame_equal(result, expected) | ||
|
||
def test_all_nans(self): | ||
# series is all nans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment needs updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done... thanks @jreback !!
lgtm. @chris-b1 if you can give a quick look (don't merge yet, going to merge a couple of things at once later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change needed for the DataFrame case I think
pandas/tests/frame/test_asof.py
Outdated
date = date_range('1/1/1990', periods=self.N * 3, freq='25s')[0] | ||
result = DataFrame(np.nan, index=self.rng, columns=['A']).asof(date) | ||
expected = DataFrame(np.nan, index=[date], columns=['A']) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a scalar input should result in a Series. That is at least the current behaviour for the working non-NaN case:
In [37]: df = pd.DataFrame(np.random.randn(2,2), index=[1,2], columns=['A', 'B'])
In [38]: df
Out[38]:
A B
1 -0.643872 1.375342
2 -0.223192 0.231439
In [39]: df.asof([3])
Out[39]:
A B
3 -0.223192 0.231439
In [40]: df.asof(3)
Out[40]:
A -0.223192
B 0.231439
Name: 3, dtype: float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you added is not the correct result I think. It are the original columns that are the index of the resulting series, not [where]
. The where
does become the name of the Series (i.e. as if you access a row from the dataframe)
Can you add the example above (but then with NaNs instead of the random data) as a test case? The it is really clear what the expected behaviour is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the last change I think to get it merged!
pandas/core/generic.py
Outdated
return pd.DataFrame(np.nan, index=where, | ||
columns=self.columns) | ||
else: | ||
return pd.Series(np.nan, index=[where]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should just be index=self.columns, name=where
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.. The test passes without name - no need to set it - index=self.columns is enough. Thanks @jorisvandenbossche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is because you wrote your test without it, then of course it will pass it without it.
The name is essential for a correct result, and should be added both to the code here as to the test.
As I showed before with this simple example of the current behaviour
In [1]: df = pd.DataFrame(np.random.randn(2,2), index=[1,2], columns=['A', 'B'])
In [2]: df
Out[2]:
A B
1 0.387517 -0.571258
2 -0.376436 0.604668
In [3]: df.asof(3)
Out[3]:
A -0.376436
B 0.604668
Name: 3, dtype: float64
you can clearly see that the name is the actual value you passed to asof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, sorry... just fixed it, it should work now... thanks again @jorisvandenbossche
pandas/core/generic.py
Outdated
@@ -3972,6 +3972,16 @@ def asof(self, where, subset=None): | |||
where = Index(where) if is_list else Index([where]) | |||
|
|||
nulls = self.isnull() if is_series else self[subset].isnull().any(1) | |||
if nulls.all(): | |||
if is_series: | |||
return pd.Series(np.nan, index=where) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct; should have name=self.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, @jreback , but when I experimented with a non-null series, I saw that it has no name. I.e.:
result = Series(np.random.randn(4), index=[1, 2, 3, 4]).asof([4, 5])
print result
returns
4 -0.558532
5 -0.558532
dtype: float64
......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that not correct. we always want to propogate the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me write the test case and fix for nan and non-nan inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback done here.. working on the request below, on simplifying the code
pandas/core/generic.py
Outdated
columns=self.columns) | ||
else: | ||
return pd.Series(np.nan, index=self.columns, name=where[0]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if you can simplify this logic a bit (maybe set the name where is_list
is used before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @jreback , I made a small simplification, pls check if that's ok... if it's ok, now I think everything is good to go
|
||
# test name is propagated | ||
result = Series(np.nan, index=[1, 2, 3, 4], name='test').asof([4, 5]) | ||
self.assertEqual(result.name, 'test') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a tm.assert_series_equal
with the expected result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'll do on merge
thanks @ucals ping on green. |
All green. @jreback , @jorisvandenbossche , thanks a lot! You guys are great. This was my first ever contribution to an open source project, and you guys gave me the right direction - feels very good to contribute and learn! I'll try to tackle an intermediate bug now |
thanks! |
closes bug pandas-dev#15713 Added the test if the series is all nans Added the code that check if that's the case: if yes, return the expected output Author: Carlos Souza <[email protected]> Closes pandas-dev#15758 from ucals/bug-fix-15713 and squashes the following commits: 0765108 [Carlos Souza] First simplification, code-block in the same place bb63964 [Carlos Souza] Propagating Series name af9a29b [Carlos Souza] Setting name of asof result when scalar input and all nan b8f078a [Carlos Souza] Small code standard change 7448b96 [Carlos Souza] Fixing scalar input a080b9b [Carlos Souza] Making scalar input return in a Series 04b7306 [Carlos Souza] Removing .values and formating code PEP8 3f9c7fd [Carlos Souza] Minor comments 70c958f [Carlos Souza] Added tests for non-default indexes, scalar and multiple inputs, and results preserve columns 6b745af [Carlos Souza] Adding DataFrame tests & support, and optimizing the code 89fb6cf [Carlos Souza] BUG pandas-dev#15713 fixing failing tests 17d1d77 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans! 4e26ab8 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans. c78d687 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans 676a4e5 [Carlos Souza] Test
git diff upstream/master | flake8 --diff
Added the test if the series is all nans
Added the code that check if that's the case: if yes, return the expected output
As this is my first contribution, please comment if I did it right :)
Thanks!