-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.asof fails for all NaN Series (GH15713) #15758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
676a4e5
c78d687
4e26ab8
17d1d77
89fb6cf
6b745af
70c958f
3f9c7fd
04b7306
a080b9b
7448b96
b8f078a
af9a29b
bb63964
0765108
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3972,6 +3972,12 @@ def asof(self, where, subset=None): | |
where = Index(where) if is_list else Index([where]) | ||
|
||
nulls = self.isnull() if is_series else self[subset].isnull().any(1) | ||
if nulls.values.all(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The values is still here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @jorisvandenbossche ... I removed then put it back because I thought it generated a backward compatibility error. Currently the build breaks for Python 2.7.9. Now I saw it has nothing to do with it in Travis CI log: it's a "ci/lint.sh" exiting 1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the .values |
||
if is_series: | ||
return pd.Series(np.nan, index=where) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not correct; should have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about that, @jreback , but when I experimented with a non-null series, I saw that it has no name. I.e.:
returns
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and that not correct. we always want to propogate the names. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, let me write the test case and fix for nan and non-nan inputs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback done here.. working on the request below, on simplifying the code |
||
else: | ||
return pd.DataFrame(np.nan, index=where, columns=self.columns) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see if you can simplify this logic a bit (maybe set the name where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hey @jreback , I made a small simplification, pls check if that's ok... if it's ok, now I think everything is good to go |
||
locs = self.index.asof_locs(where, ~(nulls.values)) | ||
|
||
# mask the missing | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,6 @@ | |
from pandas import (DataFrame, date_range, Timestamp, Series, | ||
to_datetime) | ||
|
||
from pandas.util.testing import assert_frame_equal, assert_series_equal | ||
import pandas.util.testing as tm | ||
|
||
from .common import TestData | ||
|
@@ -14,9 +13,9 @@ class TestFrameAsof(TestData, tm.TestCase): | |
|
||
def setUp(self): | ||
self.N = N = 50 | ||
rng = date_range('1/1/1990', periods=N, freq='53s') | ||
self.rng = date_range('1/1/1990', periods=N, freq='53s') | ||
self.df = DataFrame({'A': np.arange(N), 'B': np.arange(N)}, | ||
index=rng) | ||
index=self.rng) | ||
|
||
def test_basic(self): | ||
|
||
|
@@ -51,19 +50,19 @@ def test_subset(self): | |
# with a subset of A should be the same | ||
result = df.asof(dates, subset='A') | ||
expected = df.asof(dates) | ||
assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# same with A/B | ||
result = df.asof(dates, subset=['A', 'B']) | ||
expected = df.asof(dates) | ||
assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# B gives self.df.asof | ||
result = df.asof(dates, subset='B') | ||
expected = df.resample('25s', closed='right').ffill().reindex(dates) | ||
expected.iloc[20:] = 9 | ||
|
||
assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_missing(self): | ||
# GH 15118 | ||
|
@@ -75,9 +74,34 @@ def test_missing(self): | |
result = df.asof('1989-12-31') | ||
|
||
expected = Series(index=['A', 'B'], name=Timestamp('1989-12-31')) | ||
assert_series_equal(result, expected) | ||
tm.assert_series_equal(result, expected) | ||
|
||
result = df.asof(to_datetime(['1989-12-31'])) | ||
expected = DataFrame(index=to_datetime(['1989-12-31']), | ||
columns=['A', 'B'], dtype='float64') | ||
assert_frame_equal(result, expected) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_all_nans(self): | ||
# GH 15713 | ||
# DataFrame is all nans | ||
result = DataFrame([np.nan]).asof([0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. try these with non-defualt indexes and see what happens (your test will break) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, and also, when you have a DataFrame with multiple columns, those columns should be preserved in the result There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
expected = DataFrame([np.nan]) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# testing non-default indexes, multiple inputs | ||
dates = date_range('1/1/1990', periods=self.N * 3, freq='25s') | ||
result = DataFrame(np.nan, index=self.rng, columns=['A']).asof(dates) | ||
expected = DataFrame(np.nan, index=dates, columns=['A']) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# testing multiple columns | ||
dates = date_range('1/1/1990', periods=self.N * 3, freq='25s') | ||
result = DataFrame(np.nan, index=self.rng, columns=['A', 'B', 'C']).asof(dates) | ||
expected = DataFrame(np.nan, index=dates, columns=['A', 'B', 'C']) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# testing scalar input | ||
date = date_range('1/1/1990', periods=self.N * 3, freq='25s')[0] | ||
result = DataFrame(np.nan, index=self.rng, columns=['A']).asof(date) | ||
expected = DataFrame(np.nan, index=[date], columns=['A']) | ||
tm.assert_frame_equal(result, expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a scalar input should result in a Series. That is at least the current behaviour for the working non-NaN case:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What you added is not the correct result I think. It are the original columns that are the index of the resulting series, not Can you add the example above (but then with NaNs instead of the random data) as a test case? The it is really clear what the expected behaviour is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, added the tests |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -148,3 +148,24 @@ def test_errors(self): | |
s = Series(np.random.randn(N), index=rng) | ||
with self.assertRaises(ValueError): | ||
s.asof(s.index[0], subset='foo') | ||
|
||
def test_all_nans(self): | ||
# GH 15713 | ||
# series is all nans | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add the issue number as a comment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
result = Series([np.nan]).asof([0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make this a separate test? (as it is not related to errors). Eg There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add a case not using zero as the argument? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
expected = Series([np.nan]) | ||
tm.assert_series_equal(result, expected) | ||
|
||
# testing non-default indexes | ||
N = 50 | ||
rng = date_range('1/1/1990', periods=N, freq='53s') | ||
|
||
dates = date_range('1/1/1990', periods=N * 3, freq='25s') | ||
result = Series(np.nan, index=rng).asof(dates) | ||
expected = Series(np.nan, index=dates) | ||
tm.assert_series_equal(result, expected) | ||
|
||
# testing scalar input | ||
date = date_range('1/1/1990', periods=N * 3, freq='25s')[0] | ||
result = Series(np.nan, index=rng).asof(date) | ||
assert isnull(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI in the future, if you put the whatnew notes in a blank space in Bug Fixes (these are on purpose), you wont' get merge conflicts