BUG: Series.asof fails for all NaN Series (GH15713) #15758

ucals · 2017-03-21T03:05:24Z

closes bug BUG: Series.asof fails when series is all nans #15713
1 tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Added the test if the series is all nans
Added the code that check if that's the case: if yes, return the expected output
As this is my first contribution, please comment if I did it right :)
Thanks!

codecov · 2017-03-21T04:07:05Z

Codecov Report

Merging #15758 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15758      +/-   ##
==========================================
- Coverage   91.01%   91.01%   -0.01%     
==========================================
  Files         143      143              
  Lines       49395    49379      -16     
==========================================
- Hits        44959    44940      -19     
- Misses       4436     4439       +3

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.25% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/tseries/common.py	`88.09% <0%> (-1.07%)`	⬇️
pandas/core/frame.py	`97.86% <0%> (-0.1%)`	⬇️
pandas/core/algorithms.py	`94.41% <0%> (-0.1%)`	⬇️
pandas/io/stata.py	`93.47% <0%> (-0.05%)`	⬇️
pandas/core/strings.py	`98.48% <0%> (-0.02%)`	⬇️
pandas/indexes/numeric.py	`97.1% <0%> (-0.02%)`	⬇️
pandas/indexes/multi.py	`96.59% <0%> (ø)`	⬆️
pandas/core/categorical.py	`96.89% <0%> (ø)`	⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92239f5...0765108. Read the comment docs.

jorisvandenbossche

Can you also add tests for DataFrame? (both using a subset and not)?

jorisvandenbossche · 2017-03-21T08:57:10Z

pandas/tests/series/test_asof.py

@@ -4,6 +4,8 @@
 from pandas import (offsets, Series, notnull,
                    isnull, date_range, Timestamp)

+from pandas.util.testing import assert_series_equal


No need to import this, you can use it as tm.assert_series_equal(..) as is done in the other tests

jorisvandenbossche · 2017-03-21T09:03:23Z

pandas/tests/series/test_asof.py

@@ -148,3 +150,8 @@ def test_errors(self):
        s = Series(np.random.randn(N), index=rng)
        with self.assertRaises(ValueError):
            s.asof(s.index[0], subset='foo')
+
+        # series is all nans
+        result = Series([np.nan]).asof([0])


Can you make this a separate test? (as it is not related to errors). Eg test_all_nans

jorisvandenbossche · 2017-03-21T09:09:49Z

pandas/core/generic.py

@@ -3971,6 +3971,9 @@ def asof(self, where, subset=None):
        if not isinstance(where, Index):
            where = Index(where) if is_list else Index([where])

+        if self.isnull().values.all():
+            return pd.Series([np.nan])


This will not work for a DataFrame (I mean: it will not be the correct return value, see the docstring)

Further, you can put this after the next line where nulls is defined and reuse that

done! thanks for the comments, feels great to learn how to contribute :)

jreback · 2017-03-22T00:18:53Z

somewhat separate. but do we have a guarantee on a not-found indexer?

@chris-b1

In [28]: Series([np.nan, 1, 2], index=[2, 3, 4]).asof([0])
Out[28]: 
0   NaN
dtype: float64

jreback

can you add a whatsnew note (for 0.20 / bug fix section)

jreback · 2017-03-22T00:19:26Z

pandas/tests/frame/test_asof.py

+
+    def test_all_nans(self):
+        # series is all nans
+        result = DataFrame([np.nan]).asof([0])


try these with non-defualt indexes and see what happens (your test will break)

Indeed, and also, when you have a DataFrame with multiple columns, those columns should be preserved in the result

jreback · 2017-03-22T00:19:40Z

pandas/core/generic.py

@@ -3972,6 +3972,12 @@ def asof(self, where, subset=None):
            where = Index(where) if is_list else Index([where])

        nulls = self.isnull() if is_series else self[subset].isnull().any(1)
+        if nulls.values.all():
+            if is_series:
+                return pd.Series([np.nan])


need to set the indexes here

jorisvandenbossche · 2017-03-22T08:19:51Z

pandas/core/generic.py

@@ -3972,6 +3972,12 @@ def asof(self, where, subset=None):
            where = Index(where) if is_list else Index([where])

        nulls = self.isnull() if is_series else self[subset].isnull().any(1)
+        if nulls.values.all():


I don't think the .values is still needed.

The values is still here?

Hi @jorisvandenbossche ... I removed then put it back because I thought it generated a backward compatibility error. Currently the build breaks for Python 2.7.9. Now I saw it has nothing to do with it in Travis CI log: it's a "ci/lint.sh" exiting 1.
I will remove it again and see where the code is unformatted. Thanks

Removed the .values

jorisvandenbossche · 2017-03-22T08:24:37Z

pandas/tests/series/test_asof.py

+
+    def test_all_nans(self):
+        # series is all nans
+        result = Series([np.nan]).asof([0])


Can you also add a case not using zero as the argument?
And can you also add the case of a scalar, and of multiple values? (eg s.asof(10) and s.asof([10, 11])

jorisvandenbossche · 2017-03-22T08:25:49Z

pandas/tests/frame/test_asof.py

+
+    def test_all_nans(self):
+        # series is all nans
+        result = DataFrame([np.nan]).asof([0])


Indeed, and also, when you have a DataFrame with multiple columns, those columns should be preserved in the result

…results preserve columns

jreback

just some minor comments. ping when all green.

jreback · 2017-03-23T21:42:04Z

doc/source/whatsnew/v0.20.0.txt

@@ -930,3 +930,5 @@ Bug Fixes
 - Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
 - Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
 - Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
+


FYI in the future, if you put the whatnew notes in a blank space in Bug Fixes (these are on purpose), you wont' get merge conflicts

jreback · 2017-03-23T21:43:14Z

pandas/tests/series/test_asof.py

+        # testing scalar input
+        date = date_range('1/1/1990', periods=N * 3, freq='25s')[0]
+        result = Series(np.nan, index=rng).asof(date)
+        self.assertTrue(result != result)


assert isnull(result)

jreback · 2017-03-23T21:43:32Z

pandas/tests/series/test_asof.py

@@ -148,3 +148,23 @@ def test_errors(self):
        s = Series(np.random.randn(N), index=rng)
        with self.assertRaises(ValueError):
            s.asof(s.index[0], subset='foo')
+
+    def test_all_nans(self):
+        # series is all nans


can you add the issue number as a comment

jreback · 2017-03-23T21:43:38Z

pandas/tests/frame/test_asof.py

+        tm.assert_frame_equal(result, expected)
+
+    def test_all_nans(self):
+        # series is all nans


can you add the issue number as a comment

jreback · 2017-03-23T21:43:51Z

pandas/tests/frame/test_asof.py

+        tm.assert_frame_equal(result, expected)
+
+    def test_all_nans(self):
+        # series is all nans


this comment needs updating

done... thanks @jreback !!

jreback · 2017-03-23T22:37:56Z

lgtm.

@chris-b1 if you can give a quick look (don't merge yet, going to merge a couple of things at once later)

jorisvandenbossche

Small change needed for the DataFrame case I think

jorisvandenbossche · 2017-03-23T23:30:00Z

pandas/tests/frame/test_asof.py

+        date = date_range('1/1/1990', periods=self.N * 3, freq='25s')[0]
+        result = DataFrame(np.nan, index=self.rng, columns=['A']).asof(date)
+        expected = DataFrame(np.nan, index=[date], columns=['A'])
+        tm.assert_frame_equal(result, expected)


I think a scalar input should result in a Series. That is at least the current behaviour for the working non-NaN case:

In [37]: df = pd.DataFrame(np.random.randn(2,2), index=[1,2], columns=['A', 'B']) In [38]: df Out[38]: A B 1 -0.643872 1.375342 2 -0.223192 0.231439 In [39]: df.asof([3]) Out[39]: A B 3 -0.223192 0.231439 In [40]: df.asof(3) Out[40]: A -0.223192 B 0.231439 Name: 3, dtype: float64

What you added is not the correct result I think. It are the original columns that are the index of the resulting series, not [where]. The where does become the name of the Series (i.e. as if you access a row from the dataframe)

Can you add the example above (but then with NaNs instead of the random data) as a test case? The it is really clear what the expected behaviour is.

Done, added the tests

jorisvandenbossche

This should be the last change I think to get it merged!

jorisvandenbossche · 2017-03-25T10:43:00Z

pandas/core/generic.py

+                    return pd.DataFrame(np.nan, index=where,
+                                        columns=self.columns)
+                else:
+                    return pd.Series(np.nan, index=[where])


So this should just be index=self.columns, name=where I think.

Done.. The test passes without name - no need to set it - index=self.columns is enough. Thanks @jorisvandenbossche

That is because you wrote your test without it, then of course it will pass it without it.
The name is essential for a correct result, and should be added both to the code here as to the test.

As I showed before with this simple example of the current behaviour

In [1]: df = pd.DataFrame(np.random.randn(2,2), index=[1,2], columns=['A', 'B']) In [2]: df Out[2]: A B 1 0.387517 -0.571258 2 -0.376436 0.604668 In [3]: df.asof(3) Out[3]: A -0.376436 B 0.604668 Name: 3, dtype: float64

you can clearly see that the name is the actual value you passed to asof

my bad, sorry... just fixed it, it should work now... thanks again @jorisvandenbossche

jreback · 2017-03-25T18:36:24Z

pandas/core/generic.py

@@ -3972,6 +3972,16 @@ def asof(self, where, subset=None):
            where = Index(where) if is_list else Index([where])

        nulls = self.isnull() if is_series else self[subset].isnull().any(1)
+        if nulls.all():
+            if is_series:
+                return pd.Series(np.nan, index=where)


this is not correct; should have name=self.name

I thought about that, @jreback , but when I experimented with a non-null series, I saw that it has no name. I.e.:

result = Series(np.random.randn(4), index=[1, 2, 3, 4]).asof([4, 5]) print result

returns

4 -0.558532 5 -0.558532 dtype: float64 ......

and that not correct. we always want to propogate the names.

ok, let me write the test case and fix for nan and non-nan inputs

@jreback done here.. working on the request below, on simplifying the code

jreback · 2017-03-25T18:37:08Z

pandas/core/generic.py

+                                        columns=self.columns)
+                else:
+                    return pd.Series(np.nan, index=self.columns, name=where[0])
+


see if you can simplify this logic a bit (maybe set the name where is_list is used before)

hey @jreback , I made a small simplification, pls check if that's ok... if it's ok, now I think everything is good to go

jreback · 2017-03-25T23:04:20Z

pandas/tests/series/test_asof.py

+
+        # test name is propagated
+        result = Series(np.nan, index=[1, 2, 3, 4], name='test').asof([4, 5])
+        self.assertEqual(result.name, 'test')


this needs a tm.assert_series_equal with the expected result

but I'll do on merge

jreback · 2017-03-25T23:05:13Z

thanks @ucals ping on green.

ucals · 2017-03-25T23:41:47Z

All green. @jreback , @jorisvandenbossche , thanks a lot! You guys are great. This was my first ever contribution to an open source project, and you guys gave me the right direction - feels very good to contribute and learn! I'll try to tackle an intermediate bug now

jreback · 2017-03-26T02:31:22Z

thanks!

closes bug pandas-dev#15713 Added the test if the series is all nans Added the code that check if that's the case: if yes, return the expected output Author: Carlos Souza <[email protected]> Closes pandas-dev#15758 from ucals/bug-fix-15713 and squashes the following commits: 0765108 [Carlos Souza] First simplification, code-block in the same place bb63964 [Carlos Souza] Propagating Series name af9a29b [Carlos Souza] Setting name of asof result when scalar input and all nan b8f078a [Carlos Souza] Small code standard change 7448b96 [Carlos Souza] Fixing scalar input a080b9b [Carlos Souza] Making scalar input return in a Series 04b7306 [Carlos Souza] Removing .values and formating code PEP8 3f9c7fd [Carlos Souza] Minor comments 70c958f [Carlos Souza] Added tests for non-default indexes, scalar and multiple inputs, and results preserve columns 6b745af [Carlos Souza] Adding DataFrame tests & support, and optimizing the code 89fb6cf [Carlos Souza] BUG pandas-dev#15713 fixing failing tests 17d1d77 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans! 4e26ab8 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans. c78d687 [Carlos Souza] BUG pandas-dev#15713 Series.asof return nan when series is all nans 676a4e5 [Carlos Souza] Test

Carlos Souza added 5 commits March 20, 2017 19:32

Test

676a4e5

BUG pandas-dev#15713 Series.asof return nan when series is all nans

c78d687

BUG pandas-dev#15713 Series.asof return nan when series is all nans.

4e26ab8

BUG pandas-dev#15713 Series.asof return nan when series is all nans!

17d1d77

BUG pandas-dev#15713 fixing failing tests

89fb6cf

jorisvandenbossche changed the title ~~Bug fix 15713~~ BUG: Series.asof fails for all NaN Series (GH15713) Mar 21, 2017

jorisvandenbossche reviewed Mar 21, 2017

View reviewed changes

jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Mar 21, 2017

Adding DataFrame tests & support, and optimizing the code

6b745af

jreback requested changes Mar 22, 2017

View reviewed changes

jorisvandenbossche reviewed Mar 22, 2017

View reviewed changes

Added tests for non-default indexes, scalar and multiple inputs, and …

70c958f

…results preserve columns

jreback approved these changes Mar 23, 2017

View reviewed changes

jreback added this to the 0.20.0 milestone Mar 23, 2017

Minor comments

3f9c7fd

jorisvandenbossche reviewed Mar 23, 2017

View reviewed changes

Carlos Souza added 2 commits March 23, 2017 22:21

Removing .values and formating code PEP8

04b7306

Making scalar input return in a Series

a080b9b

jorisvandenbossche requested changes Mar 25, 2017

View reviewed changes

Carlos Souza added 3 commits March 25, 2017 11:24

Fixing scalar input

7448b96

Small code standard change

b8f078a

Setting name of asof result when scalar input and all nan

af9a29b

jreback reviewed Mar 25, 2017

View reviewed changes

Propagating Series name

bb63964

First simplification, code-block in the same place

0765108

jorisvandenbossche approved these changes Mar 25, 2017

View reviewed changes

jreback reviewed Mar 25, 2017

View reviewed changes

jreback closed this in d2f32a0 Mar 26, 2017

ucals deleted the bug-fix-15713 branch March 26, 2017 04:01

jreback mentioned this pull request Mar 29, 2017

BUG: Series.asof fails when series is all nans #15713

Closed

Uh oh!

BUG: Series.asof fails for all NaN Series (GH15713) #15758

BUG: Series.asof fails for all NaN Series (GH15713) #15758

Uh oh!

Conversation

ucals commented Mar 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 22, 2017

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 23, 2017

ucals commented Mar 21, 2017 •

edited

Loading

codecov bot commented Mar 21, 2017 •

edited

Loading