Skip to content

BUG: first_valid_index() empty Series and DataFrame inconsistencies #9752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wholmgren opened this issue Mar 31, 2015 · 4 comments · Fixed by #30222
Closed

BUG: first_valid_index() empty Series and DataFrame inconsistencies #9752

wholmgren opened this issue Mar 31, 2015 · 4 comments · Fixed by #30222
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@wholmgren
Copy link
Contributor

I came across some inconsistencies in how first_valid_index() works on empty Series and DataFrames (and last_valid_index()).

>>> print(pd.DataFrame().first_valid_index())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/frame.py", line 3332, in first_valid_index
    return self.index[self.count(1) > 0][0]
  File "/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/index.py", line 915, in __getitem__
    return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0

>>> print(pd.Series().first_valid_index())
None

>>> print(pd.SparseDataFrame([]).first_valid_index())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/frame.py", line 3332, in first_valid_index
    return self.index[self.count(1) > 0][0]
  File "/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py", line 919, in f
    res = self._combine_const(other, func, raise_on_error=False)
TypeError: _combine_const() got an unexpected keyword argument 'raise_on_error'

>>> print(pd.SparseSeries([]).first_valid_index())
None

Is there a reason for this different behavior? Note also that the DataFrame/SparseDataFrame exceptions are different. Here's the DataFrame and Series methods. The Series method has a if len(self) == 0 check. I can make a PR if you think one of these should be changed and tell me which way you want to go.

For what it's worth, I came across this issue trying to use this function in a list comprehension to get a list of minimum times from a handful of Series, DataFrames, and SparseDataFrames that may or may not be empty. (Although it just occurred to me that I could, of course, put an if empty check in my list comprehension.)

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Apr 2, 2015
@jreback jreback added this to the 0.16.1 milestone Apr 2, 2015
@jreback
Copy link
Contributor

jreback commented Apr 2, 2015

these are prob not tested very well.

The SparseDataFrame exception looks like an actual code error somewhere.

returning None here on an empty Series is actually pretty odd in pandas code. But prob plays nice in indexing. I think these ALL should raise IndexError. Can you see if that actually breaks anything? willing to consider that as an API change.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Apr 2, 2015
@wholmgren
Copy link
Contributor Author

There is a test for Series that requires it to return None if empty. test_frame doesn't have any explicit expectations about this behavior.

I changed the Series.first_valid_index behavior to if len(self) == 0: raise(IndexError) and the only test in pandas.tests that failed was test_series.test_first_last_valid, which is no surprise since I didn't find any uses of these functions in the test suite other than where they are explicitly tested.

SparseDataFrame._combine_const does indeed have a different signature than DataFrame._combine_const. Not sure how you want to resolve that part of it.

@jreback jreback modified the milestones: 0.16.1, 0.17.0 Apr 28, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015
@chris-b1
Copy link
Contributor

Same issue with all missing rows. #14397

pd.Series([np.nan]).first_valid_index() # this returns None
pd.DataFrame([np.nan]).first_valid_index() # raise error

>>> pd.__version__
u'0.18.1'

@mroeschke
Copy link
Member

This looks to be fixed on master. Could use a test

In [13]: >>> print(pd.DataFrame().first_valid_index())
    ...:
None

In [14]: >>> print(pd.Series().first_valid_index())
    ...:
None

In [15]: pd.Series([np.nan]).first_valid_index() # this returns None
    ...:

In [16]: pd.DataFrame([np.nan]).first_valid_index() # raise error
    ...:

In [17]: pd.__version__
Out[17]: '0.26.0.dev0+627.gef77b5700'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 22, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants