Skip to content

BUG: Index of an empty Series cannot be compared to a Timestamp anymore #36983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
gabicca opened this issue Oct 8, 2020 · 8 comments
Closed
1 task

Comments

@gabicca
Copy link
Contributor

gabicca commented Oct 8, 2020

  • [Y] I have checked that this issue has not already been reported.

  • [Y] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

def test_series_timestamp_comparison():
    my_series = pd.Series([],  dtype=np.float64)
    date = pd.Timestamp('2015-07-15 00:00:00')

    res = my_series.index < date
    assert res.dtype == bool
    assert len(res) == 0

Problem description

In pandas 1.1.2 the above test passes, while in 1.1.3 it throws an error on line 5:

Error in 1.1.3:
TypeError: '<' not supported between instances of 'numpy.ndarray' and 'Timestamp'

As far as I can tell, this started to occur because of a change in pandas.core.indexes.base._make_comparison_op,
which change was added by this PR: #36440

My question is if the new behaviour is legit for the above comparison, is it expected?

In 1.1.2 this comparison didn't fail, because it would end up in

        else:
            with np.errstate(all="ignore"):
                result = op(self._values, np.asarray(other))

which was changed to:

        elif is_interval_dtype(self.dtype):
            with np.errstate(all="ignore"):
                result = op(self._values, np.asarray(other))

and the else part started to use ops.comparison_op which now fails in the above example (since is_interval_dtype(self.dtype) is false for the above case).

A bit more background: the above comparison is used when subselecting data in the series without knowing whether the input series has data in it or not. So e.g.:

my_series[my_series.index < date] = 0.4

Apologies if this question has been raised, I couldn't find anything similar in Open Issues.

Output of pd.show_versions()

python : 3.7.4.final.0
pandas : 1.1.3
numpy : 1.19.2

@gabicca gabicca added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 8, 2020
@dsaxton
Copy link
Member

dsaxton commented Oct 8, 2020

I'm not sure what the expected output would be here. We're comparing a RangeIndex to a single timestamp, which as far as I can tell should be raising (the 1.1.2 behavior seems like a bug). If you use a DatetimeIndex instead this works fine.

@dsaxton dsaxton removed the Needs Triage Issue that has not been reviewed by a pandas team member label Oct 8, 2020
@jbrockmendel
Copy link
Member

duplicate of #35548?

@gabicca
Copy link
Contributor Author

gabicca commented Oct 9, 2020

I'm not sure what the expected output would be here. We're comparing a RangeIndex to a single timestamp, which as far as I can tell should be raising (the 1.1.2 behavior seems like a bug). If you use a DatetimeIndex instead this works fine.

Indeed DatetimeIndex works. And as you say, it probably makes more sense than letting the comparison work between non-datetime indices and timestamps. Python, however allows it once timestamp is converted into an array: op(self._values, np.asarray(other)) this doesn't break.

So I was wondering if accidentally a new bug was introduced with the change. This was a patch release and I certainly did not expect code that was written 3 years ago to start to break at this point.

@gabicca
Copy link
Contributor Author

gabicca commented Oct 9, 2020

duplicate of #35548?

Yes, I looked at that issue. However, I know for sure that our code started to break with the newest release, and that it is breaking because of the change in the PR I linked in my submission.

In my case, I found that the mocks used in the tests we had (the ones now breaking) did not resemble the original thing well enough, which indeed uses DatetimeIndex for empty series. So after fixing that, the code works with the new pandas.

@gabicca
Copy link
Contributor Author

gabicca commented Oct 9, 2020

Please, feel free to close this issue, if you think the current behaviour is intentional and is as expected.

Thanks for the replies!

@jbrockmendel
Copy link
Member

The current behavior is correct

@dsaxton dsaxton removed the Bug label Oct 10, 2020
@dsaxton
Copy link
Member

dsaxton commented Oct 10, 2020

Closing as not a bug

@dsaxton dsaxton closed this as completed Oct 10, 2020
@jreback
Copy link
Contributor

jreback commented Oct 10, 2020

i think we could have a better error message here (and tests to lock it down)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants