Skip to content

REGR: Comparing timestamp to empty Series raises TypeError starting in 0.25.0 #35548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
pganssle opened this issue Aug 4, 2020 · 1 comment
Open
3 tasks done
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Datetime Datetime data dtype Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@pganssle
Copy link
Contributor

pganssle commented Aug 4, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Minimal reproducing code sample

import pandas as pd

pd.Series() < pd.Timestamp.now()  # Raises TypeError

(Note: This raises a warning because the default dtype is changing to object — which will actually fix the issue — but no warning is raised when you slice a column from an empty DataFrame).

Problem description

Prior to 0.25.0, it was possible to compare a pd.Timestamp to an empty Series (even if that series did not have a dtype of datetime64[ns]), but starting in the 0.25.0 release, this operation now raises TypeError because of mismatched dtypes. Running git bisect, I gather that this change was introduced in GH-26916, but neither the PR or the issue seem to suggest that this change was an explicit goal.

It might be arguable what the correct behavior should be, since it's probably valid that you shouldn't be able to compare a Timestamp to a float64, but it's also obvious what the return value should be: an empty Series of dtype bool.

The reason it would be valuable to allow this behavior is that a lot of people rely on pandas to infer their data types from the input, but no such inference can take place if the data set is empty, in the pre-0.25.0 world, it was possible to design an API like this:

def get_names_in_past(names: List[str], dates: List[datetime]) -> pd.DataFrame:
    df = pd.DataFrame({'names': names, 'dates': dates})
    return df[df['dates'] < pd.Timestamp.now()]

But starting in 0.25.0, when passed an empty list, you'd get a TypeError instead of an empty DataFrame, and you need to special-case the situation where df is empty. There are similar APIs you can imagine where the data are read from a CSV or some other source where the dtype is inferred from the contents that would start failing on empty data.

I think it would be reasonable to short-circuit comparisons with empty Series before the dtypes are compared, to always return an empty boolean Series.

Another reasonable option that I think would be more narrowly-scoped to solve the particular problem of dtype inference without data would be adding a flag to Series to indicate if the dtype is ambiguous. For pd.Series() it would be True and for pd.Series(dtype=float) it would be False. This would be more complicated to implement because you'd need to make sure that it's set correctly when the Series is part of a DataFrame and make sure it's set correctly in .astype, but that would allow you to preserve the error in comparing the wrong types when a type has been explicitly specified while avoiding the need for clunky code that has to explicitly handle empty DataFrames.

@pganssle pganssle added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2020
@simonjayhawkins
Copy link
Member

Thanks @pganssle for the report. #31080 maybe related.

@simonjayhawkins simonjayhawkins added Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 4, 2020
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Sep 21, 2020
@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Datetime Datetime data dtype Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

3 participants