REGR: Comparing timestamp to empty Series raises TypeError starting in 0.25.0 #35548
Open
3 tasks done
Labels
Bug
Closing Candidate
May be closeable, needs more eyeballs
Datetime
Datetime data dtype
Numeric Operations
Arithmetic, Comparison, and Logical operations
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Minimal reproducing code sample
(Note: This raises a warning because the default
dtype
is changing toobject
— which will actually fix the issue — but no warning is raised when you slice a column from an empty DataFrame).Problem description
Prior to 0.25.0, it was possible to compare a
pd.Timestamp
to an emptySeries
(even if that series did not have a dtype ofdatetime64[ns]
), but starting in the 0.25.0 release, this operation now raisesTypeError
because of mismatched dtypes. Runninggit bisect
, I gather that this change was introduced in GH-26916, but neither the PR or the issue seem to suggest that this change was an explicit goal.It might be arguable what the correct behavior should be, since it's probably valid that you shouldn't be able to compare a
Timestamp
to afloat64
, but it's also obvious what the return value should be: an emptySeries
of dtypebool
.The reason it would be valuable to allow this behavior is that a lot of people rely on
pandas
to infer their data types from the input, but no such inference can take place if the data set is empty, in the pre-0.25.0 world, it was possible to design an API like this:But starting in 0.25.0, when passed an empty list, you'd get a
TypeError
instead of an empty DataFrame, and you need to special-case the situation wheredf
is empty. There are similar APIs you can imagine where the data are read from a CSV or some other source where thedtype
is inferred from the contents that would start failing on empty data.I think it would be reasonable to short-circuit comparisons with empty
Series
before the dtypes are compared, to always return an empty booleanSeries
.Another reasonable option that I think would be more narrowly-scoped to solve the particular problem of
dtype
inference without data would be adding a flag toSeries
to indicate if thedtype
is ambiguous. Forpd.Series()
it would beTrue
and forpd.Series(dtype=float)
it would beFalse
. This would be more complicated to implement because you'd need to make sure that it's set correctly when theSeries
is part of aDataFrame
and make sure it's set correctly in.astype
, but that would allow you to preserve the error in comparing the wrong types when a type has been explicitly specified while avoiding the need for clunky code that has to explicitly handle empty DataFrames.The text was updated successfully, but these errors were encountered: