Skip to content

BUG: TypeError when subtracting datetime series from NaT series #45892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
torfsen opened this issue Feb 9, 2022 · 2 comments
Closed
2 of 3 tasks

BUG: TypeError when subtracting datetime series from NaT series #45892

torfsen opened this issue Feb 9, 2022 · 2 comments
Labels

Comments

@torfsen
Copy link

torfsen commented Feb 9, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
pd.Series([pd.to_datetime('')]) - pd.Series([pd.to_datetime('2000-01-01 00:00 +00:00')])

Issue Description

When subtracting an all-NaT series from a tz-aware datetime64 series, a TypeError is raised:

Traceback (most recent call last):
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 732, in _sub_datetime_arraylike
    self._assert_tzawareness_compat(other)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 711, in _assert_tzawareness_compat
    raise TypeError(
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/ops/common.py", line 70, in new_method
    return method(self, other)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/arraylike.py", line 108, in __sub__
    return self._arith_method(other, operator.sub)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/series.py", line 5636, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/base.py", line 1295, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/ops/array_ops.py", line 216, in arithmetic_op
    res_values = op(left, right)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/ops/common.py", line 70, in new_method
    return method(self, other)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/arrays/datetimelike.py", line 1340, in __sub__
    result = self._sub_datetime_arraylike(other)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 735, in _sub_datetime_arraylike
    raise type(error)(new_message) from error
TypeError: Cannot subtract tz-naive and tz-aware datetime-like objects.

Expected Behavior

I would have expected the result to be NaT, without any error.

In my particular use case, I convert user-provided string timestamps to datetimes via pd.to_datetime. The string timestamps have a tz-marker (i.e. +00:00), so they get converted to tz-aware timestamps. However, some string timestamps are empty and hence get converted to NaT.

I've tried making the all-NaT series tz-aware, but that doesn't seem to work either:

>>> pd.Series([pd.to_datetime('')]).tz_convert('UTC')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/generic.py", line 9808, in tz_convert
    ax = _tz_convert(ax, tz)
  File "/home/fbrucker/apps/pyenv/versions/test-pandas/lib/python3.9/site-packages/pandas/core/generic.py", line 9790, in _tz_convert
    raise TypeError(
TypeError: index is not a valid DatetimeIndex or PeriodIndex

#11718 is a similar issue but has long been fixed.

Installed Versions

INSTALLED VERSIONS

commit : bb1f651
python : 3.9.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-99-generic
Version : #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0
numpy : 1.22.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@torfsen torfsen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 9, 2022
@simonjayhawkins
Copy link
Member

I've tried making the all-NaT series tz-aware, but that doesn't seem to work either:

specifying the dtype explicitly would work

s2 = pd.Series([pd.to_datetime("2000-01-01 00:00 +00:00")])
s1 = pd.Series([pd.to_datetime("")], dtype=s2.dtype)
s1 - s2
0   NaT
dtype: timedelta64[ns]

but agreed, I'm not sure whether being strict and raising when subtracting an all-NaT tz-naive series from a tz-aware datetime64 series is the correct thing to do. @jbrockmendel wdyt?

@jbrockmendel
Copy link
Member

being strict and raising when subtracting an all-NaT tz-naive series from a tz-aware datetime64 series is the correct thing to do

tznaive - tzaware should always raise

@simonjayhawkins simonjayhawkins added Timezones Timezone data dtype Usage Question and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 10, 2022
@simonjayhawkins simonjayhawkins added this to the No action milestone Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants