Skip to content

diff of datetime column with NaT does not result into TimeDelta #17837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michalsustr opened this issue Oct 10, 2017 · 9 comments · Fixed by #19024
Closed

diff of datetime column with NaT does not result into TimeDelta #17837

michalsustr opened this issue Oct 10, 2017 · 9 comments · Fixed by #19024
Labels
Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@michalsustr
Copy link

Code Sample, a copy-pastable example if possible

In[40]: df = pd.DataFrame([[0]], dtype="datetime64[ns]")

In[46]: df[0] - pd.to_datetime("nat")
Out[47]: 
0   NaT
Name: 0, dtype: datetime64[ns]

In[47]: df[0] - pd.to_datetime("2017")
Out[48]: 
0   -17167 days
Name: 0, dtype: timedelta64[ns]

Problem description

Operation - on datetimes should always return a timedelta.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.3 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@michalsustr
Copy link
Author

This seems like an easy issue to fix, I can create a PR if this is indeed confirmed.

@chris-b1
Copy link
Contributor

Can you try on master? There's a chance this is fixed by some of the recent inference changes, if it isn't not a PR would be welcome

@michalsustr
Copy link
Author

I'm building it now... but $ python setup.py develop is super slow, runs only one process...

@michalsustr
Copy link
Author

Confirmed.

>>> import pandas as pd
>>> df = pd.DataFrame([[0]], dtype="datetime64[ns]")
>>> df
           0
0 1970-01-01
>>> df[0] - pd.to_datetime("nat")
0   NaT
Name: 0, dtype: datetime64[ns]
>>> df[0] - pd.to_datetime("2017")
0   -17167 days
Name: 0, dtype: timedelta64[ns]

Output of pd.show_versions():

INSTALLED VERSIONS ------------------ commit: d12a7a0 python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-35-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.21.0.dev+613.gd12a7a018
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.27.1
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@michalsustr
Copy link
Author

Strangely, the following works correctly:

In [15]: s = pd.Series(pd.date_range('2012-1-1', periods=1, freq='D'))

In [17]: s.shift()
Out[17]: 
0   NaT
dtype: datetime64[ns]

In [16]: s - s.shift()
    ...: 
Out[16]: 
0   NaT
dtype: timedelta64[ns]

@michalsustr
Copy link
Author

So it seems that if the datetime is inside of Series the behaviour is correct, but if it is a scalar value improper type is casted:

In [24]:  df = pd.DataFrame([[0], [1]], dtype="datetime64[ns]")

In [26]: df[0] - pd.Series(pd.to_datetime("nat"))
Out[26]: 
0   NaT
1   NaT
dtype: timedelta64[ns]

In [27]: df[0] - pd.to_datetime("nat")
Out[27]: 
0   NaT
1   NaT
Name: 0, dtype: datetime64[ns]

@jreback
Copy link
Contributor

jreback commented Oct 13, 2017

In [22]: df = pd.DataFrame([[0]], dtype="datetime64[ns]")

In [23]: df
Out[23]: 
           0
0 1970-01-01

In [24]: df[0] - pd.to_datetime("nat")
Out[24]: 
0   NaT
Name: 0, dtype: datetime64[ns]

In [25]: df[0] - pd.to_datetime("2017")
Out[25]: 
0   -17167 days
Name: 0, dtype: timedelta64[ns]

In [26]: pd.to_datetime("2017")
Out[26]: Timestamp('2017-01-01 00:00:00')


@michalsustr what do you think is wrong here?

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Oct 13, 2017
@jaksmid
Copy link

jaksmid commented Dec 7, 2017

It seems the issue is now fixed, at least for pandas 0.20.3.

@jreback
Copy link
Contributor

jreback commented Dec 7, 2017

can u do a PR with some validation tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants