Skip to content

DataFrame of datetimes with NaT Comparison always evaluates to True #12877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antonymayi opened this issue Apr 12, 2016 · 10 comments
Closed

DataFrame of datetimes with NaT Comparison always evaluates to True #12877

antonymayi opened this issue Apr 12, 2016 · 10 comments
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@antonymayi
Copy link

Code Sample

>>> import pandas as pd
>>> from datetime import date
>>> df=pd.DataFrame({'a': pd.Series([date(2016,1,1)], index=[date(2016,1,1)]), 'b': pd.Series([date(2016,1,3)], index=[date(2016,1,3)])})
>>> df
                     a           b
2016-01-01  2016-01-01         NaN
2016-01-03         NaN  2016-01-03
>>> df>date(2016,1,2)
               a     b
2016-01-01  True  True
2016-01-03  True  True
>>> df<date(2016,1,2)
               a     b
2016-01-01  True  True
2016-01-03  True  True

Expected Output

something more like with NaT:

>>> df.fillna(pd.NaT, inplace=True)
>>> df>date(2016,1,2)
                a      b
2016-01-01  False  False
2016-01-03  False   True
>>> df<date(2016,1,2)
                a      b
2016-01-01   True  False
2016-01-03  False  False

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-22-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.5.0
xarray: None
IPython: 2.3.0
sphinx: None
patsy: 0.4.0
dateutil: 2.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: 2.3.0-b1
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: 0.6.2.None
psycopg2: None
jinja2: 2.8
boto: None

@TomAugspurger
Copy link
Contributor

Thanks for the report.

Pandas doesn't have a date type, just datetime64 (same with NumPy).

In [5]: df.dtypes
Out[5]:
a    object
b    object
dtype: object

If your data has a regular frequency you can use the Period dtype, otherwise your best bet is probably datetime64

In [15]:df=pd.DataFrame({'a': pd.Series([pd.Timestamp('2016-01-01')], index=[pd.Timestamp('2016-01-01')]), 'b': pd.Series([pd.Timestamp('2016-01-03')], index=[pd.Timestamp('2016-01-03')])})

In [16]: df
Out[16]:
                    a          b
2016-01-01 2016-01-01        NaT
2016-01-03        NaT 2016-01-03

I believe we have other issues talking about how to implement a date type, I'll edit them in here when I find them.

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions labels Apr 12, 2016
@TomAugspurger
Copy link
Contributor

Oh, and reading a bit more carefully now (sorry), the strange comparison comes from pandas having to store dates as an object dtype.

In [52]: df > 'a'
Out[52]:
               a     b
2016-01-01  True  True
2016-01-03  True  True

We're stricter about these strange comparisons when we have specific dtypes (datetime64 vs. string will raise for example). I think this is the expected behavior. Going to double check.

@antonymayi
Copy link
Author

so how should I represent timestamps so I can filter them? using pd.Timestamp doesn't work neither:

>>> df=pd.DataFrame({'a': pd.Series([pd.Timestamp('2016-01-01')], index=[pd.Timestamp('2016-01-01')]), 'b': pd.Series([pd.Timestamp('2016-01-03')], index=[pd.Timestamp('2016-01-03')])})
>>> df
                    a          b
2016-01-01 2016-01-01        NaT
2016-01-03        NaT 2016-01-03
>>> df > pd.Timestamp('2016-01-02')
               a     b
2016-01-01  True  True
2016-01-03  True  True
>>> df < pd.Timestamp('2016-01-02')
               a     b
2016-01-01  True  True
2016-01-03  True  True

@TomAugspurger
Copy link
Contributor

Ah, that looks like a bug. Thanks. Let me see if there's already another issue.

Series comparison is working correctly by the way

In [4]: df.a > pd.Timestamp('2016-01-02')
Out[4]:
2016-01-01    False
2016-01-03    False
Name: a, dtype: bool

@TomAugspurger
Copy link
Contributor

Related to #9005, but that seems to be partially fixed.

Let's keep this one open specifically for DataFrames, unless I'm missing some other issue.

@TomAugspurger TomAugspurger reopened this Apr 12, 2016
@TomAugspurger TomAugspurger changed the title auto-NaNs in time series breaking boolean filtering DataFrame of datetimes with NaT Comparison always evaluates to True Apr 12, 2016
@TomAugspurger TomAugspurger added this to the 0.18.1 milestone Apr 12, 2016
@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

dupe of #9005

@jreback jreback closed this as completed Apr 12, 2016
@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

@antonymayi comparing a NaT against a Timestamp should always be False, IOW, it doesn't make sense to do it. You generally want to .dropna, .fillna, or use .isnull as a mask

@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

Further datetime.date are barely supported, IOW, they don't auto-convert to Timestamps and are left as object. They are a pretty useless type and no reason ever to use them. Pandas was loads of timestamp/period support http://pandas.pydata.org/pandas-docs/stable/timeseries.html

@TomAugspurger
Copy link
Contributor

@jreback NaT > Timestamp is evaluating to True though, when inside a DataFrame :)

In [18]: df
Out[18]:
                    a          b
2016-01-01 2016-01-01        NaT
2016-01-03        NaT 2016-01-03

In [19]: df > pd.Timestamp('2016')
Out[19]:
               a     b
2016-01-01  True  True
2016-01-03  True  True

@jreback
Copy link
Contributor

jreback commented Apr 12, 2016

@TomAugspurger its the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

3 participants