-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame of datetimes with NaT Comparison always evaluates to True #12877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. Pandas doesn't have a In [5]: df.dtypes
Out[5]:
a object
b object
dtype: object If your data has a regular frequency you can use the In [15]:df=pd.DataFrame({'a': pd.Series([pd.Timestamp('2016-01-01')], index=[pd.Timestamp('2016-01-01')]), 'b': pd.Series([pd.Timestamp('2016-01-03')], index=[pd.Timestamp('2016-01-03')])})
In [16]: df
Out[16]:
a b
2016-01-01 2016-01-01 NaT
2016-01-03 NaT 2016-01-03 I believe we have other issues talking about how to implement a date type, I'll edit them in here when I find them. |
Oh, and reading a bit more carefully now (sorry), the strange comparison comes from pandas having to store In [52]: df > 'a'
Out[52]:
a b
2016-01-01 True True
2016-01-03 True True We're stricter about these strange comparisons when we have specific dtypes (datetime64 vs. string will raise for example). I think this is the expected behavior. Going to double check. |
so how should I represent timestamps so I can filter them? using
|
Ah, that looks like a bug. Thanks. Let me see if there's already another issue. Series comparison is working correctly by the way In [4]: df.a > pd.Timestamp('2016-01-02')
Out[4]:
2016-01-01 False
2016-01-03 False
Name: a, dtype: bool |
Related to #9005, but that seems to be partially fixed. Let's keep this one open specifically for DataFrames, unless I'm missing some other issue. |
dupe of #9005 |
@antonymayi comparing a |
Further |
@jreback NaT > Timestamp is evaluating to True though, when inside a DataFrame :) In [18]: df
Out[18]:
a b
2016-01-01 2016-01-01 NaT
2016-01-03 NaT 2016-01-03
In [19]: df > pd.Timestamp('2016')
Out[19]:
a b
2016-01-01 True True
2016-01-03 True True |
@TomAugspurger its the same issue |
Code Sample
Expected Output
something more like with NaT:
output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-22-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: None
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.5.0
xarray: None
IPython: 2.3.0
sphinx: None
patsy: 0.4.0
dateutil: 2.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: 2.3.0-b1
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: 0.6.2.None
psycopg2: None
jinja2: 2.8
boto: None
The text was updated successfully, but these errors were encountered: