Skip to content

Pandas isin with empty dataframe produces epoch value on datetime type instead of boolean #15473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jamesporritt opened this issue Feb 22, 2017 · 1 comment · Fixed by #15570
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@jamesporritt
Copy link

I've noticed that doing an isin on a DataFrame which contains datetime types where the operand is an empty DataFrame produces epoch datetime values (i.e. 1970-01-01), instead of 'False'. It seems unlikely that this is correct?

The following code demonstrates this:

import pandas as pd

data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768']}
data2 = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994']}
df = pd.DataFrame(data, columns = ['date'])
df['date'] = pd.to_datetime(df['date'])
df2 = pd.DataFrame(data2, columns = ['date'])
df2['date'] = pd.to_datetime(df2['date'])
df3 = pd.DataFrame([], columns = ['date'])
df4 = pd.DataFrame()

print df.isin(df2)
print df.isin(df3)
print df.isin(df4)

This outputs:

    date
0   True
1   True
2  False
    date
0 1970-01-01
1 1970-01-01
2 1970-01-01
    date
0 1970-01-01
1 1970-01-01
2 1970-01-01

I would normally expect a list of False values instead of '1970-01-01'? I notice that with pandas = 0.16.2 and numpy = 1.9.2, df.isin(df3) produces the more expected:

   date
0  False
1  False
2  False

But df.isin(df4) is as previous.

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-279.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 2.2
Cython: None
numpy: 1.12.0
scipy: None
statsmodels: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

Thanks for the report, does look buggy. Problem seems to be in DataFrame.eq

In [115]: df.eq(pd.DataFrame({'date': [np.nan] * 3}))
Out[115]: 
        date
0 1970-01-01
1 1970-01-01
2 1970-01-01

@chris-b1 chris-b1 added this to the Next Major Release milestone Feb 22, 2017
@chris-b1 chris-b1 added Bug Datetime Datetime data dtype labels Feb 22, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants