BUG: pd.Series([1]).isin(pd.Series(['1'])) != pd.Series(['1']).isin(pd.Series([1])) #39131

JMBurley · 2021-01-12T20:01:48Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

pd.Series([1,2,3]).isin(pd.Series(['1','2','3']))  # returns: True, True, True
pd.Series(['1','2','3']).isin(pd.Series([1,2,3]))  # returns: False, False, False

Problem description

The .isin() method is inconsistent when asked to compare numerical strings to ints, such that (pd.Series([1,2,3]).isin(pd.Series(['1','2','3']))) != (pd.Series(['1','2','3']).isin(pd.Series([1,2,3])))

There should be consistent handling of these cases.

This bug has been touched upon in issues #16938 and #24918 , but this report is distinct in identifying a smaller subset of the problem that should be solvable in a way that those issues have not been.

Expected Output

Two options

For backwards compatibility, I would suggest that pandas now needs to offer .isin() matching between str and int, as some users will have used this in their code.
Alternatively isin could raise an error if the dtypes input are not the same. This would be easier to achieve after pd.NA is fully integrated and locked down in pandas (Why? Because people will not like it if they can't compare superficially equivalent columns because the existance of an NA value has changed the dytpe of a column, as in pd<v1.0.0)

^^ Mainly we need to take a call on the desired behaviour for isin; the current inconsistency is not valid behaviour but it not clear what behaviour we should have instead.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Tue Nov 10 00:07:31 PST 2020; root:xnu-4903.278.51~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.2.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.2.0
Cython : None
pytest : 3.10.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : 0.6.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.51.2

The text was updated successfully, but these errors were encountered:

JMBurley · 2021-01-12T20:08:57Z

Correction: I think one of the datetime fixes for isin() between 1.1.5 and 1.2.0 may have altered this issue by forcing False if you attempt to compare different dtypes in an isin()?

If that is deliberate behaviour that is an okay resolution, although it would be better to raise a warning when an attempt is made to compare incomparable dtypes that will always return false?

simonjayhawkins · 2021-01-19T11:07:12Z

Thanks @JMBurley for the report.

Correction: I think one of the datetime fixes for isin() between 1.1.5 and 1.2.0 may have altered this issue by forcing False if you attempt to compare different dtypes in an isin()?

closing this as the code sample does give the expected output on 1.2.0 see also #38781

feel free to open a new issue to highlight any of the other points raised here.

JMBurley added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 12, 2021

simonjayhawkins closed this as completed Jan 19, 2021

simonjayhawkins added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff isin isin method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 19, 2021

simonjayhawkins added this to the No action milestone Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pd.Series([1]).isin(pd.Series(['1'])) != pd.Series(['1']).isin(pd.Series([1])) #39131

BUG: pd.Series([1]).isin(pd.Series(['1'])) != pd.Series(['1']).isin(pd.Series([1])) #39131

JMBurley commented Jan 12, 2021 •

edited

Loading

INSTALLED VERSIONS

JMBurley commented Jan 12, 2021

simonjayhawkins commented Jan 19, 2021

BUG: pd.Series([1]).isin(pd.Series(['1'])) != pd.Series(['1']).isin(pd.Series([1])) #39131

BUG: pd.Series([1]).isin(pd.Series(['1'])) != pd.Series(['1']).isin(pd.Series([1])) #39131

Comments

JMBurley commented Jan 12, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

JMBurley commented Jan 12, 2021

simonjayhawkins commented Jan 19, 2021

JMBurley commented Jan 12, 2021 •

edited

Loading

Output of `pd.show_versions()`