Skip to content

BUG: DataFrame.isin fails when other is a categorical series #34256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
brandon-b-miller opened this issue May 19, 2020 · 8 comments · Fixed by #34363
Closed
3 tasks

BUG: DataFrame.isin fails when other is a categorical series #34256

brandon-b-miller opened this issue May 19, 2020 · 8 comments · Fixed by #34363
Assignees
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@brandon-b-miller
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
print(pd.__version__)

x = pd.DataFrame.from_dict({'a':[1,2,3], 'b':[4,5,6]})
y = pd.DataFrame({'a':[1,2,3]}, dtype='category')
print(x.isin(y))

y = pd.Series([1,2,3]).astype('category')
print(x.isin(y))

1.0.3
      a      b
0  True  False
1  True  False
2  True  False
Traceback (most recent call last):
  File "/home/brmiller/repro.py", line 9, in <module>
    print(x.isin(y))
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/frame.py", line 8423, in isin
    return self.eq(values.reindex_like(self), axis="index")
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 814, in f
    self, other, op, fill_value=None, axis=axis, level=level
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/ops/__init__.py", line 618, in _combine_series_frame
    new_data = left._combine_match_index(right, func)
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/frame.py", line 5317, in _combine_match_index
    new_data = func(self.values.T, other.values).T
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/ops/common.py", line 64, in new_method
    return method(self, other)
  File "/home/brmiller/anaconda3/envs/pandas1/lib/python3.7/site-packages/pandas/core/arrays/categorical.py", line 72, in func
    raise ValueError("Lengths must match.")
ValueError: Lengths must match.

Problem description

This operation previously worked in pandas 0.25.3 and gave the same result as the case when other is a single column DataFrame.

Expected Output

      a      b
0  True  False
1  True  False
2  True  False
      a      b
0  True  False
1  True  False
2  True  False

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-76-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1
setuptools : 46.4.0.post20200518
Cython : 0.29.17
pytest : 5.4.2
hypothesis : 5.14.0
sphinx : 3.0.3
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
pytest : 5.4.2
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.49.1

@brandon-b-miller brandon-b-miller added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2020
@TomAugspurger
Copy link
Contributor

Seems to be fixed on master. Not sure if we have a test for it though.

@TomAugspurger TomAugspurger added Needs Tests Unit test(s) needed to prevent regressions and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone May 19, 2020
@saiajay5674
Copy link

Can I work on this?

@MarcoGorelli
Copy link
Member

@saiajay5674 PRs are welcome, see the contributing guide for how to get started

Perhaps check first to see if there's already a test for this issue, and if there's not, you could add one

@vampypandya
Copy link
Contributor

@MarcoGorelli We can create Pytest test cases, right?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented May 25, 2020

@MarcoGorelli We can create Pytest test cases, right?

Yes, that's the testing framework pandas uses. If you'd like to work on this, please comment 'take' so the issue is assigned to you

@vampypandya
Copy link
Contributor

Take

@vampypandya
Copy link
Contributor

I have committed the changes. What are the next steps? I am new to Open Source Contribution. Please let me know.
@MarcoGorelli

@jreback jreback modified the milestones: Contributions Welcome, 1.1 May 25, 2020
@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 25, 2020
@vampypandya
Copy link
Contributor

I have committed. Please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants