Skip to content

Series.update fails with categorical types #25744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexRiina opened this issue Mar 16, 2019 · 9 comments · Fixed by #38873
Closed

Series.update fails with categorical types #25744

AlexRiina opened this issue Mar 16, 2019 · 9 comments · Fixed by #38873
Assignees
Labels
Categorical Categorical Data Type good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@AlexRiina
Copy link

cats = pandas.api.types.CategoricalDtype(['a', 'b', 'c', 'd'])
# cats = None
s1 = pandas.Series(['a', 'b', 'c'], index=[1, 2, 3], dtype=cats)
s2 = pandas.Series(['b', 'a'], index=[1, 2], dtype=cats)

s1.update(s2)

With the dtype=None, the s1 series is updated to be ['b', 'a', 'c'] but with the dtype=cats, the update fails with error "ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 2 output values where the mask is true"

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-36-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 3.3.2
pip: 18.0
setuptools: 40.5.0
Cython: 0.27.3
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.6.6
patsy: 0.5.1
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 3.0.1
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@AlexRiina
Copy link
Author

Another version of the example code is

s1 = pandas.Series(['a', 'b', 'c'], index=[1, 2, 3]).astype("category")
s2 = pandas.Series(['b', 'a'], index=[1, 2])

s1.update(s2)

which fails just the same.

@mroeschke
Copy link
Member

Thanks for the report. Investigation and PRs welcome!

@mroeschke mroeschke added Bug Categorical Categorical Data Type labels Mar 16, 2019
@giuliobeseghi
Copy link

This fails with dataframes too:

import pandas as pd

df = pd.DataFrame({"A": ["a", "b", "c"], "B": [1, 2, 3]})
dfs = pd.concat([df] * 3, ignore_index=True)
dfs["C"] = dfs.A

dfs.head()

A B C
0 a 1 a
1 b 2 b
2 c 3 c
3 a 1 a
4 b 2 b

Now imagine I want to have just column 'C' as categorical:

print(dfs.C.astype("category").dtypes) -> category

dfs.update(dfs.C.astype("category"))
print(dfs.dtypes)

[object, int64, object]

@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 24, 2020
@simonjayhawkins simonjayhawkins added the Needs Tests Unit test(s) needed to prevent regressions label May 6, 2020
@simonjayhawkins
Copy link
Member

fixed in master, xref #33984

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1482.g18a7e973c'
>>>
>>> cats = pd.api.types.CategoricalDtype(['a', 'b', 'c', 'd'])
>>> # cats = None
>>> s1 = pd.Series(['a', 'b', 'c'], index=[1, 2, 3], dtype=cats)
>>> s2 = pd.Series(['b', 'a'], index=[1, 2], dtype=cats)
>>>
>>> s1.update(s2)
>>>
>>> s1
1    b
2    a
3    c
dtype: category
Categories (4, object): [a, b, c, d]

@erfannariman
Copy link
Member

@mroeschke which actions are still open for this issue? I see that a test is added and it is fixed on master/

@simonjayhawkins
Copy link
Member

I see that a test is added

from #34030 (comment)

I think the dtype issue for DataFrame.update with categorical columns mentioned in the first issue is still unaddressed, so likely don't want to only close and lose that (perhaps break into its own issue)?

@erfannariman
Copy link
Member

So we need a test for this? @simonjayhawkins

@erfannariman
Copy link
Member

take

@adhilcodes
Copy link

take

@jreback jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 1, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type good first issue Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants