Skip to content

DataFrame.replace: TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode' #16784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eromoe opened this issue Jun 28, 2017 · 8 comments · Fixed by #36202
Closed
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations Unicode Unicode strings
Milestone

Comments

@eromoe
Copy link

eromoe commented Jun 28, 2017

Code Sample, a copy-pastable example if possible

path1 = '/some.xls'
df1 = pd.read_excel(path1)
columns_values_map={
    'positive': {
        '正面':1,
        '中立': 1,
        '负面':0
    }
}

df1.replace(columns_values_map)

Problem description

got error: TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode'

Actually df1['positive'] only has value in (0, 1) , but I think it should not throw exception here.

@jreback
Copy link
Contributor

jreback commented Jun 28, 2017

pls show a copy-pastable example, IOW construct df1 here.

@eromoe
Copy link
Author

eromoe commented Jun 28, 2017

It's simple

columns_values_map={
    'positive': {
        '正面':1,
        '中立': 1,
        '负面':0
    }
}
df1 = pd.DataFrame({'positive': np.ones(10)})
df1.replace(columns_values_map)
#  TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'unicode'

df2 = pd.DataFrame({'positive': ['正面', '负面']})
df2.replace(columns_values_map)
# this work

I am using pandas to couple some excels with some common column but different value.
Now I have to use something like

for col, v_map in self.columns_values_map.items():
    cats = df[col].astype('category')
    cat_map = {k:v for k, v in v_map.items() if k in cats}
    if cat_map:
        df[col] = df[col].map(lambda x: cat_map[x])

@jreback
Copy link
Contributor

jreback commented Jun 28, 2017

This looks correct to me. You are trying to replace integers with string-likes, none of which match. Are you objecting over the error message?

FYI

In [35]: df1['positive'].map(columns_values_map['positive'])
Out[35]: 
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: positive, dtype: float64

Though for the reverse we let this pass

In [40]: df = DataFrame({'A': [1., 2.], 'B': ['foo', 'bar']})

In [41]: df.replace({'A':{20:1}})
Out[41]: 
     A    B
0  1.0  foo
1  2.0  bar

@jreback
Copy link
Contributor

jreback commented Jun 28, 2017

@chris-b1 @jorisvandenbossche @TomAugspurger

comments?

@jreback jreback added API Design Dtype Conversions Unexpected or buggy dtype conversions labels Jun 28, 2017
@TomAugspurger
Copy link
Contributor

For consistency, and since replace is a general purpose find / replace method, it'd be nice if this didn't raise a TypeError.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 12, 2017
@sanjaydeo96
Copy link

Kindly run above cells if you are using Jupyter notebook. I had same problem which shorted out by that.

@diegoquintanav
Copy link

diegoquintanav commented Dec 28, 2017

I'm having the same problem. I say a flag like the one that to_numeric has would do great here.

@simonjayhawkins
Copy link
Member

code sample in #16784 (comment) doesn't raise on master

>>> pd.__version__
'1.2.0.dev0+261.g9fea06cec'
>>>
>>> columns_values_map = {"positive": {"正面": 1, "中立": 1, "负面": 0}}
>>> df1 = pd.DataFrame({"positive": np.ones(10)})
>>> df1.replace(columns_values_map)
   positive
0       1.0
1       1.0
2       1.0
3       1.0
4       1.0
5       1.0
6       1.0
7       1.0
8       1.0
9       1.0
>>>

maybe fixed by #36093? cc @jbrockmendel

@simonjayhawkins simonjayhawkins added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Dtype Conversions Unexpected or buggy dtype conversions labels Sep 5, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Sep 7, 2020
@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Unicode Unicode strings labels Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations Unicode Unicode strings
Projects
None yet
7 participants