Skip to content

BUG: "replace" does not work with different numpy-int types even when values are the same #45311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
Jakobhenningjensen opened this issue Jan 11, 2022 · 9 comments · Fixed by #45797
Closed
3 tasks done
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions replace replace method
Milestone

Comments

@Jakobhenningjensen
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

maps = pd.Series([np.int64(0),np.int64(2),np.int64(1)]) #index-value should be mapped to series-value

# 0 0    0-->0
# 1 2    1-->2
# 2 1    2-->1

map_dict = {old:new for (old,new) in zip(maps.values,maps.index)}

map_dict == {0: 0, 2: 1, 1: 2} #True

labs = pd.Series([1,1,1,0,0,2,2,2]).astype(np.int32) #Simulate a list of observations

(labs.replace({0: 0, 2: 1, 1: 2}) == labs.replace(map_dict)).mean() #0.25

labs.replace({0: 0, 2: 1, 1: 2}) # works

# 0    2
# 1    2
# 2    2
# 3    0
# 4    0
# 5    1
# 6    1
# 7    1



labs.replace(map_dict) #doesnt work
# 0    1
# 1    1
# 2    1
# 3    0
# 4    0
# 5    2
# 6    2
# 7    2

Issue Description

Using a mapping dictionary where the keys have np.int64 as type but a series of data where the values are np.int32 then the replacement is not working.

Expected Behavior

Expected to that all integer types can be evaluated against each-other

Installed Versions

INSTALLED VERSIONS

commit : 7c48ff4
python : 3.8.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Danish_Denmark.1252

pandas : 1.2.5
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.23
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : None
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@Jakobhenningjensen Jakobhenningjensen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022
@mroeschke
Copy link
Member

mroeschke commented Jan 14, 2022

Could you try on a newer version of pandas? I get this on the main branch

In [7]: labs.replace({0: 0, 2: 1, 1: 2})
Out[7]:
0    2
1    2
2    2
3    0
4    0
5    1
6    1
7    1
dtype: int32

In [8]: labs.replace(map_dict)
Out[8]:
0    2
1    2
2    2
3    0
4    0
5    1
6    1
7    1
dtype: int32

@mroeschke mroeschke added Needs Info Clarification about behavior needed to assess issue replace replace method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2022
@Jakobhenningjensen
Copy link
Author

I still get the same on version 1.3.5

import numpy as np
import pandas as pd

print(pd.__version__) #1.3.5

maps = pd.Series([np.int64(0),np.int64(2),np.int64(1)]) #index-value should be mapped to series-value

# 0 0    0-->0
# 1 2    1-->2
# 2 1    2-->1

map_dict = {old:new for (old,new) in zip(maps.values,maps.index)}

map_dict == {0: 0, 2: 1, 1: 2} #True

labs = pd.Series([1,1,1,0,0,2,2,2]).astype(np.int32) #Simulate a list of observations

(labs.replace({0: 0, 2: 1, 1: 2}) == labs.replace(map_dict)).mean() #0.25

@mroeschke
Copy link
Member

Thanks for checking. Looks like a lot of bug fixes were applied to Series.replace in the upcoming version, 1.4, which may have fixed this issue. Don't see this case explicitly mentions so it may be good to add a unit test

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Needs Info Clarification about behavior needed to assess issue labels Jan 14, 2022
@anupamsingh24
Copy link

I can add few unit tests for this check.

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 15, 2022
@simonjayhawkins
Copy link
Member

Thanks for checking. Looks like a lot of bug fixes were applied to Series.replace in the upcoming version, 1.4, which may have fixed this issue. Don't see this case explicitly mentions so it may be good to add a unit test

fixed in commit: [fb9f205] BUG: Series.setitem failing to cast numeric values (#45121)

not directly related to replace, so adding a test is probably a good idea.

@ajayd-san
Copy link

Hey, is this issue still open? I can work on it, with some guidance since it's my first time here.

@stanleycai123
Copy link

Hi! I'm a first time contributor as well, and would love to contribute if it's still open.

@stanleycai123
Copy link

Not sure if this is new - the issue is also extant when comparing np.int16 and np.int32:

import pandas as pd
import numpy as np

def test_replace_different_np_int_types():
    # Testing different numpy int types
    ser = pd.Series([1,1,1,0,0,2,2,2]).astype(np.int32) 
    
    maps = pd.Series([np.int16(0),np.int16(2),np.int16(1)]) 
    map_dict = {old:new for (old,new) in zip(maps.values,maps.index)}
    result = ser.replace(map_dict)
    expected = ser.replace({0: 0, 2: 1, 1: 2})

    return (result == expected).mean()
    
print(test_replace_different_np_int_types()) #returns 0.25, not 1

Should the unit test be added to https://github.com/pandas-dev/pandas/blob/main/pandas/tests/series/methods/test_replace.py? I can add a unit test comparing np.int16 / np.int64 / np.int32

@vaish-muk
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions replace replace method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants