-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series.map() coerces Int64Dtype and int64[pyarrow] series which contain missing values to float64 #57189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
I had the same issue, based on the issue above, it seemed to be intentional change in 2.2.X |
@rohanjain101 Yeah you're right. I believe it changed from v2.1.x pandas/pandas/core/arrays/base.py Line 2188 in a671b5a
to pandas/pandas/core/arrays/masked.py Lines 1332 to 1333 in c3014ab
pandas/pandas/core/arrays/arrow/array.py Lines 1426 to 1430 in c3014ab
in v2.2.0. Guess you might have to cast it back to your wanted dtype @weltenseglr |
Thanks for your feedback, @rohanjain101 @remiBoudreau. I think the documentation is outdated then?
Casting back won't help in my case, as my mapper must not receive coerced values... I'm trying to figure out how to maintain the original data type. Any ideas? |
@weltenseglr You could use the workaround described in #56606 (comment) Using the python map operator instead of Series.map. Atleast for now, this preserves the original type. |
thanks, @rohanjain101. I guess I will have to implement a workaround based on your suggestion or revert to a 2.1 release. Is this going to change in pandas 3.0? I think this doesn't play well with the project's intentions regarding PDEP-10 and its benefits? |
More appropriate, also there is a bug in map/apply that converts ints to floats and thus making URLs invalid. pandas-dev/pandas#57189
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Using map on a Series with dtype Int64Dtype or int64[dtype] will coerce values to float if it contains any missing values.
Expected Behavior
Series.map() should not coerce into float64 with these dtypes.
As stated in the documentation on working with missing data:
Installed Versions
INSTALLED VERSIONS
commit : db11e25
python : 3.12.1.final.0
python-bits : 64
OS : Linux
OS-release : 6.6.13-200.fc39.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Sat Jan 20 18:03:28 UTC 2024
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_IE.UTF-8
LOCALE : en_IE.UTF-8
pandas : 3.0.0.dev0+197.gdb11e25d2b
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 67.7.2
pip : 23.2.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.0.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.19.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 15.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: