PERF: df.astype("float64[pyarrow]") is slow, df.astype("Float64") is super slow #60066
Closed
2 of 3 tasks
Labels
Constructors
Series/DataFrame/Index/pd.array Constructors
ExtensionArray
Extending pandas with custom dtypes or arrays.
good first issue
Performance
Memory or execution speed performance
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
CPU times: user 3.87 s, sys: 185 ms, total: 4.05 s
Wall time: 4.05 s
566 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
148 ms ± 984 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
It takes over 4s for a middle-sized dataframe of 5000 x 5000 to convert to
Float64
, whilefloat64[pyarrow]
is about 7x faster. The fastestfloat64
takes only 150ms, even withcopy
enabled.Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.10.14
python-bits : 64
OS : Linux
OS-release : 5.15.0-122-generic
Version : #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0
pip : 24.0
Cython : 3.0.7
sphinx : 7.3.7
IPython : 8.25.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : None
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.10.0
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : 2.9.9
pymysql : 1.4.6
pyarrow : 16.1.0
pyreadstat : None
pytest : 8.2.2
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.0
sqlalchemy : 2.0.31
tables : 3.9.2
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : 0.22.0
tzdata : 2024.1
qtpy : 2.4.1
pyqt5 : None
Prior Performance
No response
The text was updated successfully, but these errors were encountered: