PERF: dataframe construction from recarray is slow #44826
Labels
Needs Triage
Issue that has not been reviewed by a pandas team member
Performance
Memory or execution speed performance
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the master branch of pandas.
Reproducible Example
I find dataframe construction from recarray is slow, but from_records() is fast. This is unreasonable.
Suppose the recarray is generated from the following step:
Time comparison:
The reason for this odd behaviour results from pandas.core.internals.construction.rec_array_to_mgr. The following code passes the recarray into _get_names_from_index, which has a large and totally unnecessary loop across the array.
pandas/pandas/core/internals/construction.py
Line 179 in a3702e2
pandas/pandas/core/internals/construction.py
Line 724 in a3702e2
And actually I personally believe this function is designed for the nested data since it's called in nested_data_to_arrays.
pandas/pandas/core/internals/construction.py
Line 518 in a3702e2
Thus maybe directly change to use default_index(len(data)) is a fix.
Installed Versions
1.3.4
Prior Performance
No response
The text was updated successfully, but these errors were encountered: