-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: improve construct_1d_object_array_from_listlike #60461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: improve construct_1d_object_array_from_listlike #60461
Conversation
Some mypy errors, but nice find! mypy.....................................................................Failed
- hook id: mypy
- duration: 88.21s
- exit code: 1
pandas/core/dtypes/cast.py:1604: error: Argument 1 to "len" has incompatible type "Iterable[Any]"; expected "Sized" [arg-type]
pandas/core/common.py:256: error: Unused "type: ignore" comment [unused-ignore]
Found 2 errors in 2 files (checked 1446 source files) |
pandas/core/dtypes/cast.py
Outdated
@@ -1602,7 +1602,8 @@ def construct_1d_object_array_from_listlike(values: Sized) -> np.ndarray: | |||
# numpy will try to interpret nested lists as further dimensions, hence | |||
# making a 1D array that contains list-likes is a bit tricky: | |||
result = np.empty(len(values), dtype="object") | |||
result[:] = values | |||
for i, obj in enumerate(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any advantage of using np.fromiter(values, dtype="object", count=len(values))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice, wasn't aware of that. From a quick test that seems to be even a bit faster
If I want something that is both Iterable and Sized, then that's Collection or Sequence ? |
Appears either should work from the inheritance structure (Sequence inherits from Collection) https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes |
Thanks @jorisvandenbossche |
…_array_from_listlike) (#60483) Backport PR #60461: PERF: improve construct_1d_object_array_from_listlike Co-authored-by: Joris Van den Bossche <[email protected]>
* PERF: improve construct_1d_object_array_from_listlike * use np.fromiter and update annotation
This improved
construct_1d_object_array_from_listlike
, especially for the case where the objects inside the array like are itself array-likes with a potentially expensive conversion to numpy.It seems that when doing
result[:] = values
, numpy will still check the__array__
method for each object invalues
, while when iterating and assigning the objects one by one, that does not happen.And even in the case where
__array__
is not expensive at all (or is absent), it seems that iterating is faster than the single assignment:This is a useful performance improvement in general, I assume, but I am specifically doing it to fix the performance issue reported in #59657. That does is mostly for the 2.3.x branch, because that issue is avoided on main because of #57205 (avoid Series construction, which ends up calling
construct_1d_object_array_from_listlike
, in the first place)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.