Skip to content

PERF: improve construct_1d_object_array_from_listlike #60461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 1, 2024

This improved construct_1d_object_array_from_listlike, especially for the case where the objects inside the array like are itself array-likes with a potentially expensive conversion to numpy.
It seems that when doing result[:] = values, numpy will still check the __array__ method for each object in values, while when iterating and assigning the objects one by one, that does not happen.

And even in the case where __array__ is not expensive at all (or is absent), it seems that iterating is faster than the single assignment:

In [12]: class A:
    ...:     def __init__(self):
    ...:         self.data = np.random.randn(5)
    ...:     def __array__(self, dtype=None, copy=None):
    ...:         #print("calling __array__")
    ...:         return self.data

In [13]: N = 10_000

In [14]: data = [A() for _ in range(N)]

In [17]: %%timeit
    ...: arr = np.empty((N, ), dtype=object)
    ...: arr[:] = data
    ...: 
5.39 ms ± 33.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: %%timeit
    ...: arr = np.empty((N, ), dtype=object)
    ...: for i, obj in enumerate(data):
    ...:     arr[i] = obj
    ...: 
424 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is a useful performance improvement in general, I assume, but I am specifically doing it to fix the performance issue reported in #59657. That does is mostly for the 2.3.x branch, because that issue is avoided on main because of #57205 (avoid Series construction, which ends up calling construct_1d_object_array_from_listlike, in the first place)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Constructors Series/DataFrame/Index/pd.array Constructors labels Dec 1, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Dec 1, 2024
@mroeschke
Copy link
Member

Some mypy errors, but nice find!

mypy.....................................................................Failed
- hook id: mypy
- duration: 88.21s
- exit code: 1

pandas/core/dtypes/cast.py:1604: error: Argument 1 to "len" has incompatible type "Iterable[Any]"; expected "Sized"  [arg-type]
pandas/core/common.py:256: error: Unused "type: ignore" comment  [unused-ignore]
Found 2 errors in 2 files (checked 1446 source files)

@@ -1602,7 +1602,8 @@ def construct_1d_object_array_from_listlike(values: Sized) -> np.ndarray:
# numpy will try to interpret nested lists as further dimensions, hence
# making a 1D array that contains list-likes is a bit tricky:
result = np.empty(len(values), dtype="object")
result[:] = values
for i, obj in enumerate(values):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any advantage of using np.fromiter(values, dtype="object", count=len(values))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice, wasn't aware of that. From a quick test that seems to be even a bit faster

@jorisvandenbossche
Copy link
Member Author

Some mypy errors,

If I want something that is both Iterable and Sized, then that's Collection or Sequence ?

@mroeschke
Copy link
Member

If I want something that is both Iterable and Sized, then that's Collection or Sequence ?

Appears either should work from the inheritance structure (Sequence inherits from Collection) https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

@mroeschke mroeschke merged commit 8695401 into pandas-dev:main Dec 3, 2024
47 of 51 checks passed
@mroeschke
Copy link
Member

Thanks @jorisvandenbossche

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 3, 2024
mroeschke pushed a commit that referenced this pull request Dec 3, 2024
…_array_from_listlike) (#60483)

Backport PR #60461: PERF: improve construct_1d_object_array_from_listlike

Co-authored-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche jorisvandenbossche deleted the construction-1d-object branch December 3, 2024 20:51
KevsterAmp pushed a commit to KevsterAmp/pandas that referenced this pull request Mar 12, 2025
* PERF: improve construct_1d_object_array_from_listlike

* use np.fromiter and update annotation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Melt 2x slower when future.infer_string option enabled
2 participants