-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: improve iloc list indexing #15504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: improve iloc list indexing #15504
Conversation
@jorisvandenbossche : It's not ideal, but given that users have had issues with this in the past, I would prefer that you keep that in the code than remove it. If you leave the check, what is the speedup? |
@gfyoung It's not the most important part of the speed-up:
but it is called several times. In this case I think twice, which gives ca 5µs on the total of ca 50µs (for one of the examples above). So certainly not huge, but for performance sensitive code, a significant part for something that is not necessary (in this case). Other option (instead of the
or, could also do this in the
|
@jorisvandenbossche : Pre-checking |
6cbac2e
to
6d2705c
Compare
do we have micro benchmarks for this? |
@@ -2378,7 +2378,8 @@ def take(self, indices, axis=0, convert=True, is_copy=False, **kwargs): | |||
-------- | |||
numpy.ndarray.take | |||
""" | |||
nv.validate_take(tuple(), kwargs) | |||
if kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assume this is going to go back after #15850 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave it in even then (for performance sensitive methods like take here). It is still 10x faster even after #15850 (although 10x is not that important anymore at those low numbers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh ok sure.
We actually already do have some, but I should run them (at the top are only some basic |
So the remaining test failure is related to the following:
So in the above (this is the behaviour of this PR), after take this preserves the object dtype with datetime.datetime values (I suppose due to using the fastpath in series creation). While the test expects a datetime64 result. I ran the indexing benchmarks:
So the iloc list_like and array improve as expected (the array case even more than I expected). The reason that the period shallow copy is much slower is not really clear to me (I don't think I touched code related to shallow_copy). But when I reran the benchmarks, both those tests with slowdown were not consistent slower, so probably noise. |
@jorisvandenbossche yeah the If you can separate out this specific case (to a separate) test (and put a reference to this issue) would be great. After you merge this, I think I will create some |
@jorisvandenbossche I think travis was flaky, can you push again |
af7a7ad
to
bf54a0b
Compare
lgtm. merge away. |
thanks! |
Did a profile of
iloc
using a list indexer, and noticed a few parts that seemingly can be optimized rather easily:np.take
)_take
method that does not do this for internal usage?)The failing tests are currently just the ones that checked for the kwarg validation (and one failing test due to the fastpath taken where object datetimes are not coerced to datetime64).
Compared to master, I get up to 40 to 60% improvement on some simple tests:
PR:
master:
There are some details to work out, but already putting it up here for discussion (and need to run the benchmark suite).
From the remaining time that iloc takes, a large part is spent in the creation of the actual result (the new series): creating a new index (~ 17%), creating a new series based on the values and index (~ 32%). So in total almost half of the time, and thus wondering if this is not possible to speedup.