PERF: get_dummies create array of 0/1s with appropriate layout for internals #40299

jreback · 2021-03-08T13:15:12Z

use the order paramater here: https://numpy.org/doc/stable/reference/generated/numpy.eye.html or maybe .reshape (as .take gives a copy)

To clarify: the take is needed for the actual get_dummies operation.
But given your suggestion, I thought that specifying order in eye might speed up the take call, but from a quick test that doesn't seem to matter

guess this is fine

jbrockmendel · 2021-09-22T02:09:57Z

@jorisvandenbossche investigating a perf regression in reshape.GetDummies.time_get_dummies_1d current best guess is this line. Would doing arr.T.take(codes, axis=0) be equivalent here? That's about 7x faster for me locally.

@jorisvandenbossche gentle ping; time_get_dummies_1d is the biggest perf regression im seeing vs 1.2.0

@jorisvandenbossche gentle ping

Can you point to the performance regression? Is it something that showed up in the online ASV bencharks? (or just the one that Simon mentioned above?)

Note that I mentioned above that this will actually cause a slowdown for BlockManager if you isolate the get_dummies, so some performance regression was anticipated (see the timings in the top post, which has both a timing for only get_dummies, and a timing for get_dummies + copy). As I noted back then, it could also special case this to do it only for ArrayManager.

It's the same one simon linked, yes, but I see a much worse deterioriation locally (likely measuring from a different baseline):

from asv_bench.benchmarks.reshape import * setup() self = GetDummies() self.setup() %timeit self.time_get_dummies_1d() 5.79 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- 1.2.0 43.1 ms ± 842 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # <- master

profiling with %prun -s cumtime for n in range(100): self.time_get_dummies_1d() traces essentially the whole change back to this .take

I see:

In [2]: %timeit pd.get_dummies(s, sparse=False) 4.22 ms ± 452 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- 1.2.5 12.6 ms ± 1.88 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- 1.3.3 12.7 ms ± 559 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- master In [3]: %timeit pd.get_dummies(s, sparse=False).copy() 14.7 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- 1.2.5 13.6 ms ± 452 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- 1.3.3 13.2 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <-- master

(s is defined in a snippet in the top post, but it's the same as the ASV benchmark you did)

How does using arr.T.take(codes, axis=0) affect these for you? Locally making that change does make the .copy slower, but by a much smaller margin than the get_dummies becomes faster

In [3]: %timeit pd.get_dummies(s, sparse=False) 43 ms ± 812 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # <- master 6.36 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- transpose-before-take In [4]: %timeit pd.get_dummies(s, sparse=False).copy() 47.6 ms ± 835 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # <- master 15.7 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # <- transpose-before-take

How does using arr.T.take(codes, axis=0) affect these for you?

See the explanation in the top post of this PR. The intent (I didn't check again in detail to be clear, just rereading what I wrote back then) was to ensure that the data was in the best memory layout for pandas' internals.
This was more important for the ArrayManager (because there it always needs to be contiguous by column, while for a 2D block we can store it either way. But even for 2D blocks, some operations will be faster afterwards if the layout is column-major (i.e. row-major in the transposed way how we store it in the Block)). That's also why I mentioned that we could do a if/else here to do something different for ArrayManager vs BlockManager.

But based on the timings that I got / get, I didn't see a strong reason to do so. You get quite some different timings of course that gives a different impression, no idea why it would give such a difference (what version of numpy are you using? the timings I did yesterday are with 1.20 or 1.21 depending on the pandas version)

OK, I somehow got it in my head that doing the .T before the take would keep the result F-ordered, but apparently not. Thanks for taking a look.

@simonjayhawkins a good-ish explanation here would be that the results im getting locally are a wild outlier. Can you give it a try to test this hypothesis?

-Original file line number
+Diff line change
@@ Expand Up / @@ -1039,7 +1039,8 @@ def get_empty_frame(data) -> DataFrame: @@
             return out
         else:
-            dummy_mat = np.eye(number_of_cols, dtype=dtype).take(codes, axis=0)
+            # take on axis=1 + transpose to ensure ndarray layout is column-major
+            dummy_mat = np.eye(number_of_cols, dtype=dtype).take(codes, axis=1).T
             if not dummy_na:
                 # reset NaN GH4446
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: get_dummies create array of 0/1s with appropriate layout for internals #40299

Diff view

Diff view

There are no files selected for viewing

jreback Mar 8, 2021

jorisvandenbossche Mar 8, 2021

jreback Mar 15, 2021

jbrockmendel Sep 22, 2021

jbrockmendel Oct 5, 2021

jbrockmendel Oct 12, 2021

jorisvandenbossche Oct 13, 2021 •

edited

Loading

jbrockmendel Oct 13, 2021

jorisvandenbossche Oct 13, 2021 •

edited

Loading

jbrockmendel Oct 13, 2021

jorisvandenbossche Oct 14, 2021

jbrockmendel Oct 14, 2021

PERF: get_dummies create array of 0/1s with appropriate layout for internals #40299

PERF: get_dummies create array of 0/1s with appropriate layout for internals #40299

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Oct 13, 2021 •

edited

Loading

jorisvandenbossche Oct 13, 2021 •

edited

Loading