Use memcpy / realloc more effectively in hashtable #57695

WillAyd · 2024-03-01T18:25:14Z

No description provided.

WillAyd · 2024-03-01T18:26:22Z

pandas/_libs/hashtable_class_helper.pxi.in

            if self.external_view_exists:
                raise ValueError("external reference but "
                                 "Vector.resize() needed")
-            self.resize()
+            self.resize(self.data.m * 4)


The multiplying by 4 seems somewhat arbitrary and counter-productive for large arrays, but keeping status quo for now

WillAyd · 2024-03-01T18:28:48Z

pandas/_libs/hashtable_class_helper.pxi.in

        if not self.data.data:
            raise MemoryError()
-        for i in range(m):


There are still a few other places where we are doing loop assignment in the object class, but I have not touched them because I don't know what Cython does for reference counting during assignment that a simple mempcy would not know about. Something to research futher

WillAyd · 2024-03-01T18:30:55Z

pandas/_libs/hashtable_class_helper.pxi.in

+            if self.external_view_exists:
+                raise ValueError("external reference but "
+                                 "Vector.resize() needed")
+            self.resize(new_size)  # TODO: do we want to multiply by 4?


With the extend I choose not to multiple by a factor of 4. This would favor performance when doing really large appends versus many small appends in a loop

Not sure how to best manage this. Long term I would love to get rid of this self-managed code and use the C++ standard library

WillAyd · 2024-03-01T18:37:57Z

Hashing benchmarks:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                                        |
|----------|----------------------------|------------------------------------|---------|----------------------------------------------------------------------------------------------|
| -        | 1.12±0.2ms                 | 901±40μs                           |    0.8  | hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'numpy.uint64'>, 1000000) |
| -        | 1.26±0.1ms                 | 893±30μs                           |    0.71 | hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'numpy.int64'>, 1000000)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

algorithms:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                       |
|----------|----------------------------|------------------------------------|---------|-----------------------------------------------------------------------------|
| +        | 2.11±0.04μs                | 2.34±0.1μs                         |    1.11 | algorithms.Duplicated.time_duplicated(True, 'last', 'duration[s][pyarrow]') |
| -        | 5.81±0.2ms                 | 5.28±0.08ms                        |    0.91 | algorithms.Hashing.time_series_string                                       |
| -        | 2.27±0.2ms                 | 1.89±0.04ms                        |    0.83 | algorithms.Quantile.time_quantile(1, 'linear', 'float64')                   |
| -        | 1.09±0.04ms                | 850±20μs                           |    0.78 | algorithms.Quantile.time_quantile(1, 'nearest', 'uint64')                   |
| -        | 1.89±0.2ms                 | 1.31±0.06ms                        |    0.7  | algorithms.Quantile.time_quantile(0, 'midpoint', 'uint64')                  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

groupby:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                                   |
|----------|----------------------------|------------------------------------|---------|-----------------------------------------------------------------------------------------|
| +        | 28.5±0.1ms                 | 31.7±1ms                           |    1.11 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'var')                       |
| -        | 29.0±2ms                   | 26.4±0.4ms                         |    0.91 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int32', 'last')                        |
| -        | 85.1±10μs                  | 77.1±3μs                           |    0.91 | groupby.GroupByMethods.time_dtype_as_group('int16', 'nunique', 'direct', 1, 'cython')   |
| -        | 24.9±0.7μs                 | 22.6±0.5μs                         |    0.91 | groupby.GroupByMethods.time_dtype_as_group('uint', 'bfill', 'direct', 1, 'cython')      |
| -        | 35.8±3μs                   | 32.1±0.2μs                         |    0.9  | groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'direct', 1, 'cython')         |
| -        | 37.3±1μs                   | 33.7±2μs                           |    0.9  | groupby.GroupByMethods.time_dtype_as_group('uint', 'std', 'direct', 1, 'cython')        |
| -        | 1.33±0.05ms                | 1.19±0.02ms                        |    0.9  | groupby.MultipleCategories.time_groupby_transform                                       |
| -        | 16.9±0.2ms                 | 15.1±0.2ms                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'describe', 'direct', 1, 'cython')   |
| -        | 73.9±1μs                   | 66.0±0.6μs                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'diff', 'direct', 1, 'cython')       |
| -        | 398±5μs                    | 355±2μs                            |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'pct_change', 'direct', 1, 'cython') |
| -        | 4.54±0.2ms                 | 4.01±0.1ms                         |    0.88 | groupby.Nth.time_groupby_nth_all('datetime')                                            |
| -        | 36.0±2μs                   | 30.9±1μs                           |    0.86 | groupby.GroupByMethods.time_dtype_as_group('int16', 'first', 'direct', 1, 'cython')     |
| -        | 9.09±0.5ms                 | 7.81±0.2ms                         |    0.86 | groupby.Nth.time_series_nth_all('datetime')                                             |
| -        | 618±100μs                  | 394±30μs                           |    0.64 | groupby.GroupByMethods.time_dtype_as_group('float', 'sum', 'direct', 1, 'numba')        |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Not why there are some regressions with algorithms - those may just be flaky tests. I do get different results every time I run the asvs there

WillAyd · 2024-03-21T02:23:37Z

Alright sorry for letting this go stale for a while but all green and posted updated benchmarks. @mroeschke

mroeschke

The performance improvements look great. Feel free to merge once you're ready to take the PR out of draft mode

Use memcpy / realloc more effectively in hashtable

9885a51

WillAyd commented Mar 1, 2024

View reviewed changes

WillAyd added the Performance Memory or execution speed performance label Mar 1, 2024

WillAyd added 7 commits March 1, 2024 11:30

Try fix

c10cc44

skip test

b330fad

revert StringVector realloc

61955f1

fixes

40359fd

Fix missing size increment

d9b2d65

size_t -> Py_ssize_t

e40826b

cleanups

946a203

WillAyd marked this pull request as draft March 3, 2024 16:53

WillAyd added 9 commits March 19, 2024 16:14

Merge branch 'main' into memcpy-extend

422aaaa

Fixed typo

03bae72

Merge branch 'main' into memcpy-extend

fe4bb56

Fixes

7d3dee2

Merge branch 'main' into memcpy-extend

4315bcf

Merge remote-tracking branch 'upstream/main' into memcpy-extend

3f2bb40

merge fixups

6703c16

early return

dd08e40

fix merge issue

cfa5b85

mroeschke approved these changes Mar 21, 2024

View reviewed changes

WillAyd marked this pull request as ready for review March 21, 2024 18:23

WillAyd added this to the 3.0 milestone Mar 21, 2024

WillAyd merged commit 8704cfa into pandas-dev:main Mar 21, 2024
46 checks passed

WillAyd deleted the memcpy-extend branch March 21, 2024 18:29

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

Use memcpy / realloc more effectively in hashtable (pandas-dev#57695)

213ade1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use memcpy / realloc more effectively in hashtable #57695

Use memcpy / realloc more effectively in hashtable #57695

WillAyd commented Mar 1, 2024

WillAyd Mar 1, 2024

WillAyd Mar 1, 2024

WillAyd Mar 1, 2024

WillAyd commented Mar 1, 2024 •

edited

Loading

WillAyd commented Mar 21, 2024

mroeschke left a comment

Use memcpy / realloc more effectively in hashtable #57695

Use memcpy / realloc more effectively in hashtable #57695

Conversation

WillAyd commented Mar 1, 2024

WillAyd Mar 1, 2024

Choose a reason for hiding this comment

WillAyd Mar 1, 2024

Choose a reason for hiding this comment

WillAyd Mar 1, 2024

Choose a reason for hiding this comment

WillAyd commented Mar 1, 2024 • edited Loading

WillAyd commented Mar 21, 2024

mroeschke left a comment

Choose a reason for hiding this comment

WillAyd commented Mar 1, 2024 •

edited

Loading