Skip to content

Use memcpy / realloc more effectively in hashtable #57695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 21, 2024

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Mar 1, 2024

No description provided.

if self.external_view_exists:
raise ValueError("external reference but "
"Vector.resize() needed")
self.resize()
self.resize(self.data.m * 4)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multiplying by 4 seems somewhat arbitrary and counter-productive for large arrays, but keeping status quo for now

if not self.data.data:
raise MemoryError()
for i in range(m):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a few other places where we are doing loop assignment in the object class, but I have not touched them because I don't know what Cython does for reference counting during assignment that a simple mempcy would not know about. Something to research futher

if self.external_view_exists:
raise ValueError("external reference but "
"Vector.resize() needed")
self.resize(new_size) # TODO: do we want to multiply by 4?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the extend I choose not to multiple by a factor of 4. This would favor performance when doing really large appends versus many small appends in a loop

Not sure how to best manage this. Long term I would love to get rid of this self-managed code and use the C++ standard library

@WillAyd
Copy link
Member Author

WillAyd commented Mar 1, 2024

Hashing benchmarks:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                                        |
|----------|----------------------------|------------------------------------|---------|----------------------------------------------------------------------------------------------|
| -        | 1.12±0.2ms                 | 901±40μs                           |    0.8  | hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'numpy.uint64'>, 1000000) |
| -        | 1.26±0.1ms                 | 893±30μs                           |    0.71 | hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'numpy.int64'>, 1000000)  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

algorithms:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                       |
|----------|----------------------------|------------------------------------|---------|-----------------------------------------------------------------------------|
| +        | 2.11±0.04μs                | 2.34±0.1μs                         |    1.11 | algorithms.Duplicated.time_duplicated(True, 'last', 'duration[s][pyarrow]') |
| -        | 5.81±0.2ms                 | 5.28±0.08ms                        |    0.91 | algorithms.Hashing.time_series_string                                       |
| -        | 2.27±0.2ms                 | 1.89±0.04ms                        |    0.83 | algorithms.Quantile.time_quantile(1, 'linear', 'float64')                   |
| -        | 1.09±0.04ms                | 850±20μs                           |    0.78 | algorithms.Quantile.time_quantile(1, 'nearest', 'uint64')                   |
| -        | 1.89±0.2ms                 | 1.31±0.06ms                        |    0.7  | algorithms.Quantile.time_quantile(0, 'midpoint', 'uint64')                  |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

groupby:

| Change   | Before [710720e6] <main>   | After [cfa5b85f] <memcpy-extend>   |   Ratio | Benchmark (Parameter)                                                                   |
|----------|----------------------------|------------------------------------|---------|-----------------------------------------------------------------------------------------|
| +        | 28.5±0.1ms                 | 31.7±1ms                           |    1.11 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'var')                       |
| -        | 29.0±2ms                   | 26.4±0.4ms                         |    0.91 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int32', 'last')                        |
| -        | 85.1±10μs                  | 77.1±3μs                           |    0.91 | groupby.GroupByMethods.time_dtype_as_group('int16', 'nunique', 'direct', 1, 'cython')   |
| -        | 24.9±0.7μs                 | 22.6±0.5μs                         |    0.91 | groupby.GroupByMethods.time_dtype_as_group('uint', 'bfill', 'direct', 1, 'cython')      |
| -        | 35.8±3μs                   | 32.1±0.2μs                         |    0.9  | groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'direct', 1, 'cython')         |
| -        | 37.3±1μs                   | 33.7±2μs                           |    0.9  | groupby.GroupByMethods.time_dtype_as_group('uint', 'std', 'direct', 1, 'cython')        |
| -        | 1.33±0.05ms                | 1.19±0.02ms                        |    0.9  | groupby.MultipleCategories.time_groupby_transform                                       |
| -        | 16.9±0.2ms                 | 15.1±0.2ms                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'describe', 'direct', 1, 'cython')   |
| -        | 73.9±1μs                   | 66.0±0.6μs                         |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'diff', 'direct', 1, 'cython')       |
| -        | 398±5μs                    | 355±2μs                            |    0.89 | groupby.GroupByMethods.time_dtype_as_group('uint', 'pct_change', 'direct', 1, 'cython') |
| -        | 4.54±0.2ms                 | 4.01±0.1ms                         |    0.88 | groupby.Nth.time_groupby_nth_all('datetime')                                            |
| -        | 36.0±2μs                   | 30.9±1μs                           |    0.86 | groupby.GroupByMethods.time_dtype_as_group('int16', 'first', 'direct', 1, 'cython')     |
| -        | 9.09±0.5ms                 | 7.81±0.2ms                         |    0.86 | groupby.Nth.time_series_nth_all('datetime')                                             |
| -        | 618±100μs                  | 394±30μs                           |    0.64 | groupby.GroupByMethods.time_dtype_as_group('float', 'sum', 'direct', 1, 'numba')        |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Not why there are some regressions with algorithms - those may just be flaky tests. I do get different results every time I run the asvs there

@WillAyd WillAyd added the Performance Memory or execution speed performance label Mar 1, 2024
@WillAyd WillAyd marked this pull request as draft March 3, 2024 16:53
@WillAyd
Copy link
Member Author

WillAyd commented Mar 21, 2024

Alright sorry for letting this go stale for a while but all green and posted updated benchmarks. @mroeschke

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance improvements look great. Feel free to merge once you're ready to take the PR out of draft mode

@WillAyd WillAyd marked this pull request as ready for review March 21, 2024 18:23
@WillAyd WillAyd added this to the 3.0 milestone Mar 21, 2024
@WillAyd WillAyd merged commit 8704cfa into pandas-dev:main Mar 21, 2024
46 checks passed
@WillAyd WillAyd deleted the memcpy-extend branch March 21, 2024 18:29
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants