-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Use memcpy / realloc more effectively in hashtable #57695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if self.external_view_exists: | ||
raise ValueError("external reference but " | ||
"Vector.resize() needed") | ||
self.resize() | ||
self.resize(self.data.m * 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiplying by 4 seems somewhat arbitrary and counter-productive for large arrays, but keeping status quo for now
if not self.data.data: | ||
raise MemoryError() | ||
for i in range(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few other places where we are doing loop assignment in the object class, but I have not touched them because I don't know what Cython does for reference counting during assignment that a simple mempcy would not know about. Something to research futher
if self.external_view_exists: | ||
raise ValueError("external reference but " | ||
"Vector.resize() needed") | ||
self.resize(new_size) # TODO: do we want to multiply by 4? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the extend
I choose not to multiple by a factor of 4. This would favor performance when doing really large appends versus many small appends in a loop
Not sure how to best manage this. Long term I would love to get rid of this self-managed code and use the C++ standard library
Hashing benchmarks:
algorithms:
groupby:
Not why there are some regressions with algorithms - those may just be flaky tests. I do get different results every time I run the asvs there |
Alright sorry for letting this go stale for a while but all green and posted updated benchmarks. @mroeschke |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance improvements look great. Feel free to merge once you're ready to take the PR out of draft mode
No description provided.