Skip to content

Evaluate using WeakSet for reference tracking #55631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks
phofl opened this issue Oct 22, 2023 · 1 comment
Open
3 tasks

Evaluate using WeakSet for reference tracking #55631

phofl opened this issue Oct 22, 2023 · 1 comment

Comments

@phofl
Copy link
Member

phofl commented Oct 22, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

We are using an exponential backoff at the moment to improve performance. We should evaluate the use of a WeakSet. A know problem is the slowdown of the DataFrame constructor when using a WeakSet for zero copy ops.

Feature Description

Use WeakSet

Alternative Solutions

Current implementation

Additional Context

No response

@phofl phofl added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Copy / view semantics and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 22, 2023
@wangwillian0
Copy link
Contributor

One big problem is that index is not hashable (weakset requires the objects to be hashable).

This solution is slower but I still like it because it basically eliminates the need for a wrapper like BlockValuesRefs. Here is the benchmark for a simple implementation replacing the current list method with weakset (I used id() for the hash function):

Change Before [593fa85] After [de6e6954] Ratio Benchmark (Parameter)
+ 43.5±3ms 68.7±10ms 1.58 frame_ctor.FromDicts.time_nested_dict_index_columns
+ 7.97±2ms 12.5±1ms 1.57 frame_ctor.FromArrays.time_frame_from_arrays_sparse
+ 1.05±0.1ms 1.56±0.08ms 1.49 frame_ctor.FromRange.time_frame_from_range
+ 963±50μs 1.37±0.1ms 1.43 frame_ctor.FromDicts.time_dict_of_categoricals
+ 48.4±1μs 69.0±4μs 1.43 frame_ctor.FromSeries.time_mi_series
+ 135±20ms 191±30ms 1.42 frame_ctor.FromRecords.time_frame_from_records_generator(None)
+ 5.47±0.3ms 7.18±0.4ms 1.31 frame_ctor.FromScalar.time_frame_from_scalar_ea_float64_na
+ 42.3±2ms 54.0±5ms 1.28 frame_ctor.FromDicts.time_nested_dict_index
+ 5.41±0.2ms 6.82±0.5ms 1.26 frame_ctor.FromScalar.time_frame_from_scalar_ea_float64
+ 1.46±0.3ms 1.54±0.1ms 1.06 frame_ctor.FromRecords.time_frame_from_records_generator(1000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants