Skip to content

Potential regression induced by "Cython guard against [c|m|re]alloc failures" #57950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DeaMariaLeon opened this issue Mar 21, 2024 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Performance Memory or execution speed performance

Comments

@DeaMariaLeon
Copy link
Member

PR #57705

If it is expected please ignore this issue.
Please be aware that the server is slow to respond some times.

@WillAyd

commit "114a84d8a0eee8fb93a4d2d701a2a6e62ebcf6d2": {
"groupby.Cumulative.time_frame_transform (Python) with dtype='Float64', method='cummax', with_nans=True": "http://57.128.112.95:5000/compare/benchmarks/065fb944a14b793780003322ee823635...065fb8e49664747b8000f70e4ec84ae6",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Int64', method='cummin', with_nans=True": "http://57.128.112.95:5000/compare/benchmarks/065fb944ae8474e58000b3cec8275f57...065fb8e498ce7594800008071c2d8038",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Int64', method='cummin', with_nans=False": "http://57.128.112.95:5000/compare/benchmarks/065fb944b2b47a138000cb49180477b4...065fb8e499487a9e8000b44250789920",
"algos.isin.IsInLongSeriesLookUpDominates.time_isin (Python) with MaxNumber=1000, dtype='Float64', series_type='random_misses'": "http://57.128.112.95:5000/compare/benchmarks/065fb9407ebe716e8000ed2fe62f5b83...065fb8e1b287741080003a30c54d529f",
"groupby.Cumulative.time_frame_transform (Python) with dtype='int64', method='cummin', with_nans=False": "http://57.128.112.95:5000/compare/benchmarks/065fb94491a376d180006f30866e9cb7...065fb8e4939b78d480008873c990c076",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Int64', method='cummax', with_nans=False": "http://57.128.112.95:5000/compare/benchmarks/065fb944bc017c0780002137a5469d07...065fb8e49a7476d9800097b740074116",
"algorithms.Factorize.time_factorize (Python) with dtype='boolean', sort=True, unique=False": "http://57.128.112.95:5000/compare/benchmarks/065fb9400e25709d8000634602d20fa9...065fb8e14c1270388000c8005022c82f",
"algos.isin.IsInLongSeriesLookUpDominates.time_isin (Python) with MaxNumber=1000, dtype='float64', series_type='random_misses'": "http://57.128.112.95:5000/compare/benchmarks/065fb9406aa0734d8000d932d3e79d24...065fb8e1a1037615800024333e87bbf7",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Float64', method='cummin', with_nans=True": "http://57.128.112.95:5000/compare/benchmarks/065fb9449c0075d7800006671aaa142b...065fb8e4954a7e678000d0cf25e62d15",
"groupby.GroupByCythonAggEaDtypes.time_frame_agg (Python) with dtype='Int32', method='any'": "http://57.128.112.95:5000/compare/benchmarks/065fb945a1cb705f800027e9e412ca99...065fb8e4baf27ba58000e2dcb9c6c91c",
"algos.isin.IsinWithArange.time_isin (Python) with M=8000, dtype=<class 'numpy.float64'>, offset_factor=-2": "http://57.128.112.95:5000/compare/benchmarks/065fb940c55774118000016187a81f4d...065fb8e1f156773380009660ce668fd7",
"algos.isin.IsInLongSeriesLookUpDominates.time_isin (Python) with MaxNumber=1000, dtype='float64', series_type='monotone_misses'": "http://57.128.112.95:5000/compare/benchmarks/065fb9406baf7c048000e4b240c9b233...065fb8e1a200710f8000cf630f8dcec9",
"groupby.GroupByCythonAggEaDtypes.time_frame_agg (Python) with dtype='Float64', method='first'": "http://57.128.112.95:5000/compare/benchmarks/065fb94566c67d2f80005785dda0523a...065fb8e4acf773d98000663b6949b9a7",
"groupby.Cumulative.time_frame_transform (Python) with dtype='float64', method='cummax', with_nans=False": "http://57.128.112.95:5000/compare/benchmarks/065fb944863a73328000100ab11e015f...065fb8e491f073228000be1a0dcb0813",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Int64', method='cummax', with_nans=True": "http://57.128.112.95:5000/compare/benchmarks/065fb944b887736180002dd51b13dfc4...065fb8e499e779438000356086562483",
"groupby.Cumulative.time_frame_transform (Python) with dtype='Float64', method='cummax', with_nans=False": "http://57.128.112.95:5000/compare/benchmarks/065fb944a4cc74dd80007276e8078ead...065fb8e496fc7c4d80002e67128bc5cf"
}

Screenshot 2024-03-21 at 16 53 24
@WillAyd
Copy link
Member

WillAyd commented Mar 21, 2024

Hmm that's interesting. OK thanks - definitely unexpected. Will take a look later

@lithomas1 lithomas1 added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 21, 2024
@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2024

I checked the generated code and it generates things like:

if (unlikely(ptr == NULL)) {
    ...  // raise MemoryError
}

So I don't think this can be any more efficient. I noticed also some of the benchmarks were not really regressions when compared to values a few months back:

image

image

So I don't think there is much to do here

@lithomas1 lithomas1 added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member Regression Functionality that used to work in a prior pandas version labels Mar 22, 2024
@rhshadrach
Copy link
Member

@WillAyd - only question I have is how many times are these checks hit in the test? If it's an inner loop that's hit a few hundred times then a 2ms regression makes sense. But if they are just hit a few times per test run then perhaps more investigation is warranted. I'll come back and see if I can find out in a few days.

@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2024

These all come after malloc calls. I don't believe any are done in a tight loop, but if they were the surprising part is that the malloc itself is where you would expect to see a bottleneck, not the subsequent call to check if malloc failed

@rhshadrach
Copy link
Member

rhshadrach commented Mar 23, 2024

Interesting - I'm not seeing any of the checks added in #57705 hit by

groupby.Cumulative.time_frame_transform (Python) with dtype='Float64', method='cummax', with_nans=True

Same with

algos.isin.IsInLongSeriesLookUpDominates.time_isin (Python) with MaxNumber=1000, dtype='Float64', series_type='random_misses'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

4 participants