WEB: Update benchmarks page #61289

rhshadrach · 2025-04-14T13:10:24Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

rhshadrach · 2025-04-14T13:12:28Z

web/pandas/community/benchmarks.md

-ssh -L 8080:localhost:8080 [email protected]
-```
+- asv-runner results: [asv](https://pandas-dev.github.io/asv-runner/)
+- OVH server results: [asv](https://pandas.pydata.org/benchmarks/asv/)


The last run in the OVH results is from February 10th.

I think it would be good to indicate then which one of these benchmarks might be more up to date or preferred to reference

I can have a look, but I'm not planning to maintain that server myself. I'm happy to let yo @rhshadrach decide on the direction of this. In general a dedicated server should have more accurate results. I was reaching a point tunning the hardware that even without running benchmarks multiple times results were quite consistent. But if github actions is good enough, the maintenance burden should be less probably.

Changes here looks great. Once the direction is clear, I think it'd be valuable to answer Matt's question. But from my side happy to add it in a follow up, if not immediately clear what users should look at.

@datapythonista - agreed the physical server has more accurate results, in part due to the precise configuration you setup in https://github.com/pandas-dev/pandas-benchmarks. I plan to take a look and see how much of that we can implement in asv-runner on the GitHub Action workers (but if you're interested in doing this, by all means!).

However I do not see how to have a pattern for ways-of-working with a team (compare: https://github.com/pandas-dev/asv-runner/issues) and worry about maintenance / usability. As I don't think there is anyone who is maintaining / looking at these benchmarks, it would make sense to me to shut them down and remove from the docs. Does that sound good?

I don't think these configurations are possible in a virtual machine or docker, so I don't think they are possible in GitHub Actions workers (unless we self-host them in a physical machine, which we did consider). That's the reason we have the physical server. You may be able to set up some of these options maybe, but I don't think they'll have any effect. Also, there are some sources of noise in the CI. I don't think we are guaranteed to even run the job in the same hardware. I don't think it happens much in practice, but as an example, if GitHub actions has an old data center with i5 cores and a newer with i7, the CI may run some times slower in the i5 and sometimes faster in the i7. I think changes aren't so dramatic, but I guess we don't always get the same exact hardware or OS configuration. Also, what's happening in the rest of the server (in other VMs) have an effect. If it's idle, memory access will be faster and CPU cache misses will be lower, running faster than when lots of things go on in the server. Funny enough, empirical results shown that in very busy servers (like the CI workers), the kind of worst case scenario is more consistent than a mostly idle environment (I find this quite counter-intuitive personally).

Do you mean removing the part of the creation of issues, or all the GitHub actions benchmarking?

Personally, I think having early performance regression information would be amazing, and I was happy to put some time into it when there were funds. But I never checked the benchmarks much myself. And I can't afford to put too much effort into maintaining and improving the benchmarks as a volunteer. So, I'm happy with whatever you decide. Removing those issues if nobody is looking at them seems fine. And leaving them if they are already working and no maintenance is needed seems also fine.

Btw, I checked the OVH benchmarks server, and seems it wasn't set up to start running the benchmarks at start up. When OVH temporary shut down our servers because the end of our agreement is when we stop generating them in that server. I restarted them now. I forgot the details on how it's handled, but I think the lost history should be slowly populated when there are no new commits to benchmark.

I think there might be some confusion (due to imprecise verbiage on my part). My previous proposal is to remove all reference to the OVH benchmarking setup on this page, since it is not going to be maintained.

I'm fine with that if it's not helpful

…benchmarks

rhshadrach · 2025-04-19T11:21:05Z

@datapythonista This is ready for another look.

datapythonista

Looks good, thanks for taking care of this @rhshadrach

rhshadrach added 3 commits April 14, 2025 09:09

WEB: Update benchmarks page

e8df9ab

Change subsection title

278f8b6

fixup

5d90464

rhshadrach commented Apr 14, 2025

View reviewed changes

rhshadrach marked this pull request as ready for review April 14, 2025 13:12

rhshadrach requested a review from datapythonista April 14, 2025 13:12

rhshadrach added Web pandas website Benchmark Performance (ASV) benchmarks labels Apr 14, 2025

rhshadrach added this to the 3.0 milestone Apr 14, 2025

rhshadrach added 2 commits April 19, 2025 07:13

Merge branch 'main' of https://github.com/pandas-dev/pandas into doc_…

3c79911

…benchmarks

OVH-ectomy

543f904

datapythonista approved these changes Apr 19, 2025

View reviewed changes

datapythonista merged commit c27a309 into pandas-dev:main Apr 19, 2025
8 checks passed

rhshadrach deleted the doc_benchmarks branch April 19, 2025 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WEB: Update benchmarks page #61289

WEB: Update benchmarks page #61289

rhshadrach commented Apr 14, 2025

rhshadrach Apr 14, 2025 •

edited

Loading

mroeschke Apr 14, 2025

datapythonista Apr 14, 2025

rhshadrach Apr 15, 2025

datapythonista Apr 16, 2025

datapythonista Apr 16, 2025 •

edited

Loading

rhshadrach Apr 16, 2025

datapythonista Apr 16, 2025

rhshadrach commented Apr 19, 2025

datapythonista left a comment

WEB: Update benchmarks page #61289

WEB: Update benchmarks page #61289

Conversation

rhshadrach commented Apr 14, 2025

rhshadrach Apr 14, 2025 • edited Loading

Choose a reason for hiding this comment

mroeschke Apr 14, 2025

Choose a reason for hiding this comment

datapythonista Apr 14, 2025

Choose a reason for hiding this comment

rhshadrach Apr 15, 2025

Choose a reason for hiding this comment

datapythonista Apr 16, 2025

Choose a reason for hiding this comment

datapythonista Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

rhshadrach Apr 16, 2025

Choose a reason for hiding this comment

datapythonista Apr 16, 2025

Choose a reason for hiding this comment

rhshadrach commented Apr 19, 2025

datapythonista left a comment

Choose a reason for hiding this comment

rhshadrach Apr 14, 2025 •

edited

Loading

datapythonista Apr 16, 2025 •

edited

Loading