Skip to content

CI: 310-dev build seems to be timing out #44173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Oct 24, 2021 · 15 comments · Fixed by #44280
Closed

CI: 310-dev build seems to be timing out #44173

jreback opened this issue Oct 24, 2021 · 15 comments · Fixed by #44280
Labels
CI Continuous Integration
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 24, 2021

xref https://github.com/pandas-dev/pandas/runs/3990827693?check_suite_focus=true

not sure where this changed recently or a numpy issue.

@jreback jreback added the CI Continuous Integration label Oct 24, 2021
@jreback jreback added this to the 1.4 milestone Oct 24, 2021
@jreback
Copy link
Contributor Author

jreback commented Oct 25, 2021

cc @jbrockmendel any ideas here

@jbrockmendel
Copy link
Member

looks like it is stalling somewhere in the network tests, which is also where the windows stalls usually are. thats all i got

@jbrockmendel
Copy link
Member

xref #43611 not sure if @mzeitlin11 figured out what was going on there

@jreback
Copy link
Contributor Author

jreback commented Oct 25, 2021

ok let's disable these for now then. its super hard to see failures w/o green (well its not that hard but it takes a lot of time).

@mzeitlin11
Copy link
Member

xref #43611 not sure if @mzeitlin11 figured out what was going on there

In that case it also looked like network tests, but ended up actually being deadlocks in pyarrow read_csv (#43650). Certainly possible there are other cases that deadlock that weren't found and skipped in the initial pass. I'll try running with the logging setup in #43611 and see if anything turns up (sidenote is there a way to just trigger the 310-dev to run?)

@jreback
Copy link
Contributor Author

jreback commented Oct 25, 2021

you can comment out parts of the ci config to turn it off temporarily

@mzeitlin11
Copy link
Member

In #44183, both ubuntu and macos timed out. Filtering the logs to contain only lines with moments gives the attached file (warning still 15.3 MB :).
clean_mac_out.txt

Based on this, tests in pandas/tests/window/moments/ start running at 00:19:15 and end at 00:47:31 (at which point the timeout happens). Will run again to see if the same result happens. These tests run locally on macOS for me in < 2.5 min (with the same logging enabled)

@jreback
Copy link
Contributor Author

jreback commented Oct 26, 2021

cc @mroeschke let's move back some of the moved tests

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 26, 2021

MIght be that we have to split the tests in half like I did in #43517 and #43468

@mroeschke
Copy link
Member

@jreback moving back the tests in my recent PRs wouldn't make a difference since all tests in pandas/tests/window are run on macOS and Ubuntu (tests in pandas/tests/window/moments are only ignored on Windows #37535).

There's a lot of tests in pandas/tests/window/moments that are redundant with tests in pandas/tests/window/test_*.py that I am working on thinning out. As a short terms fix, you can skip the pandas/tests/window/moments I suppose while I am improving the performance of those tests.

https://github.com/pandas-dev/pandas/runs/3990827693?check_suite_focus=true specifically looks to specifically end on a test that uses the moto_server (s3_base fixture)

@jreback
Copy link
Contributor Author

jreback commented Oct 27, 2021

kk can i push a PR to skip these in 3.10?

@alimcmaster1
Copy link
Member

MIght be that we have to split the tests in half like I did in #43517 and #43468

Yeah this seems to work over in #44204

@lithomas1
Copy link
Member

Hm. Instead of splitting the build, can we just skip the tests instead? Adding more builds clogs up the CI pipeline since GHA has a concurrency limit.

I am planning on moving 3.10 MacOS and Windows to Azure and the rest to just the posix pipeline, and config the dev build for Python 3.11 but I'm kind of busy right now, so it'll take a while.

@alimcmaster1
Copy link
Member

Hm. Instead of splitting the build, can we just skip the tests instead? Adding more builds clogs up the CI pipeline since GHA has a concurrency limit.

I am planning on moving 3.10 MacOS and Windows to Azure and the rest to just the posix pipeline, and config the dev build for Python 3.11 but I'm kind of busy right now, so it'll take a while.

Was more of a temp fix to get things working, (hence I left this issue open for a longer term solution). But they should run concurrently as they are in different concurrency groups: https://github.com/pandas-dev/pandas/pull/44204/files#diff-f6f33d5fc5cfd739f0d6efd63ea9dff886abe35cdfff5672e9a863ec8da8e72cR34

@lithomas1
Copy link
Member

https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits.
Github has a global concurrency limit. If we exceed the concurrency limit there, a huge queue starts piling up and sometimes you have to wait multiple hours just to get CI to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants