-
Notifications
You must be signed in to change notification settings - Fork 153
test: advanced test use cases #3737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dreamorosi! I didn't focus on the tests themselves while reviewing because you have more context on that! But the timeout configuration and pipeline look good to me.
Summary
Changes
This PR updates the e2e tests for all the utilities in the following ways:
The work on this item started with an assumption that our e2e tests were often timing out due to CloudFormation APIs becoming slow under pressure. Based on this assumption the initial hypothesis was that if we distributed the test suite deployments across multiple regions within the same test run we would parallelize the tests further without incurring into timeouts or slowdowns.
This turned out being a fluke, and making the change did not make the timeouts go away. After talking with @leandrodamascena, he suggested that the issue could've been due to network/load of the GitHub workers. I made several tests and while that helped, it still did not remove the issue completely.
On top of not entirely removing the issue, since it's a runner shared in the GH Org, it introduced additional questions around what will happen when we make releases and tests at the same time. For this reason I decided to not adopt the runner for now and instead continue using the standard ones.
To partially mitigate the issue I instead increased the timeout from 7 to 10 minutes and centralized the config in the Vitest config files rather than having it in every single test case (hence the large diff).
I also reduced the max concurrency from 30 jobs to 25 and tried balancing and mixing packages whose tests are known to more consistently pass and be overall faster to run (i.e. metrics and layers) with others that have more frequent failures or longer runs (i.e. idempotency and logger). By making these changes some of the longer running tests are started immediately together with longer ones, so that we do something like this (simplified):
The PR also adds a new test suite for advanced use cases in logger, that includes log buffering, correlation id, and that prepares things for provisioned concurrency which will be added in #3724.
This is not likely a final solution to the e2e tests being finicky, but after several hours of work I decided to put a pin on it knowing that things are now slightly better (or at least they feel like it). In the future we should probably rework some of the tests entirely to: 1/ deploy less resources, 2/ incorporate retries, 3/ maintain the same assertions while refactoring out problematic/finicky edge cases.
Issue number: closes #3730
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.