@@ -38,20 +38,27 @@ of how Airflow CI works.
38
38
GitHub Actions runs
39
39
-------------------
40
40
41
- Our builds on CI are highly optimized. They utilise some of the latest features provided by GitHub Actions
42
- environment that make it possible to reuse parts of the build process across different Jobs.
43
-
44
- Big part of our CI runs use Container Images. Airflow has a lot of dependencies and in order to make
45
- sure that we are running tests in a well configured and repeatable environment, most of the tests,
46
- documentation building, and some more sophisticated static checks are run inside a docker container
47
- environment. This environment consist of two types of images: CI images and PROD images. CI Images
48
- are used for most of the tests and checks where PROD images are used in the Kubernetes tests.
49
-
50
- In order to run the tests, we need to make sure that the images are built using latest sources and that it
51
- is done quickly (full rebuild of such image from scratch might take ~15 minutes). Therefore optimisation
52
- techniques have been implemented that use efficiently cache from the GitHub Docker registry - in most cases
53
- this brings down the time needed to rebuild the image to ~4 minutes. In some cases (when dependencies change)
54
- it can be ~6-7 minutes and in case base image of Python releases new patch-level, it can be ~12 minutes.
41
+ Our CI builds are highly optimized, leveraging the latest features provided
42
+ by the GitHub Actions environment to reuse parts of the build process across
43
+ different jobs.
44
+
45
+ A significant portion of our CI runs utilize container images. Given that
46
+ Airflow has numerous dependencies, we use Docker containers to ensure tests
47
+ run in a well-configured and consistent environment. This approach is used
48
+ for most tests, documentation building, and some advanced static checks.
49
+ The environment comprises two types of images: CI images and PROD images.
50
+ CI images are used for most tests and checks, while PROD images are used for
51
+ Kubernetes tests.
52
+
53
+ To run the tests, we need to ensure that the images are built using the
54
+ latest sources and that the build process is efficient. A full rebuild of
55
+ such an image from scratch might take approximately 15 minutes. Therefore,
56
+ we've implemented optimization techniques that efficiently use the cache
57
+ from the GitHub Docker registry. In most cases, this reduces the time
58
+ needed to rebuild the image to about 4 minutes. However, when
59
+ dependencies change, it can take around 6-7 minutes, and if the base
60
+ image of Python releases a new patch-level, it can take approximately
61
+ 12 minutes.
55
62
56
63
Container Registry used as cache
57
64
--------------------------------
@@ -105,7 +112,7 @@ The image names follow the patterns (except the Python image, all the images are
105
112
https://ghcr.io/ in ``apache `` organization.
106
113
107
114
The packages are available under (CONTAINER_NAME is url-encoded name of the image). Note that "/" are
108
- supported now in the ``ghcr.io `` as apart of the image name within ``apache `` organization, but they
115
+ supported now in the ``ghcr.io `` as a part of the image name within the ``apache `` organization, but they
109
116
have to be percent-encoded when you access them via UI (/ = %2F)
110
117
111
118
``https://github.com/apache/airflow/pkgs/container/<CONTAINER_NAME> ``
@@ -192,29 +199,33 @@ When you are running the CI jobs in GitHub Actions, GITHUB_TOKEN is set automati
192
199
CI run types
193
200
============
194
201
195
- The following CI Job run types are currently run for Apache Airflow (run by ci.yaml workflow)
196
- and each of the run types has different purpose and context.
202
+ The Apache Airflow project utilizes several types of Continuous Integration (CI)
203
+ jobs, each with a distinct purpose and context. These jobs are executed by the
204
+ ``ci.yaml `` workflow.
197
205
198
- Besides the regular "PR" runs we also have "Canary" runs that are able to detect most of the
199
- problems that might impact regular PRs early, without necessarily failing all PRs when those
200
- problems happen. This allows to provide much more stable environment for contributors, who
201
- contribute their PR, while giving a chance to maintainers to react early on problems that
202
- need reaction, when the "canary" builds fail.
206
+ In addition to the standard "PR" runs, we also execute "Canary" runs.
207
+ These runs are designed to detect potential issues that could affect
208
+ regular PRs early on, without causing all PRs to fail when such problems
209
+ arise. This strategy ensures a more stable environment for contributors
210
+ submitting their PRs. At the same time, it allows maintainers to proactively
211
+ address issues highlighted by the "Canary" builds.
203
212
204
213
Pull request run
205
214
----------------
206
215
207
- Those runs are results of PR from the forks made by contributors. Most builds for Apache Airflow fall
208
- into this category. They are executed in the context of the "Fork", not main
209
- Airflow Code Repository which means that they have only "read" permission to all the GitHub resources
210
- (container registry, code repository). This is necessary as the code in those PRs (including CI job
211
- definition) might be modified by people who are not committers for the Apache Airflow Code Repository.
216
+ These runs are triggered by pull requests from contributors' forks. The majority of
217
+ Apache Airflow builds fall into this category. They are executed in the context of
218
+ the contributor's "Fork", not the main Airflow Code Repository, meaning they only have
219
+ "read" access to all GitHub resources, such as the container registry and code repository.
220
+ This is necessary because the code in these PRs, including the CI job definition,
221
+ might be modified by individuals who are not committers to the Apache Airflow Code Repository.
212
222
213
- The main purpose of those jobs is to check if PR builds cleanly, if the test run properly and if
214
- the PR is ready to review and merge. The runs are using cached images from the Private GitHub registry -
215
- CI, Production Images as well as base Python images that are also cached in the Private GitHub registry.
216
- Also for those builds we only execute Python tests if important files changed (so for example if it is
217
- "no-code" change, no tests will be executed.
223
+ The primary purpose of these jobs is to verify if the PR builds cleanly, if the tests
224
+ run correctly, and if the PR is ready for review and merge. These runs utilize cached
225
+ images from the Private GitHub registry, including CI, Production Images, and base
226
+ Python images. Furthermore, for these builds, we only execute Python tests if
227
+ significant files have changed. For instance, if the PR involves a "no-code" change,
228
+ no tests will be executed.
218
229
219
230
Regular PR builds run in a "stable" environment:
220
231
@@ -232,30 +243,32 @@ and has WRITE access to the GitHub Container Registry.
232
243
Canary run
233
244
----------
234
245
235
- This is the flow that happens when a pull request is merged to the "main" branch or pushed to any of
236
- the "v2-*-test" branches. The "Canary" run attempts to upgrade dependencies to the latest versions
237
- and quickly pushes a preview of cache the CI/PROD images to the GitHub Registry - so that pull requests
238
- can quickly use the new cache - this is useful when Dockerfile or installation scripts change because such
239
- cache will already have the latest Dockerfile and scripts pushed even if some tests will fail.
240
- When successful, the run updates the constraints files in the "constraints-main" branch with the latest
241
- constraints and pushes both cache and latest CI/PROD images to the GitHub Registry.
242
-
243
- When "Canary" build fails, it's often a sign that some of our dependencies released a new version that
244
- is not compatible with current tests or Airflow code, Also it might mean that a breaking change has been
245
- merged to "main". Both cases should be addressed quickly by the maintainers. The "broken main" by our code
246
- should be fixed quickly, while the "broken dependencies" can take a bit of time to fix as until the tests
247
- succeeds, constraints will not be updated, which means that regular PRs will continue using the old version
248
- of dependencies that already passed one of the previous "Canary" runs.
249
-
246
+ This workflow is triggered when a pull request is merged into the "main" branch or pushed to any of
247
+ the "v2-*-test" branches. The "Canary" run aims to upgrade dependencies to their latest versions
248
+ and promptly pushes a preview of the CI/PROD image cache to the GitHub Registry. This allows pull
249
+ requests to quickly utilize the new cache, which is particularly beneficial when the Dockerfile or
250
+ installation scripts have been modified. Even if some tests fail, this cache will already include the
251
+ latest Dockerfile and scripts.Upon successful execution , the run updates the constraint files in the
252
+ " constraints-main" branch with the latest constraints and pushes both the cache and the latest CI/PROD
253
+ images to the GitHub Registry.
254
+
255
+ If the "Canary" build fails, it often indicates that a new version of our dependencies is incompatible
256
+ with the current tests or Airflow code. Alternatively, it could mean that a breaking change has been
257
+ merged into "main". Both scenarios require prompt attention from the maintainers. While a "broken main"
258
+ due to our code should be fixed quickly, "broken dependencies" may take longer to resolve. Until the tests
259
+ pass, the constraints will not be updated, meaning that regular PRs will continue using the older version
260
+ of dependencies that passed one of the previous "Canary" runs.
250
261
251
262
Scheduled runs
252
263
--------------
253
264
254
- This is the flow that happens when a scheduled run is triggered. The "scheduled" workflow is aimed to
255
- run regularly (overnight). Scheduled run is generally the same as "Canary" run, with the difference
256
- that the image is build always from the scratch and not from the cache. This way we can check that no
257
- "system" dependencies in debian base image have changed and that the build is still reproducible.
258
- No separate diagram is needed for scheduled run as it is identical to that of "Canary" run.
265
+ The "scheduled" workflow, which is designed to run regularly (typically overnight),
266
+ is triggered when a scheduled run occurs. This workflow is largely identical to the
267
+ "Canary" run, with one key difference: the image is always built from scratch, not
268
+ from a cache. This approach ensures that we can verify whether any "system" dependencies
269
+ in the Debian base image have changed, and confirm that the build process remains reproducible.
270
+ Since the process for a scheduled run mirrors that of a "Canary" run, no separate diagram is
271
+ necessary to illustrate it.
259
272
260
273
Workflows
261
274
=========
@@ -310,10 +323,10 @@ that triggered it.
310
323
Differences for main and release branches
311
324
-----------------------------------------
312
325
313
- There are a few differences of what kind of tests are run, depending on which version/ branch the tests are executed for.
314
- While all our tests run for the "main" development branch to keep Airflow in check, only a subset of those tests is run
315
- in older branches when we are releasing patch-level releases . This is because we never use old branches to release
316
- providers and helm charts, we only use them to release Airflow and Airflow image .
326
+ The type of tests executed varies depending on the version or branch under test. For the "main" development branch,
327
+ we run all tests to maintain the quality of Airflow. However, when releasing patch-level updates on older
328
+ branches, we only run a subset of these tests . This is because older branches are exclusively used for releasing
329
+ Airflow and its corresponding image, not for releasing providers or helm charts .
317
330
318
331
This behaviour is controlled by ``default-branch `` output of the build-info job. Whenever we create a branch for old version
319
332
we update the ``AIRFLOW_BRANCH `` in ``airflow_breeze/branch_defaults.py `` to point to the new branch and there are a few
0 commit comments