Skip to content

Commit 82b3c11

Browse files
Improve CI.rst document clarity. (#36181)
* Improve document clarity. * Remove whitespace at end of lines. * Use double-backticks for code.
1 parent 6ee4d40 commit 82b3c11

File tree

1 file changed

+69
-56
lines changed

1 file changed

+69
-56
lines changed

CI.rst

Lines changed: 69 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -38,20 +38,27 @@ of how Airflow CI works.
3838
GitHub Actions runs
3939
-------------------
4040

41-
Our builds on CI are highly optimized. They utilise some of the latest features provided by GitHub Actions
42-
environment that make it possible to reuse parts of the build process across different Jobs.
43-
44-
Big part of our CI runs use Container Images. Airflow has a lot of dependencies and in order to make
45-
sure that we are running tests in a well configured and repeatable environment, most of the tests,
46-
documentation building, and some more sophisticated static checks are run inside a docker container
47-
environment. This environment consist of two types of images: CI images and PROD images. CI Images
48-
are used for most of the tests and checks where PROD images are used in the Kubernetes tests.
49-
50-
In order to run the tests, we need to make sure that the images are built using latest sources and that it
51-
is done quickly (full rebuild of such image from scratch might take ~15 minutes). Therefore optimisation
52-
techniques have been implemented that use efficiently cache from the GitHub Docker registry - in most cases
53-
this brings down the time needed to rebuild the image to ~4 minutes. In some cases (when dependencies change)
54-
it can be ~6-7 minutes and in case base image of Python releases new patch-level, it can be ~12 minutes.
41+
Our CI builds are highly optimized, leveraging the latest features provided
42+
by the GitHub Actions environment to reuse parts of the build process across
43+
different jobs.
44+
45+
A significant portion of our CI runs utilize container images. Given that
46+
Airflow has numerous dependencies, we use Docker containers to ensure tests
47+
run in a well-configured and consistent environment. This approach is used
48+
for most tests, documentation building, and some advanced static checks.
49+
The environment comprises two types of images: CI images and PROD images.
50+
CI images are used for most tests and checks, while PROD images are used for
51+
Kubernetes tests.
52+
53+
To run the tests, we need to ensure that the images are built using the
54+
latest sources and that the build process is efficient. A full rebuild of
55+
such an image from scratch might take approximately 15 minutes. Therefore,
56+
we've implemented optimization techniques that efficiently use the cache
57+
from the GitHub Docker registry. In most cases, this reduces the time
58+
needed to rebuild the image to about 4 minutes. However, when
59+
dependencies change, it can take around 6-7 minutes, and if the base
60+
image of Python releases a new patch-level, it can take approximately
61+
12 minutes.
5562

5663
Container Registry used as cache
5764
--------------------------------
@@ -105,7 +112,7 @@ The image names follow the patterns (except the Python image, all the images are
105112
https://ghcr.io/ in ``apache`` organization.
106113

107114
The packages are available under (CONTAINER_NAME is url-encoded name of the image). Note that "/" are
108-
supported now in the ``ghcr.io`` as apart of the image name within ``apache`` organization, but they
115+
supported now in the ``ghcr.io`` as a part of the image name within the ``apache`` organization, but they
109116
have to be percent-encoded when you access them via UI (/ = %2F)
110117

111118
``https://github.com/apache/airflow/pkgs/container/<CONTAINER_NAME>``
@@ -192,29 +199,33 @@ When you are running the CI jobs in GitHub Actions, GITHUB_TOKEN is set automati
192199
CI run types
193200
============
194201

195-
The following CI Job run types are currently run for Apache Airflow (run by ci.yaml workflow)
196-
and each of the run types has different purpose and context.
202+
The Apache Airflow project utilizes several types of Continuous Integration (CI)
203+
jobs, each with a distinct purpose and context. These jobs are executed by the
204+
``ci.yaml`` workflow.
197205

198-
Besides the regular "PR" runs we also have "Canary" runs that are able to detect most of the
199-
problems that might impact regular PRs early, without necessarily failing all PRs when those
200-
problems happen. This allows to provide much more stable environment for contributors, who
201-
contribute their PR, while giving a chance to maintainers to react early on problems that
202-
need reaction, when the "canary" builds fail.
206+
In addition to the standard "PR" runs, we also execute "Canary" runs.
207+
These runs are designed to detect potential issues that could affect
208+
regular PRs early on, without causing all PRs to fail when such problems
209+
arise. This strategy ensures a more stable environment for contributors
210+
submitting their PRs. At the same time, it allows maintainers to proactively
211+
address issues highlighted by the "Canary" builds.
203212

204213
Pull request run
205214
----------------
206215

207-
Those runs are results of PR from the forks made by contributors. Most builds for Apache Airflow fall
208-
into this category. They are executed in the context of the "Fork", not main
209-
Airflow Code Repository which means that they have only "read" permission to all the GitHub resources
210-
(container registry, code repository). This is necessary as the code in those PRs (including CI job
211-
definition) might be modified by people who are not committers for the Apache Airflow Code Repository.
216+
These runs are triggered by pull requests from contributors' forks. The majority of
217+
Apache Airflow builds fall into this category. They are executed in the context of
218+
the contributor's "Fork", not the main Airflow Code Repository, meaning they only have
219+
"read" access to all GitHub resources, such as the container registry and code repository.
220+
This is necessary because the code in these PRs, including the CI job definition,
221+
might be modified by individuals who are not committers to the Apache Airflow Code Repository.
212222

213-
The main purpose of those jobs is to check if PR builds cleanly, if the test run properly and if
214-
the PR is ready to review and merge. The runs are using cached images from the Private GitHub registry -
215-
CI, Production Images as well as base Python images that are also cached in the Private GitHub registry.
216-
Also for those builds we only execute Python tests if important files changed (so for example if it is
217-
"no-code" change, no tests will be executed.
223+
The primary purpose of these jobs is to verify if the PR builds cleanly, if the tests
224+
run correctly, and if the PR is ready for review and merge. These runs utilize cached
225+
images from the Private GitHub registry, including CI, Production Images, and base
226+
Python images. Furthermore, for these builds, we only execute Python tests if
227+
significant files have changed. For instance, if the PR involves a "no-code" change,
228+
no tests will be executed.
218229

219230
Regular PR builds run in a "stable" environment:
220231

@@ -232,30 +243,32 @@ and has WRITE access to the GitHub Container Registry.
232243
Canary run
233244
----------
234245

235-
This is the flow that happens when a pull request is merged to the "main" branch or pushed to any of
236-
the "v2-*-test" branches. The "Canary" run attempts to upgrade dependencies to the latest versions
237-
and quickly pushes a preview of cache the CI/PROD images to the GitHub Registry - so that pull requests
238-
can quickly use the new cache - this is useful when Dockerfile or installation scripts change because such
239-
cache will already have the latest Dockerfile and scripts pushed even if some tests will fail.
240-
When successful, the run updates the constraints files in the "constraints-main" branch with the latest
241-
constraints and pushes both cache and latest CI/PROD images to the GitHub Registry.
242-
243-
When "Canary" build fails, it's often a sign that some of our dependencies released a new version that
244-
is not compatible with current tests or Airflow code, Also it might mean that a breaking change has been
245-
merged to "main". Both cases should be addressed quickly by the maintainers. The "broken main" by our code
246-
should be fixed quickly, while the "broken dependencies" can take a bit of time to fix as until the tests
247-
succeeds, constraints will not be updated, which means that regular PRs will continue using the old version
248-
of dependencies that already passed one of the previous "Canary" runs.
249-
246+
This workflow is triggered when a pull request is merged into the "main" branch or pushed to any of
247+
the "v2-*-test" branches. The "Canary" run aims to upgrade dependencies to their latest versions
248+
and promptly pushes a preview of the CI/PROD image cache to the GitHub Registry. This allows pull
249+
requests to quickly utilize the new cache, which is particularly beneficial when the Dockerfile or
250+
installation scripts have been modified. Even if some tests fail, this cache will already include the
251+
latest Dockerfile and scripts.Upon successful execution, the run updates the constraint files in the
252+
"constraints-main" branch with the latest constraints and pushes both the cache and the latest CI/PROD
253+
images to the GitHub Registry.
254+
255+
If the "Canary" build fails, it often indicates that a new version of our dependencies is incompatible
256+
with the current tests or Airflow code. Alternatively, it could mean that a breaking change has been
257+
merged into "main". Both scenarios require prompt attention from the maintainers. While a "broken main"
258+
due to our code should be fixed quickly, "broken dependencies" may take longer to resolve. Until the tests
259+
pass, the constraints will not be updated, meaning that regular PRs will continue using the older version
260+
of dependencies that passed one of the previous "Canary" runs.
250261

251262
Scheduled runs
252263
--------------
253264

254-
This is the flow that happens when a scheduled run is triggered. The "scheduled" workflow is aimed to
255-
run regularly (overnight). Scheduled run is generally the same as "Canary" run, with the difference
256-
that the image is build always from the scratch and not from the cache. This way we can check that no
257-
"system" dependencies in debian base image have changed and that the build is still reproducible.
258-
No separate diagram is needed for scheduled run as it is identical to that of "Canary" run.
265+
The "scheduled" workflow, which is designed to run regularly (typically overnight),
266+
is triggered when a scheduled run occurs. This workflow is largely identical to the
267+
"Canary" run, with one key difference: the image is always built from scratch, not
268+
from a cache. This approach ensures that we can verify whether any "system" dependencies
269+
in the Debian base image have changed, and confirm that the build process remains reproducible.
270+
Since the process for a scheduled run mirrors that of a "Canary" run, no separate diagram is
271+
necessary to illustrate it.
259272

260273
Workflows
261274
=========
@@ -310,10 +323,10 @@ that triggered it.
310323
Differences for main and release branches
311324
-----------------------------------------
312325

313-
There are a few differences of what kind of tests are run, depending on which version/branch the tests are executed for.
314-
While all our tests run for the "main" development branch to keep Airflow in check, only a subset of those tests is run
315-
in older branches when we are releasing patch-level releases. This is because we never use old branches to release
316-
providers and helm charts, we only use them to release Airflow and Airflow image.
326+
The type of tests executed varies depending on the version or branch under test. For the "main" development branch,
327+
we run all tests to maintain the quality of Airflow. However, when releasing patch-level updates on older
328+
branches, we only run a subset of these tests. This is because older branches are exclusively used for releasing
329+
Airflow and its corresponding image, not for releasing providers or helm charts.
317330

318331
This behaviour is controlled by ``default-branch`` output of the build-info job. Whenever we create a branch for old version
319332
we update the ``AIRFLOW_BRANCH`` in ``airflow_breeze/branch_defaults.py`` to point to the new branch and there are a few

0 commit comments

Comments
 (0)