Skip to content

Extending the Pipeline

Gems and Jewels to Collect

At the end of this episode you will have a CI pipeline that encompasses a few common CI use cases that you could also apply for your CI pipelines in your own projects. Additional GitLab CI keywords will be explained, such as:

  • Conditional execution of CI jobs with rules,
  • create, store and access artifacts with artifacts,
  • reuse artifacts created in previous CI jobs with dependencies.

Introduction

In this episode you will extend the CI pipeline we elaborated in the last episode while explaining the following CI use cases we introduced previously:

  • Checking the license compliance,
  • checking the code style of the project,
  • testing against multiple Python versions.

We also dive deeper into the keyword stages and introduce new keywords like rules, artifacts and dependencies and a list of selected predefined GitLab CI variables.

Additional CI Use Cases to Extend the CI Pipeline

Before we approach the topic of optimizing the CI pipeline a few further very common CI use cases are missing in our CI pipeline.

Checking the License Compliance

We will develop a CI job that checks that all files contain license and copyright information and that all license texts of the licenses used are contained in the project. First, we need to tell GitLab CI to run the CI job in a particular stage like lint that you need to declare at the beginning in your YAML file:

stages:
  - lint
  - run

In the context of checking the license compliance the command of the CLI tool Reuse is reuse lint. Since we are working with Python‘s virtual environments we need to prefix the command with poetry run so that reuse is executed in that virtual environment. Now, we are ready to write down the corresponding CI job:

my_ci_job:
  stage: lint
  script:
    - poetry run reuse lint

In our final .gitlab-ci.yml file the complete job may look like this:

license_compliance:
  image: python:3.9
  stage: lint
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run reuse lint

Checking the Code Style of the Project

Code style checking (or linting) should also always be part of your coding projects and can be done automatically in CI pipelines. Black and Isort are recommandable tools to do that in the Python universe. The respective commands are then black --check --diff . and isort --check --diff .. The first approach would be to copy and paste the previous lint job and exchange the tasks in the script keyword:

my_ci_job:
  stage: lint
  script:
    - poetry run black --check --diff .
    - poetry run isort --check --diff .

Our second lint job can then be added to the CI pipeline:

lint:
  image: python:3.9
  stage: lint
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run black --check --diff .
    - poetry run isort --check --diff .

As you can see, because of our copy and paste approach we introduce quite a bit of duplications. We will adapt the CI pipeline and reduce some duplications again in later episodes.

Testing Against Multiple Python Versions

Testing is the most important task that needs to be automated in CI pipelines. Your test suite ensures that you do not break anything if you push your changes to the repository. This safety net is essential for coding projects to reduce the risk of having defects in your code. Pytest is a unit-test framework for Python projects. You may execute your test suite with the command pytest tests/. On top, you can create CI jobs each testing your application with different versions of the Python interpreter. But first, we need an additional stage called test to run the test suite:

stages:
  - lint
  - test
  - run

Now, you can duplicate a previous job, assign the jobs to stage test and adapt the image keyword accordingly:

my_ci_job_1:
  image: python:3.8
  stage: test
  script:
    - poetry run pytest tests/

my_ci_job_2:
  image: python:3.9
  stage: test
  script:
    - poetry run pytest tests/

my_ci_job_3:
  image: python:3.10
  stage: test
  script:
    - poetry run pytest tests/

The full jobs in all detail look like this in our example:

test:python:3.8:
  image: python:3.8
  stage: test
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run pytest tests/

test:python:3.9:
  image: python:3.9
  stage: test
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run pytest tests/

test:python:3.10:
  image: python:3.10
  stage: test
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run pytest tests/

Again, this introduces quite a bit of repetitions which we tackle in follow-up episodes.

Additional Concepts and GitLab CI Keywords

In this section we would like to discuss more concepts and keywords that you may want to use in your projects.

More About Stages and Jobs

Now that we created our first complete CI pipeline covering all of our CI use cases, let us inspect our CI pipeline and the three stages and six CI jobs we defined. We observed that those stages are executed in sequence, i.e. jobs of later stages run only if the previous stage completed successfully. Those testing jobs in the test stage run in parallel, though. This is possible because all jobs in stage test are independent of each other. We recommend running jobs in parallel in a stage if the independence criterion holds true, because parallelization speeds up the pipeline significantly. In later episodes we will learn how to change this default behaviour with the needs keyword and change the running order of CI jobs. Also, we will further speed up the CI pipeline with some additional concepts.

Predefined Variables in GitLab CI

Predefined variables in GitLab CI are variables in the context of GitLab CI which have useful values assigned. They can be used in GitLab CI pipelines.

Predefined Variables Reference

This is a compilation of few selected CI variables:

Variable Name Description
CI_COMMIT_BRANCH The commit branch name. Available in branch pipelines, including pipelines for the default branch. Not available in merge request pipelines or tag pipelines.
CI_COMMIT_REF_NAME The branch or tag name for which project is built.
CI_COMMIT_REF_SLUG CI_COMMIT_REF_NAME in lowercase, shortened to 63 bytes, and with everything except 0-9 and a-z replaced with -. No leading / trailing -. Use in URLs, host names and domain names.
CI_COMMIT_SHA The commit revision the project is built for.
CI_COMMIT_TAG The commit tag name. Available only in pipelines for tags.
CI_DEFAULT_BRANCH The name of the project’s default branch.
CI_DEPLOY_PASSWORD The authentication password of the GitLab Deploy Token, if the project has one.
CI_DEPLOY_USER The authentication username of the GitLab Deploy Token, if the project has one.
CI_JOB_TOKEN A token to authenticate with certain API endpoints. The token is valid as long as the job is running.
CI_PROJECT_DIR The full path the repository is cloned to, and where the job runs from.
CI_REGISTRY_IMAGE The address of the project’s Container Registry. Only available if the Container Registry is enabled for the project.
CI_REGISTRY_PASSWORD The password to push containers to the project’s GitLab Container Registry. Only available if the Container Registry is enabled for the project. This password value is the same as the CI_JOB_TOKEN and is valid only as long as the job is running. Use the CI_DEPLOY_PASSWORD for long-lived access to the registry
CI_REGISTRY_USER The username to push containers to the project’s GitLab Container Registry. Only available if the Container Registry is enabled for the project.
CI_REGISTRY The address of the GitLab Container Registry. Only available if the Container Registry is enabled for the project. This variable includes a :port value if one is specified in the registry configuration.
CI_REPOSITORY_URL The URL to clone the Git repository.

Predefined Variables for Merge Request Pipelines

On top, this is a compilation of few selected CI variables that are present in merge request pipelines only:

Variable Name Description
CI_MERGE_REQUEST_SOURCE_BRANCH_NAME The source branch name of the merge request.
CI_MERGE_REQUEST_SOURCE_BRANCH_SHA The HEAD SHA of the source branch of the merge request. The variable is empty in merge request pipelines. The SHA is present only in merged results pipelines.
CI_MERGE_REQUEST_TARGET_BRANCH_NAME The target branch name of the merge request.
CI_MERGE_REQUEST_TARGET_BRANCH_SHA The HEAD SHA of the target branch of the merge request. The variable is empty in merge request pipelines. The SHA is present only in merged results pipelines.

Example

In order to show how these predefined variables can be used inside your CI pipeline, we give this example that just outputs the values of two predefined CI variables that we need in the next section of this episode:

stages:
  - echo

echo:
  stage: echo
  script:
    - echo "CI_COMMIT Branch = '$CI_COMMIT_BRANCH'"
    - echo "CT_DEFAULT_BRANCH = '$CI_DEFAULT_BRANCH'"

This is the output appearing in the CI job log of job echo:

[...]
$ echo "CI_COMMIT Branch = '$CI_COMMIT_BRANCH'"
CI_COMMIT Branch = 'main'
$ echo "CT_DEFAULT_BRANCH = '$CI_DEFAULT_BRANCH'"
CT_DEFAULT_BRANCH = 'main'
[...]

Conditional Execution of CI Jobs With rules

It might be the case that you do not need to execute a CI job in all pipeline runs but in pipelines that fulfil certain conditions. A useful keyword is the rules keyword when it comes to executing CI jobs conditionally. The keyword is quite powerful but in our opinion also a bit harder to understand. Here we introduce the most common rule, i.e. execute a job if the pipeline has been triggered due to a merge into branch main. Taken the run job of our pipeline this looks like this:

run:
  image: python:3.9
  stage: run
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run python -m astronaut_analysis
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

As a consequence, the run job which created a new set of plots is only executed if the branch at hand which we commit into during a merge is the default branch, i.e. branch main in our case. Variable $CI_COMMIT_BRANCH holds the branch name which we commit into during a merge. Variable $CI_DEFAULT_BRANCH holds the default branch name, i.e. main, in this project. Running this job only conditionally might be reasonable because we only want to generate plots originating from default branch main.

Create, Store and Access Artifacts With artifacts

You might have asked yourself whether we could access artifacts generated during a CI job. Fortunately, this is possible with the artifacts keyword. We need to specify the artifacts retained from a CI job as a list of files and directories like this:

run:
  image: python:3.9
  stage: run
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run python -m astronaut_analysis
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  artifacts:
    paths:
      - results/

After the job completed the plots are stored for a period of 30 days as job artifacts. In case of so called latest artifacts they won’t be deleted until newer artifacts arrive. You can access them and, for example, download them by navigating into the CI job log of your CI job and click download in the job artifacts section on the right side-bar.

Job artifacts

Reuse Artifacts Created in Previous CI Jobs With dependencies

What if we have generated some artifacts in a previous CI job, do we need to re-generate the artifacts already created in a later CI job if we need them? No, of course it is possible to pass artifacts from one job on to a later CI job. The respective keyword is the dependencies keyword. You can tell the CI pipeline to fetch the job artifacts of a previous CI job:

stages:
  - run
  - deploy

run:
  image: python:3.9
  stage: run
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run python -m astronaut_analysis
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  artifacts:
    paths:
      - results/

pages:
  stage: deploy
  script:
    - mkdir public/
    - cp results/age_histogram.png public/age_histogram.png
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  artifacts:
    paths:
      - public/
  dependencies:
    - run

Note

This special pages CI job running on changes on branch main needs some explanations. In GitLab you can host internal static web pages containing files such as HTML, Javascript or CSS files. There is a special CI job called pages that deploys your static web page to GitLab. During a pipeline run you need to copy your generated page into the public folder and name it in the artifacts section of the CI job pages. The pages job will then take all contained files and hosts them as a static web page, if this feature is activated in the settings of your GitLab project. To activate GitLab Pages you can navigate to Settings > General > Visibility, Project Features, Permissions and enable the Pages feature. After the first pipeline run you can find the URL of your static web page in the settings of the project: Settings > Pages. All logged in GitLab users can access these Pages then. It is also possible to make these Pages private and accessible by project members only.

Exercise

Exercise 1: Create a Complete CI Pipeline for the Exercise Project

By now we have introduced some keywords and concepts that are useful in covering all CI use cases discussed so far. In the following exercise you should try to develop a CI pipeline for the exercise project which includes all CI use-cases from the previous exercise. These were:

  1. Check license compliance.
  2. Linting the source code.
  3. Building the executable.
  4. Run existing test cases.
  5. Run the executable.

The pipeline might contain jobs like licence_compliance, lint, build, test and run. To get you started, these are the relevant commands for the script section of the CI jobs: - License compliance can be checked by the before-mentioned reuse tool: reuse lint - Linting can be done by a tool called cpplint: cpplint --recursive src/ tests/ - The build of the application is done with CMake: cmake -S . -B build and cmake --build build - The test suite can be run by GoogleTest: cd build && ctest - Finally, we want to run the application on the command-line without any arguments: ./build/bin/helloWorld

Take Home Messages

In this episode we explored some additional common CI use cases like linting and testing and introduced new GitLab CI keywords like rules, artifacts and dependencies and listed a few predefined GitLab CI variables.

Next Episodes

Next, we will take the CI pipeline we wrote so far and optimize and polish it a bit so that it is easier to read, much easier to maintain and runs more efficiently and faster.