Kubernetes Cluster API

Cluster API is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters.

Started by the Kubernetes Special Interest Group (SIG) Cluster Lifecycle, the Cluster API project uses Kubernetes-style APIs and patterns to automate cluster lifecycle management for platform operators. The supporting infrastructure, like virtual machines, networks, load balancers, and VPCs, as well as the Kubernetes cluster configuration are all defined in the same way that application developers operate deploying and managing their workloads. This enables consistent and repeatable cluster deployments across a wide variety of infrastructure environments.

⚠️ Breaking Changes ⚠️

Feature gate name: ClusterTopology

Variable name to enable/disable the feature gate: CLUSTER_TOPOLOGY

Additional documentation:

Writing a ClusterClass

A ClusterClass becomes more useful and valuable when it can be used to create many Cluster of a similar shape. The goal of this document is to explain how ClusterClasses can be written in a way that they are flexible enough to be used in as many Clusters as possible by supporting variants of the same base Cluster shape.

Table of Contents

Basic ClusterClass

The following example shows a basic ClusterClass. It contains templates to shape the control plane, infrastructure and workers of a Cluster. When a Cluster is using this ClusterClass, the templates are used to generate the objects of the managed topology of the Cluster.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  controlPlane:
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: docker-clusterclass-v0.1.0
      namespace: default
    machineInfrastructure:
      ref:
        kind: DockerMachineTemplate
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        name: docker-clusterclass-v0.1.0
        namespace: default
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DockerClusterTemplate
      name: docker-clusterclass-v0.1.0-control-plane
      namespace: default
  workers:
    machineDeployments:
    - class: default-worker
      template:
        bootstrap:
          ref:
            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
            kind: KubeadmConfigTemplate
            name: docker-clusterclass-v0.1.0-default-worker
            namespace: default
        infrastructure:
          ref:
            apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
            kind: DockerMachineTemplate
            name: docker-clusterclass-v0.1.0-default-worker
            namespace: default

The following example shows a Cluster using this ClusterClass. In this case a KubeadmControlPlane with the corresponding DockerMachineTemplate, a DockerCluster and a MachineDeployment with the corresponding KubeadmConfigTemplate and DockerMachineTemplate will be created. This basic ClusterClass is already very flexible. Via the topology on the Cluster the following can be configured:

  • .spec.topology.version: the Kubernetes version of the Cluster
  • .spec.topology.controlPlane: ControlPlane replicas and their metadata
  • .spec.topology.workers: MachineDeployments and their replicas, metadata and failure domain
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-docker-cluster
spec:
  topology:
    class: docker-clusterclass-v0.1.0
    version: v1.22.4
    controlPlane:
      replicas: 3
      metadata:
        labels:
          cpLabel: cpLabelValue 
        annotations:
          cpAnnotation: cpAnnotationValue
    workers:
      machineDeployments:
      - class: default-worker
        name: md-0
        replicas: 4
        metadata:
          labels:
            mdLabel: mdLabelValue
          annotations:
            mdAnnotation: mdAnnotationValue
        failureDomain: region

Best practices:

  • The ClusterClass name should be generic enough to make sense across multiple clusters, i.e. a name which corresponds to a single Cluster, e.g. “my-cluster”, is not recommended.
  • Try to keep the ClusterClass names short and consistent (if you publish multiple ClusterClasses).
  • As a ClusterClass usually evolves over time and you might want to rebase Clusters from one version of a ClusterClass to another, consider including a version suffix in the ClusterClass name. For more information about changing a ClusterClass please see: Changing a ClusterClass.
  • Prefix the templates used in a ClusterClass with the name of the ClusterClass.
  • Don’t reuse the same template in multiple ClusterClasses. This is automatically taken care of by prefixing the templates with the name of the ClusterClass.

ClusterClass with MachineHealthChecks

MachineHealthChecks can be configured in the ClusterClass for the control plane and for a MachineDeployment class. The following configuration makes sure a MachineHealthCheck is created for the control plane and for every MachineDeployment using the default-worker class.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  controlPlane:
    ...
    machineHealthCheck:
      maxUnhealthy: 33%
      nodeStartupTimeout: 15m
      unhealthyConditions:
      - type: Ready
        status: Unknown
        timeout: 300s
      - type: Ready
        status: "False"
        timeout: 300s
  workers:
    machineDeployments:
    - class: default-worker
      ...
      machineHealthCheck:
        unhealthyRange: "[0-2]"
        nodeStartupTimeout: 10m
        unhealthyConditions:
        - type: Ready
          status: Unknown
          timeout: 300s
        - type: Ready
          status: "False"
          timeout: 300s

ClusterClass with patches

As shown above, basic ClusterClasses are already very powerful. But there are cases where more powerful mechanisms are required. Let’s assume you want to manage multiple Clusters with the same ClusterClass, but they require different values for a field in one of the referenced templates of a ClusterClass.

A concrete example would be to deploy Clusters with different registries. In this case, every cluster needs a Cluster-specific value for .spec.kubeadmConfigSpec.clusterConfiguration.imageRepository in KubeadmControlPlane. Use cases like this can be implemented with ClusterClass patches.

Defining variables in the ClusterClass

The following example shows how variables can be defined in the ClusterClass. A variable definition specifies the name and the schema of a variable and if it is required. The schema defines how a variable is defaulted and validated. It supports a subset of the schema of CRDs. For more information please see the godoc.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  variables:
  - name: imageRepository
    required: true
    schema:
      openAPIV3Schema:
        type: string
        description: ImageRepository is the container registry to pull images from.
        default: registry.k8s.io
        example: registry.k8s.io

Defining patches in the ClusterClass

The variable can then be used in a patch to set a field on a template referenced in the ClusterClass. The selector specifies on which template the patch should be applied. jsonPatches specifies which JSON patches should be applied to that template. In this case we set the imageRepository field of the KubeadmControlPlaneTemplate to the value of the variable imageRepository. For more information please see the godoc.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  patches:
  - name: imageRepository
    definitions:
    - selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
      jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/imageRepository
        valueFrom:
          variable: imageRepository

Setting variable values in the Cluster

After creating a ClusterClass with a variable definition, the user can now provide a value for the variable in the Cluster as in the example below.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-docker-cluster
spec:
  topology:
    ...
    variables:
    - name: imageRepository
      value: my.custom.registry

ClusterClass with custom naming strategies

The controller needs to generate names for new objects when a Cluster is getting created from a ClusterClass. These names have to be unique for each namespace. The naming strategy enables this by concatenating the cluster name with a random suffix.

It is possible to provide a custom template for the name generation of ControlPlane, MachineDeployment and MachinePool objects.

The generated names must comply with the RFC 1123 standard.

Defining a custom naming strategy for ControlPlane objects

The naming strategy for ControlPlane supports the following properties:

  • template: Custom template which is used when generating the name of the ControlPlane object.

The following variables can be referenced in templates:

  • .cluster.name: The name of the cluster object.
  • .random: A random alphanumeric string, without vowels, of length 5.

Example which would match the default behavior:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  controlPlane:
    ...
    namingStrategy:
      template: "{{ .cluster.name }}-{{ .random }}"
  ...

Defining a custom naming strategy for MachineDeployment objects

The naming strategy for MachineDeployments supports the following properties:

  • template: Custom template which is used when generating the name of the MachineDeployment object.

The following variables can be referenced in templates:

  • .cluster.name: The name of the cluster object.
  • .random: A random alphanumeric string, without vowels, of length 5.
  • .machineDeployment.topologyName: The name of the MachineDeployment topology (Cluster.spec.topology.workers.machineDeployments[].name)

Example which would match the default behavior:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  controlPlane:
    ...
  workers:
    machineDeployments:
    - class: default-worker
      ...
      namingStrategy:
        template: "{{ .cluster.name }}-{{ .machineDeployment.topologyName }}-{{ .random }}"

Defining a custom naming strategy for MachinePool objects

The naming strategy for MachinePools supports the following properties:

  • template: Custom template which is used when generating the name of the MachinePool object.

The following variables can be referenced in templates:

  • .cluster.name: The name of the cluster object.
  • .random: A random alphanumeric string, without vowels, of length 5.
  • .machinePool.topologyName: The name of the MachinePool topology (Cluster.spec.topology.workers.machinePools[].name).

Example which would match the default behavior:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  controlPlane:
    ...
  workers:
    machinePools:
    - class: default-worker
      ...
      namingStrategy:
        template: "{{ .cluster.name }}-{{ .machinePool.topologyName }}-{{ .random }}"

Advanced features of ClusterClass with patches

This section will explain more advanced features of ClusterClass patches.

MachineDeployment variable overrides

If you want to use many variations of MachineDeployments in Clusters, you can either define a MachineDeployment class for every variation or you can define patches and variables to make a single MachineDeployment class more flexible.

In the following example we make the instanceType of a AWSMachineTemplate customizable. First we define the workerMachineType variable and the corresponding patch:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: aws-clusterclass-v0.1.0
spec:
  ...
  variables:
  - name: workerMachineType
    required: true
    schema:
      openAPIV3Schema:
        type: string
        default: t3.large
  patches:
  - name: workerMachineType
    definitions:
    - selector:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSMachineTemplate
        matchResources:
          machineDeploymentClass:
            names:
            - default-worker
      jsonPatches:
      - op: add
        path: /spec/template/spec/instanceType
        valueFrom:
          variable: workerMachineType
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
metadata:
  name: aws-clusterclass-v0.1.0-default-worker
spec:
  template:
    spec:
      # instanceType: workerMachineType will be set by the patch.
      iamInstanceProfile: "nodes.cluster-api-provider-aws.sigs.k8s.io"
---
...

In the Cluster resource the workerMachineType variable can then be set cluster-wide and it can also be overridden for an individual MachineDeployment.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-aws-cluster
spec:
  ...
  topology:
    class: aws-clusterclass-v0.1.0
    version: v1.22.0
    controlPlane:
      replicas: 3
    workers:
      machineDeployments:
      - class: "default-worker"
        name: "md-small-workers"
        replicas: 3
        variables:
          overrides:
          # Overrides the cluster-wide value with t3.small.
          - name: workerMachineType
            value: t3.small
      # Uses the cluster-wide value t3.large.
      - class: "default-worker"
        name: "md-large-workers"
        replicas: 3
    variables:
    - name: workerMachineType
      value: t3.large

Builtin variables

In addition to variables specified in the ClusterClass, the following builtin variables can be referenced in patches:

  • builtin.cluster.{name,namespace}
  • builtin.cluster.topology.{version,class}
  • builtin.cluster.network.{serviceDomain,services,pods,ipFamily}
  • builtin.controlPlane.{replicas,version,name}
    • Please note, these variables are only available when patching control plane or control plane machine templates.
  • builtin.controlPlane.machineTemplate.infrastructureRef.name
    • Please note, these variables are only available when using a control plane with machines and when patching control plane or control plane machine templates.
  • builtin.machineDeployment.{replicas,version,class,name,topologyName}
    • Please note, these variables are only available when patching the templates of a MachineDeployment and contain the values of the current MachineDeployment topology.
  • builtin.machineDeployment.{infrastructureRef.name,bootstrap.configRef.name}
    • Please note, these variables are only available when patching the templates of a MachineDeployment and contain the values of the current MachineDeployment topology.

Builtin variables can be referenced just like regular variables, e.g.:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  patches:
  - name: clusterName
    definitions:
    - selector:
      ...
      jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/controllerManager/extraArgs/cluster-name
        valueFrom:
          variable: builtin.cluster.name

Tips & Tricks

Builtin variables can be used to dynamically calculate image names. The version used in the patch will always be the same as the one we set in the corresponding MachineDeployment (works the same way with .builtin.controlPlane.version).

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  patches:
  - name: customImage
    description: "Sets the container image that is used for running dockerMachines."
    definitions:
    - selector:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        matchResources:
          machineDeploymentClass:
            names:
            - default-worker
      jsonPatches:
      - op: add
        path: /spec/template/spec/customImage
        valueFrom:
          template: |
            kindest/node:{{ .builtin.machineDeployment.version }}

Complex variable types

Variables can also be objects, maps and arrays. An object is specified with the type object and by the schemas of the fields of the object. A map is specified with the type object and the schema of the map values. An array is specified via the type array and the schema of the array items.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  variables:
  - name: httpProxy
    schema:
      openAPIV3Schema:
        type: object
        properties: 
          # Schema of the url field.
          url: 
            type: string
          # Schema of the noProxy field.
          noProxy:
            type: string
  - name: mdConfig
    schema:
      openAPIV3Schema:
        type: object
        additionalProperties:
          # Schema of the map values.
          type: object
          properties:
            osImage:
              type: string
  - name: dnsServers
    schema:
      openAPIV3Schema:
        type: array
        items:
          # Schema of the array items.
          type: string

Objects, maps and arrays can be used in patches either directly by referencing the variable name, or by accessing individual fields. For example:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  jsonPatches:
  - op: add
    path: /spec/template/spec/httpProxy/url
    valueFrom:
      # Use the url field of the httpProxy variable.
      variable: httpProxy.url
  - op: add
    path: /spec/template/spec/customImage
    valueFrom:
      # Use the osImage field of the mdConfig variable for the current MD class.
      template: "{{ (index .mdConfig .builtin.machineDeployment.class).osImage }}"
  - op: add
    path: /spec/template/spec/dnsServers
    valueFrom:
      # Use the entire dnsServers array.
      variable: dnsServers
  - op: add
    path: /spec/template/spec/dnsServer
    valueFrom:
      # Use the first item of the dnsServers array.
      variable: dnsServers[0]

Tips & Tricks

Complex variables can be used to make references in templates configurable, e.g. the identityRef used in AzureCluster. Of course it’s also possible to only make the name of the reference configurable, including restricting the valid values to a pre-defined enum.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: azure-clusterclass-v0.1.0
spec:
  ...
  variables:
  - name: clusterIdentityRef
    schema:
      openAPIV3Schema:
        type: object
        properties:
          kind:
            type: string
          name:
            type: string

Even if OpenAPI schema allows defining free form objects, e.g.

variables:
  - name: freeFormObject
    schema:
      openAPIV3Schema:
        type: object

User should be aware that the lack of the validation of users provided data could lead to problems when those values are used in patch or when the generated templates are created (see e.g. 6135).

As a consequence we recommend avoiding this practice while we are considering alternatives to make it explicit for the ClusterClass authors to opt-in in this feature, thus accepting the implied risks.

Using variable values in JSON patches

We already saw above that it’s possible to use variable values in JSON patches. It’s also possible to calculate values via Go templating or to use hard-coded values.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  patches:
  - name: etcdImageTag
    definitions:
    - selector:
      ...
      jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/etcd
        valueFrom:
          # This template is first rendered with Go templating, then parsed by 
          # a YAML/JSON parser and then used as value of the JSON patch.
          # For example, if the variable etcdImageTag is set to `3.5.1-0` the 
          # .../clusterConfiguration/etcd field will be set to:
          # {"local": {"imageTag": "3.5.1-0"}}
          template: |
            local:
              imageTag: {{ .etcdImageTag }}
  - name: imageRepository
    definitions:
    - selector:
      ...
      jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/imageRepository
        # This hard-coded value is used directly as value of the JSON patch.
        value: "my.custom.registry"

Tips & Tricks

Templates can be used to implement defaulting behavior during JSON patch value calculation. This can be used if the simple constant default value which can be specified in the schema is not enough.

        valueFrom:
          # If .vnetName is set, it is used. Otherwise, we will use `{{.builtin.cluster.name}}-vnet`.  
          template: "{{ if .vnetName }}{{.vnetName}}{{else}}{{.builtin.cluster.name}}-vnet{{end}}"

When writing templates, a subset of functions from the Sprig library can be used to write expressions, e.g., {{ .name | upper }}. Only functions that are guaranteed to evaluate to the same result for a given input are allowed (e.g. upper or max can be used, while now or randAlpha cannot be used).

Optional patches

Patches can also be conditionally enabled. This can be done by configuring a Go template via enabledIf. The patch is then only applied if the Go template evaluates to true. In the following example the httpProxy patch is only applied if the httpProxy variable is set (and not empty).

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: docker-clusterclass-v0.1.0
spec:
  ...
  variables:
  - name: httpProxy
    schema:
      openAPIV3Schema:
        type: string
  patches:
  - name: httpProxy
    enabledIf: "{{ if .httpProxy }}true{{end}}"
    definitions:
    ...  

Tips & Tricks:

Hard-coded values can be used to test the impact of a patch during development, gradually roll out patches, etc. .

    enabledIf: false

A boolean variable can be used to enable/disable a patch (or “feature”). This can have opt-in or opt-out behavior depending on the default value of the variable.

    enabledIf: "{{ .httpProxyEnabled }}"

Of course the same is possible by adding a boolean variable to a configuration object.

    enabledIf: "{{ .httpProxy.enabled }}"

Builtin variables can be leveraged to apply a patch only for a specific Kubernetes version.

    enabledIf: '{{ semverCompare "1.21.1" .builtin.controlPlane.version }}'

With semverCompare and coalesce a feature can be enabled in newer versions of Kubernetes for both KubeadmConfigTemplate and KubeadmControlPlane.

    enabledIf: '{{ semverCompare "^1.22.0" (coalesce .builtin.controlPlane.version .builtin.machineDeployment.version )}}'

Version-aware patches

In some cases the ClusterClass authors want a patch to be computed according to the Kubernetes version in use.

While this is not a problem “per se” and it does not differ from writing any other patch, it is important to keep in mind that there could be different Kubernetes version in a Cluster at any time, all of them accessible via built in variables:

  • builtin.cluster.topology.version defines the Kubernetes version from cluster.topology, and it acts as the desired Kubernetes version for the entire cluster. However, during an upgrade workflow it could happen that some objects in the Cluster are still at the older version.
  • builtin.controlPlane.version, represent the desired version for the control plane object; usually this version changes immediately after cluster.topology.version is updated (unless there are other operations in progress preventing the upgrade to start).
  • builtin.machineDeployment.version, represent the desired version for each specific MachineDeployment object; this version changes only after the upgrade for the control plane is completed, and in case of many MachineDeployments in the same cluster, they are upgraded sequentially.

This info should provide the bases for developing version-aware patches, allowing the patch author to determine when a patch should adapt to the new Kubernetes version by choosing one of the above variables. In practice the following rules applies to the most common use cases:

  • When developing a version-aware patch for the control plane, builtin.controlPlane.version must be used.
  • When developing a version-aware patch for MachineDeployments, builtin.machineDeployment.version must be used.

Tips & Tricks:

Sometimes users need to define variables to be used by version-aware patches, and in this case it is important to keep in mind that there could be different Kubernetes versions in a Cluster at any time.

A simple approach to solve this problem is to define a map of version-aware variables, with the key of each item being the Kubernetes version. Patch could then use the proper builtin variables as a lookup entry to fetch the corresponding values for the Kubernetes version in use by each object.

JSON patches tips & tricks

JSON patches specification RFC6902 requires that the target of add operation must exist.

As a consequence ClusterClass authors should pay special attention when the following conditions apply in order to prevent errors when a patch is applied:

  • the patch tries to add a value to an array (which is a slice in the corresponding go struct)
  • the slice was defined with omitempty
  • the slice currently does not exist

A workaround in this particular case is to create the array in the patch instead of adding to the non-existing one. When creating the slice, existing values would be overwritten so this should only be used when it does not exist.

The following example shows both cases to consider while writing a patch for adding a value to a slice. This patch targets to add a file to the files slice of a KubeadmConfigTemplate which has omitempty set.

This patch requires the key .spec.template.spec.files to exist to succeed.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: my-clusterclass
spec:
  ...
  patches:
  - name: add file
    definitions:
    - selector:
        apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
        kind: KubeadmConfigTemplate
      jsonPatches:
      - op: add
        path: /spec/template/spec/files/-
        value:
          content: Some content.
          path: /some/file
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: "quick-start-default-worker-bootstraptemplate"
spec:
  template:
    spec:
      ...
      files:
      - content: Some other content
        path: /some/other/file

This patch would overwrite an existing slice at .spec.template.spec.files.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: my-clusterclass
spec:
  ...
  patches:
  - name: add file
    definitions:
    - selector:
        apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
        kind: KubeadmConfigTemplate
      jsonPatches:
      - op: add
        path: /spec/template/spec/files
        value:
        - content: Some content.
          path: /some/file
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: "quick-start-default-worker-bootstraptemplate"
spec:
  template:
    spec:
      ...

Changing a ClusterClass

Selecting a strategy

When planning a change to a ClusterClass, users should always take into consideration how those changes might impact the existing Clusters already using the ClusterClass, if any.

There are two strategies for defining how a ClusterClass change rolls out to existing Clusters:

  • Roll out ClusterClass changes to existing Cluster in a controlled/incremental fashion.
  • Roll out ClusterClass changes to all the existing Cluster immediately.

The first strategy is the recommended choice for people starting with ClusterClass; it requires the users to create a new ClusterClass with the expected changes, and then rebase each Cluster to use the newly created ClusterClass.

By splitting the change to the ClusterClass and its rollout to Clusters into separate steps the user will reduce the risk of introducing unexpected changes on existing Clusters, or at least limit the blast radius of those changes to a small number of Clusters already rebased (in fact it is similar to a canary deployment).

The second strategy listed above instead requires changing a ClusterClass “in place”, which can be simpler and faster than creating a new ClusterClass. However, this approach means that changes are immediately propagated to all the Clusters already using the modified ClusterClass. Any operation involving many Clusters at the same time has intrinsic risks, and it can impact heavily on the underlying infrastructure in case the operation triggers machine rollout across the entire fleet of Clusters.

However, regardless of which strategy you are choosing to implement your changes to a ClusterClass, please make sure to:

If instead you are interested in understanding more about which kind of
effects you should expect on the Clusters, or if you are interested in additional details about the internals of the topology reconciler you can start reading the notes in the Plan ClusterClass changes documentation or looking at the reference documentation at the end of this page.

Changing ClusterClass templates

Templates are an integral part of a ClusterClass, and thus the same considerations described in the previous paragraph apply. When changing a template referenced in a ClusterClass users should also always plan for how the change should be propagated to the existing Clusters and choose the strategy that best suits expectations.

According to the Cluster API operational practices, the recommended way for updating templates is by template rotation:

  • Create a new template
  • Update the template reference in the ClusterClass
  • Delete the old template

Also in case of changes to the ClusterClass templates, please make sure to:

You can learn more about this reading the notes in the Plan ClusterClass changes documentation or looking at the reference documentation at the end of this page.

Rebase

Rebasing is an operational practice for transitioning a Cluster from one ClusterClass to another, and the operation can be triggered by simply changing the value in Cluster.spec.topology.class.

Also in this case, please make sure to:

You can learn more about this reading the notes in the Plan ClusterClass changes documentation or looking at the reference documentation at the end of this page.

Compatibility Checks

When changing a ClusterClass, the system validates the required changes according to a set of “compatibility rules” in order to prevent changes which would lead to a non-functional Cluster, e.g. changing the InfrastructureProvider from AWS to Azure.

If the proposed changes are evaluated as dangerous, the operation is rejected.

For additional info see compatibility rules defined in the ClusterClass proposal.

Planning ClusterClass changes

It is highly recommended to always generate a plan for ClusterClass changes before applying them, no matter if you are creating a new ClusterClass and rebasing Clusters or if you are changing your ClusterClass in place.

The clusterctl tool provides a new alpha command for this operation, clusterctl alpha topology plan.

The output of this command will provide you all the details about how those changes would impact Clusters, but the following notes can help you to understand what you should expect when planning your ClusterClass changes:

  • Users should expect the resources in a Cluster (e.g. MachineDeployments) to behave consistently no matter if a change is applied via a ClusterClass or directly as you do in a Cluster without a ClusterClass. In other words, if someone changes something on a KCP object triggering a control plane Machines rollout, you should expect the same to happen when the same change is applied to the KCP template in ClusterClass.

  • User should expect the Cluster topology to change consistently irrespective of how the change has been implemented inside the ClusterClass or applied to the ClusterClass. In other words, if you change a template field “in place”, or if you rotate the template referenced in the ClusterClass by pointing to a new template with the same field changed, or if you change the same field via a patch, the effects on the Cluster are the same.

See reference for more details.

Reference

Effects on the Clusters

The following table documents the effects each ClusterClass change can have on a Cluster; Similar considerations apply to changes introduced by changes in Cluster.Topology or by changes introduced by patches.

NOTE: for people used to operating Cluster API without Cluster Class, it could also help to keep in mind that the underlying objects like control plane and MachineDeployment act in the same way with and without a ClusterClass.

Changed fieldEffects on Clusters
infrastructure.refCorresponding InfrastructureCluster objects are updated (in place update).
controlPlane.metadataIf labels/annotations are added, changed or deleted the ControlPlane objects are updated (in place update).

In case of KCP, corresponding controlPlane Machines, KubeadmConfigs and InfrastructureMachines are updated in-place.
controlPlane.refCorresponding ControlPlane objects are updated (in place update).
If updating ControlPlane objects implies changes in the spec, the corresponding ControlPlane Machines are updated accordingly (rollout).
controlPlane.machineInfrastructure.refIf the referenced template has changes only in metadata labels or annotations, the corresponding InfrastructureMachineTemplates are updated (in place update).

If the referenced template has changes in the spec:
- Corresponding InfrastructureMachineTemplate are rotated (create new, delete old)
- Corresponding ControlPlane objects are updated with the reference to the newly created template (in place update)
- The corresponding controlPlane Machines are updated accordingly (rollout).
controlPlane.nodeDrainTimeoutIf the value is changed the ControlPlane object is updated in-place.

In case of KCP, the change is propagated in-place to control plane Machines.
controlPlane.nodeVolumeDetachTimeoutIf the value is changed the ControlPlane object is updated in-place.

In case of KCP, the change is propagated in-place to control plane Machines.
controlPlane.nodeDeletionTimeoutIf the value is changed the ControlPlane object is updated in-place.

In case of KCP, the change is propagated in-place to control plane Machines.
workers.machineDeploymentsIf a new MachineDeploymentClass is added, no changes are triggered to the Clusters.
If an existing MachineDeploymentClass is changed, effect depends on the type of change (see below).
workers.machineDeployments[].template.metadataIf labels/annotations are added, changed or deleted the MachineDeployment objects are updated (in place update) and corresponding worker Machines are updated (in-place).
workers.machineDeployments[].template.bootstrap.refIf the referenced template has changes only in metadata labels or annotations, the corresponding BootstrapTemplates are updated (in place update).

If the referenced template has changes in the spec:
- Corresponding BootstrapTemplate are rotated (create new, delete old).
- Corresponding MachineDeployments objects are updated with the reference to the newly created template (in place update).
- The corresponding worker machines are updated accordingly (rollout)
workers.machineDeployments[].template.infrastructure.refIf the referenced template has changes only in metadata labels or annotations, the corresponding InfrastructureMachineTemplates are updated (in place update).

If the referenced template has changes in the spec:
- Corresponding InfrastructureMachineTemplate are rotated (create new, delete old).
- Corresponding MachineDeployments objects are updated with the reference to the newly created template (in place update).
- The corresponding worker Machines are updated accordingly (rollout)
workers.machineDeployments[].template.nodeDrainTimeoutIf the value is changed the MachineDeployment is updated in-place.

The change is propagated in-place to the MachineDeployment Machine.
workers.machineDeployments[].template.nodeVolumeDetachTimeoutIf the value is changed the MachineDeployment is updated in-place.

The change is propagated in-place to the MachineDeployment Machine.
workers.machineDeployments[].template.nodeDeletionTimeoutIf the value is changed the MachineDeployment is updated in-place.

The change is propagated in-place to the MachineDeployment Machine.
workers.machineDeployments[].template.minReadySecondsIf the value is changed the MachineDeployment is updated in-place.

How the topology controller reconciles template fields

The topology reconciler enforces values defined in the ClusterClass templates into the topology owned objects in a Cluster.

More specifically, the topology controller uses Server Side Apply to write/patch topology owned objects; using SSA allows other controllers to co-author the generated objects, like e.g. adding info for subnets in CAPA.

A corollary of the behaviour described above is that it is technically possible to change fields in the object which are not derived from the templates and patches, but we advise against using the possibility or making ad-hoc changes in generated objects unless otherwise needed for a workaround. It is always preferable to improve ClusterClasses by supporting new Cluster variants in a reusable way.

Operating a managed Cluster

The spec.topology field added to the Cluster object as part of ClusterClass allows changes made on the Cluster to be propagated across all relevant objects. This means the Cluster object can be used as a single point of control for making changes to objects that are part of the Cluster, including the ControlPlane and MachineDeployments.

A managed Cluster can be used to:

Upgrade a Cluster

Using a managed topology the operation to upgrade a Kubernetes cluster is a one-touch operation. Let’s assume we have created a CAPD cluster with ClusterClass and specified Kubernetes v1.21.2 (as documented in the Quick Start guide). Specifying the version is done when running clusterctl generate cluster. Looking at the cluster, the version of the control plane and the MachineDeployments is v1.21.2.

> kubectl get kubeadmcontrolplane,machinedeployments
NAME                                                                              CLUSTER                   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX    clusterclass-quickstart   true          true                   1          1       1         0             2m21s   v1.21.2

NAME                                                                             CLUSTER                   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE     VERSION
machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX    clusterclass-quickstart   1          1       1         0             Running   2m21s   v1.21.2

To update the Cluster the only change needed is to the version field under spec.topology in the Cluster object.

Change 1.21.2 to 1.22.0 as below.

kubectl patch cluster clusterclass-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/version", "value": "v1.22.0"}]'

The patch will make the following change to the Cluster yaml:

   spec:
     topology:
      class: quick-start
+     version: v1.22.0
-     version: v1.21.2 

Important Note: A +2 minor Kubernetes version upgrade is not allowed in Cluster Topologies. This is to align with existing control plane providers, like KubeadmControlPlane provider, that limit a +2 minor version upgrade. Example: Upgrading from 1.21.2 to 1.23.0 is not allowed.

The upgrade will take some time to roll out as it will take place machine by machine with older versions of the machines only being removed after healthy newer versions come online.

To watch the update progress run:

watch kubectl get kubeadmcontrolplane,machinedeployments

After a few minutes the upgrade will be complete and the output will be similar to:

NAME                                                                              CLUSTER                   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE     VERSION
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/clusterclass-quickstart-XXXX    clusterclass-quickstart   true          true                   1          1       1         0             7m29s   v1.22.0

NAME                                                                             CLUSTER                   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE     VERSION
machinedeployment.cluster.x-k8s.io/clusterclass-quickstart-linux-workers-XXXX    clusterclass-quickstart   1          1       1         0             Running   7m29s   v1.22.0

Scale a MachineDeployment

When using a managed topology scaling of MachineDeployments, both up and down, should be done through the Cluster topology.

Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the Quick Start guide). Initially we should have a MachineDeployment with 3 replicas. Running

kubectl get machinedeployments

Will give us:

NAME                                                            CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX   capi-quickstart   3          3       3         0             Running   21m   v1.23.3

We can scale up or down this MachineDeployment through the Cluster object by changing the replicas field under /spec/topology/workers/machineDeployments/0/replicas The 0 in the path refers to the position of the target MachineDeployment in the list of our Cluster topology. As we only have one MachineDeployment we’re targeting the first item in the list under /spec/topology/workers/machineDeployments/.

To change this value with a patch:

kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/workers/machineDeployments/0/replicas",  "value": 1}]'

This patch will make the following changes on the Cluster yaml:

   spec:
     topology:
       workers:
         machineDeployments:
         - class: default-worker
           name: md-0
           metadata: {}
+          replicas: 1
-          replicas: 3

After a minute the MachineDeployment will have scaled down to 1 replica:

NAME                         CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
capi-quickstart-md-0-XXXXX  capi-quickstart   1          1       1         0             Running   25m   v1.23.3

As well as scaling a MachineDeployment, Cluster operators can edit the labels and annotations applied to a running MachineDeployment using the Cluster topology as a single point of control.

Add a MachineDeployment

MachineDeployments in a managed Cluster are defined in the Cluster’s topology. Cluster operators can add a MachineDeployment to a living Cluster by adding it to the cluster.spec.topology.workers.machineDeployments field.

Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the Quick Start guide). Initially we should have a single MachineDeployment with 3 replicas. Running

kubectl get machinedeployments

Will give us:

NAME                                                            CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
machinedeployment.cluster.x-k8s.io/capi-quickstart-md-0-XXXX   capi-quickstart   3          3       3         0             Running   21m   v1.23.3

A new MachineDeployment can be added to the Cluster by adding a new MachineDeployment spec under /spec/topology/workers/machineDeployments/. To do so we can patch our Cluster with:

kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "add", "path": "/spec/topology/workers/machineDeployments/-",  "value": {"name": "second-deployment", "replicas": 1, "class": "default-worker"} }]'

This patch will make the below changes on the Cluster yaml:

   spec:
     topology:
       workers:
         machineDeployments:
         - class: default-worker
           metadata: {}
           replicas: 3
           name: md-0
+        - class: default-worker
+          metadata: {}
+          replicas: 1
+          name: second-deployment

After a minute to scale the new MachineDeployment we get:

NAME                                      CLUSTER           REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
capi-quickstart-md-0-XXXX                 capi-quickstart   1          1       1         0             Running   39m   v1.23.3
capi-quickstart-second-deployment-XXXX    capi-quickstart   1          1       1         0             Running   99s   v1.23.3

Our second deployment uses the same underlying MachineDeployment class default-worker as our initial deployment. In this case they will both have exactly the same underlying machine templates. In order to modify the templates MachineDeployments are based on take a look at Changing a ClusterClass.

A similar process as that described here - removing the MachineDeployment from cluster.spec.topology.workers.machineDeployments - can be used to delete a running MachineDeployment from an active Cluster.

Scale a ControlPlane

When using a managed topology scaling of ControlPlane Machines, where the Cluster is using a topology that includes ControlPlane MachineInfrastructure, should be done through the Cluster topology.

This is done by changing the ControlPlane replicas field at /spec/topology/controlPlane/replica in the Cluster object. The command is:

kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/controlPlane/replicas",  "value": 1}]'

This patch will make the below changes on the Cluster yaml:

   spec:
      topology:
        controlPlane:
          metadata: {}
+         replicas: 1
-         replicas: 3

As well as scaling a ControlPlane, Cluster operators can edit the labels and annotations applied to a running ControlPlane using the Cluster topology as a single point of control.

Use variables

A ClusterClass can use variables and patches in order to allow flexible customization of Clusters derived from a ClusterClass. Variable definition allows two or more Cluster topologies derived from the same ClusterClass to have different specs, with the differences controlled by variables in the Cluster topology.

Assume we have created a CAPD cluster with ClusterClass and Kubernetes v1.23.3 (as documented in the Quick Start guide). Our Cluster has a variable etcdImageTag as defined in the ClusterClass. The variable is not set on our Cluster. Some variables, depending on their definition in a ClusterClass, may need to be specified by the Cluster operator for every Cluster created using a given ClusterClass.

In order to specify the value of a variable all we have to do is set the value in the Cluster topology.

We can see the current unset variable with:

kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}'                                     

Which will return something like:

{"name":"etcdImageTag","value":""}

In order to run a different version of etcd in new ControlPlane machines - the part of the spec this variable sets - change the value using the below patch:

kubectl  patch cluster capi-quickstart --type json --patch '[{"op": "replace", "path": "/spec/topology/variables/1/value",  "value": "3.5.0"}]'

Running the patch makes the following change to the Cluster yaml:

   spec:
     topology:
       variables:
       - name: imageRepository
         value: registry.k8s.io
       - name: etcdImageTag
         value: ""
       - name: coreDNSImageTag
+        value: "3.5.0"
-        value: ""

Retrieving the variable value from the Cluster object, with kubectl get cluster capi-quickstart -o jsonpath='{.spec.topology.variables[1]}' we can see:

{"name":"etcdImageTag","value":"3.5.0"}

Note: Changing the etcd version may have unintended impacts on a running Cluster. For safety the cluster should be reapplied after running the above variable patch.

Rebase a Cluster

To perform more significant changes using a Cluster as a single point of control, it may be necessary to change the ClusterClass that the Cluster is based on. This is done by changing the class referenced in /spec/topology/class.

To read more about changing an underlying class please refer to ClusterClass rebase.

Tips and tricks

Users should always aim at ensuring the stability of the Cluster and of the applications hosted on it while using spec.topology as a single point of control for making changes to the objects that are part of the Cluster.

Following recommendation apply:

  • If possible, avoid concurrent changes to control-plane and/or MachineDeployments to prevent excessive turnover on the underlying infrastructure or bottlenecks in the Cluster trying to move workloads from one machine to the other.
  • Keep machine labels and annotation stable, because changing those values requires machines rollouts; also, please note that machine labels and annotation are not propagated to Kubernetes nodes; see metadata propagation.
  • While upgrading a Cluster, if possible avoid any other concurrent change to the Cluster; please note that you can rely on version-aware patches to ensure the Cluster adapts to the new Kubernetes version in sync with the upgrade workflow.

For more details about how changes can affect a Cluster, please look at reference.

Upgrading Cluster API

There are some special considerations for ClusterClass regarding Cluster API upgrades when the upgrade includes a bump of the apiVersion of infrastructure, bootstrap or control plane provider CRDs.

The recommended approach is to first upgrade Cluster API and then update the apiVersions in the ClusterClass references afterwards. By following above steps, there won’t be any disruptions of the reconciliation as the Cluster topology controller is able to reconcile the Cluster even with the old apiVersions in the ClusterClass.

Note: The apiVersions in ClusterClass cannot be updated before Cluster API because the new apiVersions don’t exist in the management cluster before the Cluster API upgrade.

In general the Cluster topology controller always uses exactly the versions of the CRDs referenced in the ClusterClass. This means in the following example the Cluster topology controller will always use v1beta1 when reconciling/applying patches for the infrastructure ref, even if the DockerClusterTemplate already has a v1beta2 apiVersion.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: quick-start
  namespace: default
spec:
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DockerClusterTemplate
...

Experimental Feature: Runtime SDK (alpha)

The Runtime SDK feature provides an extensibility mechanism that allows systems, products, and services built on top of Cluster API to hook into a workload cluster’s lifecycle.

Feature gate name: RuntimeSDK

Variable name to enable/disable the feature gate: EXP_RUNTIME_SDK

Additional documentation:

Implementing Runtime Extensions

Introduction

As a developer building systems on top of Cluster API, if you want to hook into the Cluster’s lifecycle via a Runtime Hook, you have to implement a Runtime Extension handling requests according to the OpenAPI specification for the Runtime Hook you are interested in.

Runtime Extensions by design are very powerful and flexible, however given that with great power comes great responsibility, a few key consideration should always be kept in mind (more details in the following sections):

  • Runtime Extensions are components that should be designed, written and deployed with great caution given that they can affect the proper functioning of the Cluster API runtime.
  • Cluster administrators should carefully vet any Runtime Extension registration, thus preventing malicious components from being added to the system.

Please note that following similar practices is already commonly accepted in the Kubernetes ecosystem for Kubernetes API server admission webhooks. Runtime Extensions share the same foundation and most of the same considerations/concerns apply.

Implementation

As mentioned above as a developer building systems on top of Cluster API, if you want to hook in the Cluster’s lifecycle via a Runtime Extension, you have to implement an HTTPS server handling a discovery request and a set of additional requests according to the OpenAPI specification for the Runtime Hook you are interested in.

The following shows a minimal example of a Runtime Extension server implementation:

package main

import (
	"context"
	"flag"
	"net/http"
	"os"

	"github.com/spf13/pflag"
	cliflag "k8s.io/component-base/cli/flag"
	"k8s.io/component-base/logs"
	logsv1 "k8s.io/component-base/logs/api/v1"
	"k8s.io/klog/v2"
	ctrl "sigs.k8s.io/controller-runtime"

	runtimecatalog "sigs.k8s.io/cluster-api/exp/runtime/catalog"
	runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1"
	"sigs.k8s.io/cluster-api/exp/runtime/server"
)

var (
	// catalog contains all information about RuntimeHooks.
	catalog = runtimecatalog.New()

	// Flags.
	profilerAddress string
	webhookPort     int
	webhookCertDir  string
	logOptions      = logs.NewOptions()
)

func init() {
	// Adds to the catalog all the RuntimeHooks defined in cluster API.
	_ = runtimehooksv1.AddToCatalog(catalog)
}

// InitFlags initializes the flags.
func InitFlags(fs *pflag.FlagSet) {
	// Initialize logs flags using Kubernetes component-base machinery.
	logsv1.AddFlags(logOptions, fs)

	// Add test-extension specific flags
	fs.StringVar(&profilerAddress, "profiler-address", "",
		"Bind address to expose the pprof profiler (e.g. localhost:6060)")

	fs.IntVar(&webhookPort, "webhook-port", 9443,
		"Webhook Server port")

	fs.StringVar(&webhookCertDir, "webhook-cert-dir", "/tmp/k8s-webhook-server/serving-certs/",
		"Webhook cert dir, only used when webhook-port is specified.")
}

func main() {
	// Creates a logger to be used during the main func.
	setupLog := ctrl.Log.WithName("setup")

	// Initialize and parse command line flags.
	InitFlags(pflag.CommandLine)
	pflag.CommandLine.SetNormalizeFunc(cliflag.WordSepNormalizeFunc)
	pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
	// Set log level 2 as default.
	if err := pflag.CommandLine.Set("v", "2"); err != nil {
		setupLog.Error(err, "failed to set default log level")
		os.Exit(1)
	}
	pflag.Parse()

	// Validates logs flags using Kubernetes component-base machinery and applies them
	if err := logsv1.ValidateAndApply(logOptions, nil); err != nil {
		setupLog.Error(err, "unable to start extension")
		os.Exit(1)
	}

	// Add the klog logger in the context.
	ctrl.SetLogger(klog.Background())

	// Initialize the golang profiler server, if required.
	if profilerAddress != "" {
		klog.Infof("Profiler listening for requests at %s", profilerAddress)
		go func() {
			klog.Info(http.ListenAndServe(profilerAddress, nil))
		}()
	}

	// Create a http server for serving runtime extensions
	webhookServer, err := server.New(server.Options{
		Catalog: catalog,
		Port:    webhookPort,
		CertDir: webhookCertDir,
	})
	if err != nil {
		setupLog.Error(err, "error creating webhook server")
		os.Exit(1)
	}

	// Register extension handlers.
	if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{
		Hook:        runtimehooksv1.BeforeClusterCreate,
		Name:        "before-cluster-create",
		HandlerFunc: DoBeforeClusterCreate,
	}); err != nil {
		setupLog.Error(err, "error adding handler")
		os.Exit(1)
	}
	if err := webhookServer.AddExtensionHandler(server.ExtensionHandler{
		Hook:        runtimehooksv1.BeforeClusterUpgrade,
		Name:        "before-cluster-upgrade",
		HandlerFunc: DoBeforeClusterUpgrade,
	}); err != nil {
		setupLog.Error(err, "error adding handler")
		os.Exit(1)
	}

	// Setup a context listening for SIGINT.
	ctx := ctrl.SetupSignalHandler()

	// Start the https server.
	setupLog.Info("Starting Runtime Extension server")
	if err := webhookServer.Start(ctx); err != nil {
		setupLog.Error(err, "error running webhook server")
		os.Exit(1)
	}
}

func DoBeforeClusterCreate(ctx context.Context, request *runtimehooksv1.BeforeClusterCreateRequest, response *runtimehooksv1.BeforeClusterCreateResponse) {
	log := ctrl.LoggerFrom(ctx)
	log.Info("BeforeClusterCreate is called")
	// Your implementation
}

func DoBeforeClusterUpgrade(ctx context.Context, request *runtimehooksv1.BeforeClusterUpgradeRequest, response *runtimehooksv1.BeforeClusterUpgradeResponse) {
	log := ctrl.LoggerFrom(ctx)
	log.Info("BeforeClusterUpgrade is called")
	// Your implementation
}

For a full example see our test extension.

Please note that a Runtime Extension server can serve multiple Runtime Hooks (in the example above BeforeClusterCreate and BeforeClusterUpgrade) at the same time. Each of them are handled at a different path, like the Kubernetes API server does for different API resources. The exact format of those paths is handled by the server automatically in accordance to the OpenAPI specification of the Runtime Hooks.

There is an additional Discovery endpoint which is automatically served by the Server. The Discovery endpoint returns a list of extension handlers to inform Cluster API which Runtime Hooks are implemented by this Runtime Extension server.

Please note that Cluster API is only able to enforce the correct request and response types as defined by a Runtime Hook version. Developers are fully responsible for all other elements of the design of a Runtime Extension implementation, including:

  • To choose which programming language to use; please note that Golang is the language of choice, and we are not planning to test or provide tooling and libraries for other languages. Nevertheless, given that we rely on Open API and plain HTTPS calls, other languages should just work but support will be provided at best effort.
  • To choose if a dedicated or a shared HTTPS Server is used for the Runtime Extension (it can be e.g. also used to serve a metric endpoint).

When using Golang the Runtime Extension developer can benefit from the following packages (provided by the sigs.k8s.io/cluster-api module) as shown in the example above:

  • exp/runtime/hooks/api/v1alpha1 contains the Runtime Hook Golang API types, which are also used to generate the OpenAPI specification.
  • exp/runtime/catalog provides the Catalog object to register Runtime Hook definitions. The Catalog is then used by the server package to handle requests. Catalog is similar to the runtime.Scheme of the k8s.io/apimachinery/pkg/runtime package, but it is designed to store Runtime Hook registrations.
  • exp/runtime/server provides a Server object which makes it easy to implement a Runtime Extension server. The Server will automatically handle tasks like Marshalling/Unmarshalling requests and responses. A Runtime Extension developer only has to implement a strongly typed function that contains the actual logic.

Guidelines

While writing a Runtime Extension the following important guidelines must be considered:

Timeouts

Runtime Extension processing adds to reconcile durations of Cluster API controllers. They should respond to requests as quickly as possible, typically in milliseconds. Runtime Extension developers can decide how long the Cluster API Runtime should wait for a Runtime Extension to respond before treating the call as a failure (max is 30s) by returning the timeout during discovery. Of course a Runtime Extension can trigger long-running tasks in the background, but they shouldn’t block synchronously.

Availability

Runtime Extension failure could result in errors in handling the workload clusters lifecycle, and so the implementation should be robust, have proper error handling, avoid panics, etc. Failure policies can be set up to mitigate the negative impact of a Runtime Extension on the Cluster API Runtime, but this option can’t be used in all cases (see Error Management).

Blocking Hooks

A Runtime Hook can be defined as “blocking” - e.g. the BeforeClusterUpgrade hook allows a Runtime Extension to prevent the upgrade from starting. A Runtime Extension registered for the BeforeClusterUpgrade hook can block by returning a non-zero retryAfterSeconds value. Following consideration apply:

  • The system might decide to retry the same Runtime Extension even before the retryAfterSeconds period expires, e.g. due to other changes in the Cluster, so retryAfterSeconds should be considered as an approximate maximum time before the next reconcile.
  • If there is more than one Runtime Extension registered for the same Runtime Hook and more than one returns retryAfterSeconds, the shortest non-zero value will be used.
  • If there is more than one Runtime Extension registered for the same Runtime Hook and at least one returns retryAfterSeconds, all Runtime Extensions will be called again.

Detailed description of what “blocking” means for each specific Runtime Hooks is documented case by case in the hook-specific implementation documentation (e.g. Implementing Lifecycle Hook Runtime Extensions).

Side Effects

It is recommended that Runtime Extensions should avoid side effects if possible, which means they should operate only on the content of the request sent to them, and not make out-of-band changes. If side effects are required, rules defined in the following sections apply.

Idempotence

An idempotent Runtime Extension is able to succeed even in case it has already been completed before (the Runtime Extension checks current state and changes it only if necessary). This is necessary because a Runtime Extension may be called many times after it already succeeded because other Runtime Extensions for the same hook may not succeed in the same reconcile.

A practical example that explains why idempotence is relevant is the fact that extensions could be called more than once for the same lifecycle transition, e.g.

  • Two Runtime Extensions are registered for the BeforeClusterUpgrade hook.
  • Before a Cluster upgrade is started both extensions are called, but one of them temporarily blocks the operation by asking to retry after 30 seconds.
  • After 30 seconds the system retries the lifecycle transition, and both extensions are called again to re-evaluate if it is now possible to proceed with the Cluster upgrade.

Avoid dependencies

Each Runtime Extension should accomplish its task without depending on other Runtime Extensions. Introducing dependencies across Runtime Extensions makes the system fragile, and it is probably a consequence of poor “Separation of Concerns” between extensions.

Deterministic result

A deterministic Runtime Extension is implemented in such a way that given the same input it will always return the same output.

Some Runtime Hooks, e.g. like external patches, might explicitly request for corresponding Runtime Extensions to support this property. But we encourage developers to follow this pattern more generally given that it fits well with practices like unit testing and generally makes the entire system more predictable and easier to troubleshoot.

Error messages

RuntimeExtension authors should be aware that error messages are surfaced as a conditions in Kubernetes resources and recorded in Cluster API controller’s logs. As a consequence:

  • Error message must not contain any sensitive information.
  • Error message must be deterministic, and must avoid to including timestamps or values changing at every call.
  • Error message must not contain external errors when it’s not clear if those errors are deterministic (e.g. errors return from cloud APIs).

ExtensionConfig

To register your runtime extension apply the ExtensionConfig resource in the management cluster, including your CA certs, ClusterIP service associated with the app and namespace, and the target namespace for the given extension. Once created, the extension will detect the associated service and discover the associated Hooks. For clarification, you can check the status of the ExtensionConfig. Below is an example of ExtensionConfig -

apiVersion: runtime.cluster.x-k8s.io/v1alpha1
kind: ExtensionConfig
metadata:
  annotations:
    runtime.cluster.x-k8s.io/inject-ca-from-secret: default/test-runtime-sdk-svc-cert
  name: test-runtime-sdk-extensionconfig
spec:
  clientConfig:
    service:
      name: test-runtime-sdk-svc
      namespace: default # Note: this assumes the test extension get deployed in the default namespace
      port: 443
  namespaceSelector:
    matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: In
        values:
          - default # Note: this assumes the test extension is used by Cluster in the default namespace only

Settings

Settings can be added to the ExtensionConfig object in the form of a map with string keys and values. These settings are sent with each request to hooks registered by that ExtensionConfig. Extension developers can implement behavior in their extensions to alter behavior based on these settings. Settings should be well documented by extension developers so that ClusterClass authors can understand usage and expected behaviour.

Settings can be provided for individual external patches by providing them in the ClusterClass .spec.patches[*].external.settings. This can be used to overwrite settings at the ExtensionConfig level for that patch.

Error management

In case a Runtime Extension returns an error, the error will be handled according to the corresponding failure policy defined in the response of the Discovery call.

If the failure policy is Ignore the error is going to be recorded in the controller’s logs, but the processing will continue. However we recognize that this failure policy cannot be used in most of the use cases because Runtime Extension implementers want to ensure that the task implemented by an extension is completed before continuing with the cluster’s lifecycle.

If instead the failure policy is Fail the system will retry the operation until it passes. The following general considerations apply:

  • It is the responsibility of Cluster API components to surface Runtime Extension errors using conditions.
  • Operations will be retried with an exponential backoff or whenever the state of a Cluster changes (we are going to rely on controller runtime exponential backoff/watches).
  • If there is more than one Runtime Extension registered for the same Runtime Hook and at least one of them fails, all the registered Runtime Extension will be retried. See Idempotence

Additional considerations about errors that apply only to a specific Runtime Hook will be documented in the hook-specific implementation documentation.

Tips & tricks

After you implemented and deployed a Runtime Extension you can manually test it by sending HTTP requests. This can be for example done via kubectl:

Via kubectl create --raw:

# Send a Discovery Request to the webhook-service in namespace default with protocol https on port 443:
kubectl create --raw '/api/v1/namespaces/default/services/https:webhook-service:443/proxy/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovery' \
  -f <(echo '{"apiVersion":"hooks.runtime.cluster.x-k8s.io/v1alpha1","kind":"DiscoveryRequest"}') | jq

Via kubectl proxy and curl:

# Open a proxy with kubectl and then use curl to send the request
## First terminal:
kubectl proxy
## Second terminal:
curl -X 'POST' 'http://127.0.0.1:8001/api/v1/namespaces/default/services/https:webhook-service:443/proxy/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovery' \
  -d '{"apiVersion":"hooks.runtime.cluster.x-k8s.io/v1alpha1","kind":"DiscoveryRequest"}' | jq

For more details about the API of the Runtime Extensions please see .
For more details on proxy support please see Proxies in Kubernetes.

Implementing Lifecycle Hook Runtime Extensions

Introduction

The lifecycle hooks allow hooking into the Cluster lifecycle. The following diagram provides an overview:

Lifecycle Hooks overview

Please see the corresponding CAEP for additional background information.

Guidelines

All guidelines defined in Implementing Runtime Extensions apply to the implementation of Runtime Extensions for lifecycle hooks as well.

In summary, Runtime Extensions are components that should be designed, written and deployed with great caution given that they can affect the proper functioning of the Cluster API runtime. A poorly implemented Runtime Extension could potentially block lifecycle transitions from happening.

Following recommendations are especially relevant:

Definitions

BeforeClusterCreate

This hook is called after the Cluster object has been created by the user, immediately before all the objects which are part of a Cluster topology(*) are going to be created. Runtime Extension implementers can use this hook to determine/prepare add-ons for the Cluster and block the creation of those objects until everything is ready.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterCreateRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterCreateResponse
status: Success # or Failure
message: "error message if status == Failure"
retryAfterSeconds: 10

For additional details, you can see the full schema in .

(*) The objects which are part of a Cluster topology are the infrastructure Cluster, the Control Plane, the MachineDeployments and the templates derived from the ClusterClass.

AfterControlPlaneInitialized

This hook is called after the Control Plane for the Cluster is marked as available for the first time. Runtime Extension implementers can use this hook to execute tasks, for example component installation on workload clusters, that are only possible once the Control Plane is available. This hook does not block any further changes to the Cluster.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterControlPlaneInitializedRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterControlPlaneInitializedResponse
status: Success # or Failure
message: "error message if status == Failure"

For additional details, you can see the full schema in .

BeforeClusterUpgrade

This hook is called after the Cluster object has been updated with a new spec.topology.version by the user, and immediately before the new version is going to be propagated to the control plane (*). Runtime Extension implementers can use this hook to execute pre-upgrade add-on tasks and block upgrades of the ControlPlane and Workers.

Note: While the upgrade is blocked changes made to the Cluster Topology will be delayed propagating to the underlying objects while the object is waiting for upgrade. Example: modifying ControlPlane/MachineDeployments (think scale up), or creating new MachineDeployments will be delayed until the target ControlPlane/MachineDeployment is ready to pick up the upgrade. This ensures that the ControlPlane and MachineDeployments do not perform a rollout prematurely while waiting to be rolled out again for the version upgrade (no double rollouts). This also ensures that any version specific changes are only pushed to the underlying objects also at the correct version.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterUpgradeRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...
fromKubernetesVersion: "v1.21.2"
toKubernetesVersion: "v1.22.0"

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterUpgradeResponse
status: Success # or Failure
message: "error message if status == Failure"
retryAfterSeconds: 10

For additional details, you can see the full schema in .

(*) Under normal circumstances spec.topology.version gets propagated to the control plane immediately; however if previous upgrades or worker machine rollouts are still in progress, the system waits for those operations to complete before starting the new upgrade.

AfterControlPlaneUpgrade

This hook is called after the control plane has been upgraded to the version specified in spec.topology.version, and immediately before the new version is going to be propagated to the MachineDeployments of the Cluster. Runtime Extension implementers can use this hook to execute post-upgrade add-on tasks and block upgrades to workers until everything is ready.

Note: While the MachineDeployments upgrade is blocked changes made to existing MachineDeployments and creating new MachineDeployments will be delayed while the object is waiting for upgrade. Example: modifying MachineDeployments (think scale up), or creating new MachineDeployments will be delayed until the target MachineDeployment is ready to pick up the upgrade. This ensures that the MachineDeployments do not perform a rollout prematurely while waiting to be rolled out again for the version upgrade (no double rollouts). This also ensures that any version specific changes are only pushed to the underlying objects also at the correct version.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterControlPlaneUpgradeRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...
kubernetesVersion: "v1.22.0"

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterControlPlaneUpgradeResponse
status: Success # or Failure
message: "error message if status == Failure"
retryAfterSeconds: 10

For additional details, you can see the full schema in .

AfterClusterUpgrade

This hook is called after the Cluster, control plane and workers have been upgraded to the version specified in spec.topology.version. Runtime Extensions implementers can use this hook to execute post-upgrade add-on tasks. This hook does not block any further changes or upgrades to the Cluster.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterClusterUpgradeRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...
kubernetesVersion: "v1.22.0"

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: AfterClusterUpgradeResponse
status: Success # or Failure
message: "error message if status == Failure"

For additional details, refer to the Draft OpenAPI spec.

BeforeClusterDelete

This hook is called after the Cluster deletion has been triggered by the user and immediately before the topology of the Cluster is going to be deleted. Runtime Extension implementers can use this hook to execute cleanup tasks for the add-ons and block deletion of the Cluster and descendant objects until everything is ready.

Example Request:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterDeleteRequest
settings: <Runtime Extension settings>
cluster:
  apiVersion: cluster.x-k8s.io/v1beta1
  kind: Cluster
  metadata:
   name: test-cluster
   namespace: test-ns
  spec:
   ...
  status:
   ...

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: BeforeClusterDeleteResponse
status: Success # or Failure
message: "error message if status == Failure"
retryAfterSeconds: 10

For additional details, you can see the full schema in .

Implementing Topology Mutation Hook Runtime Extensions

Introduction

Three different hooks are called as part of Topology Mutation - two in the Cluster topology reconciler and one in the ClusterClass reconciler.

Cluster topology reconciliation

  • GeneratePatches: GeneratePatches is responsible for generating patches for the entire Cluster topology.
  • ValidateTopology: ValidateTopology is called after all patches have been applied and thus allow to validate the resulting objects.

ClusterClass reconciliation

  • DiscoverVariables: DiscoverVariables is responsible for providing variable definitions for a specific external patch.

Cluster topology reconciliation

Please see the corresponding CAEP for additional background information.

Inline vs. external patches

Inline patches have the following advantages:

  • Inline patches are easier when getting started with ClusterClass as they are built into the Cluster API core controller, no external component have to be developed and managed.

External patches have the following advantages:

  • External patches can be individually written, unit tested and released/versioned.
  • External patches can leverage the full feature set of a programming language and are thus not limited to the capabilities of JSON patches and Go templating.
  • External patches can use external data (e.g. from cloud APIs) during patch generation.
  • External patches can be easily reused across ClusterClasses.

External variable definitions

The DiscoverVariables hook can be used to supply variable definitions for use in external patches. These variable definitions are added to the status of any applicable ClusterClasses. Clusters using the ClusterClass can then set values for those variables.

External variable discovery in the ClusterClass

External variable definitions are discovered by calling the DiscoverVariables runtime hook. This hook is called from the ClusterClass reconciler. Once discovered the variable definitions are validated and stored in ClusterClass status.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
# metadata
spec:
    # Inline variable definitions
    variables:
    # This variable is unique and can be accessed globally.
    - name: no-proxy
      required: true
      schema:
        openAPIV3Schema:
          type: string
          default: "internal.com"
          example: "internal.com"
          description: "comma-separated list of machine or domain names excluded from using the proxy."
    # This variable is also defined by an external DiscoverVariables hook.
    - name: http-proxy
      schema:
        openAPIV3Schema:
          type: string
          default: "proxy.example.com"
          example: "proxy.example.com"
          description: "proxy for http calls."
    # External patch definitions.
    patches:
    - name: lbImageRepository
      external:
          generateExtension: generate-patches.k8s-upgrade-with-runtimesdk
          validateExtension: validate-topology.k8s-upgrade-with-runtimesdk
          ## Call variable discovery for this patch.
          discoverVariablesExtension: discover-variables.k8s-upgrade-with-runtimesdk
status:
    # observedGeneration is used to check that the current version of the ClusterClass is the same as that when the Status was previously written.
    # if metadata.generation isn't the same as observedGeneration Cluster using the ClusterClass should not reconcile.
    observedGeneration: xx
    # variables contains a list of all variable definitions, both inline and from external patches, that belong to the ClusterClass.
    variables:
      - name: no-proxy
        definitions:
          - from: inline
            required: true
            schema:
              openAPIV3Schema:
                type: string
                default: "internal.com"
                example: "internal.com"
                description: "comma-separated list of machine or domain names excluded from using the proxy."
      - name: http-proxy
        # definitionsConflict is true if there are non-equal definitions for a variable.
        definitionsConflict: true
        definitions:
          - from: inline
            schema:
              openAPIV3Schema:
                type: string
                default: "proxy.example.com"
                example: "proxy.example.com"
                description: "proxy for http calls."
          - from: lbImageRepository
            schema:
              openAPIV3Schema:
                type: string
                default: "different.example.com"
                example: "different.example.com"
                description: "proxy for http calls."

Variable definition conflicts

Variable definitions can be inline in the ClusterClass or from any number of external DiscoverVariables hooks. The source of a variable definition is recorded in the from field in ClusterClass .status.variables. Variables that are defined by an external DiscoverVariables hook will have the name of the patch they are associated with as the value of from. Variables that are defined in the ClusterClass .spec.variables will have inline as the value of from. Note: inline is a reserved name for patches. It cannot be used as the name of an external patch to avoid conflicts.

If all variables that share a name have equivalent schemas the variable definitions are not in conflict. These variables can be set without providing definitionFrom value - see below. The CAPI components will consider variable definitions to be equivalent when they share a name and their schema is exactly equal.

Setting values for variables in the Cluster

Setting variables that are defined with external variable definitions requires attention to be paid to variable definition conflicts, as exposed in the ClusterClass status. Variable values are set in Cluster .spec.topology.variables.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
#metadata 
spec:
    topology:
      variables:
        # `definitionFrom` is not needed as this variable does not have conflicting definitions.
        - name: no-proxy
          value: "internal.domain.com"
        # variables with the same name but different definitions require values for each individual schema.
        - name: http-proxy
          definitionFrom: inline
          value: http://proxy.example2.com:1234
        - name: http-proxy
          definitionFrom: lbImageRepository
          value:
            host: proxy.example2.com
            port: 1234

Using one or multiple external patch extensions

Some considerations:

  • In general a single external patch extension is simpler than many, as only one extension then has to be built, deployed and managed.
  • A single extension also requires less HTTP round-trips between the CAPI controller and the extension(s).
  • With a single extension it is still possible to implement multiple logical features using different variables.
  • When implementing multiple logical features in one extension it’s recommended that they can be conditionally enabled/disabled via variables (either via certain values or by their existence).
  • Conway’s law might make it not feasible in large organizations to use a single extension. In those cases it’s important that boundaries between extensions are clearly defined.

Guidelines

For general Runtime Extension developer guidelines please refer to the guidelines in Implementing Runtime Extensions. This section outlines considerations specific to Topology Mutation hooks.

Patch extension guidelines

  • Input validation: An External Patch Extension must always validate its input, i.e. it must validate that all variables exist, have the right type and it must validate the kind and apiVersion of the templates which should be patched.
  • Timeouts: As External Patch Extensions are called during each Cluster topology reconciliation, they must respond as fast as possible (<=200ms) to avoid delaying individual reconciles and congestion.
  • Availability: An External Patch Extension must be always available, otherwise Cluster topologies won’t be reconciled anymore.
  • Side Effects: An External Patch Extension must not make out-of-band changes. If necessary external data can be retrieved, but be aware of performance impact.
  • Deterministic results: For a given request (a set of templates and variables) an External Patch Extension must always return the same response (a set of patches). Otherwise the Cluster topology will never reach a stable state.
  • Idempotence: An External Patch Extension must only return patches if changes to the templates are required, i.e. unnecessary patches when the template is already in the desired state must be avoided.
  • Avoid Dependencies: An External Patch Extension must be independent of other External Patch Extensions. However if dependencies cannot be avoided, it is possible to control the order in which patches are executed via the ClusterClass.
  • Error messages: For a given request (a set of templates and variables) an External Patch Extension must always return the same error message. Otherwise the system might become unstable due to controllers being overloaded by continuous changes to Kubernetes resources as these messages are reported as conditions. See error messages.

Variable discovery guidelines

  • Distinctive variable names: Names should be carefully chosen, and if possible generic names should be avoided. Using a generic name could lead to conflicts if the variables defined for this patch are used in combination with other patches providing variables with the same name.
  • Avoid breaking changes to variable definitions: Changing a variable definition can lead to problems on existing clusters because reconciliation will stop if variable values do not match the updated definition. When more than one variable with the same name is defined, changes to variable definitions can require explicit values for each patch. Updates to the variable definition should be carefully evaluated, and very well documented in extension release notes, so ClusterClass authors can evaluate impacts of changes before performing an upgrade.

Definitions

GeneratePatches

A GeneratePatches call generates patches for the entire Cluster topology. Accordingly the request contains all templates, the global variables and the template-specific variables. The response contains generated patches.

Example request:

  • Generating patches for a Cluster topology is done via a single call to allow External Patch Extensions a holistic view of the entire Cluster topology. Additionally this allows us to reduce the number of round-trips.
  • Each item in the request will contain the template as a raw object. Additionally information about where the template is used is provided via holderReference.
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: GeneratePatchesRequest
settings: <Runtime Extension settings>
variables:
- name: <variable-name>
  value: <variable-value>
  ...
items:
- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9
  holderReference:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: MachineDeployment
    namespace: default
    name: cluster-md1-xyz
    fieldPath: spec.template.spec.infrastructureRef
  object:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSMachineTemplate
    spec:
    ...
  variables:
  - name: <variable-name>
    value: <variable-value>
    ...

Example Response:

  • The response contains patches instead of full objects to reduce the payload.
  • Templates in the request and patches in the response will be correlated via UIDs.
  • Like inline patches, external patches are only allowed to change fields in spec.template.spec.
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: GeneratePatchesResponse
status: Success # or Failure
message: "error message if status == Failure"
items:
- uid: 7091de79-e26c-4af5-8be3-071bc4b102c9
  patchType: JSONPatch
  patch: <JSON-patch>

For additional details, you can see the full schema in .

We are considering to introduce a library to facilitate development of External Patch Extensions. It would provide capabilities like:

  • Accessing builtin variables
  • Extracting certain templates from a GeneratePatches request (e.g. all bootstrap templates)

If you are interested in contributing to this library please reach out to the maintainer team or feel free to open an issue describing your idea or use case.

ValidateTopology

A ValidateTopology call validates the topology after all patches have been applied. The request contains all templates of the Cluster topology, the global variables and the template-specific variables. The response contains the result of the validation.

Example Request:

  • The request is the same as the GeneratePatches request except it doesn’t have uid fields. We don’t need them as we don’t have to correlate patches in the response.
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: ValidateTopologyRequest
settings: <Runtime Extension settings>
variables:
- name: <variable-name>
  value: <variable-value>
  ...
items:
- holderReference:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: MachineDeployment
    namespace: default
    name: cluster-md1-xyz
    fieldPath: spec.template.spec.infrastructureRef
  object:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSMachineTemplate
    spec:
    ...
  variables:
  - name: <variable-name>
    value: <variable-value>
    ...

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: ValidateTopologyResponse
status: Success # or Failure
message: "error message if status == Failure"

For additional details, you can see the full schema in .

DiscoverVariables

A DiscoverVariables call returns definitions for one or more variables.

Example Request:

  • The request is a simple call to the Runtime hook.
apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: DiscoverVariablesRequest
settings: <Runtime Extension settings>

Example Response:

apiVersion: hooks.runtime.cluster.x-k8s.io/v1alpha1
kind: DiscoverVariablesResponse
status: Success # or Failure
message: ""
variables:
  - name: etcdImageTag 
    required: true
    schema:
      openAPIV3Schema:
        type: string
        default: "3.5.3-0" 
        example: "3.5.3-0"
        description: "etcdImageTag sets the tag for the etcd image."
  - name: preLoadImages
    required: false
    schema:
      openAPIV3Schema:
        default: []
        type: array
        items:
          type: string
        description: "preLoadImages sets the images for the Docker machines to preload."
  - name: podSecurityStandard
    required: false
    schema:
      openAPIV3Schema:
        type: object
        properties:
          enabled:
            type: boolean
            default: true
            description: "enabled enables the patches to enable Pod Security Standard via AdmissionConfiguration."
          enforce:
            type: string
            default: "baseline"
            description: "enforce sets the level for the enforce PodSecurityConfiguration mode. One of privileged, baseline, restricted."
          audit:
            type: string
            default: "restricted"
            description: "audit sets the level for the audit PodSecurityConfiguration mode. One of privileged, baseline, restricted."
          warn:
            type: string
            default: "restricted"
            description: "warn sets the level for the warn PodSecurityConfiguration mode. One of privileged, baseline, restricted."
...

For additional details, you can see the full schema in . TODO: Add openAPI definition to the SwaggerUI

Dealing with Cluster API upgrades with apiVersion bumps

There are some special considerations regarding Cluster API upgrades when the upgrade includes a bump of the apiVersion of infrastructure, bootstrap or control plane provider CRDs.

When calling external patches the Cluster topology controller is always sending the templates in the apiVersion of the references in the ClusterClass.

While inline patches are always referring to one specific apiVersion, external patch implementations are more flexible. They can be written in a way that they are able to handle multiple apiVersions of a CRD. This can be done by calculating patches differently depending on which apiVersion is received by the external patch implementation.

This allows users more flexibility during Cluster API upgrades:

Variant 1: External patch implementation supporting two apiVersions at the same time

  1. Update Cluster API
  2. Update the external patch implementation to be able to handle custom resources with the old and the new apiVersion
  3. Update the references in ClusterClasses to use the new apiVersion

Note In this variant it doesn’t matter if Cluster API or the external patch implementation is updated first.

Variant 2: Deploy an additional instance of the external patch implementation which can handle the new apiVersion

  1. Upgrade Cluster API
  2. Deploy the new external patch implementation which is able to handle the new apiVersion
  3. Update ClusterClasses to use the new apiVersion and the new external patch implementation
  4. Remove the old external patch implementation as it’s not used anymore

Note In this variant it doesn’t matter if Cluster API is updated or the new external patch implementation is deployed first.

Deploy Runtime Extensions

Cluster API requires that each Runtime Extension must be deployed using an endpoint accessible from the Cluster API controllers. The recommended deployment model is to deploy a Runtime Extension in the management cluster by:

  • Packing the Runtime Extension in a container image.
  • Using a Kubernetes Deployment to run the above container inside the Management Cluster.
  • Using a Cluster IP Service to make the Runtime Extension instances accessible via a stable DNS name.
  • Using a cert-manager generated Certificate to protect the endpoint.
  • Register the Runtime Extension using ExtensionConfig.

For an example, please see our test extension which follows, as closely as possible, the kubebuilder setup used for controllers in Cluster API.

There are a set of important guidelines that must be considered while choosing the deployment method:

Availability

It is recommended that Runtime Extensions should leverage some form of load-balancing, to provide high availability and performance benefits. You can run multiple Runtime Extension servers behind a Kubernetes Service to leverage the load-balancing that services support.

Identity and access management

The security model for each Runtime Extension should be carefully defined, similar to any other application deployed in the Cluster. If the Runtime Extension requires access to the apiserver the deployment must use a dedicated service account with limited RBAC permission. Otherwise no service account should be used.

On top of that, the container image for the Runtime Extension should be carefully designed in order to avoid privilege escalation (e.g using distroless base images). The Pod spec in the Deployment manifest should enforce security best practices (e.g. do not use privileged pods).

Alternative deployments methods

Alternative deployment methods can be used as long as the HTTPs endpoint is accessible, like e.g.:

  • deploying the HTTPS Server as a part of another component, e.g. a controller.
  • deploying the HTTPS Server outside the Management Cluster.

In those cases recommendations about availability and identity and access management still apply.

Experimental Feature: Ignition Bootstrap Config (alpha)

The default configuration engine for bootstrapping workload cluster machines is cloud-init. Ignition is an alternative engine used by Linux distributions such as Flatcar Container Linux and Fedora CoreOS and therefore should be used when choosing an Ignition-based distribution as the underlying OS for workload clusters.

This guide explains how to deploy an AWS workload cluster using Ignition.

Prerequisites

  • kubectl installed locally
  • clusterawsadm installed locally - download from the releases page of the AWS provider
  • kind and Docker installed locally (when using kind to create a management cluster)

Configure a management cluster

Follow this section of the quick start guide to deploy a Kubernetes cluster or connect to an existing one.

Follow this section of the quick start guide to install clusterctl.

Initialize the management cluster

Before workload clusters can be deployed, Cluster API components must be deployed to the management cluster.

Initialize the management cluster:

export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# Workload clusters need to call the AWS API as part of their normal operation.
# The following command creates a CloudFormation stack which provisions the
# necessary IAM resources to be used by workload clusters.
clusterawsadm bootstrap iam create-cloudformation-stack

# The management cluster needs to call the AWS API in order to manage cloud
# resources for workload clusters. The following command tells clusterctl to
# store the AWS credentials provided before in a Kubernetes secret where they
# can be retrieved by the AWS provider running on the management cluster.
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

# Enable the feature gates controlling Ignition bootstrap.
export EXP_KUBEADM_BOOTSTRAP_FORMAT_IGNITION=true # Used by the kubeadm bootstrap provider
export EXP_BOOTSTRAP_FORMAT_IGNITION=true # Used by the AWS provider

# Initialize the management cluster.
clusterctl init --infrastructure aws

Generate a workload cluster configuration

# Deploy the workload cluster in the following AWS region.
export AWS_REGION=us-east-1

# Authorize the following SSH public key on cluster nodes.
export AWS_SSH_KEY_NAME=my-key

# Ignition bootstrap data needs to be stored in an S3 bucket so that nodes can
# read them at boot time. Store Ignition bootstrap data in the following bucket.
export AWS_S3_BUCKET_NAME=my-bucket

# Set the EC2 machine size for controllers and workers.
export AWS_CONTROL_PLANE_MACHINE_TYPE=t3a.small
export AWS_NODE_MACHINE_TYPE=t3a.small

clusterctl generate cluster ignition-cluster \
    --from https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/templates/cluster-template-flatcar.yaml \
    --kubernetes-version v1.28.0 \
    --worker-machine-count 2 \
    > ignition-cluster.yaml

NOTE: Only certain Kubernetes versions have pre-built Kubernetes AMIs. See list of published pre-built Kubernetes AMIs.

Apply the workload cluster

kubectl apply -f ignition-cluster.yaml

Wait for the control plane of the workload cluster to become initialized:

kubectl get kubeadmcontrolplane ignition-cluster-control-plane

This could take a while. When the control plane is initialized, the INITIALIZED field should be true:

NAME                             CLUSTER            INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE    VERSION
ignition-cluster-control-plane   ignition-cluster   true                                 1                  1         1             7m7s   v1.22.2

Connect to the workload cluster

Generate a kubeconfig for the workload cluster:

clusterctl get kubeconfig ignition-cluster > ./kubeconfig

Set kubectl to use the generated kubeconfig:

export KUBECONFIG=$(pwd)/kubeconfig

Verify connectivity with the workload cluster’s API server:

kubectl cluster-info

Sample output:

Kubernetes control plane is running at https://ignition-cluster-apiserver-284992524.us-east-1.elb.amazonaws.com:6443
CoreDNS is running at https://ignition-cluster-apiserver-284992524.us-east-1.elb.amazonaws.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Deploy a CNI plugin

A CNI plugin must be deployed to the workload cluster for the cluster to become ready. We use Calico here, however other CNI plugins could be used, too.

kubectl apply -f https://docs.projectcalico.org/v3.20/manifests/calico.yaml

Ensure all cluster nodes become ready:

kubectl get nodes

Sample output:

NAME                                            STATUS   ROLES                  AGE   VERSION
ip-10-0-122-154.us-east-1.compute.internal   Ready    control-plane,master   14m   v1.22.2
ip-10-0-127-59.us-east-1.compute.internal    Ready    <none>                 13m   v1.22.2
ip-10-0-89-169.us-east-1.compute.internal    Ready    <none>                 13m   v1.22.2

Clean up

Delete the workload cluster (from a shell connected to the management cluster):

kubectl delete cluster ignition-cluster

Caveats

Supported infrastructure providers

Cluster API has multiple infrastructure providers which can be used to deploy workload clusters.

The following infrastructure providers already have Ignition support:

Ignition support will be added to more providers in the future.

Running multiple providers

Cluster API supports running multiple infrastructure/bootstrap/control plane providers on the same management cluster. It’s highly recommended to rely on clusterctl init command in this case. clusterctl will help ensure that all providers support the same API Version of Cluster API (contract).

Verification of CAPI artifacts

Requirements

You will need to have the following tools installed:

CAPI Images

Each release of the Cluster API project includes the following container images:

  • cluster-api-controller
  • kubeadm-bootstrap-controller
  • kubeadm-control-plane-controller
  • clusterctl

Verifying Image Signatures

All of the four images are hosted by registry.k8s.io. In order to verify the authenticity of the images, you can use cosign verify command with the appropriate image name and version:

$ cosign verify registry.k8s.io/cluster-api/cluster-api-controller:v1.5.0 --certificate-identity krel-trust@k8s-releng-prod.iam.gserviceaccount.com --certificate-oidc-issuer https://accounts.google.com | jq .
Verification for registry.k8s.io/cluster-api/cluster-api-controller:v1.5.0 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - The code-signing certificate was verified using trusted certificate authority certificates
[
  {
    "critical": {
      "identity": {
        "docker-reference": "registry.k8s.io/cluster-api/cluster-api-controller"
      },
      "image": {
        "docker-manifest-digest": "sha256:f34016d3a494f9544a16137c9bba49d8756c574a0a1baf96257903409ef82f77"
      },
      "type": "cosign container image signature"
    },
    "optional": {
      "1.3.6.1.4.1.57264.1.1": "https://accounts.google.com",
      "Bundle": {
        "SignedEntryTimestamp": "MEYCIQDtxr/v3uRl2QByVfYo1oopruADSaH3E4wThpmkibJs8gIhAIe0odbk99na5GBdYGjJ6IwpFzhlTlicgWOrsgxZH8LC",
        "Payload": {
          "body": "eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiaGFzaGVkcmVrb3JkIiwic3BlYyI6eyJkYXRhIjp7Imhhc2giOnsiYWxnb3JpdGhtIjoic2hhMjU2IiwidmFsdWUiOiIzMDMzNzY0MTQwZmI2OTE5ZjRmNDg2MDgwMDZjYzY1ODU2M2RkNjE0NWExMzVhMzE5MmQyYTAzNjE1OTRjMTRlIn19LCJzaWduYXR1cmUiOnsiY29udGVudCI6Ik1FUUNJQ3RtcGdHN3RDcXNDYlk0VlpXNyt6Rm5tYWYzdjV4OTEwcWxlWGppdTFvbkFpQS9JUUVSSDErdit1a0hrTURSVnZnN1hPdXdqTTN4REFOdEZyS3NUMHFzaUE9PSIsInB1YmxpY0tleSI6eyJjb250ZW50IjoiTFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVTTJha05EUVc1SFowRjNTVUpCWjBsVldqYzNUbGRSV1VacmQwNTVRMk13Y25GWWJIcHlXa3RyYURjMGQwTm5XVWxMYjFwSmVtb3dSVUYzVFhjS1RucEZWazFDVFVkQk1WVkZRMmhOVFdNeWJHNWpNMUoyWTIxVmRWcEhWakpOVWpSM1NFRlpSRlpSVVVSRmVGWjZZVmRrZW1SSE9YbGFVekZ3WW01U2JBcGpiVEZzV2tkc2FHUkhWWGRJYUdOT1RXcE5kMDU2U1RGTlZHTjNUa1JOTlZkb1kwNU5hazEzVG5wSk1VMVVZM2hPUkUwMVYycEJRVTFHYTNkRmQxbElDa3R2V2tsNmFqQkRRVkZaU1V0dldrbDZhakJFUVZGalJGRm5RVVZ4VEdveFJsSmhLM2RZTUVNd0sxYzFTVlZWUW14UmRsWkNWM2xLWTFRcmFWaERjV01LWTA4d1prVmpNV2s0TVUxSFQwRk1lVXB2UXpGNk5TdHVaRGxFUnpaSGNFSmpOV0ZJYXpoU1QxaDBOV2h6U21wa1VVdFBRMEZhUVhkblowZE5UVUUwUndwQk1WVmtSSGRGUWk5M1VVVkJkMGxJWjBSQlZFSm5UbFpJVTFWRlJFUkJTMEpuWjNKQ1owVkdRbEZqUkVGNlFXUkNaMDVXU0ZFMFJVWm5VVlYxTVRoMENqWjVWMWxNVlU5RVR5dEVjek52VVU1RFNsYzNZMUJWZDBoM1dVUldVakJxUWtKbmQwWnZRVlV6T1ZCd2VqRlphMFZhWWpWeFRtcHdTMFpYYVhocE5Ga0tXa1E0ZDFGQldVUldVakJTUVZGSUwwSkVXWGRPU1VWNVlUTktiR0pETVRCamJsWjZaRVZDY2s5SVRYUmpiVlp6V2xjMWJreFlRbmxpTWxGMVlWZEdkQXBNYldSNldsaEtNbUZYVG14WlYwNXFZak5XZFdSRE5XcGlNakIzUzFGWlMwdDNXVUpDUVVkRWRucEJRa0ZSVVdKaFNGSXdZMGhOTmt4NU9XaFpNazUyQ21SWE5UQmplVFZ1WWpJNWJtSkhWWFZaTWpsMFRVTnpSME5wYzBkQlVWRkNaemM0ZDBGUlowVklVWGRpWVVoU01HTklUVFpNZVRsb1dUSk9kbVJYTlRBS1kzazFibUl5T1c1aVIxVjFXVEk1ZEUxSlIwdENaMjl5UW1kRlJVRmtXalZCWjFGRFFraDNSV1ZuUWpSQlNGbEJNMVF3ZDJGellraEZWRXBxUjFJMFl3cHRWMk16UVhGS1MxaHlhbVZRU3pNdmFEUndlV2RET0hBM2J6UkJRVUZIU21wblMxQmlkMEZCUWtGTlFWSjZRa1pCYVVKSmJXeGxTWEFyTm05WlpVWm9DbWRFTTI1Uk5sazBSV2g2U25SVmMxRTRSSEJrWTFGeU5FSk1XRE41ZDBsb1FVdFhkV05tYmxCUk9GaExPWGRZYkVwcVNWQTBZMFpFT0c1blpIazRkV29LYldreGN6RkRTamczTW1zclRVRnZSME5EY1VkVFRUUTVRa0ZOUkVFeVkwRk5SMUZEVFVoaU9YRjBSbGQxT1VGUU1FSXpaR3RKVkVZNGVrazRZVEkxVUFwb2IwbFBVVlJLVWxKeGFsVmlUMkUyVnpOMlRVZEJOWFpKTlZkVVJqQkZjREZwTWtGT2QwbDNSVko0TW5ocWVtWjNjbmRPYmxoUVpEQjRjbmd3WWxoRENtUmpOV0Z4WWxsWlVsRXdMMWhSVVdONFRFVnRkVGwzUnpGRlYydFNNWE01VEdaUGVHZDNVMjRLTFMwdExTMUZUa1FnUTBWU1ZFbEdTVU5CVkVVdExTMHRMUW89In19fX0=",
          "integratedTime": 1690304684,
          "logIndex": 28719030,
          "logID": "c0d23d6ad406973f9559f3ba2d1ca01f84147d8ffc5b8445c224f98b9591801d"
        }
      },
      "Issuer": "https://accounts.google.com",
      "Subject": "krel-trust@k8s-releng-prod.iam.gserviceaccount.com",
      "org.kubernetes.kpromo.version": "kpromo-v4.0.3-5-ge99897c"
    }
  }
]

Diagnostics

Introduction

With CAPI v1.6 we introduced new flags to allow serving metrics, the pprof endpoint and an endpoint to dynamically change log levels securely in production.

This feature is enabled per default via:

          args:
            - "--diagnostics-address=${CAPI_DIAGNOSTICS_ADDRESS:=:8443}"

As soon as the feature is enabled the metrics endpoint is served via https and protected via authentication and authorization. This works the same way as metrics in core Kubernetes components: Metrics in Kubernetes.

To continue serving metrics via http the following configuration can be used:

          args:
            - "--diagnostics-address=localhost:8080"
            - "--insecure-diagnostics"

The same can be achieved via clusterctl:

export CAPI_DIAGNOSTICS_ADDRESS: "localhost:8080"
export CAPI_INSECURE_DIAGNOSTICS: "true"
clusterctl init ...

Note: If insecure serving is configured the pprof and log level endpoints are disabled for security reasons.

Scraping metrics

A ServiceAccount token is now required to scrape metrics. The corresponding ServiceAccount needs permissions on the /metrics path. This can be achieved e.g. by following the Kubernetes documentation.

via Prometheus

With the Prometheus Helm chart it is as easy as using the following config for the Prometheus job scraping the Cluster API controllers:

    scheme: https
    authorization:
      type: Bearer
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      # The diagnostics endpoint is using a self-signed certificate, so we don't verify it.
      insecure_skip_verify: true

For more details please see our Prometheus development setup: Prometheus

Note: The Prometheus Helm chart deploys the required ClusterRole out-of-the-box.

via kubectl

First deploy the following RBAC configuration:

cat << EOT | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: default-metrics
rules:
- nonResourceURLs:
  - "/metrics"
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: default-metrics
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
EOT

Then let’s open a port-forward, create a ServiceAccount token and scrape the metrics:

# Terminal 1
kubectl -n capi-system port-forward deployments/capi-controller-manager 8443

# Terminal 2
TOKEN=$(kubectl create token default)
curl https://localhost:8443/metrics --header "Authorization: Bearer $TOKEN" -k

Collecting profiles

via Parca

Parca can be used to continuously scrape profiles from CAPI providers. For more details please see our Parca development setup: parca

via kubectl

First deploy the following RBAC configuration:

cat << EOT | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: default-pprof
rules:
- nonResourceURLs:
  - "/debug/pprof/*"
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-pprof
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: default-pprof
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
EOT

Then let’s open a port-forward, create a ServiceAccount token and scrape the profile:

# Terminal 1
kubectl -n capi-system port-forward deployments/capi-controller-manager 8443

# Terminal 2
TOKEN=$(kubectl create token default)

# Get a goroutine dump
curl "https://localhost:8443/debug/pprof/goroutine?debug=2" --header "Authorization: Bearer $TOKEN" -k > ./goroutine.txt

# Get a profile
curl "https://localhost:8443/debug/pprof/profile?seconds=10" --header "Authorization: Bearer $TOKEN" -k > ./profile.out
go tool pprof -http=:8080 ./profile.out

Changing the log level

via kubectl

First deploy the following RBAC configuration:

cat << EOT | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: default-loglevel
rules:
- nonResourceURLs:
  - "/debug/flags/v"
  verbs:
  - put
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-loglevel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: default-loglevel
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
EOT

Then let’s open a port-forward, create a ServiceAccount token and change the log level to 8:

# Terminal 1
kubectl -n capi-system port-forward deployments/capi-controller-manager 8443

# Terminal 2
TOKEN=$(kubectl create token default)
curl "https://localhost:8443/debug/flags/v" --header "Authorization: Bearer $TOKEN" -X PUT -d '8' -k

Security Guidelines

This section provides security guidelines useful to provision clusters which are secure by default to follow the secure defaults guidelines for cloud native apps.

Pod Security Standards

Pod Security Admission allows applying Pod Security Standards during creation of pods at the cluster level.

The flavor development-topology for the Docker provider used in Quick Start already includes a basic Pod Security Standard configuration. It is using ClusterClass variables and patches to inject the configuration.

Adding a basic Pod Security Standards configuration to a ClusterClass

By adding the following variables and patches Pod Security Standards can be added to every ClusterClass which references a Kubeadm based control plane.

Adding the variables to a ClusterClass

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
spec:
  variables:
  - name: podSecurityStandard
    required: false
    schema:
      openAPIV3Schema:
        type: object
        properties: 
          enabled: 
            type: boolean
            default: true
            description: "enabled enables the patches to enable Pod Security Standard via AdmissionConfiguration."
          enforce:
            type: string
            default: "baseline"
            description: "enforce sets the level for the enforce PodSecurityConfiguration mode. One of privileged, baseline, restricted."
            pattern: "privileged|baseline|restricted"
          audit:
            type: string
            default: "restricted"
            description: "audit sets the level for the audit PodSecurityConfiguration mode. One of privileged, baseline, restricted."
            pattern: "privileged|baseline|restricted"
          warn:
            type: string
            default: "restricted"
            description: "warn sets the level for the warn PodSecurityConfiguration mode. One of privileged, baseline, restricted."
            pattern: "privileged|baseline|restricted"
  ...
  • The version field in Pod Security Admission Config defaults to latest.
  • The kube-system namespace is exempt from Pod Security Standards enforcement, because it runs control-plane pods that need higher privileges.

Adding the patches to a ClusterClass

The following snippet contains the patch to be added to the ClusterClass.

Due to limitations of ClusterClass with patches there are two versions for this patch.

Use this patch if the following keys already exist inside the KubeadmControlPlaneTemplate referred by the ClusterClass:

  • .spec.template.spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraVolumes
  • .spec.template.spec.kubeadmConfigSpec.files
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
spec:
  ...
  patches:
  - name: podSecurityStandard
    description: "Adds an admission configuration for PodSecurity to the kube-apiserver."
    definitions:
    - selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
      jsonPatches:
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/clusterConfiguration/apiServer/extraArgs"
        value:
          admission-control-config-file: "/etc/kubernetes/kube-apiserver-admission-pss.yaml"
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/clusterConfiguration/apiServer/extraVolumes/-"
        value:
          name: admission-pss
          hostPath: /etc/kubernetes/kube-apiserver-admission-pss.yaml
          mountPath: /etc/kubernetes/kube-apiserver-admission-pss.yaml
          readOnly: true
          pathType: "File"
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/files/-"
        valueFrom:
          template: |
            content: |
              apiVersion: apiserver.config.k8s.io/v1
              kind: AdmissionConfiguration
              plugins:
              - name: PodSecurity
                configuration:
                  apiVersion: pod-security.admission.config.k8s.io/v1{{ if semverCompare "< v1.25" .builtin.controlPlane.version }}beta1{{ end }}
                  kind: PodSecurityConfiguration
                  defaults:
                    enforce: "{{ .podSecurity.enforce }}"
                    enforce-version: "latest"
                    audit: "{{ .podSecurity.audit }}"
                    audit-version: "latest"
                    warn: "{{ .podSecurity.warn }}"
                    warn-version: "latest"
                  exemptions:
                    usernames: []
                    runtimeClasses: []
                    namespaces: [kube-system]
            path: /etc/kubernetes/kube-apiserver-admission-pss.yaml
    enabledIf: "{{ .podSecurityStandard.enabled }}"
...

Use this patches if the following keys do not exist inside the KubeadmControlPlaneTemplate referred by the ClusterClass:

  • .spec.template.spec.kubeadmConfigSpec.clusterConfiguration.apiServer.extraVolumes
  • .spec.template.spec.kubeadmConfigSpec.files

Attention: Existing values inside the KubeadmControlPlaneTemplate at the mentioned keys will be replaced by this patch.

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
spec:
  ...
  patches:
  - name: podSecurityStandard
    description: "Adds an admission configuration for PodSecurity to the kube-apiserver."
    definitions:
    - selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
      jsonPatches:
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/clusterConfiguration/apiServer/extraArgs"
        value:
          admission-control-config-file: "/etc/kubernetes/kube-apiserver-admission-pss.yaml"
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/clusterConfiguration/apiServer/extraVolumes"
        value:
        - name: admission-pss
          hostPath: /etc/kubernetes/kube-apiserver-admission-pss.yaml
          mountPath: /etc/kubernetes/kube-apiserver-admission-pss.yaml
          readOnly: true
          pathType: "File"
      - op: add
        path: "/spec/template/spec/kubeadmConfigSpec/files"
        valueFrom:
          template: |
            - content: |
                apiVersion: apiserver.config.k8s.io/v1
                kind: AdmissionConfiguration
                plugins:
                - name: PodSecurity
                  configuration:
                    apiVersion: pod-security.admission.config.k8s.io/v1{{ if semverCompare "< v1.25" .builtin.controlPlane.version }}beta1{{ end }}
                    kind: PodSecurityConfiguration
                    defaults:
                      enforce: "{{ .podSecurity.enforce }}"
                      enforce-version: "latest"
                      audit: "{{ .podSecurity.audit }}"
                      audit-version: "latest"
                      warn: "{{ .podSecurity.warn }}"
                      warn-version: "latest"
                    exemptions:
                      usernames: []
                      runtimeClasses: []
                      namespaces: [kube-system]
              path: /etc/kubernetes/kube-apiserver-admission-pss.yaml
    enabledIf: "{{ .podSecurityStandard.enabled }}"
...

Create a secure Cluster using the ClusterClass

After adding the variables and patches the Pod Security Standards would be applied by default. It is also possible to disable this patch or configure different levels for the configuration using variables.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: "my-cluster"
spec:
  ...
  topology:
    ...
    class: my-secure-cluster-class
    variables:
    - name: podSecurityStandard
      value: 
        enabled: true
        enforce: "restricted"

Overview of clusterctl

The clusterctl CLI tool handles the lifecycle of a Cluster API management cluster.

The clusterctl command line interface is specifically designed for providing a simple “day 1 experience” and a quick start with Cluster API. It automates fetching the YAML files defining provider components and installing them.

Additionally it encodes a set of best practices in managing providers, that helps the user in avoiding mis-configurations or in managing day 2 operations such as upgrades.

Below you can find a list of main clusterctl commands:

For the full list of clusterctl commands please refer to commands.

Avoiding GitHub rate limiting

While using providers hosted on GitHub, clusterctl is calling GitHub API which are rate limited; for normal usage free tier is enough but when using clusterctl extensively users might hit the rate limit.

To avoid rate limiting for the public repos set the GITHUB_TOKEN environment variable. To generate a token follow this documentation. The token only needs repo scope for clusterctl.

Per default clusterctl will use a go proxy to detect the available versions to prevent additional API calls to the GitHub API. It is possible to configure the go proxy url using the GOPROXY variable as for go itself (defaults to https://proxy.golang.org). To immediately fallback to the GitHub client and not use a go proxy, the environment variable could get set to GOPROXY=off or GOPROXY=direct. If a provider does not follow Go’s semantic versioning, clusterctl may fail when detecting the correct version. In such cases, disabling the go proxy functionality via GOPROXY=off should be considered.

Installing clusterctl

Instructions are available in the Quick Start.

clusterctl commands

CommandDescription
clusterctl alpha rolloutManages the rollout of Cluster API resources. For example: MachineDeployments.
clusterctl alpha topology planDescribes the changes to a cluster topology for a given input.
clusterctl completionOutput shell completion code for the specified shell (bash or zsh).
clusterctl configDisplay clusterctl configuration.
clusterctl deleteDelete one or more providers from the management cluster.
clusterctl describe clusterDescribe workload clusters.
clusterctl generate clusterGenerate templates for creating workload clusters.
clusterctl generate providerGenerate templates for provider components.
clusterctl generate yamlProcess yaml using clusterctl’s yaml processor.
clusterctl get kubeconfigGets the kubeconfig file for accessing a workload cluster.
clusterctl helpHelp about any command.
clusterctl initInitialize a management cluster.
clusterctl init list-imagesLists the container images required for initializing the management cluster.
clusterctl moveMove Cluster API objects and all their dependencies between management clusters.
clusterctl upgrade planProvide a list of recommended target versions for upgrading Cluster API providers in a management cluster.
clusterctl upgrade applyApply new versions of Cluster API core and providers in a management cluster.
clusterctl versionPrint clusterctl version.

clusterctl init

The clusterctl init command installs the Cluster API components and transforms the Kubernetes cluster into a management cluster.

This document provides more detail on how clusterctl init works and on the supported options for customizing your management cluster.

Defining the management cluster

The clusterctl init command accepts in input a list of providers to install.

Automatically installed providers

The clusterctl init command automatically adds the cluster-api core provider, the kubeadm bootstrap provider, and the kubeadm control-plane provider to the list of providers to install. This allows users to use a concise command syntax for initializing a management cluster. For example, to get a fully operational management cluster with the aws infrastructure provider, the cluster-api core provider, the kubeadm bootstrap, and the kubeadm control-plane provider, use the command:

clusterctl init --infrastructure aws

Provider version

The clusterctl init command by default installs the latest version available for each selected provider.

Target namespace

The clusterctl init command by default installs each provider in the default target namespace defined by each provider, e.g. capi-system for the Cluster API core provider.

See the provider documentation for more details.

Provider repositories

To access provider specific information, such as the components YAML to be used for installing a provider, clusterctl init accesses the provider repositories, that are well-known places where the release assets for a provider are published.

Per default clusterctl will use a go proxy to detect the available versions to prevent additional API calls to the GitHub API. It is possible to configure the go proxy url using the GOPROXY variable as for go itself (defaults to https://proxy.golang.org). To immediately fallback to the GitHub client and not use a go proxy, the environment variable could get set to GOPROXY=off or GOPROXY=direct. If a provider does not follow Go’s semantic versioning, clusterctl may fail when detecting the correct version. In such cases, disabling the go proxy functionality via GOPROXY=off should be considered.

See clusterctl configuration for more info about provider repository configurations.

Variable substitution

Providers can use variables in the components YAML published in the provider’s repository.

During clusterctl init, those variables are replaced with environment variables or with variables read from the clusterctl configuration.

Additional information

When installing a provider, the clusterctl init command executes a set of steps to simplify the lifecycle management of the provider’s components.

  • All the provider’s components are labeled, so they can be easily identified in subsequent moments of the provider’s lifecycle, e.g. upgrades.
labels:
- clusterctl.cluster.x-k8s.io: ""
- cluster.x-k8s.io/provider: "<provider-name>"
  • An additional Provider object is created in the target namespace where the provider is installed. This object keeps track of the provider version, and other useful information for the inventory of the providers currently installed in the management cluster.

Cert-manager

Cluster API providers require a cert-manager version supporting the cert-manager.io/v1 API to be installed in the cluster.

While doing init, clusterctl checks if there is a version of cert-manager already installed. If not, clusterctl will install a default version (currently cert-manager v1.14.4). See clusterctl configuration for available options to customize this operation.

Avoiding GitHub rate limiting

Follow this

clusterctl generate cluster

The clusterctl generate cluster command returns a YAML template for creating a workload cluster.

For example

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 --control-plane-machine-count=3 --worker-machine-count=3 > my-cluster.yaml

Generates a YAML file named my-cluster.yaml with a predefined list of Cluster API objects; Cluster, Machines, Machine Deployments, etc. to be deployed in the current namespace (in case, use the --target-namespace flag to specify a different target namespace).

Then, the file can be modified using your editor of choice; when ready, run the following command to apply the cluster manifest.

kubectl apply -f my-cluster.yaml

Selecting the infrastructure provider to use

The clusterctl generate cluster command uses smart defaults in order to simplify the user experience; in the example above, it detects that there is only an aws infrastructure provider in the current management cluster and so it automatically selects a cluster template from the aws provider’s repository.

In case there is more than one infrastructure provider, the following syntax can be used to select which infrastructure provider to use for the workload cluster:

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
    --infrastructure aws > my-cluster.yaml

or

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
    --infrastructure aws:v0.4.1 > my-cluster.yaml

Flavors

The infrastructure provider authors can provide different types of cluster templates, or flavors; use the --flavor flag to specify which flavor to use; e.g.

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
    --flavor high-availability > my-cluster.yaml

Please refer to the providers documentation for more info about available flavors.

Alternative source for cluster templates

clusterctl uses the provider’s repository as a primary source for cluster templates; the following alternative sources for cluster templates can be used as well:

ConfigMaps

Use the --from-config-map flag to read cluster templates stored in a Kubernetes ConfigMap; e.g.

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
    --from-config-map my-templates > my-cluster.yaml

Also following flags are available --from-config-map-namespace (defaults to current namespace) and --from-config-map-key (defaults to template).

GitHub, raw template URL, local file system folder or standard input

Use the --from flag to read cluster templates stored in a GitHub repository, raw template URL, in a local file system folder, or from the standard input; e.g.

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
   --from https://github.com/my-org/my-repository/blob/main/my-template.yaml > my-cluster.yaml

or

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
   --from https://foo.bar/my-template.yaml > my-cluster.yaml

or

clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
   --from ~/my-template.yaml > my-cluster.yaml

or

cat ~/my-template.yaml | clusterctl generate cluster my-cluster --kubernetes-version v1.28.0 \
    --from - > my-cluster.yaml

Variables

If the selected cluster template expects some environment variables, the user should ensure those variables are set in advance.

E.g. if the AWS_CREDENTIALS variable is expected for a cluster template targeting the aws infrastructure, you should ensure the corresponding environment variable to be set before executing clusterctl generate cluster.

Please refer to the providers documentation for more info about the required variables or use the clusterctl generate cluster --list-variables flag to get a list of variables names required by a cluster template.

The clusterctl configuration file can be used as alternative to environment variables.

clusterctl generate provider

Generate templates for provider components.

clusterctl fetches the provider components from the provider repository and performs variable substitution.

Variable values are either sourced from the clusterctl config file or from environment variables

Usage: clusterctl generate provider [flags]

Current usage of the command is as follows:

# Generates a yaml file for creating provider with variable values using
# components defined in the provider repository.
clusterctl generate provider --infrastructure aws

# Generates a yaml file for creating provider for a specific version with variable values using
# components defined in the provider repository.
clusterctl generate provider --infrastructure aws:v0.4.1

# Displays information about a specific infrastructure provider.
# If applicable, prints out the list of required environment variables.
clusterctl generate provider --infrastructure aws --describe

# Displays information about a specific version of the infrastructure provider.
clusterctl generate provider --infrastructure aws:v0.4.1 --describe

# Generates a yaml file for creating provider for a specific version.
# No variables will be processed and substituted using this flag
clusterctl generate provider --infrastructure aws:v0.4.1 --raw

clusterctl generate yaml

The clusterctl generate yaml command processes yaml using clusterctl’s yaml processor.

The intent of this command is to allow users who may have specific templates to leverage clusterctl’s yaml processor for variable substitution. For example, this command can be leveraged in local and CI scripts or for development purposes.

clusterctl ships with a simple yaml processor that performs variable substitution that takes into account default values. Under the hood, clusterctl’s yaml processor uses drone/envsubst to replace variables and uses the defaults if necessary.

Variable values are either sourced from the clusterctl config file or from environment variables.

Current usage of the command is as follows:

# Generates a configuration file with variable values using a template from a
# specific URL as well as a GitHub URL.
clusterctl generate yaml --from https://github.com/foo-org/foo-repository/blob/main/cluster-template.yaml

clusterctl generate yaml --from https://foo.bar/cluster-template.yaml

# Generates a configuration file with variable values using
# a template stored locally.
clusterctl generate yaml  --from ~/workspace/cluster-template.yaml

# Prints list of variables used in the local template
clusterctl generate yaml --from ~/workspace/cluster-template.yaml --list-variables

# Prints list of variables from template passed in via stdin
cat ~/workspace/cluster-template.yaml | clusterctl generate yaml --from - --list-variables

# Default behavior for this sub-command is to read from stdin.
# Generate configuration from stdin
cat ~/workspace/cluster-template.yaml | clusterctl generate yaml

clusterctl get kubeconfig

This command prints the kubeconfig of an existing workload cluster into stdout. This functionality is available in clusterctl v0.3.9 or newer.

Examples

Get the kubeconfig of a workload cluster named foo.

clusterctl get kubeconfig foo

Get the kubeconfig of a workload cluster named foo in the namespace bar

clusterctl get kubeconfig foo --namespace bar

Get the kubeconfig of a workload cluster named foo using a specific context bar

clusterctl get kubeconfig foo --kubeconfig-context bar

clusterctl describe cluster

The clusterctl describe cluster command provides an “at a glance” view of a Cluster API cluster designed to help the user in quickly understanding if there are problems and where.

For example clusterctl describe cluster capi-quickstart will provide an output similar to:

The “at a glance” view is based on the idea that clusterctl should avoid overloading the user with information, but instead surface problems, if any.

In practice, if you look at the ControlPlane node, you might notice that the underlying machines are grouped together, because all of them have the same state (Ready equal to True), so it is not necessary to repeat the same information three times.

If this is not the case, and machines have different states, the visualization is going to use different lines:

You might also notice that the visualization does not represent the infrastructure machine or the bootstrap object linked to a machine, unless their state differs from the machine’s state.

Customizing the visualization

By default, the visualization generated by clusterctl describe cluster hides details for the sake of simplicity and shortness. However, if required, the user can ask for showing all the detail:

By using --grouping=false, the user can force the visualization to show all the machines on separated lines, no matter if they have the same state or not:

By using the --echo flag, the user can force the visualization to show infrastructure machines and bootstrap objects linked to machines, no matter if they have the same state or not:

It is also possible to force the visualization to show all the conditions for an object (instead of showing only the ready condition). e.g. with --show-conditions KubeadmControlPlane you get:

Please note that this option is flexible, and you can pass a comma separated list of kind or kind/name for which the command should show all the object’s conditions (use ‘all’ to show conditions for everything).

clusterctl move

The clusterctl move command allows to move the Cluster API objects defining workload clusters, like e.g. Cluster, Machines, MachineDeployments, etc. from one management cluster to another management cluster.

You can use:

clusterctl move --to-kubeconfig="path-to-target-kubeconfig.yaml"

To move the Cluster API objects existing in the current namespace of the source management cluster; in case if you want to move the Cluster API objects defined in another namespace, you can use the --namespace flag.

The discovery mechanism for determining the objects to be moved is in the provider contract

Pivot

Pivoting is a process for moving the provider components and declared Cluster API resources from a source management cluster to a target management cluster.

This can now be achieved with the following procedure:

  1. Use clusterctl init to install the provider components into the target management cluster
  2. Use clusterctl move to move the cluster-api resources from a Source Management cluster to a Target Management cluster

Bootstrap & Pivot

The pivot process can be bounded with the creation of a temporary bootstrap cluster used to provision a target Management cluster.

This can now be achieved with the following procedure:

  1. Create a temporary bootstrap cluster, e.g. using kind or minikube
  2. Use clusterctl init to install the provider components
  3. Use clusterctl generate cluster ... | kubectl apply -f - to provision a target management cluster
  4. Wait for the target management cluster to be up and running
  5. Get the kubeconfig for the new target management cluster
  6. Use clusterctl init with the new cluster’s kubeconfig to install the provider components
  7. Use clusterctl move to move the Cluster API resources from the bootstrap cluster to the target management cluster
  8. Delete the bootstrap cluster

Note: It’s required to have at least one worker node to schedule Cluster API workloads (i.e. controllers). A cluster with a single control plane node won’t be sufficient due to the NoSchedule taint. If a worker node isn’t available, clusterctl init will timeout.

Dry run

With --dry-run option you can dry-run the move action by only printing logs without taking any actual actions. Use log level verbosity -v to see different levels of information.

clusterctl upgrade

The clusterctl upgrade command can be used to upgrade the version of the Cluster API providers (CRDs, controllers) installed into a management cluster.

upgrade plan

The clusterctl upgrade plan command can be used to identify possible targets for upgrades.

clusterctl upgrade plan

Produces an output similar to this:

Checking cert-manager version...
Cert-Manager will be upgraded from "v1.5.0" to "v1.5.3"

Checking new release availability...

Management group: capi-system/cluster-api, latest release available for the v1beta1 API Version of Cluster API (contract):

NAME                    NAMESPACE                           TYPE                     CURRENT VERSION   NEXT VERSION
bootstrap-kubeadm       capi-kubeadm-bootstrap-system       BootstrapProvider        v0.4.0           v1.0.0
control-plane-kubeadm   capi-kubeadm-control-plane-system   ControlPlaneProvider     v0.4.0           v1.0.0
cluster-api             capi-system                         CoreProvider             v0.4.0           v1.0.0
infrastructure-docker   capd-system                         InfrastructureProvider   v0.4.0           v1.0.0

You can now apply the upgrade by executing the following command:

   clusterctl upgrade apply --contract v1beta1

The output contains the latest release available for each API Version of Cluster API (contract) available at the moment.

upgrade apply

After choosing the desired option for the upgrade, you can run the following command to upgrade all the providers in the management cluster. This upgrades all the providers to the latest stable releases.

clusterctl upgrade apply --contract v1beta1

The upgrade process is composed by three steps:

  • Check the cert-manager version, and if necessary, upgrade it.
  • Delete the current version of the provider components, while preserving the namespace where the provider components are hosted and the provider’s CRDs.
  • Install the new version of the provider components.

Please note that clusterctl does not upgrade Cluster API objects (Clusters, MachineDeployments, Machine etc.); upgrading such objects are the responsibility of the provider’s controllers.

It is also possible to explicitly upgrade one or more components to specific versions.

clusterctl upgrade apply \
    --core cluster-api:v1.2.4 \
    --infrastructure docker:v1.2.4