Build the operator

Before you start, you first need a functional Development Setup.

While you create new files, always add the following copyright disclaimer at the beginning:

#
# Copyright (c) 2020-<current-year> The Yaook Authors.
#
# This file is part of Yaook.
# See https://yaook.cloud for further info.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

The order of the following steps is not meant to be taken strictly. You will be probably often jump back and forth between some of them.

  1. Reference Docker images

    Add required Docker images to ./yaook/assets/pinned_version.yml. Usually there should be one entry for each minor version.

  2. Create the CR class

    Create a class inheriting sm.ReleaseAwareCustomResource and add the class for the CR:

    ./yaook/op/<newcomponent>/<newcomponent>.py
     import yaook.statemachine as sm
    
    
     class NewComponent(sm.ReleaseAwareCustomResource):
         API_GROUP = "yaook.cloud"
         API_GROUP_VERSION = "v1"
         PLURAL = "newcomponentdeployments"  # changeme
         KIND = "NewComponentDeployment"  # changeme
         RELEASES = ["2025.1"]  # changeme
         # Usually supported versions except for the lowest one. If you only support a single version keep it empty.
         VALID_UPGRADE_TARGETS = []
    
         def __init__(self, **kwargs):
             super().__init__(assemble_sm=True, **kwargs)
    
    
     sm.register(NewComponent)
    

    Create the __init__.py file:

    ./yaook/op/<newcomponent>/__init__.py
     from .newcomponent import NewComponent  # noqa:F401
    
  3. Add subresources to the CR class

    Add class instances of the statemachine module as class members to NewComponent as previously evaluated.

  4. Create jinja templates

    Create the corresponding jinja templates for the Kubernetes manifests. These will be placed inside ./yaook/op/<newcomponent>/templates or ./yaook/op/infra/templates for infra resources.

    If the user is not set inside the Docker image, also add the securityContext.runAsUser, securityContext.runAsGroup and securityContext.fsGroup directives, together with the user ID you determined inside Containers, to each statefulset, deployment and job.

  5. Add cue files for configuration

    For OpenStack components, first add the packages listed inside the upstream config generator configuration (e.g. barbican.conf) to ./buildcue.py. This is used to generate the cue template for the configuration.

    To learn how to add a new cue configuration template and, if necessary, cue layers, read Working with CUE.

    After you create a cue template, add <newcomponent> to the variable cue_schema_dsts inside ./GNUmakefile. Afterwards, and each time you change the cue template, build the template by running:

    make cue-templates
    

    To inject values into the cue templates during rendering, make sure to specify target="<newcomponent>" for each sm.CueLayer inside CueSecret.add_cue_layers / CueConfig.add_cue_layers.

    Now, you can also add the configuration to add_dependencies=[newcomponent_config] of templated deployments, statefulsets or jobs and reference them inside the jinja template:

    {{ dependencies['newcomponent_config'].resource_name() }}
    
  6. Create the Kubernetes CRD and verify functionality

    Before you can start testing your operator, you also need to create a K8s CRD. For OpenStack components, you can copy this minimal CRD template and adjust it according to your requirements:

    cp ./docs/developer/guides/create_operator/newcomponent-crd.cue ./yaook/helm_builder/Charts/crds/cue-templates/<newcomponent>-crd.cue
    

    Then run make k8s_helm_install_crds to install the CRD inside your cluster. Similiar to the configuration templates, this needs to be run after each change.

    Create an example manifest ./docs/examples/<newcomponent>.yaml for a NewComponent instance and apply it to your K8s cluster:

    newcomponent.yaml
    apiVersion: yaook.cloud/v1
    kind: NewComponentDeployment
    metadata:
        name: my-component
    spec:
      api:
        ingress:
          fqdn: "mycomponent.yaook.cloud"
          port: 32443
      database:
        replicas: 1
        timeoutClient: 300
        storageSize: 8Gi
        proxy:
          replicas: 1
        backup:
          schedule: "0 * * * *"
      issuerRef:
        name: ca-issuer
      messageQueue: 
        replicas: 1
      keystoneRef:
        name: keystone
      region:
        name: MyRegion
      targetRelease: <LATEST_RELEASE>
      newcomponentConfig:
        DEFAULT:
          debug: True
    

    Now run the operator:

    python3 -m yaook.op -vv newcomponent run
    

    If everything is setup correctly, the newly created operator should start to reconcile the K8s CR and you can start testing and debugging the main functionality.

    The next steps will be necessary to adjust the operator for productive environments.

  7. Add scheduling keys

    Define a scheduling key for each statefulset, deployment and job as well as the <NEWCOMPONENT>_ANY_SERVICE scheduling key inside ./yaook/op/scheduling_keys.py and add the .. autoattribute:: directives for sphinx.

    Scheduling keys for templated Kubernetes resources need to be defined as follows for jobs:

    ./yaook/op/<newcomponent>/<newcomponent>.py
     [
         scheduling_keys.SchedulingKey.OPERATOR_<NEWCOMPONENT>.value,
         scheduling_keys.SchedulingKey.OPERATOR_ANY.value,
     ]
    

    and as follows for deployments and statefulsets:

    ./yaook/op/<newcomponent>/<newcomponent>.py
    [
       scheduling_keys.SchedulingKey.<NEWCOMPONENT_SERCICE>.value,
       scheduling_keys.SchedulingKey.<NEWCOMPONENT>_ANY_SERVICE.value,
    ]
    

    Additionally, API deployments require the yaook.op.scheduling_keys.SchedulingKey.ANY_API.value scheduling key.

    These scheduling keys will be assigned the following strings:

    ./yaook/op/scheduling_keys.py
    OPERATOR_<NEWCOMPONENT> = "operator.yaook.cloud/cinder"
    <NEWCOMPONENT_SERCICE> = \
      "<newcomponent>.yaook.cloud/<newcomponent-service>"
    <NEWCOMPONENT>_ANY_SERVICE = \
      "<newcomponent>.yaook.cloud/<newcomponent>-any-service"
    

    Then add the corresponding scheduling keys to each templated resource using the scheduling_keys property.

    For a deployment or statefulset manifest, the scheduling keys will be injected with the following nodeSelectorTerms:

    Example
    spec:
       affinity:
          nodeAffinity:
             requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
             - key: <newcomponent>.yaook.cloud/<newcomponent-service>
                operator: Exists
             - key: namespace.yaook.cloud
                operator: In
                values:
                - <yaook_namespace>
             - matchExpressions:
             - key: <newcomponent>.yaook.cloud/<newcomponent>-any-service
                operator: Exists
    

    Don’t forget to label your K8s nodes before you test this.

  8. SSL encrypt internal traffic

    If possible, this should be achieved by configuring the service accordingly. If the software does not support encrypted communication natively, you will have to add additional containers to the k8s resource which handle the encryption. For OpenStack components, you usually only need to configure the multi container approach inside the API deployment, since communication between services is usually achieved via AMQP which Yaook configures to use SSL by default.

    For sidecar encryption, you will need the following containers and enable them based on spec.api.internal:

    • ssl-terminator

    • ssl-terminator-external

    • ssl-terminator-internal

    You also need to add a corresponding service-reload container for each ssl-terminator container. As a reference, you can use a template like ./yaook/op/barbican/templates/barbican-deployment-api.yaml.

    The value of the LOCAL_PORT environment variable for the ssl-terminator should use the default port as listed under OpenStack firewall default ports. The LOCAL_PORT of ssl-terminator-internal should then increase this port number by 1 and the ssl-terminator-external by 2.

  9. Allow resource configuration

    Ensure you can configure the Kubernetes resources (requests and limits) for each job, deployment and statefulset and every container that are part of those. Do this by adding crd.#containerresources for each deployment/sts and jobResources to ./yaook/helm_builder/Charts/crds/cue-templates/<newcomponent>-crd.cue.

    This is a snippet of the Cinder CRD cue file to showcase how this is structured for a subset of Cinder services:

    ./yaook/helm_builder/Charts/crds/cue-templates/cinder-crd.cue
    api: {
       description: "Cinder API deployment configuration"
       properties: resources: {
          type:        "object"
          description: "Resource requests/limits for containers related to the Cinder API."
          properties: {
             "cinder-api":              crd.#containerresources
             "ssl-terminator":          crd.#containerresources
             "ssl-terminator-external": crd.#containerresources
             "ssl-terminator-internal": crd.#containerresources
             "service-reload":          crd.#containerresources
             "service-reload-external": crd.#containerresources
             "service-reload-internal": crd.#containerresources
          }
       }
    }
    scheduler: {
       description: "Cinder Scheduler deployment configuration"
       properties: resources: {
          type:        "object"
          description: "Resource requests/limits for containers related to the Cinder Scheduler."
          properties: "cinder-scheduler": crd.#containerresources
       }
    }
    jobResources: {
       type:        "object"
       description: "Resource limits for Job Pod containers spawned by the Operator"
       properties: {
          "cinder-db-sync-job":         crd.#containerresources
          "cinder-db-upgrade-pre-job":  crd.#containerresources
          "cinder-db-upgrade-post-job": crd.#containerresources
          "cinder-db-cleanup-cronjob":  crd.#containerresources
       }
    }
    

    To inject these resources inside the Jinja templates, you must use the resources Jinja filter for each container for .spec.containers[@].resources. Inside Cinder templates this is achieved like this:

    ./yaook/op/cinder/templates/cinder-deployment-api.yaml
    resources: {{ crd_spec | resources('api.cinder-api') }}
    ...
    resources: {{ crd_spec | resources('api.ssl-terminator') }}
    ...
    resources: {{ crd_spec | resources('api.ssl-terminator-external') }}
    # and so on
    
    ./yaook/op/cinder/templates/cinder-statefulset-scheduler.yaml
    resources: {{ crd_spec | resources('scheduler.cinder-scheduler') }}
    

    As you can see, the parameter to the resources filter always consists of the key of the service from the CR manifest .spec and the key of crd.#containerresources separated by a dot. For jobs, there is a slight difference as the first part uses the substring job, not jobResources:

    ./yaook/op/cinder/templates/cinder-job-db-sync.yaml
    resources: {{ crd_spec | resources('job.cinder-db-sync-job') }}
    
  10. Implement high availability

    K8s deployments and statefulsets replicas need to be configurable with the CRD manifest. The setup needs to distribute load and stay functional during rolling restarts. Note that there are exceptions where only a single replica is supported.

    The configuration can be supported by adding the following to properties inside the <newcomponent>-crd.cue for each service:

    <service-name>: crd.replicated
    

    Potential additional steps depend on the component you want to deploy.

  11. Policy validation (OpenStack only)

    Validate optional policy configuration from the K8s manifest by by adding sm.PolicyValidator including its dependencies to NewComponent. You can use ./yaook/op/cinder/__init__.py as a reference.

  12. Add a QuorumPodDisruptionBudget for each deployment and statefulset

    Example
     api_deployment_pdb = sm.QuorumPodDisruptionBudget(
         metadata=lambda ctx: f"{ctx.parent_name}-api_deployment_pdb",
         replicated=api_deployment,
     )
    
  13. Setup monitoring

    Use sm.GeneratedServiceMonitor or a more suitable classes from to ./yaook/statemachine/resources/prometheus.py to setup monitoring.

    For OpenStack API monitoring, you need to create 3 service monitors:

    • internal_ssl_service_monitor

    • external_ssl_service_monitor

    • internal_ingress_ssl_service_monitor (using sm.Optional(condition=_internal_endpoint_configured, ...))

  14. Adjust the default config

    This depends on your specific requirements. Make sure that the setup can withstand the expected traffic and amount of created datasets and adjust configuration values like worker counts, limits, quotas, etc. accordingly.

  15. Add IPv6 support

    If possible, use dual-stack sockets inside configurations and adjust additional configuration if needed by the component.

Additional file changes

  • Reference templates and static files inside ./MANIFEST.in

  • Adjust the file ./docs/examples/<newcomponent>.yaml

Additional file changes for infra operator CRs

  • Add constants you want to reference in other operators to ./yaook/op/common.py

  • add your CRD to AVAILABLE_WATCHERS inside ./yaook/op/daemon.py

  • add the following to ./yaook/statemachine/resources/yaook_infra.py:
    • add another class NewComponent inheriting YaookReadyResource. Add all k8s manifest keys of the CRD that require other Kubernetes to be updated to the _needs_update method.

    • create a TemplatedNewComponent class inside ./yaook/statemachine/resources/yaook_infra.py

    • reference TemplatedNewComponent inside .. autoclass:: and .. autosummary::

  • reference the TemplatedNewComponent class inside ./yaook/statemachine/resources/__init__.py

  • add a method NewComponent._interface to ./yaook/statemachine/interfaces.py

Set up tests

Unit tests

Unit tests need to be put inside ./tests/op/<newcomponent>/test_api.py for OpenStack CRs and inside ./tests/op/infra/test_<newcomponent>.py for infra resources. For Openstack CRs, make use of the test cases defined inside ./tests/op/common_tests.py. If you created new cue layers during development, also write tests for those.

Integration tests

  • add an example manifest for the CR to ./ci/devel_integration_tests/deploy/<newcomponent>.yaml

  • add the component to the CLASS_INFO dictionary inside ./ci/devel_integration_tests/os_services.py (OpenStack only)

  • add the CR to the wait_and_test method inside ci/devel_integration_tests/run-tests.sh and configure test cases that confirm functionality

Add support for tempest tests (OpenStack only)

Add the component to the method _get_tempest_suffix_for_service inside ./yaook/op/tempest/__init__.py to return the correct suffix. You either need to provide the module name from the tempest repository or provide the name and module path of a separate plugin, e.g. for Barbican the barbican-tempest-plugin. You can confirm that tempest tests are running by creating a TempestJob (make sure the tempest-operator is running). You can use ./docs/examples/tempest-job.yaml to run the tests by adjusting .spec.target.service and .spec.tempestConfig.service_available (depending on your service, you will also have to adjust other configuration values). Inspect the logs after the job terminated to ensure everything is working as intended. As long as you are working inside a development cluster, it is not necessary that all of these tests pass each time, but they might serve as an indicator where things might require additional tweaking. Also beware that some tempest test cases may fail simply because they contain bugs.

Create the helm chart

Stop the local operator process and create and install the helm chart as described here. Recreate the CR to make sure the resource still reconciles successfully.

Update scripts and documentation

  • add the node labels to ./docs/handbook/user-guide.rst and ./docs/developer/guides/dev_setup.rst

  • add node labels inside ./ci/devel_integration_tests/label-nodes.sh to the variable all_node_labels

  • add the user and group name with their Docker image ID to ./docs/developer/explanations/containers.rst

  • create a user guide if deploying the CRD involves manual steps ./docs/handbook (optional)

If you add new documentation pages, reference them inside the correct index.rst file.