Build the operator
==================

Before you start, you first need a functional :doc:`../dev_setup`.

While you create new files, always add the following copyright disclaimer at the beginning:

.. code-block:: python

    #
    # Copyright (c) 2020-<current-year> The Yaook Authors.
    #
    # This file is part of Yaook.
    # See https://yaook.cloud for further info.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #


The order of the following steps is not meant to be taken strictly. You will be probably often jump back and forth between some of them.

#. **Reference Docker images**
 
   Add required Docker images to ``./yaook/assets/pinned_version.yml``. Usually there should
   be one entry for each minor version.
 
#. **Create the CR class**

   Create a class inheriting ``sm.ReleaseAwareCustomResource`` and add the class for the CR:
   
   .. code-block:: python
      :caption: ``./yaook/op/<newcomponent>/<newcomponent>.py``
   
       import yaook.statemachine as sm
   
   
       class NewComponent(sm.ReleaseAwareCustomResource):
           API_GROUP = "yaook.cloud"
           API_GROUP_VERSION = "v1"
           PLURAL = "newcomponentdeployments"  # changeme
           KIND = "NewComponentDeployment"  # changeme
           RELEASES = ["2025.1"]  # changeme
           # Usually supported versions except for the lowest one. If you only support a single version keep it empty.
           VALID_UPGRADE_TARGETS = []
   
           def __init__(self, **kwargs):
               super().__init__(assemble_sm=True, **kwargs)
   
   
       sm.register(NewComponent)
   
   Create the ``__init__.py`` file:
   
   .. code-block:: python
      :caption: ``./yaook/op/<newcomponent>/__init__.py``
   
       from .newcomponent import NewComponent  # noqa:F401


#. **Add subresources to the CR class**

   Add class instances of the ``statemachine`` module as class members to ``NewComponent`` as previously evaluated. 

#. **Create jinja templates**

   Create the corresponding jinja templates for the
   Kubernetes manifests. These will be placed inside ``./yaook/op/<newcomponent>/templates`` or ``./yaook/op/infra/templates`` for infra resources.
   
   If the user is not set inside the Docker image, also add the ``securityContext.runAsUser``,  ``securityContext.runAsGroup`` and ``securityContext.fsGroup`` directives, together with the user ID you determined inside :doc:`../../explanations/containers`, to each statefulset, deployment and job.

#. **Add cue files for configuration**

   For OpenStack components, first add the packages listed inside the upstream config generator configuration (e.g. `barbican.conf <https://opendev.org/openstack/barbican/src/branch/master/etc/oslo-config-generator/barbican.conf>`__) to ``./buildcue.py``.
   This is used to generate the cue template for the configuration.

   To learn how to add a new cue configuration template and, if necessary, cue layers, read :doc:`../working_with_cue`.

   After you create a cue template, add ``<newcomponent>`` to the variable ``cue_schema_dsts`` inside ``./GNUmakefile``. Afterwards, and each time you change the cue template, build the template by running:
   
   .. code-block:: bash
   
       make cue-templates
   
   To inject values into the cue templates during rendering, make sure to specify ``target="<newcomponent>"`` for each ``sm.CueLayer`` inside ``CueSecret.add_cue_layers`` / ``CueConfig.add_cue_layers``.
   
   Now, you can also add the configuration to ``add_dependencies=[newcomponent_config]`` of templated deployments, statefulsets or jobs and reference them inside the jinja template:
   
   .. code-block:: yaml

      {{ dependencies['newcomponent_config'].resource_name() }}


#. **Create the Kubernetes CRD and verify functionality**

   Before you can start testing your operator, you also need to create a K8s CRD. For OpenStack components, you can copy this minimal CRD template and adjust it according to your requirements:
   
   .. code-block:: bash
    
       cp ./docs/developer/guides/create_operator/newcomponent-crd.cue ./yaook/helm_builder/Charts/crds/cue-templates/<newcomponent>-crd.cue
   
   Then run ``make k8s_helm_install_crds`` to install the CRD inside your cluster. Similiar to the configuration templates, this needs to be run after each change.
   
   Create an example manifest ``./docs/examples/<newcomponent>.yaml`` for a ``NewComponent`` instance and apply it to your K8s cluster:
   
   .. collapse:: newcomponent.yaml
    
      .. literalinclude:: newcomponent.yaml
         :language: bash
   
   
   Now run the operator:
   
   .. code-block:: bash
   
       python3 -m yaook.op -vv newcomponent run
   
   If everything is setup correctly, the newly created operator should start to reconcile the K8s CR and you can start testing and debugging the main functionality.
   
   The next steps will be necessary to adjust the operator for productive environments.

#. **Add scheduling keys**

   Define a scheduling key for each statefulset, deployment and job as well as the ``<NEWCOMPONENT>_ANY_SERVICE`` scheduling key inside ``./yaook/op/scheduling_keys.py``
   and add the ``.. autoattribute::`` directives for sphinx.
   
   Scheduling keys for templated Kubernetes resources need to be defined as follows for jobs:
   
   .. code-block:: python
      :caption: ``./yaook/op/<newcomponent>/<newcomponent>.py``
   
       [
           scheduling_keys.SchedulingKey.OPERATOR_<NEWCOMPONENT>.value,
           scheduling_keys.SchedulingKey.OPERATOR_ANY.value,
       ]

   and as follows for deployments and statefulsets:

   .. code-block:: python
      :caption: ``./yaook/op/<newcomponent>/<newcomponent>.py``

      [
         scheduling_keys.SchedulingKey.<NEWCOMPONENT_SERCICE>.value,
         scheduling_keys.SchedulingKey.<NEWCOMPONENT>_ANY_SERVICE.value,
      ]

   Additionally, API deployments require the ``yaook.op.scheduling_keys.SchedulingKey.ANY_API.value`` scheduling key.

   These scheduling keys will be assigned the following strings:

   .. code-block:: python
      :caption: ``./yaook/op/scheduling_keys.py``

      OPERATOR_<NEWCOMPONENT> = "operator.yaook.cloud/cinder"
      <NEWCOMPONENT_SERCICE> = \
        "<newcomponent>.yaook.cloud/<newcomponent-service>"
      <NEWCOMPONENT>_ANY_SERVICE = \
        "<newcomponent>.yaook.cloud/<newcomponent>-any-service"

   
   Then add the corresponding scheduling keys to each templated resource using the ``scheduling_keys`` property.

   For a deployment or statefulset manifest, the scheduling keys will be injected with the following ``nodeSelectorTerms``:

   .. code-block:: yaml 
      :caption: Example

      spec:
         affinity:
            nodeAffinity:
               requiredDuringSchedulingIgnoredDuringExecution:
               nodeSelectorTerms:
               - matchExpressions:
               - key: <newcomponent>.yaook.cloud/<newcomponent-service>
                  operator: Exists
               - key: namespace.yaook.cloud
                  operator: In
                  values:
                  - <yaook_namespace>
               - matchExpressions:
               - key: <newcomponent>.yaook.cloud/<newcomponent>-any-service
                  operator: Exists


   Don't forget to label your K8s nodes before you test this.

#. **SSL encrypt internal traffic**

   If possible, this should be achieved by configuring the service accordingly. If the software does not support encrypted communication natively, you will have to add additional containers to the k8s resource which handle the encryption.
   For OpenStack components, you usually only need to configure the multi container approach inside the API deployment, since communication between services is usually achieved via AMQP which
   Yaook configures to use SSL by default.
   
   For sidecar encryption, you will need the following containers and enable them based on ``spec.api.internal``:

   * ssl-terminator         
   * ssl-terminator-external
   * ssl-terminator-internal

   You also need to add a corresponding ``service-reload`` container for each ``ssl-terminator`` container. As a reference, you can use a template like ``./yaook/op/barbican/templates/barbican-deployment-api.yaml``.
   
   The value of the ``LOCAL_PORT`` environment variable for the ``ssl-terminator`` should use the default port as listed under `OpenStack firewall default ports <https://docs.openstack.org/install-guide/firewalls-default-ports.html>`__.
   The ``LOCAL_PORT`` of ``ssl-terminator-internal`` should then increase this port number by 1 and the ``ssl-terminator-external`` by 2.

#. **Allow resource configuration**

   Ensure you can configure the Kubernetes resources (requests and limits) for each job, deployment and statefulset and every container that are part of those.
   Do this by adding ``crd.#containerresources`` for each deployment/sts and ``jobResources`` to ``./yaook/helm_builder/Charts/crds/cue-templates/<newcomponent>-crd.cue``.

   This is a snippet of the Cinder CRD cue file to showcase how this is structured for a subset of Cinder services:

   .. code-block:: text
      :caption: ./yaook/helm_builder/Charts/crds/cue-templates/cinder-crd.cue

      api: {
         description: "Cinder API deployment configuration"
         properties: resources: {
            type:        "object"
            description: "Resource requests/limits for containers related to the Cinder API."
            properties: {
               "cinder-api":              crd.#containerresources
               "ssl-terminator":          crd.#containerresources
               "ssl-terminator-external": crd.#containerresources
               "ssl-terminator-internal": crd.#containerresources
               "service-reload":          crd.#containerresources
               "service-reload-external": crd.#containerresources
               "service-reload-internal": crd.#containerresources
            }
         }
      }
      scheduler: {
         description: "Cinder Scheduler deployment configuration"
         properties: resources: {
            type:        "object"
            description: "Resource requests/limits for containers related to the Cinder Scheduler."
            properties: "cinder-scheduler": crd.#containerresources
         }
      }
      jobResources: {
         type:        "object"
         description: "Resource limits for Job Pod containers spawned by the Operator"
         properties: {
            "cinder-db-sync-job":         crd.#containerresources
            "cinder-db-upgrade-pre-job":  crd.#containerresources
            "cinder-db-upgrade-post-job": crd.#containerresources
            "cinder-db-cleanup-cronjob":  crd.#containerresources
         }
      }

   To inject these resources inside the Jinja templates, you must use the ``resources`` Jinja filter for each container for ``.spec.containers[@].resources``.
   Inside Cinder templates this is achieved like this:

   .. code-block:: yaml
      :caption: ./yaook/op/cinder/templates/cinder-deployment-api.yaml

      resources: {{ crd_spec | resources('api.cinder-api') }}
      ...
      resources: {{ crd_spec | resources('api.ssl-terminator') }}
      ...
      resources: {{ crd_spec | resources('api.ssl-terminator-external') }}
      # and so on

   .. code-block:: yaml
      :caption: ./yaook/op/cinder/templates/cinder-statefulset-scheduler.yaml

      resources: {{ crd_spec | resources('scheduler.cinder-scheduler') }}

   As you can see, the parameter to the ``resources`` filter always consists of the key of the service from the CR manifest ``.spec`` and the key of
   ``crd.#containerresources`` separated by a dot. For jobs, there is a slight difference as the first part uses the substring ``job``, not ``jobResources``:

   .. code-block:: yaml
      :caption: ./yaook/op/cinder/templates/cinder-job-db-sync.yaml

      resources: {{ crd_spec | resources('job.cinder-db-sync-job') }}


#. **Implement high availability**

   K8s deployments and statefulsets replicas need to be configurable with the CRD manifest. The setup needs to distribute load
   and stay functional during rolling restarts. Note that there are exceptions where only a single replica
   is supported.

   The configuration can be supported by adding the following to ``properties`` inside the ``<newcomponent>-crd.cue`` for each service:

   .. code-block:: bash

        <service-name>: crd.replicated

   Potential additional steps depend on the component you want to deploy.

#. **Policy validation (OpenStack only)**

   Validate optional policy configuration from the K8s manifest by by adding ``sm.PolicyValidator`` including its dependencies to ``NewComponent``.
   You can use ``./yaook/op/cinder/__init__.py`` as a reference.

#. **Add a QuorumPodDisruptionBudget for each deployment and statefulset**

   .. code-block:: python
      :caption: Example
   
       api_deployment_pdb = sm.QuorumPodDisruptionBudget(
           metadata=lambda ctx: f"{ctx.parent_name}-api_deployment_pdb",
           replicated=api_deployment,
       )

#. **Setup monitoring**

   Use ``sm.GeneratedServiceMonitor`` or a more suitable classes from to ``./yaook/statemachine/resources/prometheus.py`` to setup monitoring.

   For OpenStack API monitoring, you need to create 3 service monitors:

   * internal_ssl_service_monitor
   * external_ssl_service_monitor
   * internal_ingress_ssl_service_monitor (using ``sm.Optional(condition=_internal_endpoint_configured, ...)``)

#. **Adjust the default config**
   
   This depends on your specific requirements. Make sure that the setup can withstand the expected traffic and
   amount of created datasets and adjust configuration values like worker counts, limits, quotas, etc. accordingly. 

#. **Add IPv6 support**
   
   If possible, use dual-stack sockets inside configurations and adjust additional configuration if needed by the component.

Additional file changes
-----------------------
* Reference templates and static files inside ``./MANIFEST.in``
* Adjust the file ``./docs/examples/<newcomponent>.yaml``

Additional file changes for infra operator CRs
----------------------------------------------
* Add constants you want to reference in other operators to ``./yaook/op/common.py``
* add your CRD to ``AVAILABLE_WATCHERS`` inside ``./yaook/op/daemon.py``
* add the following to ``./yaook/statemachine/resources/yaook_infra.py``: 
   * add another class ``NewComponent`` inheriting ``YaookReadyResource``. Add all k8s manifest keys of the CRD that require
     other Kubernetes to be updated to the ``_needs_update`` method. 
   * create a ``TemplatedNewComponent`` class inside ``./yaook/statemachine/resources/yaook_infra.py``
   * reference ``TemplatedNewComponent`` inside ``.. autoclass::`` and ``.. autosummary::``
* reference the ``TemplatedNewComponent`` class inside ``./yaook/statemachine/resources/__init__.py``
* add a method ``NewComponent._interface`` to ``./yaook/statemachine/interfaces.py``

Set up tests
--------------

Unit tests
^^^^^^^^^^

Unit tests need to be put inside ``./tests/op/<newcomponent>/test_api.py`` for OpenStack CRs and inside ``./tests/op/infra/test_<newcomponent>.py`` for infra resources.
For Openstack CRs, make use of the test cases defined inside ``./tests/op/common_tests.py``. If you created new cue layers during development, also write tests for those.

Integration tests
^^^^^^^^^^^^^^^^^
* add an example manifest for the CR to ``./ci/devel_integration_tests/deploy/<newcomponent>.yaml`` 
* add the component to the ``CLASS_INFO`` dictionary inside ``./ci/devel_integration_tests/os_services.py`` (OpenStack only)
* add the CR to the ``wait_and_test`` method inside ``ci/devel_integration_tests/run-tests.sh`` and configure test cases that confirm functionality

Add support for tempest tests (OpenStack only)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Add the component to the method ``_get_tempest_suffix_for_service`` inside ``./yaook/op/tempest/__init__.py`` to return the correct suffix.
You either need to provide the module name from the tempest repository or provide the name and module path of a separate plugin, e.g. for Barbican the `barbican-tempest-plugin <https://opendev.org/openstack/barbican-tempest-plugin>`__.
You can confirm that tempest tests are running by creating a TempestJob (make sure the tempest-operator is running). You can use ``./docs/examples/tempest-job.yaml`` to run the tests by adjusting ``.spec.target.service`` and ``.spec.tempestConfig.service_available``
(depending on your service, you will also have to adjust other configuration values). Inspect the logs after the job terminated to ensure everything is working as intended.
As long as you are working inside a development cluster, it is not necessary that all of these tests pass each time, but they might serve as an indicator where things might require additional tweaking. Also beware that some tempest test cases may fail simply because they contain bugs.

Create the helm chart
---------------------

Stop the local operator process and create and install the helm chart as described `here <https://docs.yaook.cloud/developer/explanations/helm-charts.html#creating-a-new-helm-chart>`__.
Recreate the CR to make sure the resource still reconciles successfully.

Update scripts and documentation
--------------------------------

* add the node labels to ``./docs/handbook/user-guide.rst`` and ``./docs/developer/guides/dev_setup.rst``
* add node labels inside ``./ci/devel_integration_tests/label-nodes.sh`` to the variable ``all_node_labels``
* add the user and group name with their Docker image ID to ``./docs/developer/explanations/containers.rst``
* create a user guide if deploying the CRD involves manual steps ``./docs/handbook`` (optional)

If you add new documentation pages, reference them inside the correct ``index.rst`` file.