Safe eviction
#############

Motivation
==========

- Avoid data loss at all cost
- Avoid data plane disruption wherever possible
- Recover from data plane disruption

Affected Services
=================

- Nova Compute
- OVN Agent
- OVN BGP Agent

Surroundings / Kubernetes Tools
===============================

Finalizers
----------

- Finalizers DO prevent deletion of an object from the API
- Finalizers DO NOT prevent termination of containers in a Pod

Container Lifecycle Hooks
-------------------------

.. seealso::

    `Official Kubernetes documentation on Container Lifecycle Hooks <https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/>`_

- `preStop` hook allows to execute code inside a container or send an HTTP
  request to it
- execution time bounded by the termination grace period (per-Pod setting or
  deletion request setting)
- Docs say:

        Users should make their hook handlers as lightweight as possible.
        There are cases, however, when long running commands make sense, such
        as when saving state prior to stopping a Container.

Node Draining
-------------

- Implemented by adding Evictions for matching Pods
- Protection of DaemonSets is HARDCODED in kubectl against the DaemonSet
  resource!! -> Our CDSes are unprotected and will be evicted by Drain!!
- Not useful at all for our use cases

Implementation
==============

Nova Compute
------------

- Finalizers on Pods are not sufficient to protect anything worth saving
- We need to prevent the CDS from descheduling Pods until the node has been
  evicted
- Approach:

    - Subclass :class:`ConfiguredDaemonSetState` to prevent descheduling of
      nodes which still have state which needs to be saved
    - Track nodes with state using finalizers? or annotations?
    - On reconcile, trigger task (in the queue? how?) which performs the
      eviction (via OpenStack API)
    - Once eviction is complete, trigger reconcile which will then allow the
      node to be cleared
    - If the node is unreachable (detect how? OpenStack API +
      k8s pod/node status?), do hard eviction and delete pod parallely. If
      a ironicNodeShutdown is specify in the eviction, the node gets shutdown.

- Challenges:

    - Trying to mess with lifecycle management of CDS is probably not a wise
      idea
    - Need to keep track of the state the nodes are in separately (where?)

To overcome those challenges, we decided to split the Nova Operator into two
Operators: One for the big picture (``nova``) and one which manages the
individual compute nodes (``nova_compute``). This also means that there is a
new resource (``NovaComputeNode``).

Neutron OVN Agents
------------------

- Like Nova Compute, Finalizers on Pods don't help us here
- Components that depends on ``OVNAgent`` are ``NovaComputeNode`` and
  ``OVNBGPAgent``.
- Approach to safely schedule or evict them:

    - Use annotations and labels on the nodes running the agents/services
    - OVN Operator sets label
      `maintenance.yaook.cloud/maintenance-required-l2-agent: False`
      after the L2 agent has been successfully created.
    - If L2 agent needs to be removed (e.g. for updating configuration) the
      operator set
      `maintenance.yaook.cloud/maintenance-required-l2-agent: True`
    - Operators (nova, neutron) creating resources needing L2 agent, won't
      schedule them on nodes which don't have the label
      `maintenance.yaook.cloud/maintenance-required-l2-agent: False`
      (so either, no annotation or value `True`, will lead to not scheduling
      resources on the node)
    - Operators responsible for agents/services needing L2 agent set an
      annotation `l2-lock.maintenance.yaook.cloud/*: ''` to the node, at the
      very beginning of reconcile
    - The annotation `l2-lock.maintenance.yaook.cloud/*: ''` will be removed,
      after the agent/service is deleted. This way, each agent/service can be
      safely evicted before.
    - OVN Operator waits till all `l2-lock.maintenance.yaook.cloud/*: ''`
      annotations got removed from the node. Before that, the L2 agent won't be
      touched by the operator
    - Once all `l2-lock.maintenance.yaook.cloud/*: ''` annotations are gone,
      OVN Operator will delete the L2 agent
    - After L2 agent is updated/recreated, the label
      `maintenance.yaook.cloud/maintenance-required-l2-agent: False`
      is set again

We decided to retain the maintenance-required annotation on the node, even
after the OVN agent has been deleted. That way, if the ovnagent doesn't
got deleted right away by k8s and the pods are still there, other operators
still see that there is a maintenance required.

- Implementation details:

    - We added a :class:`~.L2Lock` that will be used by each operator needing
      L2, to set the `l2-lock.maintenance.yaook.cloud/*: ''` annotation.
    - Introduced subclass :class:`~.L2AwareStatefulAgentResource` from
      `StatefulAgentResource` that each agents resource inherits from, that
      needs L2. It is used to check if label
      `maintenance.yaook.cloud/maintenance-required-l2-agent: False` is set,
      so agent/service can be scheduled on the node.
    - OVN Operator has it's own :class:`L2StateResource` instead of inheriting
      from `APIStateResource` so the specific behavior can be implemented there.
      This class adds the label
      `maintenance.yaook.cloud/maintenance-required-l2-agent: False`
      to the node after L2 agent is created and changes it to `True` on
      deletion. It also waits, till all the maintenance locks are gone from the
      node.

The name `maintenance.yaook.cloud/maintenance-required-l2-agent` is historical
from the neutron-l2-agent. We kept it for compatibility during the switchover
to ovn.