Safe eviction

Motivation

  • Avoid data loss at all cost

  • Avoid data plane disruption wherever possible

  • Recover from data plane disruption

Affected Services

  • Nova Compute

  • Neutron L3 Agent

  • Neutron DHCP Agent

Surroundings / Kubernetes Tools

Finalizers

  • Finalizers DO prevent deletion of an object from the API

  • Finalizers DO NOT prevent termination of containers in a Pod

Container Lifecycle Hooks

  • preStop hook allows to execute code inside a container or send an HTTP request to it

  • execution time bounded by the termination grace period (per-Pod setting or deletion request setting)

  • Docs say:

    Users should make their hook handlers as lightweight as possible. There are cases, however, when long running commands make sense, such as when saving state prior to stopping a Container.

Node Draining

  • Implemented by adding Evictions for matching Pods

  • Protection of DaemonSets is HARDCODED in kubectl against the DaemonSet resource!! -> Our CDSes are unprotected and will be evicted by Drain!!

  • Not useful at all for our use cases

Implementation

Nova Compute

  • Finalizers on Pods are not sufficient to protect anything worth saving

  • We need to prevent the CDS from descheduling Pods until the node has been evicted

  • Approach:

    • Subclass ConfiguredDaemonSetState to prevent descheduling of nodes which still have state which needs to be saved

    • Track nodes with state using finalizers? or annotations?

    • On reconcile, trigger task (in the queue? how?) which performs the eviction (via OpenStack API)

    • Once eviction is complete, trigger reconcile which will then allow the node to be cleared

    • If the node is unreachable (detect how? OpenStack API + k8s pod/node status?), do hard eviction and delete pod parallely.

  • Challenges:

    • Trying to mess with lifecycle management of CDS is probably not a wise idea

    • Need to keep track of the state the nodes are in separately (where?)

To overcome those challenges, we decided to split the Nova Operator into two Operators: One for the big picture (nova) and one which manages the individual compute nodes (nova_compute). This also means that there is a new resource (NovaComputeNode).