Avoid data loss at all cost
Avoid data plane disruption wherever possible
Recover from data plane disruption
Neutron L3 Agent
Neutron DHCP Agent
Surroundings / Kubernetes Tools
Finalizers DO prevent deletion of an object from the API
Finalizers DO NOT prevent termination of containers in a Pod
Container Lifecycle Hooks
preStop hook allows to execute code inside a container or send an HTTP request to it
execution time bounded by the termination grace period (per-Pod setting or deletion request setting)
Users should make their hook handlers as lightweight as possible. There are cases, however, when long running commands make sense, such as when saving state prior to stopping a Container.
Implemented by adding Evictions for matching pods
Protection of DaemonSets is HARDCODED in kubectl against the DaemonSet resource!! -> Our CDSes are unprotected and will be evicted by Drain!!
Not useful at all for our usecases
Finalizers on Pods are not sufficient to protect anything worth saving
We need to prevent the CDS from descheduling pods until the node has been evicted
ConfiguredDaemonSetStateto prevent descheduling of nodes which still have state which needs to be saved
Track nodes with state using finalizers? or annotations?
On reconcile, trigger task (in the queue? how?) which performs the eviction (via OpenStack API)
Once eviction is complete, trigger reconcile which will then allow the node to be cleared
If the node is unreachable (detect how? OpenStack API + k8s pod/node status?), do hard eviction and delete pod parallely.
Trying to mess with lifecycle management of CDS is probably not a wise idea
Need to keep track of the state the nodes are in separately (where?)
To overcome those challenges, we decided to split the nova operator into two
operators: One for the big picture (
nova) and one which manages the
individual compute nodes (
nova_compute). This also means that there is a
new resource (