Kubernetes Node Failover

3 min readJan 17, 2021

Hello ,

Today I’m gonna tell a story about how to tune failover behaviour of Kubernetes clusters. In Kubernetes clusters, worker nodes can be down based on couple reasons like network outage, resource limits ...etc

What is the Problem?

In the configuration of Kubernetes Cluster, master nodes are fetch the state of the nodes periodically. The default monitoring period of master nodes is 40second.

If any weird action occurred API server update the node state and taint the node. Taint provides us to prevent any possible scheduling action.

Let’s have a look at this nodes

ip-172-31-44-210   NotReady   <none>                 7h31m   v1.20.2

Let’s describe it :

Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule

NodeController has tainted the nodes but pods are still in there and replication controller do not detect missing replicas in failed nodes like that :

How I fixed the problem?

I’ve checked Kubernetes GitHub issues and google replies and I find `pod-eviction-timeout` parameters but it is not worked properly.

Notice: Please check this issue: https://github.com/kubernetes-sigs/kubespray/issues/7112

In further research, I found this admission plugin, DefaultTolerationSeconds and enabled it like that ;

- --enable-admissionplugins=NodeRestriction,DefaultTolerationSeconds
- --default-not-ready-toleration-seconds=40
- --default-unreachable-toleration-seconds=40

default-not-ready-toleration and default-unreachable-toleration-seconds parameters able to evict a pod where the located has been failed in specfied time period, in this example set this value to 40.

But there is another thing at the deployment level;

tolerations:
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 40
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 40

ReplicationController detect the available time period to create a new replica in possible taint label added to the worker nodes.

NAMEREADY   STATUS    RESTARTS   AGE        NODE
nginx-deployment-7944668857-2pdnt   1/1   => ip-172-31-46-223

I’m stopping node after that my lost replica locating on newly available replicas locating in available worker nodes.

NAMEREADY   STATUS    RESTARTS   AGE        NODE
nginx-deployment-7944668857-2pdnt   1/1   => ip-172-31-46-223
nginx-deployment-7944668857-2pdnt   0/1   => TERMINATING
nginx-deployment-7944668857-cwggz   0/1   => PENDING
nginx-deployment-7944668857-cwggz   0/1   => CONTAINER-CREATING
nginx-deployment-7944668857-cwggz   1/1   => RUNNING ip-172-31-41-27

Summary;

Kubernetes default configuration aware the node failure situation in very short term but evict the pods and recreate the missing replica process take 5 minute at this new approach at this blogpost is able to us decrease this period.

But every solution is not applicable for all environments and applications and so you can customize this approach related with your application.

Resources:

Pod eviction timeout not working · Issue #91891 · kubernetes/kubernetes

Assignees zhouya0 changed the title Remove TaintBasedEvictions feature gate Pod eviction timeout not working Jun 9…

github.com

kube_controller_pod_eviction_timeout no longer works · Issue #7112 · kubernetes-sigs/kubespray

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Using Admission Controllers

This page provides an overview of Admission Controllers. What are they? An admission controller is a piece of code that…

kubernetes.io

kube-controller-manager

Synopsis The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In…