Emir Özbir

Aug 29, 2020

3 min read

ETCD Quorum and Consensus #k8s

Hello everyone, today I want to explain one of the essential dynamics in Kubernetes universe. Based on a true story.

I have a cluster for the dev-test environment, for the development and playground purposes.

This cluster is working with 1master and N number of workers, and Wehave update the master count to 3. Their expectation was, joining of the masters to the cluster, but that did not happen, and they cannot make any deployment.

So.. What’s going on about scaling of masters

The count of master affects the number of etcd members actually if you setup your cluster as a Self-Hosted(etcd).

Let’s have a look at this diagram.

Currently, our cluster has a single master instance; it is enough for the cluster, which provisioned for dev-test purposes.

In our case, one developer scaled our master instances, so it is possible, yes, you can scale the master.

Single Master Node

Regarding topology of my cluster, when you add another instance you mean adding a new etcd to the cluster it is normal for us it works yes we a scale them this appears another diagram shown below.

Scaling Environment

New ETCD members start to heart beating and leader elections and but some of the times that cause issues for the health of ETCD states.

In our case developer members scaled-down the masters 3 to 1and we could not get any response from the cluster.

Why don’t we scale up and down the master nodes like worker nodes?

1-) Raft Consensus Algorithm

2-) Quorum Number

Quorum Number:

Voting/HeartBeat

Each etcd member needs to access each other, and it continuously detects the leader of the etcd cluster term by a term.

List of Votes

Each member votest the leader candidates during election term so each term, and it provides consistency like that.

You can look at the simple vote and election table on the left-hand side.

If we lost one of a member from etcd cluster, the 3 of member it cluster would provide the consistency.

Because two nodes => `3/2+1` equals the two and we can continue but if we lose the member count which is not able to consistency.

So the value of N is the member count which joined when you set up your cluster at day one.

Consensus Algorithms; helps us find the leader of the system. Leader election is regarding their votes as basically.

This strategy provides consistency for the cluster state.

Summary

The story I have told, such a rare situation but everything happens and we must carefully state of our cluster.

Links: