Redeploying Kubernetes master server

Goal: redeploy Kubernetes master node without redeploying the worker nodes, while keeping the same x509 certificates and all the Kubernetes configuration. No client/consumer action is required, this is solely a server side operation.

The following has been tested in a single master Kubernetes deployment scenario (deployed with the kubeadm).

If you have tested this in a multi-master node deployment, please share your observations.

With a single master server, the impact should be minimal as long as internal DNS is not actively used since it will be the only unavailable service (along with the Kubernetes API itself of course).

Backup Kubernetes PKI

The x509 certificates in /etc/kubernetes/pki directory are created when the Kubernetes cluster is built and maintained so they will not expire. These certificates are mainly used for identification and authorization, and also for securing the connection between the core services such as kubelets, etcd.

The Kubernetes CA certificate contain DNS names, IP addresses of the Kubernetes master server(s). The same valid for the etcd CA certificate, except that DNS and IP's are of the server(s) running the etcd cluster.

ssh k8s-master "sudo tar czf - -C /etc/kubernetes pki" > pki.tar.gz

Backup etcd

etcd keeps all Kubernetes configuration, objects such as PodSecurityPolicy, Deployments, Pods, ServiceAccounts, Secrets, ... pretty much everything.

Make sure you have saved snapshot.db somewhere safe as you will need to restore your Kubernetes from that file.

curl -L https://github.com/coreos/etcd/releases/download/v3.1.11/etcd-v3.1.11-linux-amd64.tar.gz -o /tmp/etcd-v3.1.11-linux-amd64.tar.gz
tar xvf /tmp/etcd-v3.1.11-linux-amd64.tar.gz 'etcd-v3.1.11-linux-amd64/etcdctl' --strip-components=1
mv etcdctl /usr/bin/

ETCDCTL_API=3 etcdctl --endpoints http://127.0.0.1:2379 snapshot save /tmp/snapshot.db
exit

scp k8s-master:/tmp/snapshot.db .

Now that you have backups of PKI & etcd, you can destroy and redeploy the k8s-master server.

Restore Kubernetes PKI

Right before you will install Kubernetes on your new k8s-master node, you need to restore previously backed up x509 certificates:

ssh-keygen -R k8s-master
ssh k8s-master "sudo mkdir /etc/kubernetes"
cat pki.tar.gz | ssh k8s-master "sudo tar xvzf - -C /etc/kubernetes"

Restore etcd

If you are using kubeadm to deploy Kubernetes, then restore etcd right after kubeadm init has been completed, otherwise it will abort complaining on that /var/lib/etcd directory is not empty.

ssh-keygen -R k8s-master
scp snapshot.db k8s-master:/tmp/

ssh k8s-master
sudo -i

curl -L https://github.com/coreos/etcd/releases/download/v3.1.11/etcd-v3.1.11-linux-amd64.tar.gz -o /tmp/etcd-v3.1.11-linux-amd64.tar.gz
tar xvf /tmp/etcd-v3.1.11-linux-amd64.tar.gz 'etcd-v3.1.11-linux-amd64/etcdctl' --strip-components=1
mv etcdctl /usr/bin/

docker stop k8s_etcd_etcd-k8s-master_kube-system_d0ddbed1539cb679a70d43b61fa403c5_0
rm -rf /var/lib/etcd

ETCDCTL_API=3 etcdctl snapshot restore /tmp/snapshot.db \
  --name k8s-master \
  --initial-cluster k8s-master=http://127.0.0.1:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://127.0.0.1:2380 \
  --data-dir=/var/lib/etcd

systemctl restart kubelet

Do not use docker start for starting the etcd dokcer container. It will get automatically started after you restart the kubelet service.

If your snapshot was done with an older Kubernetes version and with this redeployment you have gotten a new version, you should use upgrade your etcd by using kubeadm upgrade commands:

kubeadm upgrade plan
kubeadm upgrade apply vX.Y.Z

Caveats

kubeadm might not initialize if the server gets different IP, e.g. 10.0.0.4 instead of 10.0.0.5 which was previously written to the Kubernetes CA certificate.

In this occasion, you will see the following error in the logs:

# journalctl -u kubelet -f
Mar 11 11:47:12 k8s-master kubelet[11330]: E0311 11:47:12.362418   11330 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:465: Failed to list *v1.Service: Get https://10.0.0.4:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 10.0.0.5, not 10.0.0.4

While it is possible for you to regenerate the x509 certificate, sign with it everything that is related to it again and distribute it across your environment, it is better if you do not go that way and instead just use the internal IP for the k8s-master instance which was used before.

To set the previous IP address to your new instance, use the following commands:

openstack port create --network private1 --fixed-ip subnet=private1,ip-address=10.0.0.5 k8s-master-int-ip

openstack server create ... --port k8s-master-int-ip ... # or "--nic portid=k8s-master-int-ip" but without "--network private1" flag.