Redeploying Kubernetes master server
Goal: redeploy Kubernetes master node without redeploying the worker nodes, while keeping the same x509 certificates and all the Kubernetes configuration. No client/consumer action is required, this is solely a server side operation.
The following has been tested in a single master Kubernetes deployment scenario (deployed with the
kubeadm
).If you have tested this in a multi-master node deployment, please share your observations.
With a single master server, the impact should be minimal as long as internal DNS is not actively used since it will be the only unavailable service (along with the Kubernetes API itself of course).
Backup Kubernetes PKI
The x509 certificates in /etc/kubernetes/pki
directory are created when the Kubernetes cluster is built and maintained so they will not expire. These certificates are mainly used for identification and authorization, and also for securing the connection between the core services such as kubelets, etcd.
The Kubernetes CA certificate contain DNS names, IP addresses of the Kubernetes master server(s). The same valid for the etcd CA certificate, except that DNS and IP's are of the server(s) running the etcd cluster.
ssh k8s-master "sudo tar czf - -C /etc/kubernetes pki" > pki.tar.gz
Backup etcd
etcd keeps all Kubernetes configuration, objects such as PodSecurityPolicy, Deployments, Pods, ServiceAccounts, Secrets, ... pretty much everything.
Make sure you have saved snapshot.db somewhere safe as you will need to restore your Kubernetes from that file.
curl -L https://github.com/coreos/etcd/releases/download/v3.1.11/etcd-v3.1.11-linux-amd64.tar.gz -o /tmp/etcd-v3.1.11-linux-amd64.tar.gz
tar xvf /tmp/etcd-v3.1.11-linux-amd64.tar.gz 'etcd-v3.1.11-linux-amd64/etcdctl' --strip-components=1
mv etcdctl /usr/bin/
ETCDCTL_API=3 etcdctl --endpoints http://127.0.0.1:2379 snapshot save /tmp/snapshot.db
exit
scp k8s-master:/tmp/snapshot.db .
Now that you have backups of PKI & etcd, you can destroy and redeploy the k8s-master
server.
Restore Kubernetes PKI
Right before you will install Kubernetes on your new k8s-master
node, you need to restore previously backed up x509 certificates:
ssh-keygen -R k8s-master
ssh k8s-master "sudo mkdir /etc/kubernetes"
cat pki.tar.gz | ssh k8s-master "sudo tar xvzf - -C /etc/kubernetes"
Restore etcd
If you are using kubeadm
to deploy Kubernetes, then restore etcd right after kubeadm init
has been completed, otherwise it will abort complaining on that /var/lib/etcd
directory is not empty.
ssh-keygen -R k8s-master
scp snapshot.db k8s-master:/tmp/
ssh k8s-master
sudo -i
curl -L https://github.com/coreos/etcd/releases/download/v3.1.11/etcd-v3.1.11-linux-amd64.tar.gz -o /tmp/etcd-v3.1.11-linux-amd64.tar.gz
tar xvf /tmp/etcd-v3.1.11-linux-amd64.tar.gz 'etcd-v3.1.11-linux-amd64/etcdctl' --strip-components=1
mv etcdctl /usr/bin/
docker stop k8s_etcd_etcd-k8s-master_kube-system_d0ddbed1539cb679a70d43b61fa403c5_0
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore /tmp/snapshot.db \
--name k8s-master \
--initial-cluster k8s-master=http://127.0.0.1:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://127.0.0.1:2380 \
--data-dir=/var/lib/etcd
systemctl restart kubelet
Do not use
docker start
for starting the etcd dokcer container. It will get automatically started after you restart thekubelet
service.
If your snapshot was done with an older Kubernetes version and with this redeployment you have gotten a new version, you should use upgrade your etcd by using kubeadm upgrade
commands:
kubeadm upgrade plan
kubeadm upgrade apply vX.Y.Z
Caveats
kubeadm
might not initialize if the server gets different IP, e.g. 10.0.0.4 instead of 10.0.0.5 which was previously written to the Kubernetes CA certificate.
In this occasion, you will see the following error in the logs:
# journalctl -u kubelet -f
Mar 11 11:47:12 k8s-master kubelet[11330]: E0311 11:47:12.362418 11330 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:465: Failed to list *v1.Service: Get https://10.0.0.4:6443/api/v1/services?limit=500&resourceVersion=0: x509: certificate is valid for 10.96.0.1, 10.0.0.5, not 10.0.0.4
While it is possible for you to regenerate the x509 certificate, sign with it everything that is related to it again and distribute it across your environment, it is better if you do not go that way and instead just use the internal IP for the k8s-master
instance which was used before.
To set the previous IP address to your new instance, use the following commands:
openstack port create --network private1 --fixed-ip subnet=private1,ip-address=10.0.0.5 k8s-master-int-ip
openstack server create ... --port k8s-master-int-ip ... # or "--nic portid=k8s-master-int-ip" but without "--network private1" flag.