{"id":22764,"date":"2024-06-17T11:59:46","date_gmt":"2024-06-17T08:59:46","guid":{"rendered":"https:\/\/kifarunix.com\/?p=22764"},"modified":"2024-06-18T19:17:37","modified_gmt":"2024-06-18T16:17:37","slug":"disaster-recovery-in-kubernetes-etcd-backup-and-restore-with-etcdctl-and-etcdutl","status":"publish","type":"post","link":"https:\/\/kifarunix.com\/disaster-recovery-in-kubernetes-etcd-backup-and-restore-with-etcdctl-and-etcdutl\/","title":{"rendered":"Disaster Recovery in Kubernetes: etcd Backup and Restore with etcdctl and etcdutl"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1073\" height=\"602\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-etcd-cluster-backup-and-restore.png\" alt=\"backup and restore kubernetes etcd cluster\" class=\"wp-image-22893\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-etcd-cluster-backup-and-restore.png?v=1718614716 1073w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-etcd-cluster-backup-and-restore-768x431.png?v=1718614716 768w\" sizes=\"(max-width: 1073px) 100vw, 1073px\" \/><\/figure>\n\n\n\n<p>In this blog post, we will dive into Kubernetes disaster recovery strategies, backup and restore etcd, using <code>etcdctl<\/code> and <code>etcdutl<\/code> tools. Even the most robust Kubernetes clusters aren&#8217;t immune to accidents. Data loss or corruption can bring your applications to a screeching halt. That&#8217;s where backups come in \u2013 your safety net for disaster recovery. At the heart of Kubernetes is <strong>etcd<\/strong>, the central data store fthat holds all cluster data, including the state of nodes, pods, services, e.t.c. Hence, mastering the backup and restoration process of <code>etcd<\/code> data is a crucial skill for any Kubernetes administrators.<\/p>\n\n\n\n<div class=\"wp-block-rank-math-toc-block\" id=\"rank-math-toc\"><h2>Table of Contents<\/h2><nav><ul><li><a href=\"#backup-and-restore-kubernetes-etcd-with-etcdctl-and-etcdutl\">Backup and Restore Kubernetes etcd with etcdctl and etcdutl<\/a><ul><li><a href=\"#what-is-etcd-in-kubernetes-and-why-is-it-important\">What is etcd in Kubernetes and why is it important?<\/a><\/li><li><a href=\"#backup-etcd-cluster-with-etcdctl-command-line-tool\">Backup etcd cluster with etcdctl command line tool<\/a><ul><li><a href=\"#install-etcdctl-and-etcdutl-command-line-tools\">Install etcdctl and etcdutl command line tools<\/a><\/li><li><a href=\"#find-the-required-details-about-etcd-pods\">Find the Required Details about etcd Pods<\/a><\/li><li><a href=\"#backup-etcd-cluster-with-etcdctl\">Backup etcd cluster with etcdctl<\/a><\/li><li><a href=\"#check-status-of-the-etcd-snapshot-backup-file\">Check Status of the etcd Snapshot Backup File<\/a><\/li><\/ul><\/li><li><a href=\"#restoring-kubernetes-etcd-from-snapshot\">Restoring Kubernetes etcd from Snapshot<\/a><ul><li><a href=\"#restore-etcd-backup-in-a-single-node-control-plane-kubernetes-cluster\">Restore etcd Backup in a Single Node Control Plane Kubernetes Cluster<\/a><\/li><li><a href=\"#restore-etcd-in-a-stacked-etcd-kubernetes-ha-cluster\">Restore etcd in a Stacked etcd Kubernetes HA Cluster<\/a><\/li><\/ul><\/li><li><a href=\"#conclusion\">Conclusion<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"backup-and-restore-kubernetes-etcd-with-etcdctl-and-etcdutl\">Backup and Restore Kubernetes etcd with etcdctl and etcdutl<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-is-etcd-in-kubernetes-and-why-is-it-important\">What is etcd in Kubernetes and why is it important?<\/h3>\n\n\n\n<p><strong>etcd<\/strong> is a distributed, consistent key-value store used by Kubernetes to manage cluster state. It acts as the central data store for all the essential data that governs the state and configuration of your Kubernetes cluster.<\/p>\n\n\n\n<p>But, what does <strong>etcd<\/strong> store exactly in a Kubernetes cluster?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cluster state<\/strong>: Information about all pods, deployments, services, and other Kubernetes resources deployed in your cluster.<\/li>\n\n\n\n<li><strong>Desired state vs. Actual state<\/strong>: etcd stores both the desired state (as defined by your deployments) and the actual state (the current running state of your applications). This allows Kubernetes to maintain consistency and take corrective actions if there are discrepancies.<\/li>\n\n\n\n<li><strong>Configuration<\/strong>: etcd holds configuration data for various Kubernetes components, including the API server, scheduler, and controllers.<\/li>\n<\/ul>\n\n\n\n<p>Why <strong>etcd<\/strong> Important in Kubernetes Cluster?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single Source of Truth:<\/strong> etcd serves as a centralized location for all cluster data, ensuring consistency and simplifying cluster management.<\/li>\n\n\n\n<li><strong>Highly Available:<\/strong> Designed to be highly available, etcd can tolerate failures and maintain data integrity. This is crucial for ensuring your Kubernetes cluster remains operational even if individual etcd nodes experience issues.<\/li>\n\n\n\n<li><strong>Scalability:<\/strong> etcd can be scaled horizontally by adding more nodes to the cluster, allowing it to handle the growing demands of your deployments.<\/li>\n\n\n\n<li><strong>Centralized Configuration Management:<\/strong> With configuration data stored in etcd, changes can be made in a single location, simplifying cluster administration.<\/li>\n\n\n\n<li><strong>Scheduling and Coordination:<\/strong> etcd plays a vital role in scheduling tasks across worker nodes and coordinating various Kubernetes components, ensuring smooth operation of your applications.<\/li>\n<\/ul>\n\n\n\n<p>While there are more advanced tools for backing up and restoring Kubernetes cluster, this guide will focus on the basic tools, <strong>etcdctl<\/strong> and <strong>etcdutl<\/strong>.<\/p>\n\n\n\n<p>So, to backup and restore Kubernetes cluster etcd, follow through.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"backup-etcd-cluster-with-etcdctl-command-line-tool\">Backup etcd cluster with <strong>etcdctl<\/strong> command line tool<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"install-etcdctl-and-etcdutl-command-line-tools\">Install <strong>etcdctl<\/strong> and <strong>etcdutl<\/strong> command line tools<\/h4>\n\n\n\n<p>If <strong>etcdctl\/etcdutl<\/strong> tools are not already installed, you can install them by following the guides on the links below<\/p>\n\n\n\n<p><a href=\"https:\/\/kifarunix.com\/how-to-install-etcdctl-on-kubernetes-cluster\/#install-etcdctl-command-line-tool\">Install etcdctl command line tool on Kubernetes control plane<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/kifarunix.com\/how-to-install-etcdctl-on-kubernetes-cluster\/#install-etcdutl-command-line-tool\">Install etcdutl command line tool on Kubernetes control plane<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"find-the-required-details-about-etcd-pods\">Find the Required Details about <strong>etcd<\/strong> Pods<\/h4>\n\n\n\n<p>You will need to know the <strong>etcd<\/strong> pods endpoint uri as well as the required TLS certificates and keys for securing connection to <strong>etcd<\/strong> cluster.<\/p>\n\n\n\n<p>To begin with, list the <strong>etcd<\/strong> pods on the <strong>kube-system<\/strong> namespace;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n kube-system -l component=etcd<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME             READY   STATUS    RESTARTS        AGE\netcd-master-01   1\/1     Running   1 (2d10h ago)   2d12h\netcd-master-02   1\/1     Running   0               2d12h\netcd-master-03   1\/1     Running   0               2d12h\n<\/code><\/pre>\n\n\n\n<p>In a Kubernetes setup with a multi-node etcd cluster, it is important to understand that etcd uses a distributed architecture where data is replicated across all nodes in the cluster. This ensures high availability and redundancy. When it comes to taking snapshots for backups, the snapshot can be taken from any one of the healthy etcd nodes, as the data will be consistent across the cluster.<\/p>\n\n\n\n<p>So, let&#8217;s get the details of one of the pods, say <strong>etcd-master-01<\/strong>;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl describe pod etcd-master-01 -n kube-system<\/code><\/pre>\n\n\n\n<p>Sample output;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>Name:                 etcd-master-01\nNamespace:            kube-system\nPriority:             2000001000\nPriority Class Name:  system-node-critical\nNode:                 master-01\/192.168.122.58\nStart Time:           Sat, 08 Jun 2024 07:44:21 +0000\nLabels:               component=etcd\n                      tier=control-plane\nAnnotations:          kubeadm.kubernetes.io\/etcd.advertise-client-urls: https:\/\/192.168.122.58:2379\n                      kubernetes.io\/config.hash: af3280d5bddb0c05b28b8bdde858c3e6\n                      kubernetes.io\/config.mirror: af3280d5bddb0c05b28b8bdde858c3e6\n                      kubernetes.io\/config.seen: 2024-06-08T05:30:06.933599803Z\n                      kubernetes.io\/config.source: file\nStatus:               Running\nSeccompProfile:       RuntimeDefault\nIP:                   192.168.122.58\nIPs:\n  IP:           192.168.122.58\nControlled By:  Node\/master-01\nContainers:\n  etcd:\n    Container ID:  containerd:\/\/521ed0ce231aedfa7479f8a73bc0761e0468fd5d3398682043e05d3802eb9239\n    Image:         registry.k8s.io\/etcd:3.5.12-0\n    Image ID:      registry.k8s.io\/etcd@sha256:44a8e24dcbba3470ee1fee21d5e88d128c936e9b55d4bc51fbef8086f8ed123b\n    Port:          <none>\n    Host Port:     <none>\n    Command:\n      etcd\n      --advertise-client-urls=https:\/\/192.168.122.58:2379\n      --cert-file=\/etc\/kubernetes\/pki\/etcd\/server.crt\n      --client-cert-auth=true\n      --data-dir=\/var\/lib\/etcd\n      --experimental-initial-corrupt-check=true\n      --experimental-watch-progress-notify-interval=5s\n      --initial-advertise-peer-urls=https:\/\/192.168.122.58:2380\n      --initial-cluster=master-01=https:\/\/192.168.122.58:2380\n      --key-file=\/etc\/kubernetes\/pki\/etcd\/server.key\n      --listen-client-urls=https:\/\/127.0.0.1:2379,https:\/\/192.168.122.58:2379\n      --listen-metrics-urls=http:\/\/127.0.0.1:2381\n      --listen-peer-urls=https:\/\/192.168.122.58:2380\n      --name=master-01\n      --peer-cert-file=\/etc\/kubernetes\/pki\/etcd\/peer.crt\n      --peer-client-cert-auth=true\n      --peer-key-file=\/etc\/kubernetes\/pki\/etcd\/peer.key\n      --peer-trusted-ca-file=\/etc\/kubernetes\/pki\/etcd\/ca.crt\n      --snapshot-count=10000\n      --trusted-ca-file=\/etc\/kubernetes\/pki\/etcd\/ca.crt\n    State:          Running\n      Started:      Sat, 08 Jun 2024 07:44:22 +0000\n    Last State:     Terminated\n      Reason:       Unknown\n      Exit Code:    255\n      Started:      Sat, 08 Jun 2024 05:30:02 +0000\n      Finished:     Sat, 08 Jun 2024 07:44:20 +0000\n    Ready:          True\n    Restart Count:  1\n    Requests:\n      cpu:        100m\n      memory:     100Mi\n    Liveness:     http-get http:\/\/127.0.0.1:2381\/health%3Fexclude=NOSPACE&serializable=true delay=10s timeout=15s period=10s #success=1 #failure=8\n    Startup:      http-get http:\/\/127.0.0.1:2381\/health%3Fserializable=false delay=10s timeout=15s period=10s #success=1 #failure=24\n    Environment:  <none>\n    Mounts:\n      \/etc\/kubernetes\/pki\/etcd from etcd-certs (rw)\n      \/var\/lib\/etcd from etcd-data (rw)\nConditions:\n  Type                        Status\n  PodReadyToStartContainers   True \n  Initialized                 True \n  Ready                       True \n  ContainersReady             True \n  PodScheduled                True \nVolumes:\n  etcd-certs:\n    Type:          HostPath (bare host directory volume)\n    Path:          \/etc\/kubernetes\/pki\/etcd\n    HostPathType:  DirectoryOrCreate\n  etcd-data:\n    Type:          HostPath (bare host directory volume)\n    Path:          \/var\/lib\/etcd\n    HostPathType:  DirectoryOrCreate\nQoS Class:         Burstable\nNode-Selectors:    <none>\nTolerations:       :NoExecute op=Exists\nEvents:            <none>\n<\/code><\/pre>\n\n\n\n<p>So, what we need is:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>listen-client-urls<\/strong>: https:\/\/127.0.0.1:2379,https:\/\/192.168.122.58:2379<\/li>\n\n\n\n<li><strong>cert-file<\/strong>: \/etc\/kubernetes\/pki\/etcd\/server.crt<\/li>\n\n\n\n<li><strong>key-file<\/strong>: \/etc\/kubernetes\/pki\/etcd\/server.key<\/li>\n\n\n\n<li><strong>trusted-ca-file<\/strong>: \/etc\/kubernetes\/pki\/etcd\/ca.crt<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"backup-etcd-cluster-with-etcdctl\">Backup etcd cluster with <strong>etcdctl<\/strong><\/h4>\n\n\n\n<p><strong>etcdctl<\/strong> can be used to create a backup of the <strong>etcd<\/strong> cluster by taking a <strong>snapshot<\/strong> of the current state of the cluster. The snapshot captures the data and metadata of the entire etcd cluster at a specific point in time and are typically stored as binary files.<\/p>\n\n\n\n<p>Thus, to create a snapshot of the  current <strong>etcd<\/strong> cluster state, use the <strong>etcdctl<\/strong> command as follows;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdctl snapshot save &lt;filename&gt; &#91;flags]<\/code><\/pre>\n\n\n\n<p>You can get required flags from the help page;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdctl snapshot save --help<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME:\n\tsnapshot save - Stores an etcd node backend snapshot to a given file\n\nUSAGE:\n\tetcdctl snapshot save <filename> [flags]\n\nOPTIONS:\n  -h, --help[=false]\thelp for save\n\nGLOBAL OPTIONS:\n      --cacert=\"\"\t\t\t\tverify certificates of TLS-enabled secure servers using this CA bundle\n      --cert=\"\"\t\t\t\t\tidentify secure client using this TLS certificate file\n      --command-timeout=5s\t\t\ttimeout for short running command (excluding dial timeout)\n      --debug[=false]\t\t\t\tenable client-side debug logging\n      --dial-timeout=2s\t\t\t\tdial timeout for client connections\n  -d, --discovery-srv=\"\"\t\t\tdomain name to query for SRV records describing cluster endpoints\n      --discovery-srv-name=\"\"\t\t\tservice name to query when using DNS discovery\n      --endpoints=[127.0.0.1:2379]\t\tgRPC endpoints\n      --hex[=false]\t\t\t\tprint byte strings as hex encoded strings\n      --insecure-discovery[=true]\t\taccept insecure SRV records describing cluster endpoints\n      --insecure-skip-tls-verify[=false]\tskip server certificate verification (CAUTION: this option should be enabled only for testing purposes)\n      --insecure-transport[=true]\t\tdisable transport security for client connections\n      --keepalive-time=2s\t\t\tkeepalive time for client connections\n      --keepalive-timeout=6s\t\t\tkeepalive timeout for client connections\n      --key=\"\"\t\t\t\t\tidentify secure client using this TLS key file\n      --password=\"\"\t\t\t\tpassword for authentication (if this option is used, --user option shouldn't include password)\n      --user=\"\"\t\t\t\t\tusername[:password] for authentication (prompt if password is not supplied)\n  -w, --write-out=\"simple\"\t\t\tset the output format (fields, json, protobuf, simple, table)\n<\/code><\/pre>\n\n\n\n<p>Note that from etcd v3.4, it is not really necessary to prefix the <strong>etcdctl<\/strong>\/<strong>etcdutl<\/strong> commands with API version, <strong>ETCDCTL_API=3<\/strong>.<\/p>\n\n\n\n<p>Now, that we have the required details to backup etcd cluster, proceed as follows;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>sudo etcdctl snapshot save \/mnt\/backups\/snapshot_v1-`date +%FT%T`.db \\\n\t --cacert=\/etc\/kubernetes\/pki\/etcd\/ca.crt \\\n\t --cert=\/etc\/kubernetes\/pki\/etcd\/server.crt \\\n\t --key=\/etc\/kubernetes\/pki\/etcd\/server.key\n<\/code><\/pre>\n\n\n\n<p>If you don&#8217;t specify the endpoint, localhost port 2379 will be used (https:\/\/127.0.0.1:2379).<\/p>\n\n\n\n<p>This command will create an etcd snapshot file like, <strong>snapshot_v1-2024-06-10T21:18:51.db<\/strong> under the <strong>\/mnt\/backups<\/strong> directory. This directory must exist before running the command.<\/p>\n\n\n\n<p>Sample output;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.050843Z\",\"caller\":\"snapshot\/v3_snapshot.go:65\",\"msg\":\"created temporary db file\",\"path\":\"\/mnt\/backups\/snapshot_v1-2024-06-10T18:23:08.db.part\"}\n{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.060964Z\",\"logger\":\"client\",\"caller\":\"v3@v3.5.12\/maintenance.go:212\",\"msg\":\"opened snapshot stream; downloading\"}\n{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.061344Z\",\"caller\":\"snapshot\/v3_snapshot.go:73\",\"msg\":\"fetching snapshot\",\"endpoint\":\"127.0.0.1:2379\"}\n{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.127634Z\",\"logger\":\"client\",\"caller\":\"v3@v3.5.12\/maintenance.go:220\",\"msg\":\"completed snapshot read; closing\"}\n{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.139728Z\",\"caller\":\"snapshot\/v3_snapshot.go:88\",\"msg\":\"fetched snapshot\",\"endpoint\":\"127.0.0.1:2379\",\"size\":\"8.0 MB\",\"took\":\"now\"}\n{\"level\":\"info\",\"ts\":\"2024-06-10T18:23:08.139803Z\",\"caller\":\"snapshot\/v3_snapshot.go:97\",\"msg\":\"saved\",\"path\":\"\/mnt\/backups\/snapshot_v1-2024-06-10T18:23:08.db\"}\nSnapshot saved at \/mnt\/backups\/snapshot_v1-2024-06-10T18:23:08.db\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"check-status-of-the-etcd-snapshot-backup-file\">Check Status of the etcd Snapshot Backup File<\/h4>\n\n\n\n<p>You can use the command, <strong>etcdctl snapshot status<\/strong> command to check the status of the snapshot file. However, the use of etcdctl snapshot status command is deprecated and will be removed in etcd v3.6. Thus, you can use <strong>etcdutl snapshot status<\/strong> instead.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdutl snapshot status &lt;filename&gt; &#91;flags]<\/code><\/pre>\n\n\n\n<p>For example;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo etcdutl snapshot status \/mnt\/backups\/snapshot_v1-2024-06-10T18:23:08.db<\/code><\/pre>\n\n\n\n<p>Or<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo etcdutl snapshot status \/mnt\/backups\/snapshot_v1-2024-06-10T18:23:08.db -w table<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>+----------+----------+------------+------------+\n|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |\n+----------+----------+------------+------------+\n| 545378d6 |   593444 |       1352 |     8.0 MB |\n+----------+----------+------------+------------+\n<\/code><\/pre>\n\n\n\n<div class=\"info-panel\">\n    <div class=\"info-panel-header\">Info\n    <\/div>\n    <div class=\"info-panel-content\">Security is paramount. Store your backups securely, ideally in an offsite location in an encrypted format, to prevent them from being affected by the same incident that disrupts your cluster.<br>\n\nSimilarly, consider encrypting your snapshot backup files at rest!\n    <\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"restoring-kubernetes-etcd-from-snapshot\">Restoring Kubernetes etcd from Snapshot<\/h3>\n\n\n\n<p>In case a disaster strikes, and you need to restore your cluster, you can use <strong>etcdutl snapshot restore<\/strong> command.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdutl snapshot restore &lt;filename&gt; --data-dir {output dir} &#91;options] &#91;flags]<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdutl snapshot restore --help<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>Usage:\n  etcdutl snapshot restore <filename> --data-dir {output dir} [options] [flags]\n\nFlags:\n      --bump-revision uint                   How much to increase the latest revision after restore\n      --data-dir string                      Path to the output data directory\n  -h, --help                                 help for restore\n      --initial-advertise-peer-urls string   List of this member's peer URLs to advertise to the rest of the cluster (default \"http:\/\/localhost:2380\")\n      --initial-cluster string               Initial cluster configuration for restore bootstrap (default \"default=http:\/\/localhost:2380\")\n      --initial-cluster-token string         Initial cluster token for the etcd cluster during restore bootstrap (default \"etcd-cluster\")\n      --mark-compacted                       Mark the latest revision after restore as the point of scheduled compaction (required if --bump-revision > 0, disallowed otherwise)\n      --name string                          Human-readable name for this member (default \"default\")\n      --skip-hash-check                      Ignore snapshot integrity hash value (required if copied from data directory)\n      --wal-dir string                       Path to the WAL directory (use --data-dir if none given)\n\nGlobal Flags:\n  -w, --write-out string   set the output format (fields, json, protobuf, simple, table) (default \"simple\")\n<\/code><\/pre>\n\n\n\n<p>So, how do we confirm that a restore works as expected? Let&#8217;s take for example, I have an Nginx app running on the <strong>apps<\/strong> namespace;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods,svc,configmaps,deployment -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                             READY   STATUS    RESTARTS   AGE\npod\/nginx-app-6ff7b5d8f6-7mldr   1\/1     Running   0          6m43s\n\nNAME                TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE\nservice\/nginx-app   NodePort   10.103.175.198   <none>        80:32189\/TCP   6m28s\n\nNAME                         DATA   AGE\nconfigmap\/html-page          1      9m4s\nconfigmap\/kube-root-ca.crt   1      9m16s\n\nNAME                        READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\/nginx-app   1\/1     1            1           8m55s\n<\/code><\/pre>\n\n\n\n<p>The Nginx is exposed on port <strong>32189<\/strong> on the cluster;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get svc nginx-app -n apps -o wide<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE     SELECTOR\nnginx-app   NodePort   10.103.175.198   <none>        80:32189\/TCP   8m46s   app=nginx-app\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n apps -o wide --selector=app=nginx-app<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>\nNAME                         READY   STATUS    RESTARTS   AGE     IP               NODE        NOMINATED NODE   READINESS GATES\nnginx-app-6ff7b5d8f6-7mldr   1\/1     Running   0          9m33s   10.100.202.199   worker-03   &lt;none&gt;           &lt;none&gt;\n<\/code><\/pre>\n\n\n\n<p>And this is how my apps looks like on browser;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1477\" height=\"577\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app.png?v=1718051887\" alt=\"\" class=\"wp-image-22776\" style=\"width:820px;height:auto\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app.png?v=1718051887 1477w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app-768x300.png?v=1718051887 768w\" sizes=\"(max-width: 1477px) 100vw, 1477px\" \/><\/figure>\n\n\n\n<p>So, I have already taken backup of our Kubernetes cluster etcd.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo etcdutl snapshot status \/mnt\/backups\/snapshot_v4-2024-06-11T10:37:24.db -w table<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>+---------+----------+------------+------------+\n|  HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |\n+---------+----------+------------+------------+\n| 78eacca |   680597 |        949 |     8.0 MB |\n+---------+----------+------------+------------+\n<\/code><\/pre>\n\n\n\n<p>So, let&#8217;s delete the whole namespace<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl delete ns apps<\/code><\/pre>\n\n\n\n<p>Confirm;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get ns<\/code><\/pre>\n\n\n\n<p>If you try access the web server;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1603\" height=\"645\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/delete-nginx-app-service-k8s.png?v=1718052343\" alt=\"\" class=\"wp-image-22779\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/delete-nginx-app-service-k8s.png?v=1718052343 1603w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/delete-nginx-app-service-k8s-768x309.png?v=1718052343 768w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/delete-nginx-app-service-k8s-1536x618.png?v=1718052343 1536w\" sizes=\"(max-width: 1603px) 100vw, 1603px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"restore-etcd-backup-in-a-single-node-control-plane-kubernetes-cluster\">Restore etcd Backup in a Single Node Control Plane Kubernetes Cluster<\/h4>\n\n\n\n<p>If you have only a single control plane node in your Kubernetes cluster with a single <strong>etcd<\/strong> data store, then the restoration is as follows.<\/p>\n\n\n\n<p>Ensure you have worked around this <a href=\"https:\/\/kifarunix.com\/kubernetes-nodes-maintenance-drain-vs-cordon-demystified\/#kubectl-drain-node-gets-stuck-forever-apparmor-bug\" target=\"_blank\" rel=\"noreferrer noopener\">apparmor bug<\/a> that prevents pods from terminating.<\/p>\n\n\n\n<p>Next, stop the API server. Kubernetes API server is running as a static Pod. Therefore, you can simply &#8220;<strong>remove<\/strong>&#8221; the static pods manifests file directory, <strong>\/etc\/kubernetes\/manifests<\/strong>;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/etc\/kubernetes\/manifests{,.01}<\/code><\/pre>\n\n\n\n<p>The static pods will now be shut down.<\/p>\n\n\n\n<p>You can check with the command below;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo crictl -r unix:\/\/\/run\/containerd\/containerd.sock ps<\/code><\/pre>\n\n\n\n<p>Ensure that all the core kubernetes pod containers are not running before you can proceed.<\/p>\n\n\n\n<p>Stop kubelet<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo systemctl stop kubelet<\/code><\/pre>\n\n\n\n<p>Next, remove the initial etcd data directory, <strong>\/var\/lib\/etcd<\/strong>. If you want to keep the original data directory, you can restore the snapshot to a different directory.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/var\/lib\/etcd{,.01}<\/code><\/pre>\n\n\n\n<p>Restore etcd from the backup snapshot;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo etcdutl snapshot restore \/mnt\/backups\/snapshot_v4-2024-06-11T10:37:24.db --data-dir \/var\/lib\/etcd<\/code><\/pre>\n\n\n\n<p>Sample output;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>2024-06-11T10:48:42Z\tinfo\tsnapshot\/v3_snapshot.go:260\trestoring snapshot\t{\"path\": \"\/mnt\/backups\/snapshot_v4-2024-06-11T10:37:24.db\", \"wal-dir\": \"\/var\/lib\/etcd\/member\/wal\", \"data-dir\": \"\/var\/lib\/etcd\/\", \"snap-dir\": \"\/var\/lib\/etcd\/member\/snap\"}\n2024-06-11T10:48:42Z\tinfo\tmembership\/store.go:141\tTrimming membership information from the backend...\n2024-06-11T10:48:42Z\tinfo\tmembership\/cluster.go:421\tadded member\t{\"cluster-id\": \"cdf818194e3a8c32\", \"local-member-id\": \"0\", \"added-peer-id\": \"8e9e05c52164694d\", \"added-peer-peer-urls\": [\"http:\/\/localhost:2380\"]}\n2024-06-11T10:48:42Z\tinfo\tsnapshot\/v3_snapshot.go:287\trestored snapshot\t{\"path\": \"\/mnt\/backups\/snapshot_v4-2024-06-11T10:37:24.db\", \"wal-dir\": \"\/var\/lib\/etcd\/member\/wal\", \"data-dir\": \"\/var\/lib\/etcd\/\", \"snap-dir\": \"\/var\/lib\/etcd\/member\/snap\"}\n<\/code><\/pre>\n\n\n\n<p>We have restored the data to the default etcd data dir, <strong>\/var\/lib\/etcd<\/strong>.<\/p>\n\n\n\n<p>You can now move the static pods manifests to the right place.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/etc\/kubernetes\/manifests{.01,}<\/code><\/pre>\n\n\n\n<p>If you restored etcd data to a different data directory, you will have to update the default data directory path in the etcd deployment manifest file, <strong>\/etc\/kubernetes\/manifests\/etcd.yaml<\/strong>. Replace <strong>\/var\/lib\/etcd<\/strong> with your current restore data path. Save and exit the file.<\/p>\n\n\n\n<p>Start kubelet service.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo systemctl start kubelet<\/code><\/pre>\n\n\n\n<p>The Pods should now be coming up.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n kube-system<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                                READY   STATUS    RESTARTS        AGE\ncoredns-7db6d8ff4d-8859g            1\/1     Running   0               66m\ncoredns-7db6d8ff4d-lkhgp            1\/1     Running   0               66m\netcd-master-01                      1\/1     Running   0               67m\nkube-apiserver-master-01            1\/1     Running   0               67m\nkube-controller-manager-master-01   1\/1     Running   0               67m\nkube-proxy-6vwk7                    1\/1     Running   7 (4m37s ago)   60m\nkube-proxy-chrxf                    1\/1     Running   10 (96s ago)    60m\nkube-proxy-ttdc6                    1\/1     Running   0               66m\nkube-proxy-wvvns                    1\/1     Running   9 (2m16s ago)   60m\nkube-scheduler-master-01            1\/1     Running   0               67m\n<\/code><\/pre>\n\n\n\n<p>You can watch the events unfold!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n kube-system -w<\/code><\/pre>\n\n\n\n<p>Confirm it after recreation that your namespace is available;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get ns<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME               STATUS   AGE\napps               Active   60m\ncalico-apiserver   Active   66m\ncalico-system      Active   66m\ndefault            Active   68m\nkube-node-lease    Active   68m\nkube-public        Active   68m\nkube-system        Active   68m\ntigera-operator    Active   67m\n<\/code><\/pre>\n\n\n\n<p>So, we have our namespace, <strong>apps<\/strong>. Check the all resources in the namespace;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get all -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                             READY   STATUS    RESTARTS      AGE\npod\/nginx-app-6ff7b5d8f6-msvld   1\/1     Running   3 (55m ago)   62m\n\nNAME                TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE\nservice\/nginx-app   NodePort   10.109.243.106   <none>        80:32189\/TCP   62m\n\nNAME                        READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\/nginx-app   1\/1     1            1           62m\n\nNAME                                   DESIRED   CURRENT   READY   AGE\nreplicaset.apps\/nginx-app-676b458f4f   0         0         0       62m\nreplicaset.apps\/nginx-app-6ff7b5d8f6   1         1         1       62m\n<\/code><\/pre>\n\n\n\n<p>Confirm the nodes are ready to handle the workload;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get nodes<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        STATUS   ROLES           AGE   VERSION\nmaster-01   Ready    control-plane   75m   v1.30.1\nworker-01   Ready    <none>          68m   v1.30.1\nworker-02   Ready    <none>          68m   v1.30.1\nworker-03   Ready    <none>          68m   v1.30.1\n<\/code><\/pre>\n\n\n\n<p>You should now be able to access your web server the way it was before the disaster.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"restore-etcd-in-a-stacked-etcd-kubernetes-ha-cluster\">Restore etcd in a Stacked etcd Kubernetes HA Cluster<\/h4>\n\n\n\n<p>If you are running a stacked etcd HA cluster and you have taken a snapshot as shown above, then you can restore it in case of unforeseen disaster as follows.<\/p>\n\n\n\n<p>As usual, you need to ensure that your backup is secure and not corrupted!<\/p>\n\n\n\n<p>In my setup, I have 3 control planes and 3 worker nodes.<\/p>\n\n\n\n<p><a href=\"https:\/\/kifarunix.com\/setup-highly-available-kubernetes-cluster-with-haproxy-and-keepalived\/\" target=\"_blank\" rel=\"noreferrer noopener\">Setup Highly Available Kubernetes Cluster with Haproxy and Keepalived<\/a><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get nodes<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        STATUS   ROLES           AGE   VERSION\nmaster-01   Ready    control-plane   17h   v1.30.1\nmaster-02   Ready    control-plane   17h   v1.30.1\nmaster-03   Ready    control-plane   17h   v1.30.1\nworker-01   Ready    <none>          16h   v1.30.1\nworker-02   Ready    <none>          16h   v1.30.1\nworker-03   Ready    <none>          33m   v1.30.1\n<\/code><\/pre>\n\n\n\n<p>The procedure to restore etcd backup in stacked etcd HA cluster is more less similar to the procedure used above.<\/p>\n\n\n\n<p>To begin with, <strong>ensure the snapshots are stored well and the integrity checks well!<\/strong><\/p>\n\n\n\n<p>We have our snapshot file under <strong>\/mnt\/backups\/snapshot_v2.db<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo etcdutl snapshot status \/mnt\/backups\/snapshot_v2.db  -w table<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>+----------+----------+------------+------------+\n|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |\n+----------+----------+------------+------------+\n| 993414e1 |    23609 |       2443 |      10 MB |\n+----------+----------+------------+------------+\n<\/code><\/pre>\n\n\n\n<p>When snapshot was taken, we had a sample Nginx app running under the <strong>apps<\/strong> namespace.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get all -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                             READY   STATUS    RESTARTS   AGE\npod\/nginx-app-6ff7b5d8f6-9qhlh   1\/1     Running   0          15h\n\nNAME                TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE\nservice\/nginx-app   NodePort   10.98.82.32   <none>        80:31501\/TCP   15h\n\nNAME                        READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\/nginx-app   1\/1     1            1           15h\n\nNAME                                   DESIRED   CURRENT   READY   AGE\nreplicaset.apps\/nginx-app-676b458f4f   0         0         0       15h\nreplicaset.apps\/nginx-app-6ff7b5d8f6   1         1         1       15h\n<\/code><\/pre>\n\n\n\n<p>Let&#8217;s delete the namespace and everything within it.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl delete ns apps<\/code><\/pre>\n\n\n\n<p>Confirm;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get ns<\/code><\/pre>\n\n\n\n<p>As you can see, there is no namespace named <strong>apps<\/strong>.<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>NAME               STATUS   AGE\ncalico-apiserver   Active   17h\ncalico-system      Active   17h\ndefault            Active   17h\nkube-node-lease    Active   17h\nkube-public        Active   17h\nkube-system        Active   17h\ntigera-operator    Active   17h\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get all -n apps<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>No resources found in apps namespace.<\/code><\/pre>\n\n\n\n<p>So, let&#8217;s try to restore the etcd snapshot in the Kubernetes HA cluster.<\/p>\n\n\n\n<p>Stop all Kubernetes core components (<strong>kube-apiserver, etcd, scheduler <\/strong>and<strong> controller-manager<\/strong>) on <strong>all control plane instances<\/strong>.<\/p>\n\n\n\n<p>But how can you stop these components? Well, you can shut them down by temporarily removing their respective manifest yaml files from <strong>\/etc\/kubernetes\/manifests<\/strong> directory.<\/p>\n\n\n\n<div class=\"info-panel\">\n    <div class=\"info-panel-header\">Info\n    <\/div>\n    <div class=\"info-panel-content\">Kubernetes runs the core components as static pods. Static pods are managed directly via Kubelet daemon using the manifests YAML files stored in \/etc\/kubernetes\/manifests\/ directory. Each component has a respective manifest file for managing with it without the need for the API server.<br>\nKubelet continuously monitors the contents of \/etc\/kubernetes\/manifests\/. If it detects any changes (additions, modifications, or deletions) to the pod manifests within this directory, it automatically manages the corresponding pods&#8217; lifecycle on the node.<br>\nTherefore, to stop these components, you can move the manifests files outside the \/etc\/kubernetes\/manifests\/ directory to other location.\n    <\/div>\n<\/div>\n\n\n\n<p>State of the Pods containers on one of the control plane nodes before removing the manifests files;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo crictl -r unix:\/\/\/run\/containerd\/containerd.sock ps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>CONTAINER           IMAGE               CREATED             STATE               NAME                        ATTEMPT             POD ID              POD\n8345f5a431daf       7820c83aa1394       32 seconds ago      Running             kube-scheduler              1                   6988e08d0e41e       kube-scheduler-master-01\neae8fa68f43ad       e874818b3caac       37 seconds ago      Running             kube-controller-manager     1                   c8866eb2013fc       kube-controller-manager-master-01\n080e2fdbfe0fc       6c07591fd1cfa       26 minutes ago      Running             calico-apiserver            0                   b9d58e1e01a1c       calico-apiserver-6d69f8d89f-jcplx\ne08b9afc6d710       0f80feca743f4       26 minutes ago      Running             csi-node-driver-registrar   0                   2fda38a106385       csi-node-driver-bv7cs\n5514f4bbe2bdc       1a094aeaf1521       26 minutes ago      Running             calico-csi                  0                   2fda38a106385       csi-node-driver-bv7cs\nbac154fa9b086       4e42b6f329bc1       26 minutes ago      Running             calico-node                 0                   d79ecf4326e92       calico-node-zzsqs\ne4238345bfbbd       a9372c0f51b54       27 minutes ago      Running             calico-typha                0                   288729a1f918b       calico-typha-f64f84658-5zsst\n8410ce8bb7647       53c535741fb44       30 minutes ago      Running             kube-proxy                  0                   098fea76f0dca       kube-proxy-4c699\n46baba197fc08       3861cfcd7c04c       31 minutes ago      Running             etcd                        0                   fcadbd893b13c       etcd-master-01\n5b7e68be6694d       56ce0fd9fb532       31 minutes ago      Running             kube-apiserver              0                   9e438a9bf57ff       kube-apiserver-master-01\n<\/code><\/pre>\n\n\n\n<p>So, on all control plane nodes, let&#8217;s remove the core components manifests files.<\/p>\n\n\n\n<div class=\"info-panel\">\n    <div class=\"info-panel-header\">Note\n    <\/div>\n    <div class=\"info-panel-content\">You can start this on the etcd cluster follower nodes and finally on leader node<\/br>\nYou can find which node is the leader or follower using the command below.<\/br>\n<code>sudo etcdctl endpoint status --cacert=\/etc\/kubernetes\/pki\/etcd\/ca.crt \\\n \t<ul>--cert=\/etc\/kubernetes\/pki\/etcd\/server.crt \\\n \t--key=\/etc\/kubernetes\/pki\/etcd\/server.key \\\n \t--cluster -w table<\/code><\/ul><\/div><\/div>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/etc\/kubernetes\/manifests \/mnt\/backups\/<\/code><\/pre>\n\n\n\n<p>After a while, all static pods containers are gone!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo crictl -r unix:\/\/\/run\/containerd\/containerd.sock ps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>CONTAINER           IMAGE               CREATED             STATE               NAME                        ATTEMPT             POD ID              POD\n080e2fdbfe0fc       6c07591fd1cfa       31 minutes ago      Running             calico-apiserver            0                   b9d58e1e01a1c       calico-apiserver-6d69f8d89f-jcplx\ne08b9afc6d710       0f80feca743f4       32 minutes ago      Running             csi-node-driver-registrar   0                   2fda38a106385       csi-node-driver-bv7cs\n5514f4bbe2bdc       1a094aeaf1521       32 minutes ago      Running             calico-csi                  0                   2fda38a106385       csi-node-driver-bv7cs\nbac154fa9b086       4e42b6f329bc1       32 minutes ago      Running             calico-node                 0                   d79ecf4326e92       calico-node-zzsqs\ne4238345bfbbd       a9372c0f51b54       32 minutes ago      Running             calico-typha                0                   288729a1f918b       calico-typha-f64f84658-5zsst\n8410ce8bb7647       53c535741fb44       36 minutes ago      Running             kube-proxy                  0                   098fea76f0dca       kube-proxy-4c699\n<\/code><\/pre>\n\n\n\n<p>Similarly, remove the <strong>etcd<\/strong> data directory on all the control plane nodes. This will allow us to restore the backup into the original data directory. You can however restore to a different path and update the same in the manifest file for <strong>etcd<\/strong> (<strong>etcd.yaml<\/strong>).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/var\/lib\/etcd \/mnt\/backups\/<\/code><\/pre>\n\n\n\n<p>As this point, you wont be able to access the cluster!<\/p>\n\n\n\n<p>Stop Kubelet on control plane nodes.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo systemctl stop kubelet<\/code><\/pre>\n\n\n\n<p>Ensure the snapshot file is accessible on all the control plane nodes. If you want, you can use shared storage for easy access to the snapshots files across the control plane nodes.<\/p>\n\n\n\n<p>For now, let&#8217;s just copy our snapshot from master-01 to other control plane nodes, master-02 and master-03.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for i in 02 03; do sudo rsync -avP \/mnt\/backups\/snapshot_v3.db root@master-$i:\/mnt\/backups\/; done<\/code><\/pre>\n\n\n\n<p>Restore the snapshot on all the control plane nodes using <strong>etcdutl<\/strong> command. Rember you can still use <strong>etcdctl<\/strong> even though it is already deprecated.<\/p>\n\n\n\n<p>See <a href=\"https:\/\/kifarunix.com\/how-to-install-etcdctl-on-kubernetes-cluster\/#install-etcdutl-command-line-tool\" target=\"_blank\" rel=\"noreferrer noopener\">how to install etcdutl command line tool<\/a>.<\/p>\n\n\n\n<p>On all control plane restore etcd snapshot. We will use <strong>etcdutl<\/strong> command which is set to replace <strong>etcdctl<\/strong> for restoration.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>etcdutl snapshot restore --help<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>Usage:\n  etcdutl snapshot restore <filename> --data-dir {output dir} [options] [flags]\n\nFlags:\n      --bump-revision uint                   How much to increase the latest revision after restore\n      --data-dir string                      Path to the output data directory\n  -h, --help                                 help for restore\n      --initial-advertise-peer-urls string   List of this member's peer URLs to advertise to the rest of the cluster (default \"http:\/\/localhost:2380\")\n      --initial-cluster string               Initial cluster configuration for restore bootstrap (default \"default=http:\/\/localhost:2380\")\n      --initial-cluster-token string         Initial cluster token for the etcd cluster during restore bootstrap (default \"etcd-cluster\")\n      --mark-compacted                       Mark the latest revision after restore as the point of scheduled compaction (required if --bump-revision > 0, disallowed otherwise)\n      --name string                          Human-readable name for this member (default \"default\")\n      --skip-hash-check                      Ignore snapshot integrity hash value (required if copied from data directory)\n      --wal-dir string                       Path to the WAL directory (use --data-dir if none given)\n\nGlobal Flags:\n  -w, --write-out string   set the output format (fields, json, protobuf, simple, table) (default \"simple\")\n<\/code><\/pre>\n\n\n\n<p>As already mentioned, we have three control plane nodes (master-01,02,03)<\/p>\n\n\n\n<p>The value of <strong>initial-cluster<\/strong>, <strong>name<\/strong>, <strong>initial-advertise-peer-urls<\/strong> MUST match what is defined on each control plane <strong>etcd.yaml<\/strong> manifest file.<\/p>\n\n\n\n<p>On first control plane node (master-01)<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>etcdutl snapshot restore \/mnt\/backups\/snapshot_v3.db \\\n\t--name master-01 \\\n\t--initial-cluster master-01=https:\/\/192.168.122.58:2380,master-02=https:\/\/192.168.122.59:2380,master-03=https:\/\/192.168.122.60:2380 \\\n\t--data-dir \/var\/lib\/etcd \\\n\t--initial-advertise-peer-urls https:\/\/192.168.122.58:2380\n<\/code><\/pre>\n\n\n\n<p>Second control plane (master-02)<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>etcdutl snapshot restore \/mnt\/backups\/snapshot_v3.db \\\n\t--name master-02 \\\n\t--initial-cluster master-01=https:\/\/192.168.122.58:2380,master-02=https:\/\/192.168.122.59:2380,master-03=https:\/\/192.168.122.60:2380 \\\n\t--data-dir \/var\/lib\/etcd \\\n\t--initial-advertise-peer-urls https:\/\/192.168.122.59:2380\n<\/code><\/pre>\n\n\n\n<p>Next control plane (master-03)<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>etcdutl snapshot restore \/mnt\/backups\/snapshot_v3.db \\\n\t--name master-03 \\\n\t--initial-cluster master-01=https:\/\/192.168.122.58:2380,master-02=https:\/\/192.168.122.59:2380,master-03=https:\/\/192.168.122.60:2380 \\\n\t--data-dir \/var\/lib\/etcd \\\n\t--initial-advertise-peer-urls https:\/\/192.168.122.60:2380\n<\/code><\/pre>\n\n\n\n<p>Next, move back the Kubernetes core services manifests YAML files into the right directory on all control plane nodes;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo mv \/mnt\/backups\/manifests \/etc\/kubernetes\/<\/code><\/pre>\n\n\n\n<p>Start Kubelet<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sudo systemctl start kubelet<\/code><\/pre>\n\n\n\n<p>The Kubernets core components pods containers should be coming up now;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>crictl -r unix:\/\/\/run\/containerd\/containerd.sock ps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>CONTAINER           IMAGE               CREATED                  STATE               NAME                        ATTEMPT             POD ID              POD\n3ee7db5199d21       3861cfcd7c04c       Less than a second ago   Running             etcd                        0                   0ac470d386f96       etcd-master-03\n685d11faf98cf       e874818b3caac       Less than a second ago   Running             kube-controller-manager     0                   97e8a01818631       kube-controller-manager-master-03\nb0f70b7246d87       7820c83aa1394       Less than a second ago   Running             kube-scheduler              0                   f829c946d19ca       kube-scheduler-master-03\nf1bcd51cc8269       56ce0fd9fb532       Less than a second ago   Running             kube-apiserver              0                   4390ce5db8a47       kube-apiserver-master-03\n65ba3c8a76bfc       0f80feca743f4       51 minutes ago           Running             csi-node-driver-registrar   1                   c1539a2211bf5       csi-node-driver-dql9n\n7c0fd95753436       1a094aeaf1521       51 minutes ago           Running             calico-csi                  1                   c1539a2211bf5       csi-node-driver-dql9n\nd540ad32d70f7       cbb01a7bd410d       51 minutes ago           Running             coredns                     1                   396cca65c3afa       coredns-7db6d8ff4d-4xb99\n8e9ba7a50b35e       428d92b022539       51 minutes ago           Running             calico-kube-controllers     1                   b7a8e1b6ec67b       calico-kube-controllers-584469688f-7lb9z\nb87f929f56aad       cbb01a7bd410d       51 minutes ago           Running             coredns                     1                   52c20065e9cdd       coredns-7db6d8ff4d-48x7h\n767cf9f1a2aba       4e42b6f329bc1       52 minutes ago           Running             calico-node                 1                   76e48bfd7fd37       calico-node-9jcv6\n2322ab0046f40       53c535741fb44       52 minutes ago           Running             kube-proxy                  1                   3a4dcd0a0f6f0       kube-proxy-fdz5w\n<\/code><\/pre>\n\n\n\n<p>Cluster should be healthy;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>sudo etcdctl endpoint status --cacert=\/etc\/kubernetes\/pki\/etcd\/ca.crt \\\n--cert=\/etc\/kubernetes\/pki\/etcd\/server.crt \\\n--key=\/etc\/kubernetes\/pki\/etcd\/server.key \\\n--cluster -w table\n<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |\n+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n| https:\/\/192.168.122.60:2379 | 71adc069c6babb75 |  3.5.12 |   13 MB |     false |      false |         2 |        992 |                992 |        |\n| https:\/\/192.168.122.58:2379 | 7f56434149e2cc7f |  3.5.12 |   13 MB |      true |      false |         2 |        992 |                992 |        |\n| https:\/\/192.168.122.59:2379 | db4091f30b21595e |  3.5.12 |   13 MB |     false |      false |         2 |        992 |                992 |        |\n+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+\n<\/code><\/pre>\n\n\n\n<p>Check cluster nodes should be up;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get nodes<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        STATUS   ROLES           AGE    VERSION\nmaster-01   Ready    control-plane   18h    v1.30.1\nmaster-02   Ready    control-plane   18h    v1.30.1\nmaster-03   Ready    control-plane   18h    v1.30.1\nworker-01   Ready    &lt;none&gt;          17h    v1.30.1\nworker-02   Ready    &lt;none&gt;          17h    v1.30.1\nworker-03   Ready    &lt;none&gt;          102m   v1.30.1\n<\/code><\/pre>\n\n\n\n<p>You can check the core services;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pods -n kube-system<\/code><\/pre>\n\n\n\n<p>List namespaces to confirm that the apps namespace is restored;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get ns<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME               STATUS   AGE\napps               Active   17h\ncalico-apiserver   Active   18h\ncalico-system      Active   18h\ndefault            Active   18h\nkube-node-lease    Active   18h\nkube-public        Active   18h\nkube-system        Active   18h\ntigera-operator    Active   18\n<\/code><\/pre>\n\n\n\n<p>I can see our namespace is available.<\/p>\n\n\n\n<p>Get resources in the namespace;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get all -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                             READY   STATUS    RESTARTS   AGE\npod\/nginx-app-6ff7b5d8f6-njkcr   1\/1     Running   0          54m\n\nNAME                TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE\nservice\/nginx-app   NodePort   10.109.64.134   <none>        80:30833\/TCP   53m\n\nNAME                        READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\/nginx-app   1\/1     1            1           54m\n\nNAME                                   DESIRED   CURRENT   READY   AGE\nreplicaset.apps\/nginx-app-676b458f4f   0         0         0       54m\nreplicaset.apps\/nginx-app-6ff7b5d8f6   1         1         1       54m\n<\/code><\/pre>\n\n\n\n<p>Looks good.<\/p>\n\n\n\n<p>Confirm access to our app!<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1477\" height=\"577\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app.png\" alt=\"kubernetes muti-master node etcd cluster restore\" class=\"wp-image-22776\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app.png?v=1718051887 1477w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/06\/kubernetes-nginx-app-768x300.png?v=1718051887 768w\" sizes=\"(max-width: 1477px) 100vw, 1477px\" \/><\/figure>\n\n\n\n<p>And there you go!<\/p>\n\n\n\n<p>Kubernetes multi-master HA cluster is restored from snapshot successfully.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h3>\n\n\n\n<p>You have learnt how to backup and restore ectd cluster in Kubernetes. Kubernetes etcd cluster backup can be done using <strong>etcdctl snapshot save<\/strong> command.<\/p>\n\n\n\n<p>Similarly, you have also learnt how to restore Kubernetes ectd cluster backup for both single control plane and multi-control plane Kubernetes cluster using the <strong>etcdutl snapshot restore<\/strong> command.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, we will dive into Kubernetes disaster recovery strategies, backup and restore etcd, using etcdctl and etcdutl tools. Even the most robust<\/p>\n","protected":false},"author":10,"featured_media":22893,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_lock_modified_date":false,"footnotes":""},"categories":[1668,1076,121],"tags":[7530,7532,7531],"class_list":["post-22764","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kubernetes","category-containers","category-howtos","tag-backup-etcd-cluster-kubernetes","tag-etcdutl-restore-multi-control-plane","tag-restore-etcd-snapshot-in-kubernetes","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50","resize-featured-image"],"_links":{"self":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/22764"}],"collection":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/comments?post=22764"}],"version-history":[{"count":54,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/22764\/revisions"}],"predecessor-version":[{"id":22932,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/22764\/revisions\/22932"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media\/22893"}],"wp:attachment":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media?parent=22764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/categories?post=22764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/tags?post=22764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}