{"id":19382,"date":"2023-11-25T12:17:17","date_gmt":"2023-11-25T09:17:17","guid":{"rendered":"https:\/\/kifarunix.com\/?p=19382"},"modified":"2024-03-10T11:46:31","modified_gmt":"2024-03-10T08:46:31","slug":"setup-highly-available-elasticsearch-cluster-with-keepalived","status":"publish","type":"post","link":"https:\/\/kifarunix.com\/setup-highly-available-elasticsearch-cluster-with-keepalived\/","title":{"rendered":"Setup Highly Available Elasticsearch Cluster with Keepalived"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1052\" height=\"582\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2023\/11\/highly-available-elasticsearch-cluster.png\" alt=\"setup highly available Elasticsearch cluster with Keepalived\" class=\"wp-image-19423\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2023\/11\/highly-available-elasticsearch-cluster.png?v=1700903619 1052w, https:\/\/kifarunix.com\/wp-content\/uploads\/2023\/11\/highly-available-elasticsearch-cluster-768x425.png?v=1700903619 768w\" sizes=\"(max-width: 1052px) 100vw, 1052px\" \/><\/figure><\/div>\n\n\n<p>Step through this guide to learn how to setup highly available <a href=\"https:\/\/www.elastic.co\/\" target=\"_blank\" rel=\"noreferrer noopener\">Elasticsearch<\/a> cluster with <a href=\"https:\/\/www.keepalived.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Keepalived<\/a>. Setting up a highly available Elasticsearch cluster with Keepalived is a pivotal step in ensuring the robustness and reliability of your Elasticsearch infrastructure. Elasticsearch, being a distributed search and analytics engine, thrives on seamless availability and fault tolerance. Keepalived, a powerful and flexible tool, adds an extra layer of high availability by providing IP failover and monitoring services.<\/p>\n\n\n\n<div class=\"wp-block-rank-math-toc-block\" id=\"rank-math-toc\"><h2>Table of Contents<\/h2><nav><ul><li><a href=\"#setting-up-highly-available-elasticsearch-cluster-with-keepalived\">Setting up Highly Available Elasticsearch Cluster with Keepalived<\/a><ul><li><a href=\"#setup-elasticsearch-cluster\">Setup Elasticsearch Cluster<\/a><\/li><li><a href=\"#configure-elasticsearch-to-listen-on-all-interfaces\">Configure Elasticsearch to Listen on All Interfaces<\/a><\/li><li><a href=\"#install-keepalived-on-cluster-nodes\">Install Keepalived on Cluster Nodes<\/a><\/li><li><a href=\"#configure-non-local-ip-binding\">Configure non-local IP binding<\/a><\/li><li><a href=\"#configure-keepalived-high-availability\">Configure Keepalived High Availability<\/a><\/li><li><a href=\"#running-keepalived\">Running Keepalived<\/a><\/li><li><a href=\"#simulate-high-availability\">Simulate High Availability<\/a><\/li><li><a href=\"#send-logs-to-elasticsearch-cluster-vip-address\">Send Logs to Elasticsearch Cluster VIP Address<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"setting-up-highly-available-elasticsearch-cluster-with-keepalived\">Setting up Highly Available Elasticsearch Cluster with Keepalived<\/h2>\n\n\n\n<p>So, how can you setup a highly available Elasticsearch cluster with Keepalived?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"setup-elasticsearch-cluster\">Setup Elasticsearch Cluster<\/h3>\n\n\n\n<p>Ensure you have a running cluster. Check our guide below on how to setup multinode Elasticsearch cluster.<\/p>\n\n\n\n<p><a href=\"https:\/\/kifarunix.com\/setup-multinode-elasticsearch-8-x-cluster\/\" target=\"_blank\" rel=\"noreferrer noopener\">Setup Multinode Elasticsearch 8.x Cluster<\/a><\/p>\n\n\n\n<p>We already have an healthy three node elasticsearch cluster;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>curl -k -XGET \"https:\/\/es-node01:9200\/_cat\/health?v\" -u elastic<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>Enter host password for user 'elastic':\nepoch      timestamp cluster        status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent\n1700763329 18:15:29  kifarunix-demo green           3         3      2   1    0    0        0             0                  -                100.0%<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"configure-elasticsearch-to-listen-on-all-interfaces\">Configure Elasticsearch to Listen on All Interfaces<\/h3>\n\n\n\n<p>In order to be able to configure Elasticsearch cluster for high availability, you need to ensure that it is able to listen on a VIP.<\/p>\n\n\n\n<p>As such, edit the ES configuration file <strong>on each cluster node<\/strong> and ensure that it is listening on all interfaces.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vim \/etc\/elasticsearch\/elasticsearch.yml<\/code><\/pre>\n\n\n\n<p>In my current setup, each Elasticsearch service is configured to listen on respective node IP;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># ---------------------------------- Network -----------------------------------\n#\n# By default Elasticsearch is only accessible on localhost. Set a different\n# address here to expose this node on the network:\n#\n<strong>network.host: 192.168.122.50<\/strong><\/code><\/pre>\n\n\n\n<p>To ensure that Elasticsearch is listening on all interfaces, update this line and set the address to <strong>0.0.0.0<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># ---------------------------------- Network -----------------------------------\n#\n# By default Elasticsearch is only accessible on localhost. Set a different\n# address here to expose this node on the network:\n#\nnetwork.host: 0.0.0.0<\/code><\/pre>\n\n\n\n<p>Save and exit the file.<\/p>\n\n\n\n<p>Restart Elasticsearch cluster<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl restart elasticsearch<\/code><\/pre>\n\n\n\n<p>Confirm the service is up and listening on all interfaces;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ss -altnp | grep :9200<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>LISTEN 0      4096               *:9200            *:*    users:((\"java\",pid=1356,fd=488))<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"install-keepalived-on-cluster-nodes\">Install Keepalived on Cluster Nodes<\/h3>\n\n\n\n<p>Keepalived is an open-source software solution that plays a pivotal role in maintaining high availability and fault tolerance in Linux-based systems. It accomplishes this critical task by actively monitoring the health of servers within a cluster and in the event of a server failure, Keepalived automatically orchestrates the seamless transition of a virtual IP address (VIP) to a healthy server in the cluster, ensuring uninterrupted service delivery.<\/p>\n\n\n\n<p>This process is fundamental in achieving and sustaining high availability, minimizing downtime, and enhancing the overall reliability of applications and services. Keepalived is often employed alongside the Linux Virtual Server (LVS) kernel module to provide not only fault tolerance but also load balancing capabilities, distributing network traffic across multiple servers.<\/p>\n\n\n\n<p>Install Keepalived on all your Cluster nodes using your distro specific package manager.<\/p>\n\n\n\n<p>Ubuntu\/Debian;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apt install keepalived<\/code><\/pre>\n\n\n\n<p>CentOS\/RHEL distros;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>yum install keeplived<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"configure-non-local-ip-binding\">Configure non-local IP binding<\/h3>\n\n\n\n<p>You need to enable Keepalived to bind to non-local IP address, that is to bind to the failover IP address (Floating IP or VIP).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">echo \"net.ipv4.ip_nonlocal_bind = 1\" &gt;&gt; \/etc\/sysctl.conf<\/pre>\n\n\n\n<p>Reload sysctl settings;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sysctl -p<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"configure-keepalived-high-availability\">Configure Keepalived High Availability<\/h3>\n\n\n\n<p>Keepalived can operate in two primary modes;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Active\/Passive (Master\/Backup) Mode<\/strong>: In this mode, one node serves as the active (or master) node, handling traffic for the virtual IP address (VIP). The other nodes remain in a passive (or backup) state, ready to take over if the active node fails based on their priorities. This mode is commonly used for scenarios where high availability is the primary goal, and only one node actively processes traffic at a time.<\/li>\n\n\n\n<li><strong>Active\/Active Mode<\/strong>: In this mode, multiple nodes actively handle traffic for the virtual IP address simultaneously. Each node has a separate IP address range. This is commonly used in scenarios where load balancing is a priority, and traffic distribution across multiple nodes is desired.<\/li>\n<\/ul>\n\n\n\n<p>We will be doing active\/passive configuration of Keepalived in this guide.<\/p>\n\n\n\n<p>The default configuration file for Keepalived should be <code><strong>\/etc\/keepalived\/keepalived.conf<\/strong><\/code>. However, on Ubuntu\/Debian systems, a sample of this configuration, <strong><code>\/etc\/keepalived\/keepalived.conf.sample<\/code><\/strong>, is created by default. <\/p>\n\n\n\n<p>Thus, you can rename the sample configuration file as follows.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cp \/etc\/keepalived\/keepalived.conf{.sample,}<\/code><\/pre>\n\n\n\n<p>By default, this is how the default configuration file looks like;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat \/etc\/keepalived\/keepalived.conf.sample<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>! Configuration File for keepalived\n\nglobal_defs {\n   notification_email {\n     acassen@firewall.loc\n     failover@firewall.loc\n     sysadmin@firewall.loc\n   }\n   notification_email_from Alexandre.Cassen@firewall.loc\n   smtp_server 192.168.200.1\n   smtp_connect_timeout 30\n   router_id LVS_DEVEL\n   vrrp_skip_check_adv_addr\n   vrrp_strict\n   vrrp_garp_interval 0\n   vrrp_gna_interval 0\n}\n\nvrrp_instance VI_1 {\n    state MASTER\n    interface eth0\n    virtual_router_id 51\n    priority 100\n    advert_int 1\n    authentication {\n        auth_type PASS\n        auth_pass 1111\n    }\n    virtual_ipaddress {\n        192.168.200.16\n        192.168.200.17\n        192.168.200.18\n    }\n}\n\nvirtual_server 192.168.200.100 443 {\n    delay_loop 6\n    lb_algo rr\n    lb_kind NAT\n    persistence_timeout 50\n    protocol TCP\n\n    real_server 192.168.201.100 443 {\n        weight 1\n        SSL_GET {\n            url {\n              path \/\n              digest ff20ad2481f97b1754ef3e12ecd3a9cc\n            }\n            url {\n              path \/mrtg\/\n              digest 9b3a0c85a887a256d6939da88aabd8cd\n            }\n            connect_timeout 3\n            retry 3\n            delay_before_retry 3\n        }\n    }\n}\n\nvirtual_server 10.10.10.2 1358 {\n    delay_loop 6\n    lb_algo rr\n    lb_kind NAT\n    persistence_timeout 50\n    protocol TCP\n\n    sorry_server 192.168.200.200 1358\n\n    real_server 192.168.200.2 1358 {\n        weight 1\n        HTTP_GET {\n            url {\n              path \/testurl\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl2\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl3\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            connect_timeout 3\n            retry 3\n            delay_before_retry 3\n        }\n    }\n\n    real_server 192.168.200.3 1358 {\n        weight 1\n        HTTP_GET {\n            url {\n              path \/testurl\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334c\n            }\n            url {\n              path \/testurl2\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334c\n            }\n            connect_timeout 3\n            retry 3\n            delay_before_retry 3\n        }\n    }\n}\n\nvirtual_server 10.10.10.3 1358 {\n    delay_loop 3\n    lb_algo rr\n    lb_kind NAT\n    persistence_timeout 50\n    protocol TCP\n\n    real_server 192.168.200.4 1358 {\n        weight 1\n        HTTP_GET {\n            url {\n              path \/testurl\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl2\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl3\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            connect_timeout 3\n            retry 3\n            delay_before_retry 3\n        }\n    }\n\n    real_server 192.168.200.5 1358 {\n        weight 1\n        HTTP_GET {\n            url {\n              path \/testurl\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl2\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            url {\n              path \/testurl3\/test.jsp\n              digest 640205b7b0fc66c1ea91c463fac6334d\n            }\n            connect_timeout 3\n            retry 3\n            delay_before_retry 3\n        }\n    }\n}\n<\/code><\/pre>\n\n\n\n<p>You can then edit and update the configuration to suite your cluster setup.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">vim \/etc\/keepalived\/keepalived.conf<\/pre>\n\n\n\n<p>Below is our Keepalived configurations on each node in the cluster;<\/p>\n\n\n\n<p>Node 01;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>vrrp_script check_elasticsearch {\n    script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n    interval 5\n    weight 10\n}\n\nvrrp_instance ES_HA {\n    state MASTER\n    interface enp1s0\n    virtual_router_id 100\n    priority 200\n    advert_int 1\n\n    unicast_src_ip 192.168.122.12\n    unicast_peer {\n        192.168.122.73\/24\n        192.168.122.50\/24\n    }\n\n    virtual_ipaddress {\n        192.168.122.100\/24\n    }\n\n    authentication {\n        auth_type PASS\n        auth_pass YOUR_PASSWORD_HERE\n    }\n\n    track_script {\n        check_elasticsearch\n    }\n}\n<\/code><\/pre>\n\n\n\n<p>Node 02<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>vrrp_script check_elasticsearch {\n    script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n    interval 5\n    weight 10\n}\n\nvrrp_instance ES_HA {\n    state BACKUP\n    interface enp1s0\n    virtual_router_id 100\n    priority 199\n    advert_int 1\n\n    unicast_src_ip 192.168.122.73\n    unicast_peer {\n        192.168.122.12\/24\n        192.168.122.50\/24\n    }\n\n    virtual_ipaddress {\n        192.168.122.100\/24\n    }\n\n    authentication {\n        auth_type PASS\n        auth_pass YOUR_PASSWORD_HERE\n    }\n\n    track_script {\n        check_elasticsearch\n    }\n}\n<\/code><\/pre>\n\n\n\n<p>Node 03;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>vrrp_script check_elasticsearch {\n    script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n    interval 5\n    weight 10\n}\n\nvrrp_instance ES_HA {\n    state BACKUP\n    interface enp1s0\n    virtual_router_id 100\n    priority 198\n    advert_int 1\n\n    unicast_src_ip 192.168.122.50\n    unicast_peer {\n        192.168.122.12\/24\n        192.168.122.73\/24\n    }\n\n    virtual_ipaddress {\n        192.168.122.100\/24\n    }\n\n    authentication {\n        auth_type PASS\n        auth_pass YOUR_PASSWORD_HERE\n    }\n\n    track_script {\n        check_elasticsearch\n    }\n}\n<\/code><\/pre>\n\n\n\n<p>The configuration has three sections; The <strong>VRRP Script<\/strong> and <strong>VRRP Instance<\/strong> sections.<\/p>\n\n\n\n<p>The <strong>VRRP script<\/strong> section:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><code>check_elasticsearch<\/code>:<\/strong> This is the user-defined name for the VRRP script.<\/li>\n\n\n\n<li><strong><code>script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"<\/code>:<\/strong> Specifies the script or command to be executed. In this case, it checks if the Elasticsearch service (<code>elasticsearch.service<\/code>) is active using the <code>systemctl<\/code> command.<\/li>\n\n\n\n<li><strong><code>interval 5<\/code>:<\/strong> Sets the interval at which the script is executed. In this example, it checks the status every 5 seconds.<\/li>\n\n\n\n<li><strong><code>weight 10<\/code>:<\/strong> The weight assigned to the script. If the script succeeds (Elasticsearch is active), this weight (positive integer) is added to the priority of the node. A positive number on the &#8220;weight&#8221; setting will add that number to the priority if the check succeeds. A negative number will subtract that number from priority number if the check fails.<\/li>\n<\/ul>\n\n\n\n<p>You can use other types of tracking for example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>process<\/strong> tracking: Monitors the status of a specified process on a node. If the process is running, the node is considered healthy, and its priority is increased by weight value.<\/li>\n\n\n\n<li><strong>interface<\/strong> tracking: Monitors the status of a network interface. If the specified interface is up, the node&#8217;s priority is increased by weight value<\/li>\n\n\n\n<li><strong>kernel table<\/strong> tracking: Monitors the existence of a specified kernel routing table entry. If the entry is present, the node&#8217;s priority is increased by weight value.<\/li>\n<\/ul>\n\n\n\n<p>The <strong>VRRP Instance<\/strong> section:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>vrrp_instance &lt;STRING&gt;:<\/strong>&nbsp;This section defines name of the VRRP instance.<\/li>\n\n\n\n<li><strong><code>state MASTER<\/code>:<\/strong> Sets the initial state of this node to be the master. The other possible state is <code>BACKUP<\/code>.<\/li>\n\n\n\n<li><strong><code>interface enp1s0<\/code>:<\/strong> Specifies the network interface associated with this VRRP instance.<\/li>\n\n\n\n<li><strong><code>virtual_router_id 100<\/code>:<\/strong> A numeric identifier for this VRRP instance. Nodes with the same <code>virtual_router_id<\/code> belong to the same VRRP group.<\/li>\n\n\n\n<li><strong><code>priority 200<\/code>:<\/strong> The priority of this node in the VRRP group. Higher priority nodes are more likely to become the master. The script&#8217;s weight will dynamically adjust this priority. <strong>Depending on the value of tracking script\/process weight, ensure there is no huge GAP between cluster nodes priority values<\/strong>. A huge cap might cause the node with high priority to retain and not release the VIP even after the service check fails.<\/li>\n\n\n\n<li><strong><code>advert_int 1<\/code>:<\/strong> The advertisement interval, in seconds, determines how often the master node sends advertisements to other nodes.<\/li>\n\n\n\n<li><strong>unicast_src_ip &lt;IP&gt;<\/strong>. Specifies the source IP address for unicast communication. In this case, an IP for the respective node.<\/li>\n\n\n\n<li><strong><code>unicast_peer<\/code>:<\/strong> Specifies the unicast peers, the rest of the cluster nodes, in the VRRP group.<\/li>\n\n\n\n<li><strong><code>virtual_ipaddress<\/code>:<\/strong> The virtual IP address associated with this VRRP instance. Clients connect to this IP, which will be hosted on the master node.<\/li>\n\n\n\n<li><strong><code>authentication<\/code>:<\/strong> Configures authentication for VRRP messages. In this case, it uses a simple password. Plain text credentials are use here hence you need to focus on securing access to your system.\n<ul class=\"wp-block-list\">\n<li><strong>auth_type:<\/strong>&nbsp;This parameter specifies the authentication type. In this case, the authentication type is PASS.<\/li>\n\n\n\n<li><strong>auth_pass:<\/strong>&nbsp;This parameter specifies the authentication password. In this case, the password is YOUR_PASSWORD_HERE.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong><code>track_script { check_elasticsearch }<\/code>:<\/strong> Associates the <code>check_elasticsearch<\/code> script with this VRRP instance, meaning the VRRP priority will be dynamically adjusted based on the script&#8217;s result.<\/li>\n<\/ul>\n\n\n\n<p>Read more on <strong><code>man keepalived.conf<\/code><\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"running-keepalived\">Running Keepalived<\/h3>\n\n\n\n<p>You can now start and enable Keepalived to run on system boot <strong>on all nodes;<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">systemctl enable --now keepalived<\/pre>\n\n\n\n<p>If already running, restart;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl restart keepalived<\/code><\/pre>\n\n\n\n<p>Check the status on Master Node, which is node01 for us;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">systemctl status keepalived<\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>\u25cf keepalived.service - Keepalive Daemon (LVS and VRRP)\n     Loaded: loaded (\/lib\/systemd\/system\/keepalived.service; enabled; preset: enabled)\n     Active: active (running) since Thu 2023-11-23 16:27:02 EST; 7s ago\n       Docs: man:keepalived(8)\n             man:keepalived.conf(5)\n             man:genhash(1)\n             https:\/\/keepalived.org\n   Main PID: 1811 (keepalived)\n      Tasks: 2 (limit: 4645)\n     Memory: 3.0M\n        CPU: 26ms\n     CGroup: \/system.slice\/keepalived.service\n             \u251c\u25001811 \/usr\/sbin\/keepalived --dont-fork\n             \u2514\u25001814 \/usr\/sbin\/keepalived --dont-fork\n\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived[1811]: Starting VRRP child process, pid=1814\nNov 23 16:27:02 es-node01.kifarunix-demo.com systemd[1]: keepalived.service: Got notification message from PID 1814, but reception only permitted for main PID 1811\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: Script user 'keepalived_script' does not exist\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived[1811]: Startup complete\nNov 23 16:27:02 es-node01.kifarunix-demo.com systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Entering BACKUP STATE (init)\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: VRRP_Script(check_elasticsearch) succeeded\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Changing effective priority from 200 to 210\nNov 23 16:27:05 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Entering MASTER STATE\n<\/code><\/pre>\n\n\n\n<p>You can as well check the status on the other nodes;<\/p>\n\n\n\n<p>The master node, which in our case if node01, should now have the VIP assigned.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ip a<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>1: lo: &lt;LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000\n    link\/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00\n    inet 127.0.0.1\/8 scope host lo\n       valid_lft forever preferred_lft forever\n    inet6 ::1\/128 scope host noprefixroute \n       valid_lft forever preferred_lft forever\n2: enp1s0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000\n    link\/ether 52:54:00:df:44:43 brd ff:ff:ff:ff:ff:ff\n    inet 192.168.122.12\/24 brd 192.168.122.255 scope global dynamic enp1s0\n       valid_lft 3047sec preferred_lft 3047sec\n    inet <strong>192.168.122.100\/24<\/strong> scope global secondary enp1s0\n       valid_lft forever preferred_lft forever\n    inet6 fe80::5054:ff:fedf:4443\/64 scope link \n       valid_lft forever preferred_lft forever\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"simulate-high-availability\">Simulate High Availability<\/h3>\n\n\n\n<p>To simulate high availability, stop Elasticsearch service on the node with high priority, in this case, node01. Only do this if it is save for you to do so!<\/p>\n\n\n\n<p>We are stopping Elasticsearch because, in our VRRP script, we are using the status of the Elasticsearch service to guide Keepalived to take appropriate actions, such as updating the node&#8217;s priority, triggering a failover and re-assign the VIP to another node with a higher priority.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>systemctl stop elasticsearch<\/code><\/pre>\n\n\n\n<p>At the same time, check the logs on the rest of the nodes;<\/p>\n\n\n\n<p>Node02;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>journalctl -f -u keepalived.service<\/code><\/pre>\n\n\n\n<pre class=\"scroll-sz\"><code>Nov 24 06:33:13 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:14 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:15 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:16 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) Entering MASTER STATE\n<\/code><\/pre>\n\n\n\n<p>It has entered master state and should now have VIP;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>root@es-node02:~# ip a<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000\n    link\/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00\n    inet 127.0.0.1\/8 scope host lo\n       valid_lft forever preferred_lft forever\n    inet6 ::1\/128 scope host noprefixroute \n       valid_lft forever preferred_lft forever\n2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000\n    link\/ether 52:54:00:05:b7:40 brd ff:ff:ff:ff:ff:ff\n    inet 192.168.122.73\/24 brd 192.168.122.255 scope global dynamic enp1s0\n       valid_lft 2279sec preferred_lft 2279sec\n    inet 192.168.122.100\/24 scope global secondary enp1s0\n       valid_lft forever preferred_lft forever\n    inet6 fe80::5054:ff:fe05:b740\/64 scope link \n       valid_lft forever preferred_lft forever\n<\/code><\/pre>\n\n\n\n<p>On node03, its priority is still lower so, it will remain in backup state.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>journalctl -f -u keepalived.service<\/code><\/pre>\n\n\n\n<pre class=\"scroll-sz\"><code>Nov 24 06:33:13 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:14 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:15 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\n<\/code><\/pre>\n\n\n\n<p>If you stop Elasticsearch on both node01 and node02, then Node03 will become master and be assigned the VIP.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"send-logs-to-elasticsearch-cluster-vip-address\">Send Logs to Elasticsearch Cluster VIP Address<\/h3>\n\n\n\n<p>You can now configure, whatever your agents are, to send logs to Elasticsearch cluster via the VIP address.<\/p>\n\n\n\n<p>For example, I am using Filebeat to send logs to Elasticsearch cluster, then I have to edit the config file and define the Elasticsearch cluster VIP output;<\/p>\n\n\n\n<p>See example;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vim \/etc\/filebeat\/filebeat.yml<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>...\n# ---------------------------- Elasticsearch Output ----------------------------\noutput.elasticsearch:\n  # Array of hosts to connect to.\n  hosts: [\"elk.kifarunix-demo.com:9200\"]\n\n  # Protocol - either `http` (default) or `https`.\n  protocol: \"https\"\n  ssl.certificate_authorities: \"\/etc\/filebeat\/es-ca.crt\"\n  # Authentication credentials - either API key or username\/password.\n  #api_key: \"id:api_key\"\n  username: \"USER\"\n  password: \"PASS\"\n...\n<\/code><\/pre>\n\n\n\n<p>The elk.kifarunix-demo.com is configured to resolve to ES VIP;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ping elk.kifarunix-demo.com -c 4<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>PING elk.kifarunix-demo.com (192.168.122.100) 56(84) bytes of data.\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=1 ttl=64 time=0.301 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=2 ttl=64 time=0.329 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=3 ttl=64 time=0.404 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=4 ttl=64 time=0.359 ms\n\n--- elk.kifarunix-demo.com ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 3098ms\nrtt min\/avg\/max\/mdev = 0.301\/0.348\/0.404\/0.040 ms\n<\/code><\/pre>\n\n\n\n<p>Ensure you are using Wildcard Elasticsearch SSL\/TLS certificates so as to ensure that you can connect to any of the ES cluster nodes without having to every time reconfigure the agents\/clients to use respective node hostname.<\/p>\n\n\n\n<p>You can check the guide below on how to generate Wildcard SSL certs for Elasticsearch.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-kifarunix-com wp-block-embed-kifarunix-com\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"Gi52zyxFW2\"><a href=\"https:\/\/kifarunix.com\/generate-wildcard-ssl-certificates-for-elasticsearch\/\">Generate Wildcard SSL Certificates for Elasticsearch<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;Generate Wildcard SSL Certificates for Elasticsearch&#8221; &#8212; kifarunix.com\" src=\"https:\/\/kifarunix.com\/generate-wildcard-ssl-certificates-for-elasticsearch\/embed\/#?secret=m1za9EFJch#?secret=Gi52zyxFW2\" data-secret=\"Gi52zyxFW2\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Once you have configured your clients with the right SSL\/TLS certificates, then test the connection;<\/p>\n\n\n\n<p>E.g for Filebeat;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>filebeat test output<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>elasticsearch: https:\/\/elk.kifarunix-demo.com:9200...\n  parse url... OK\n  connection...\n    parse host... OK\n    dns lookup... OK\n    addresses: 192.168.122.100, 192.168.122.100\n    dial up... OK\n  TLS...\n    security: server's certificate chain verification is enabled\n    handshake... OK\n    TLS version: TLSv1.3\n    dial up... OK\n  talk to server... OK\n  version: 8.11.1\n<\/code><\/pre>\n\n\n\n<p>Perfect!<\/p>\n\n\n\n<p>Next, run filebeat in standard output and ensure that it can establish connection to Elasticsearch;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>filebeat -e<\/code><\/pre>\n\n\n\n<p>Watch for the connection.<\/p>\n\n\n\n<p>If you see this line below, then you are all set! otherwise, troubleshoot the issue.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"log.level\":\"info\",\"@timestamp\":\"2023-11-25T08:51:30.695Z\",\"log.logger\":\"publisher_pipeline_output\",\"log.origin\":{\"file.name\":\"pipeline\/client_worker.go\",\"file.line\":145},\"message\":\"<strong>Connection to backoff(elasticsearch(https:\/\/elk.kifarunix-demo.com:9200)) established<\/strong>\",\"service.name\":\"filebeat\",\"ecs.version\":\"1.6.0\"}<\/code><\/pre>\n\n\n\n<p>And that is how you can setting up highly available Elasticsearch cluster with Keepalived.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Step through this guide to learn how to setup highly available Elasticsearch cluster with Keepalived. Setting up a highly available Elasticsearch cluster with Keepalived is<\/p>\n","protected":false},"author":10,"featured_media":19423,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_lock_modified_date":false,"footnotes":""},"categories":[121,910,92],"tags":[7327,7328,7325,1665,7326],"class_list":["post-19382","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-howtos","category-elastic-stack","category-load-balancers","tag-elasticsearch-cluster-and-keepalived","tag-elasticsearch-cluster-high-availability","tag-elasticsearch-high-availability","tag-keepalived","tag-keepalived-and-elasticsearch","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50","resize-featured-image"],"_links":{"self":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/19382"}],"collection":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/comments?post=19382"}],"version-history":[{"count":16,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/19382\/revisions"}],"predecessor-version":[{"id":20868,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/19382\/revisions\/20868"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media\/19423"}],"wp:attachment":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media?parent=19382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/categories?post=19382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/tags?post=19382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}