! Configuration File for keepalived\n\nglobal_defs {\n notification_email {\n acassen@firewall.loc\n failover@firewall.loc\n sysadmin@firewall.loc\n }\n notification_email_from Alexandre.Cassen@firewall.loc\n smtp_server 192.168.200.1\n smtp_connect_timeout 30\n router_id LVS_DEVEL\n vrrp_skip_check_adv_addr\n vrrp_strict\n vrrp_garp_interval 0\n vrrp_gna_interval 0\n}\n\nvrrp_instance VI_1 {\n state MASTER\n interface eth0\n virtual_router_id 51\n priority 100\n advert_int 1\n authentication {\n auth_type PASS\n auth_pass 1111\n }\n virtual_ipaddress {\n 192.168.200.16\n 192.168.200.17\n 192.168.200.18\n }\n}\n\nvirtual_server 192.168.200.100 443 {\n delay_loop 6\n lb_algo rr\n lb_kind NAT\n persistence_timeout 50\n protocol TCP\n\n real_server 192.168.201.100 443 {\n weight 1\n SSL_GET {\n url {\n path \/\n digest ff20ad2481f97b1754ef3e12ecd3a9cc\n }\n url {\n path \/mrtg\/\n digest 9b3a0c85a887a256d6939da88aabd8cd\n }\n connect_timeout 3\n retry 3\n delay_before_retry 3\n }\n }\n}\n\nvirtual_server 10.10.10.2 1358 {\n delay_loop 6\n lb_algo rr\n lb_kind NAT\n persistence_timeout 50\n protocol TCP\n\n sorry_server 192.168.200.200 1358\n\n real_server 192.168.200.2 1358 {\n weight 1\n HTTP_GET {\n url {\n path \/testurl\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl2\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl3\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n connect_timeout 3\n retry 3\n delay_before_retry 3\n }\n }\n\n real_server 192.168.200.3 1358 {\n weight 1\n HTTP_GET {\n url {\n path \/testurl\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334c\n }\n url {\n path \/testurl2\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334c\n }\n connect_timeout 3\n retry 3\n delay_before_retry 3\n }\n }\n}\n\nvirtual_server 10.10.10.3 1358 {\n delay_loop 3\n lb_algo rr\n lb_kind NAT\n persistence_timeout 50\n protocol TCP\n\n real_server 192.168.200.4 1358 {\n weight 1\n HTTP_GET {\n url {\n path \/testurl\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl2\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl3\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n connect_timeout 3\n retry 3\n delay_before_retry 3\n }\n }\n\n real_server 192.168.200.5 1358 {\n weight 1\n HTTP_GET {\n url {\n path \/testurl\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl2\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n url {\n path \/testurl3\/test.jsp\n digest 640205b7b0fc66c1ea91c463fac6334d\n }\n connect_timeout 3\n retry 3\n delay_before_retry 3\n }\n }\n}\n<\/code><\/pre>\n\n\n\nYou can then edit and update the configuration to suite your cluster setup.<\/p>\n\n\n\n
vim \/etc\/keepalived\/keepalived.conf<\/pre>\n\n\n\nBelow is our Keepalived configurations on each node in the cluster;<\/p>\n\n\n\n
Node 01;<\/p>\n\n\n\n
vrrp_script check_elasticsearch {\n script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n interval 5\n weight 10\n}\n\nvrrp_instance ES_HA {\n state MASTER\n interface enp1s0\n virtual_router_id 100\n priority 200\n advert_int 1\n\n unicast_src_ip 192.168.122.12\n unicast_peer {\n 192.168.122.73\/24\n 192.168.122.50\/24\n }\n\n virtual_ipaddress {\n 192.168.122.100\/24\n }\n\n authentication {\n auth_type PASS\n auth_pass YOUR_PASSWORD_HERE\n }\n\n track_script {\n check_elasticsearch\n }\n}\n<\/code><\/pre>\n\n\n\nNode 02<\/p>\n\n\n\n
vrrp_script check_elasticsearch {\n script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n interval 5\n weight 10\n}\n\nvrrp_instance ES_HA {\n state BACKUP\n interface enp1s0\n virtual_router_id 100\n priority 199\n advert_int 1\n\n unicast_src_ip 192.168.122.73\n unicast_peer {\n 192.168.122.12\/24\n 192.168.122.50\/24\n }\n\n virtual_ipaddress {\n 192.168.122.100\/24\n }\n\n authentication {\n auth_type PASS\n auth_pass YOUR_PASSWORD_HERE\n }\n\n track_script {\n check_elasticsearch\n }\n}\n<\/code><\/pre>\n\n\n\nNode 03;<\/p>\n\n\n\n
vrrp_script check_elasticsearch {\n script \"\/usr\/bin\/systemctl is-active elasticsearch.service\"\n interval 5\n weight 10\n}\n\nvrrp_instance ES_HA {\n state BACKUP\n interface enp1s0\n virtual_router_id 100\n priority 198\n advert_int 1\n\n unicast_src_ip 192.168.122.50\n unicast_peer {\n 192.168.122.12\/24\n 192.168.122.73\/24\n }\n\n virtual_ipaddress {\n 192.168.122.100\/24\n }\n\n authentication {\n auth_type PASS\n auth_pass YOUR_PASSWORD_HERE\n }\n\n track_script {\n check_elasticsearch\n }\n}\n<\/code><\/pre>\n\n\n\nThe configuration has three sections; The VRRP Script<\/strong> and VRRP Instance<\/strong> sections.<\/p>\n\n\n\nThe VRRP script<\/strong> section:<\/p>\n\n\n\n\ncheck_elasticsearch<\/code>:<\/strong> This is the user-defined name for the VRRP script.<\/li>\n\n\n\nscript \"\/usr\/bin\/systemctl is-active elasticsearch.service\"<\/code>:<\/strong> Specifies the script or command to be executed. In this case, it checks if the Elasticsearch service (elasticsearch.service<\/code>) is active using the systemctl<\/code> command.<\/li>\n\n\n\ninterval 5<\/code>:<\/strong> Sets the interval at which the script is executed. In this example, it checks the status every 5 seconds.<\/li>\n\n\n\nweight 10<\/code>:<\/strong> The weight assigned to the script. If the script succeeds (Elasticsearch is active), this weight (positive integer) is added to the priority of the node. A positive number on the “weight” setting will add that number to the priority if the check succeeds. A negative number will subtract that number from priority number if the check fails.<\/li>\n<\/ul>\n\n\n\nYou can use other types of tracking for example:<\/p>\n\n\n\n
\nprocess<\/strong> tracking: Monitors the status of a specified process on a node. If the process is running, the node is considered healthy, and its priority is increased by weight value.<\/li>\n\n\n\ninterface<\/strong> tracking: Monitors the status of a network interface. If the specified interface is up, the node’s priority is increased by weight value<\/li>\n\n\n\nkernel table<\/strong> tracking: Monitors the existence of a specified kernel routing table entry. If the entry is present, the node’s priority is increased by weight value.<\/li>\n<\/ul>\n\n\n\nThe VRRP Instance<\/strong> section:<\/p>\n\n\n\n\nvrrp_instance <STRING>:<\/strong> This section defines name of the VRRP instance.<\/li>\n\n\n\nstate MASTER<\/code>:<\/strong> Sets the initial state of this node to be the master. The other possible state is BACKUP<\/code>.<\/li>\n\n\n\ninterface enp1s0<\/code>:<\/strong> Specifies the network interface associated with this VRRP instance.<\/li>\n\n\n\nvirtual_router_id 100<\/code>:<\/strong> A numeric identifier for this VRRP instance. Nodes with the same virtual_router_id<\/code> belong to the same VRRP group.<\/li>\n\n\n\npriority 200<\/code>:<\/strong> The priority of this node in the VRRP group. Higher priority nodes are more likely to become the master. The script’s weight will dynamically adjust this priority. Depending on the value of tracking script\/process weight, ensure there is no huge GAP between cluster nodes priority values<\/strong>. A huge cap might cause the node with high priority to retain and not release the VIP even after the service check fails.<\/li>\n\n\n\nadvert_int 1<\/code>:<\/strong> The advertisement interval, in seconds, determines how often the master node sends advertisements to other nodes.<\/li>\n\n\n\nunicast_src_ip <IP><\/strong>. Specifies the source IP address for unicast communication. In this case, an IP for the respective node.<\/li>\n\n\n\nunicast_peer<\/code>:<\/strong> Specifies the unicast peers, the rest of the cluster nodes, in the VRRP group.<\/li>\n\n\n\nvirtual_ipaddress<\/code>:<\/strong> The virtual IP address associated with this VRRP instance. Clients connect to this IP, which will be hosted on the master node.<\/li>\n\n\n\nauthentication<\/code>:<\/strong> Configures authentication for VRRP messages. In this case, it uses a simple password. Plain text credentials are use here hence you need to focus on securing access to your system.\n\nauth_type:<\/strong> This parameter specifies the authentication type. In this case, the authentication type is PASS.<\/li>\n\n\n\nauth_pass:<\/strong> This parameter specifies the authentication password. In this case, the password is YOUR_PASSWORD_HERE.<\/li>\n<\/ul>\n<\/li>\n\n\n\ntrack_script { check_elasticsearch }<\/code>:<\/strong> Associates the check_elasticsearch<\/code> script with this VRRP instance, meaning the VRRP priority will be dynamically adjusted based on the script’s result.<\/li>\n<\/ul>\n\n\n\nRead more on man keepalived.conf<\/code><\/strong>.<\/p>\n\n\n\nRunning Keepalived<\/h3>\n\n\n\n You can now start and enable Keepalived to run on system boot on all nodes;<\/strong><\/p>\n\n\n\nsystemctl enable --now keepalived<\/pre>\n\n\n\nIf already running, restart;<\/p>\n\n\n\n
systemctl restart keepalived<\/code><\/pre>\n\n\n\nCheck the status on Master Node, which is node01 for us;<\/p>\n\n\n\n
systemctl status keepalived<\/pre>\n\n\n\n\u25cf keepalived.service - Keepalive Daemon (LVS and VRRP)\n Loaded: loaded (\/lib\/systemd\/system\/keepalived.service; enabled; preset: enabled)\n Active: active (running) since Thu 2023-11-23 16:27:02 EST; 7s ago\n Docs: man:keepalived(8)\n man:keepalived.conf(5)\n man:genhash(1)\n https:\/\/keepalived.org\n Main PID: 1811 (keepalived)\n Tasks: 2 (limit: 4645)\n Memory: 3.0M\n CPU: 26ms\n CGroup: \/system.slice\/keepalived.service\n \u251c\u25001811 \/usr\/sbin\/keepalived --dont-fork\n \u2514\u25001814 \/usr\/sbin\/keepalived --dont-fork\n\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived[1811]: Starting VRRP child process, pid=1814\nNov 23 16:27:02 es-node01.kifarunix-demo.com systemd[1]: keepalived.service: Got notification message from PID 1814, but reception only permitted for main PID 1811\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: Script user 'keepalived_script' does not exist\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived[1811]: Startup complete\nNov 23 16:27:02 es-node01.kifarunix-demo.com systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Entering BACKUP STATE (init)\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: VRRP_Script(check_elasticsearch) succeeded\nNov 23 16:27:02 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Changing effective priority from 200 to 210\nNov 23 16:27:05 es-node01.kifarunix-demo.com Keepalived_vrrp[1814]: (ES_HA) Entering MASTER STATE\n<\/code><\/pre>\n\n\n\nYou can as well check the status on the other nodes;<\/p>\n\n\n\n
The master node, which in our case if node01, should now have the VIP assigned.<\/p>\n\n\n\n
ip a<\/code><\/pre>\n\n\n\n1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000\n link\/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00\n inet 127.0.0.1\/8 scope host lo\n valid_lft forever preferred_lft forever\n inet6 ::1\/128 scope host noprefixroute \n valid_lft forever preferred_lft forever\n2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000\n link\/ether 52:54:00:df:44:43 brd ff:ff:ff:ff:ff:ff\n inet 192.168.122.12\/24 brd 192.168.122.255 scope global dynamic enp1s0\n valid_lft 3047sec preferred_lft 3047sec\n inet 192.168.122.100\/24<\/strong> scope global secondary enp1s0\n valid_lft forever preferred_lft forever\n inet6 fe80::5054:ff:fedf:4443\/64 scope link \n valid_lft forever preferred_lft forever\n<\/code><\/pre>\n\n\n\nSimulate High Availability<\/h3>\n\n\n\n To simulate high availability, stop Elasticsearch service on the node with high priority, in this case, node01. Only do this if it is save for you to do so!<\/p>\n\n\n\n
We are stopping Elasticsearch because, in our VRRP script, we are using the status of the Elasticsearch service to guide Keepalived to take appropriate actions, such as updating the node’s priority, triggering a failover and re-assign the VIP to another node with a higher priority.<\/p>\n\n\n\n
systemctl stop elasticsearch<\/code><\/pre>\n\n\n\nAt the same time, check the logs on the rest of the nodes;<\/p>\n\n\n\n
Node02;<\/p>\n\n\n\n
journalctl -f -u keepalived.service<\/code><\/pre>\n\n\n\nNov 24 06:33:13 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:14 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:15 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:16 es-node02.kifarunix-demo.com Keepalived_vrrp[12643]: (ES_HA) Entering MASTER STATE\n<\/code><\/pre>\n\n\n\nIt has entered master state and should now have VIP;<\/p>\n\n\n\n
root@es-node02:~# ip a<\/code><\/pre>\n\n\n\n1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000\n link\/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00\n inet 127.0.0.1\/8 scope host lo\n valid_lft forever preferred_lft forever\n inet6 ::1\/128 scope host noprefixroute \n valid_lft forever preferred_lft forever\n2: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000\n link\/ether 52:54:00:05:b7:40 brd ff:ff:ff:ff:ff:ff\n inet 192.168.122.73\/24 brd 192.168.122.255 scope global dynamic enp1s0\n valid_lft 2279sec preferred_lft 2279sec\n inet 192.168.122.100\/24 scope global secondary enp1s0\n valid_lft forever preferred_lft forever\n inet6 fe80::5054:ff:fe05:b740\/64 scope link \n valid_lft forever preferred_lft forever\n<\/code><\/pre>\n\n\n\nOn node03, its priority is still lower so, it will remain in backup state.<\/p>\n\n\n\n
journalctl -f -u keepalived.service<\/code><\/pre>\n\n\n\nNov 24 06:33:13 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:14 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\nNov 24 06:33:15 en-node03.kifarunix-demo.com Keepalived_vrrp[12627]: (ES_HA) received lower priority (200) advert from 192.168.122.12 - discarding\n<\/code><\/pre>\n\n\n\nIf you stop Elasticsearch on both node01 and node02, then Node03 will become master and be assigned the VIP.<\/p>\n\n\n\n
Send Logs to Elasticsearch Cluster VIP Address<\/h3>\n\n\n\n You can now configure, whatever your agents are, to send logs to Elasticsearch cluster via the VIP address.<\/p>\n\n\n\n
For example, I am using Filebeat to send logs to Elasticsearch cluster, then I have to edit the config file and define the Elasticsearch cluster VIP output;<\/p>\n\n\n\n
See example;<\/p>\n\n\n\n
vim \/etc\/filebeat\/filebeat.yml<\/code><\/pre>\n\n\n\n...\n# ---------------------------- Elasticsearch Output ----------------------------\noutput.elasticsearch:\n # Array of hosts to connect to.\n hosts: [\"elk.kifarunix-demo.com:9200\"]\n\n # Protocol - either `http` (default) or `https`.\n protocol: \"https\"\n ssl.certificate_authorities: \"\/etc\/filebeat\/es-ca.crt\"\n # Authentication credentials - either API key or username\/password.\n #api_key: \"id:api_key\"\n username: \"USER\"\n password: \"PASS\"\n...\n<\/code><\/pre>\n\n\n\nThe elk.kifarunix-demo.com is configured to resolve to ES VIP;<\/p>\n\n\n\n
ping elk.kifarunix-demo.com -c 4<\/code><\/pre>\n\n\n\nPING elk.kifarunix-demo.com (192.168.122.100) 56(84) bytes of data.\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=1 ttl=64 time=0.301 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=2 ttl=64 time=0.329 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=3 ttl=64 time=0.404 ms\n64 bytes from elk.kifarunix-demo.com (192.168.122.100): icmp_seq=4 ttl=64 time=0.359 ms\n\n--- elk.kifarunix-demo.com ping statistics ---\n4 packets transmitted, 4 received, 0% packet loss, time 3098ms\nrtt min\/avg\/max\/mdev = 0.301\/0.348\/0.404\/0.040 ms\n<\/code><\/pre>\n\n\n\nEnsure you are using Wildcard Elasticsearch SSL\/TLS certificates so as to ensure that you can connect to any of the ES cluster nodes without having to every time reconfigure the agents\/clients to use respective node hostname.<\/p>\n\n\n\n
You can check the guide below on how to generate Wildcard SSL certs for Elasticsearch.<\/p>\n\n\n\n