Add Controller Nodes into Existing OpenStack Cluster using Kolla-Ansible

Add Controller Nodes into Existing OpenStack Cluster using Kolla-Ansible

Is it possible to add controller nodes into existing OpenStack cluster using Kolla-Ansible? Of course, yes! In this blog post, you will learn how to add controller nodes to existing OpenStack cluster to boost the performance and provide high availability. Kolla-Ansible is a powerful tool that streamlines the deployment and management of OpenStack services using Docker containers. If you have deployed your OpenStack using Kolla-Ansible, then this guide is for you!

Use Kolla-Ansible to Add Controller Nodes into Existing OpenStack Cluster

What is a Controller Node and Why is it Important?

In an OpenStack environment, a controller node plays a crucial role in managing and coordinating various services that make up the cloud infrastructure. It acts as the central hub, overseeing functions like authentication, API requests, and overall orchestration of resources.

Here’s why a controller node is important:

  • Centralized Management: The controller node serves as a centralized point for managing core OpenStack services like Nova (compute), Neutron (networking), Keystone (identity), Glance (image), and others. This centralization simplifies administration and control.
  • Orchestration: It coordinates actions among different services to ensure they work together seamlessly. For example, when you launch a new virtual machine, the controller node manages the process, instructing the compute nodes to carry out the request.
  • High Availability: Deploying multiple controller nodes allows for high availability configurations. If one controller node fails, others can take over, ensuring that essential services remain operational, minimizing downtime.
  • Scalability: As your OpenStack cloud grows, adding more controller nodes becomes important for distributing the management load. This scalability ensures efficient handling of increased workloads and resources.
  • Security and Identity Management: The controller node is responsible for user authentication and authorization through Keystone. This centralized identity management enhances security by ensuring consistent access controls across the OpenStack environment.

The recommended number of controller nodes for an OpenStack deployment depends on various factors, including the size of the cloud environment, the anticipated workload, and the desired level of high availability and redundancy. While there isn’t a one-size-fits-all answer, here are some few key notes to take:

  • Minimum Configuration: In a small or development environment, a single controller node might be sufficient for running all the required OpenStack services. This, however, lacks redundancy and high availability.
  • High Availability (HA) Configuration: For production environments where high availability is crucial, a minimum of three controller nodes is often recommended. This allows for redundancy, ensuring that if one controller node fails, the others can continue to operate. This is typically achieved through the use of HAProxy.
  • Scalability: As your OpenStack environment grows, you can scale the number of controller nodes to handle increased load and provide better performance. Adding more controller nodes allows for a distributed and load-balanced architecture.
  • Separation of Services: In larger deployments, it’s common to separate services onto different controller nodes based on their functions. For example, Keystone (identity service) and Horizon (dashboard) might be on one controller node, while Nova (compute) and Neutron (networking) are on another. This separation can help manage resources more efficiently.
  • Resource Considerations: Ensure that each controller node has sufficient resources (CPU, RAM, and disk space) to handle the services running on it. The specific resource requirements depend on the size of your deployment and the services you’re running.

Ideal Timing for Adding Controller Nodes in a Production Environment:

So, what is the ideal time to expand your OpenStack infrastructure?

  • Low Traffic Window:
    • Plan the addition of new controller nodes during a low-traffic window or a maintenance window to minimize the impact on users.
  • Off-Peak Hours:
    • If possible, schedule the addition during off-peak hours when user activity is minimal. This helps reduce the impact on ongoing operations.
  • Backup and Snapshot:
    • Before making any changes, take a backup or snapshot of critical components, including databases and configurations. This ensures a quick recovery in case of unexpected issues.
  • Communication:
    • Notify users and stakeholders in advance about the planned maintenance window, highlighting potential disruptions and assuring them of the temporary nature of the changes.
  • Monitoring and Testing:
    • Implement robust monitoring tools to keep a close eye on the existing infrastructure during the addition of new controller nodes. Test the changes in a staging environment before applying them in production.
  • Rolling Upgrade:
    • If your OpenStack deployment method supports it, consider a rolling upgrade approach. This involves adding new nodes, ensuring they work seamlessly, and then gradually migrating services to the new nodes without disrupting the entire environment.
  • Verify High Availability:
    • If your OpenStack deployment is designed for high availability, make sure that the addition of new controller nodes aligns with the HA configuration. Verify that services can failover and operate as expected.
  • Rollback Plan:
    • Have a well-defined rollback plan in case issues arise during the addition of controller nodes. This includes reverting to the previous state and ensuring minimal impact on users.
  • Post-Deployment Verification:
    • After the new controller nodes are added, perform thorough testing to ensure that OpenStack services are functioning correctly. Monitor the environment to identify and address any issues promptly.
  • Documentation:
    • Update documentation to reflect the changes made, including details about the new controller nodes. This helps maintain a clear record for future reference.

So, how can you add a controller node(s) into an existing OpenStack using Kolla-Ansible;

Prepare the Nodes for Addition into OpenStack

When using Kolla-Ansible for deployment, most of the pre-requisites will be taken care by Kolla-Ansible.

You however need to do the fresh installation of the OS, initial IP assignent to the node, hostname, creation of first user accounts… (You can automate if you want).

Also, I would recommend that you use same OS version for uniformity across the cluster and easy management. We are running Ubuntu 22.04 LTS

Based on our basic deployment architecture;

------------------+---------------------------------------------+--------------------------------+----------------------------------+
                  |                                             |                                |                                  |
+-----------------+-------------------------+     +-------------+-------------+     +------------+--------------+     +-------------+-------------+
|        [ Controller Node ]                |     |     [ Compute01 Node ]    |     |    [ Storage01 Node ]     |     |     [ Compute02 Node ]    |
|                                           |     |                           |     |                           |     |                           |
|        br0: VIP and Mgt IP                |     |  enp1s0: 192.168.200.202  |     |  enp1s0: 192.168.200.201  |     |  enp1s0: 192.168.200.203  |
|             VIP: 192.168.200.254          |     |  enp2s0: 10.100.0.110/24  |     |                           |     |  enp2s0: 10.100.0.111/24  |
|             Mgt IP: 192.168.200.200       |     +---------------------------+     +---------------------------+     +---------------------------+
|        br-ex: Provider Network            |
|               10.100.0.100/24             |
+-------------------------------------------+

In this guide, we have added two controller nodes, assigned the IP addresses, create required user account with required sudo rights. Our basic architecture will now look like;

------------------+---------------------------------------------+--------------------------------+----------------------------------+
                  |                                             |                                |                                  |
+-----------------+-------------------------+     +-------------+-------------+     +------------+--------------+     +-------------+-------------+
|        [  Controller 01  ]                |     |     [ Compute01 Node ]    |     |    [ Storage01 Node ]     |     |     [ Compute02 Node ]    |
|                                           |     |                           |     |                           |     |                           |
|        br0: VIP and Mgt IP                |     |  enp1s0: 192.168.200.202  |     |  enp1s0: 192.168.200.201  |     |  enp1s0: 192.168.200.203  |
|             VIP: 192.168.200.254          |     |  enp2s0: 10.100.0.110/24  |     |                           |     |  enp2s0: 10.100.0.111/24  |
|             Mgt IP: 192.168.200.200       |     +---------------------------+     +---------------------------+     +---------------------------+
|        br-ex: Provider Network            |
|               10.100.0.100/24             |
+-----------------+-------------------------+
                  |
+-----------------+-------------------------+
|        [  Controller 02  ]                |
|                                           |
|        br0: Mgt IP: 192.168.200.204       |
|        br-ex: Provider Network            |
|               10.100.0.101/24             |
+-----------------+-------------------------+
                  |
+-----------------+-------------------------+
|        [  Controller 03  ]                |
|                                           |
|        br0: Mgt IP: 192.168.200.205       |
|        br-ex: Provider Network            |
|               10.100.0.102/24             |
+-----------------+-------------------------+

Check our controller nodes network configuration.

Are you using Shared Storage for Glance Images?

Well, chances are you are using a shared storage for storing glance images. If by any chance this is the case in your environment, then you need to ensure that all controller nodes have access to the shared storage.

For example, in our demo environment, we are using NFS share for storing Glance images;

(kolla-ansible) kifarunix@controller01:~$ docker inspect glance_api
           {
                "Type": "bind",
                "Source": "/mnt/glance",
                "Destination": "/var/lib/glance",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
df -hT -P /mnt/glance
Filesystem                  Type  Size  Used Avail Use% Mounted on
192.168.200.201:/mnt/glance nfs4  100G  747M  100G   1% /mnt/glance

Hence, configure the new controller nodes to also access the share by updating their FSTAB files accordingly.

sudo vim /etc/fstab

Add the line to mount the appropriate NFS share;

192.168.200.201:/mnt/glance /mnt/glance nfs _netdev,defaults 0 0

Where:

  • 192.168.200.201:/mnt/glance: This part indicates the NFS server’s IP address (192.168.200.201) and the exported path (/mnt/glance) on the NFS server.
  • /mnt/glance: This is the local mount point on the current machine where the NFS share will be mounted.
  • nfs: This specifies the file system type to be mounted, in this case, NFS (Network File System).
  • _netdev: This option indicates that the filesystem is a network device and should be mounted after the network has been enabled. This is typically used for network file systems like NFS.
  • defaults: This includes a set of default mount options. The specific default options depend on the operating system, but they typically include options for read/write access, user/group permissions, etc.
  • 0: The dump parameter. It is used by the dump utility to determine whether the file system needs to be backed up.
  • 0: The pass parameter. It is used by the fsck utility to determine the order in which file systems should be checked.

Save and exit the file.

Ensure that the NFS client packages are installed on your system.

On Debian/Ubuntu, you might need to install the nfs-common package:

sudo apt update
sudo apt install nfs-common

On Red Hat-based systems, you might need to install the nfs-utils package:

sudo yum install nfs-utils

Then mount the share;

 sudo mount -a

Copy Deployment User SSH Keys from Control Node to New Controller Node

Your control node is the node where you are running Kolla-ansible. In our setup, we are running Kolla-ansible in our controller01 node.

In regards to the deployment user, if you are not using SSH keys, you need to define the username and password in the multinode configuration file for the new respective controller node to define how Kolla-Ansible will login to configure that respective node.

We are using SSH keys on our guide which we already generated while creating Kolla-Ansible Deployment User Account.

Hence, let just copy the SSH keys into the new controller nodes.

First of all, let’s ensure the new controller nodes are reachable via their hostnames from the control node.

sudo tee -a /etc/hosts << EOL
192.168.200.204 controller02
192.168.200.205 controller03
EOL

Let’s confirm reachability;

ping controller02 -c 4
PING controller02 (192.168.200.204) 56(84) bytes of data.
64 bytes from controller02 (192.168.200.204): icmp_seq=1 ttl=64 time=0.260 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=2 ttl=64 time=0.286 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=3 ttl=64 time=0.334 ms
64 bytes from controller02 (192.168.200.204): icmp_seq=4 ttl=64 time=0.312 ms

--- controller02 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.260/0.298/0.334/0.027 ms
ping controller03 -c 4
PING controller03 (192.168.200.205) 56(84) bytes of data.
64 bytes from controller03 (192.168.200.205): icmp_seq=1 ttl=64 time=0.452 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=2 ttl=64 time=0.397 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=3 ttl=64 time=0.398 ms
64 bytes from controller03 (192.168.200.205): icmp_seq=4 ttl=64 time=0.487 ms

--- controller03 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3076ms
rtt min/avg/max/mdev = 0.397/0.433/0.487/0.038 ms

Next, copy the keys;

for i in 02 03; do ssh-copy-id kifarunix@controller$i; done

Update Kolla-Ansible Inventory

Next, you need to add the new controller nodes in the inventory file. Since we are running a multinode deployment, open the multinode inventory and add your new controller node.

This is a snippet of how our multinode inventory looks like before we add the new controller node;

cat multinode
# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
controller01 ansible_connection=local neutron_external_interface=vethext

# The above can also be specified as follows:
#control[01:03]     ansible_user=kolla

# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
controller01 ansible_connection=local neutron_external_interface=vethext network_interface=br0

[compute]
compute01 neutron_external_interface=enp2s0 network_interface=enp1s0
compute02 neutron_external_interface=enp2s0 network_interface=enp1s0


[monitoring]
controller01 ansible_connection=local neutron_external_interface=vethext

# When compute nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 tunnel_interface=em1

[storage]
storage01 neutron_external_interface=enp10s0 network_interface=enp1s0

[deployment]
localhost       ansible_connection=local

[baremetal:children]
control
network
compute
storage
monitoring

[tls-backend:children]
control

# You can explicitly specify which hosts run each project by updating the
# groups in the sections below. Common services are grouped together.

[common:children]
control
network
compute
storage
monitoring

[collectd:children]
compute

[grafana:children]
monitoring

[etcd:children]
control

[influxdb:children]
monitoring

[prometheus:children]
monitoring

[kafka:children]
control

[telegraf:children]
compute
control
monitoring
network
storage

[hacluster:children]
control

[hacluster-remote:children]
compute

[loadbalancer:children]
network

[mariadb:children]
control

[rabbitmq:children]
control

[outward-rabbitmq:children]
control

[monasca-agent:children]
compute
control
monitoring
network
storage

[monasca:children]
monitoring

[storm:children]
monitoring

[keystone:children]
control

[glance:children]
control

[nova:children]
control

[neutron:children]
network

[openvswitch:children]
network
compute
manila-share

[cinder:children]
control

[cloudkitty:children]
control

[freezer:children]
control

[memcached:children]
control

[horizon:children]
control

[swift:children]
control

[barbican:children]
control

[heat:children]
control

[murano:children]
control

[solum:children]
control

[ironic:children]
control

[magnum:children]
control

[sahara:children]
control

[mistral:children]
control

[manila:children]
control

[ceilometer:children]
control

[aodh:children]
control

[cyborg:children]
control
compute

[gnocchi:children]
control

[tacker:children]
control

[trove:children]
control

[senlin:children]
control

[vitrage:children]
control

[watcher:children]
control

[octavia:children]
control

[designate:children]
control

[placement:children]
control

[bifrost:children]
deployment

[zookeeper:children]
control

[zun:children]
control

[skyline:children]
control

[redis:children]
control

[blazar:children]
control

[venus:children]
monitoring

# Additional control implemented here. These groups allow you to control which
# services run on which hosts at a per-service level.
#
# Word of caution: Some services are required to run on the same host to
# function appropriately. For example, neutron-metadata-agent must run on the
# same host as the l3-agent and (depending on configuration) the dhcp-agent.

# Common
[cron:children]
common

[fluentd:children]
common

[kolla-logs:children]
common

[kolla-toolbox:children]
common

[opensearch:children]
control

# Opensearch dashboards
[opensearch-dashboards:children]
opensearch

# Glance
[glance-api:children]
glance

# Nova
[nova-api:children]
nova

[nova-conductor:children]
nova

[nova-super-conductor:children]
nova

[nova-novncproxy:children]
nova

[nova-scheduler:children]
nova

[nova-spicehtml5proxy:children]
nova

[nova-compute-ironic:children]
nova

[nova-serialproxy:children]
nova

# Neutron
[neutron-server:children]
control

[neutron-dhcp-agent:children]
neutron

[neutron-l3-agent:children]
neutron

[neutron-metadata-agent:children]
neutron

[neutron-ovn-metadata-agent:children]
compute
network

[neutron-bgp-dragent:children]
neutron

[neutron-infoblox-ipam-agent:children]
neutron

[neutron-metering-agent:children]
neutron

[ironic-neutron-agent:children]
neutron

[neutron-ovn-agent:children]
compute
network

# Cinder
[cinder-api:children]
cinder

[cinder-backup:children]
storage

[cinder-scheduler:children]
cinder

[cinder-volume:children]
storage

# Cloudkitty
[cloudkitty-api:children]
cloudkitty

[cloudkitty-processor:children]
cloudkitty

# Freezer
[freezer-api:children]
freezer

[freezer-scheduler:children]
freezer

# iSCSI
[iscsid:children]
compute
storage
ironic

[tgtd:children]
storage

# Manila
[manila-api:children]
manila

[manila-scheduler:children]
manila

[manila-share:children]
network

[manila-data:children]
manila

# Swift
[swift-proxy-server:children]
swift

[swift-account-server:children]
storage

[swift-container-server:children]
storage

[swift-object-server:children]
storage

# Barbican
[barbican-api:children]
barbican

[barbican-keystone-listener:children]
barbican

[barbican-worker:children]
barbican

# Heat
[heat-api:children]
heat

[heat-api-cfn:children]
heat

[heat-engine:children]
heat

# Murano
[murano-api:children]
murano

[murano-engine:children]
murano

# Monasca
[monasca-agent-collector:children]
monasca-agent

[monasca-agent-forwarder:children]
monasca-agent

[monasca-agent-statsd:children]
monasca-agent

[monasca-api:children]
monasca

[monasca-log-persister:children]
monasca

[monasca-log-metrics:children]
monasca

[monasca-thresh:children]
monasca

[monasca-notification:children]
monasca

[monasca-persister:children]
monasca

# Storm
[storm-worker:children]
storm

[storm-nimbus:children]
storm

# Ironic
[ironic-api:children]
ironic

[ironic-conductor:children]
ironic

[ironic-inspector:children]
ironic

[ironic-tftp:children]
ironic

[ironic-http:children]
ironic

# Magnum
[magnum-api:children]
magnum

[magnum-conductor:children]
magnum

# Sahara
[sahara-api:children]
sahara

[sahara-engine:children]
sahara

# Solum
[solum-api:children]
solum

[solum-worker:children]
solum

[solum-deployer:children]
solum

[solum-conductor:children]
solum

[solum-application-deployment:children]
solum

[solum-image-builder:children]
solum

# Mistral
[mistral-api:children]
mistral

[mistral-executor:children]
mistral

[mistral-engine:children]
mistral

[mistral-event-engine:children]
mistral

# Ceilometer
[ceilometer-central:children]
ceilometer

[ceilometer-notification:children]
ceilometer

[ceilometer-compute:children]
compute

[ceilometer-ipmi:children]
compute

# Aodh
[aodh-api:children]
aodh

[aodh-evaluator:children]
aodh

[aodh-listener:children]
aodh

[aodh-notifier:children]
aodh

# Cyborg
[cyborg-api:children]
cyborg

[cyborg-agent:children]
compute

[cyborg-conductor:children]
cyborg

# Gnocchi
[gnocchi-api:children]
gnocchi

[gnocchi-statsd:children]
gnocchi

[gnocchi-metricd:children]
gnocchi

# Trove
[trove-api:children]
trove

[trove-conductor:children]
trove

[trove-taskmanager:children]
trove

# Multipathd
[multipathd:children]
compute
storage

# Watcher
[watcher-api:children]
watcher

[watcher-engine:children]
watcher

[watcher-applier:children]
watcher

# Senlin
[senlin-api:children]
senlin

[senlin-conductor:children]
senlin

[senlin-engine:children]
senlin

[senlin-health-manager:children]
senlin

# Octavia
[octavia-api:children]
octavia

[octavia-driver-agent:children]
octavia

[octavia-health-manager:children]
octavia

[octavia-housekeeping:children]
octavia

[octavia-worker:children]
octavia

# Designate
[designate-api:children]
designate

[designate-central:children]
designate

[designate-producer:children]
designate

[designate-mdns:children]
network

[designate-worker:children]
designate

[designate-sink:children]
designate

[designate-backend-bind9:children]
designate

# Placement
[placement-api:children]
placement

# Zun
[zun-api:children]
zun

[zun-wsproxy:children]
zun

[zun-compute:children]
compute

[zun-cni-daemon:children]
compute

# Skyline
[skyline-apiserver:children]
skyline

[skyline-console:children]
skyline

# Tacker
[tacker-server:children]
tacker

[tacker-conductor:children]
tacker

# Vitrage
[vitrage-api:children]
vitrage

[vitrage-notifier:children]
vitrage

[vitrage-graph:children]
vitrage

[vitrage-ml:children]
vitrage

[vitrage-persistor:children]
vitrage

# Blazar
[blazar-api:children]
blazar

[blazar-manager:children]
blazar

# Prometheus
[prometheus-node-exporter:children]
monitoring
control
compute
network
storage

[prometheus-mysqld-exporter:children]
mariadb

[prometheus-haproxy-exporter:children]
loadbalancer

[prometheus-memcached-exporter:children]
memcached

[prometheus-cadvisor:children]
monitoring
control
compute
network
storage

[prometheus-alertmanager:children]
monitoring

[prometheus-openstack-exporter:children]
monitoring

[prometheus-elasticsearch-exporter:children]
opensearch

[prometheus-blackbox-exporter:children]
monitoring

[prometheus-libvirt-exporter:children]
compute

[prometheus-msteams:children]
prometheus-alertmanager

[masakari-api:children]
control

[masakari-engine:children]
control

[masakari-hostmonitor:children]
control

[masakari-instancemonitor:children]
compute

[ovn-controller:children]
ovn-controller-compute
ovn-controller-network

[ovn-controller-compute:children]
compute

[ovn-controller-network:children]
network

[ovn-database:children]
control

[ovn-northd:children]
ovn-database

[ovn-nb-db:children]
ovn-database

[ovn-sb-db:children]
ovn-database

[venus-api:children]
venus

[venus-manager:children]
venus

So, we will update the [control] group to add our new nodes such that the configuration looks like;

vim multinode

See controller[02:03].

# These initial groups are the only groups required to be modified. The
# additional groups are for more control of the environment.
[control]
controller01 ansible_connection=local neutron_external_interface=vethext
control[02:03] neutron_external_interface=vethext

# The network nodes are where your l3-agent and loadbalancers will run
# This can be the same as a host in the control group
[network]
controller01 ansible_connection=local neutron_external_interface=vethext network_interface=br0
controller[02:03] neutron_external_interface=vethext network_interface=br0

[compute]
compute01 neutron_external_interface=enp2s0 network_interface=enp1s0
compute02 neutron_external_interface=enp2s0 network_interface=enp1s0

[monitoring]
controller01 ansible_connection=local neutron_external_interface=vethext
controller[02:03] neutron_external_interface=vethext

# When controller nodes and control nodes use different interfaces,
# you need to comment out "api_interface" and other interfaces from the globals.yml
# and specify like below:
#compute01 neutron_external_interface=eth0 api_interface=em1 tunnel_interface=em1

[storage]
storage01 neutron_external_interface=enp10s0 network_interface=enp1s0

[deployment]
localhost       ansible_connection=local

[baremetal:children]
control
network
compute
storage
monitoring

[tls-backend:children]
control

# You can explicitly specify which hosts run each project by updating the
# groups in the sections below. Common services are grouped together.

[common:children]
control
network
compute
storage
monitoring
...

Enable HAProxy for High Availability and Load Balancing

The globals.yaml file in Kolla-Ansible is a configuration file where you can set various global parameters for your OpenStack deployment. The parameter enable_haproxy in globals.yaml is used to specify whether or not HAProxy should be enabled as part of the deployment.

Set the value of enable_haproxy to yes to configure HAProxy to provide load balancing and high availability for your OpenStack services across the controller nodes.

vim /etc/kolla/globals.yml
...
enable_haproxy: "yes"
enable_keepalived: "{{ enable_haproxy | bool }}"
...

Save and exit the file.

The above configuration basically sets the value of enable_keepalived to a boolean indicating that if HAProxy is enabled (enable_haproxy is “yes”), enable_keepalived will be True; otherwise, it will be False.

While Keepalived manages virtual IP addresses associated with the active controller node using VRRP (Virtual Router Redundancy Protocol), HAProxy load-balances traffic to service backends.

HAProxy regularly checks the health of each controller node by sending health-check requests. If a node becomes unreachable or fails these health checks, HAProxy stops directing traffic to that node.

Keepalived continuously monitors the health of the active controller node. If it detects a failure (for example, if the controller node becomes unreachable), Keepalived triggers a failover process and automatically transfers the VIP to one of the standby controller nodes that are still healthy based on their priority setting.

You can check the configuration of HAProxy on /etc/kolla/haproxy/haproxy.cfg and configuration for KeepAlived on /etc/kolla/keepalived/keepalived.conf.

Activate Kolla-Ansible Virtual Environment

Activate your respective virtual environment;

source ~/kolla-ansible/bin/activate

Test Connectivity to the Node

Execute the Ansible command below to check the reachability of node in your inventory using the Ansible ping module.

ansible -i multinode -m ping controller02,controller03

Sample output;

controller03 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
controller02 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

Bootstrap the new Controller Nodes

You need to bootstrap the controller nodes with kolla deploy dependencies by running the command below.

kolla-ansible -i <inventory> bootstrap-servers [ --limit <limit> ]

Replace the <inventory> with your inventory file. When adding controller nodes, <limit> needs to specify the controller nodes group, in this case control, since you need the boostrap command to generate the Fernet keys that are used for encrypting tokens in Keystone and provide security for authentication tokens and then distributed them across the Keystone hosts/controller nodes to ensure consistency.

Be cautious about re-bootstrapping a cloud that has already been boostrapped. See some considerations for reboostrapping.

Thus, our command will look like;

kolla-ansible -i multinode bootstrap-servers --limit control

This command may restart docker containers in other nodes. Hence, also check the ideal time to add controller nodes into production deployment as outlined above.

If at some point during the deployment, MariaDB stucks at starting after a restart;

docker ps | grep mariadb
fa28f34cbad9   quay.io/openstack.kolla/mariadb-server:2023.1-ubuntu-jammy                  "dumb-init -- kolla_…"   3 hours ago     Up 24 seconds (health: starting)             mariadb

And such errors appear on the logs;

tail -f /var/log/kolla/mariadb/mariadb.log
2023-11-12  8:19:18 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50534S), skipping check
2023-11-12  8:19:47 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2023-11-12  8:19:47 0 [Note] WSREP: view((empty))
2023-11-12  8:19:47 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
	 at ./gcomm/src/pc.cpp:connect():160
2023-11-12  8:19:47 0 [ERROR] WSREP: ./gcs/src/gcs_core.cpp:gcs_core_open():221: Failed to open backend connection: -110 (Connection timed out)
2023-11-12  8:19:48 0 [ERROR] WSREP: ./gcs/src/gcs.cpp:gcs_open():1669: Failed to open channel 'openstack' at 'gcomm://192.168.200.200:4567,192.168.200.204:4567,192.168.200.205:4567': -110 (Connection timed out)
2023-11-12  8:19:48 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2023-11-12  8:19:48 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.200.200:4567,192.168.200.204:4567,192.168.200.205:4567) failed: 7
2023-11-12  8:19:48 0 [ERROR] Aborting
231112 08:19:48 mysqld_safe mysqld from pid file /var/lib/mysql/mariadb.pid ended

You can execure the database recovery;

kolla-ansible -i multinode mariadb_recovery

Once that is completed, you can re-run the deployment command.

Run Pre-Deployment Checks on the New Controller Nodes

Next, run pre-deployment checks for node;

kolla-ansible -i multinode prechecks --limit controller02,controller03

Deploy Required Services Docker Containers on the New Controller Nodes

Next, deploy Docker containers for the required services on the new controller nodes.

To begin with, download the container images into the host

kolla-ansible -i multinode pull --limit controller02,controller03

When the command completes, you can list the container images on the node.

You can login to the node and check list images;

docker images

or just list them using ansible from the control node;

ansible -i multinode -m raw -a "sudo docker images" controller02

Sample output;

controller02 | CHANGED | rc=0 >>
REPOSITORY                                              TAG                   IMAGE ID       CREATED       SIZE
quay.io/openstack.kolla/cinder-scheduler                2023.1-ubuntu-jammy   41d2c120dcaf   5 hours ago   1.33GB
quay.io/openstack.kolla/cinder-api                      2023.1-ubuntu-jammy   dc89b0b74a64   5 hours ago   1.33GB
quay.io/openstack.kolla/neutron-server                  2023.1-ubuntu-jammy   dd92163f698e   5 hours ago   1.05GB
quay.io/openstack.kolla/neutron-l3-agent                2023.1-ubuntu-jammy   47e8b6290070   5 hours ago   1.05GB
quay.io/openstack.kolla/aodh-listener                   2023.1-ubuntu-jammy   4f103cd2aa96   5 hours ago   892MB
quay.io/openstack.kolla/aodh-notifier                   2023.1-ubuntu-jammy   577c5461b901   5 hours ago   892MB
quay.io/openstack.kolla/aodh-api                        2023.1-ubuntu-jammy   80311624cca1   5 hours ago   892MB
quay.io/openstack.kolla/aodh-evaluator                  2023.1-ubuntu-jammy   82850f3f9b82   5 hours ago   892MB
quay.io/openstack.kolla/neutron-dhcp-agent              2023.1-ubuntu-jammy   3c9ab4533eb5   5 hours ago   1.04GB
quay.io/openstack.kolla/keystone                        2023.1-ubuntu-jammy   d4c445793a75   5 hours ago   942MB
quay.io/openstack.kolla/keystone-ssh                    2023.1-ubuntu-jammy   946e8484c8a5   5 hours ago   948MB
quay.io/openstack.kolla/neutron-openvswitch-agent       2023.1-ubuntu-jammy   8c8468ded47c   5 hours ago   1.04GB
quay.io/openstack.kolla/neutron-metadata-agent          2023.1-ubuntu-jammy   87d457f01029   5 hours ago   1.04GB
quay.io/openstack.kolla/keystone-fernet                 2023.1-ubuntu-jammy   0d92e47e97b5   5 hours ago   946MB
quay.io/openstack.kolla/nova-novncproxy                 2023.1-ubuntu-jammy   14934aa7795f   5 hours ago   1.22GB
quay.io/openstack.kolla/placement-api                   2023.1-ubuntu-jammy   fb19639b8875   5 hours ago   894MB
quay.io/openstack.kolla/horizon                         2023.1-ubuntu-jammy   2c893a67ae9f   5 hours ago   1.11GB
quay.io/openstack.kolla/heat-api-cfn                    2023.1-ubuntu-jammy   7ec11e8fefb9   5 hours ago   960MB
quay.io/openstack.kolla/heat-api                        2023.1-ubuntu-jammy   e925a6c8b000   5 hours ago   960MB
quay.io/openstack.kolla/heat-engine                     2023.1-ubuntu-jammy   048d6ebe056e   5 hours ago   960MB
quay.io/openstack.kolla/nova-scheduler                  2023.1-ubuntu-jammy   ab1b447c7021   5 hours ago   1.11GB
quay.io/openstack.kolla/nova-conductor                  2023.1-ubuntu-jammy   ca809956e88b   5 hours ago   1.11GB
quay.io/openstack.kolla/nova-api                        2023.1-ubuntu-jammy   137618290a77   5 hours ago   1.11GB
quay.io/openstack.kolla/glance-api                      2023.1-ubuntu-jammy   4e47347129e3   5 hours ago   1.04GB
quay.io/openstack.kolla/zun-wsproxy                     2023.1-ubuntu-jammy   6004d566ab20   5 hours ago   1.01GB
quay.io/openstack.kolla/zun-api                         2023.1-ubuntu-jammy   f93535599eba   5 hours ago   1.01GB
quay.io/openstack.kolla/gnocchi-api                     2023.1-ubuntu-jammy   4e8e97ea301a   5 hours ago   1.07GB
quay.io/openstack.kolla/gnocchi-metricd                 2023.1-ubuntu-jammy   c8cefb46f796   5 hours ago   1.07GB
quay.io/openstack.kolla/gnocchi-statsd                  2023.1-ubuntu-jammy   b6b3c5404e55   5 hours ago   1.07GB
quay.io/openstack.kolla/ceilometer-central              2023.1-ubuntu-jammy   52e724bda206   5 hours ago   895MB
quay.io/openstack.kolla/ceilometer-notification         2023.1-ubuntu-jammy   24affe5317f4   5 hours ago   895MB
quay.io/openstack.kolla/kolla-toolbox                   2023.1-ubuntu-jammy   47fb2da898a6   5 hours ago   819MB
quay.io/openstack.kolla/mariadb-server                  2023.1-ubuntu-jammy   08f31f386d7b   5 hours ago   605MB
quay.io/openstack.kolla/mariadb-clustercheck            2023.1-ubuntu-jammy   df9693cf6795   5 hours ago   322MB
quay.io/openstack.kolla/prometheus-blackbox-exporter    2023.1-ubuntu-jammy   3786777ba92f   5 hours ago   278MB
quay.io/openstack.kolla/prometheus-alertmanager         2023.1-ubuntu-jammy   9dc45a653636   5 hours ago   314MB
quay.io/openstack.kolla/prometheus-node-exporter        2023.1-ubuntu-jammy   29854ef51590   5 hours ago   276MB
quay.io/openstack.kolla/prometheus-memcached-exporter   2023.1-ubuntu-jammy   bccdd901d45b   5 hours ago   271MB
quay.io/openstack.kolla/prometheus-mysqld-exporter      2023.1-ubuntu-jammy   142d34487ae1   5 hours ago   272MB
quay.io/openstack.kolla/prometheus-openstack-exporter   2023.1-ubuntu-jammy   00708fff486b   5 hours ago   267MB
quay.io/openstack.kolla/prometheus-cadvisor             2023.1-ubuntu-jammy   57e64a35a36d   5 hours ago   295MB
quay.io/openstack.kolla/prometheus-haproxy-exporter     2023.1-ubuntu-jammy   8da898f5509f   5 hours ago   272MB
quay.io/openstack.kolla/prometheus-v2-server            2023.1-ubuntu-jammy   34d5a456575e   5 hours ago   469MB
quay.io/openstack.kolla/openvswitch-vswitchd            2023.1-ubuntu-jammy   da575b11b8d8   5 hours ago   274MB
quay.io/openstack.kolla/grafana                         2023.1-ubuntu-jammy   9485136dab77   5 hours ago   673MB
quay.io/openstack.kolla/openvswitch-db-server           2023.1-ubuntu-jammy   45b86c2748cb   5 hours ago   274MB
quay.io/openstack.kolla/fluentd                         2023.1-ubuntu-jammy   f0b67ea51417   5 hours ago   529MB
quay.io/openstack.kolla/keepalived                      2023.1-ubuntu-jammy   34c6f0412dca   5 hours ago   269MB
quay.io/openstack.kolla/rabbitmq                        2023.1-ubuntu-jammy   a9e21182aae7   5 hours ago   314MB
quay.io/openstack.kolla/cron                            2023.1-ubuntu-jammy   f0bc61f82d64   5 hours ago   258MB
quay.io/openstack.kolla/memcached                       2023.1-ubuntu-jammy   366a6292a66e   5 hours ago   259MB
quay.io/openstack.kolla/etcd                            2023.1-ubuntu-jammy   37bc7fc289fe   5 hours ago   297MB
quay.io/openstack.kolla/haproxy                         2023.1-ubuntu-jammy   b64892cee984   5 hours ago   266MB

Deploy containers on the new controller nodes. Again, you have to specify all the controller nodes here. Thus, –limit control.

kolla-ansible -i multinode deploy --limit control

Ensure there is no error. If any, fix it and proceed.

Verify New Node Addition to OpenStack

You can now verify if the new controller nodes have been successfully added into OpenStack cluster.

To begin with, you can list Docker containers running on the node (the command below is executed from the control node);

ansible -i multinode -m raw -a "sudo docker ps" controller02

sample output;

controller02 | CHANGED | rc=0 >>
CONTAINER ID   IMAGE                                                                       COMMAND                  CREATED       STATUS                   PORTS     NAMES
ae81cdd54526   quay.io/openstack.kolla/zun-wsproxy:2023.1-ubuntu-jammy                     "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               zun_wsproxy
559251e9067c   quay.io/openstack.kolla/zun-api:2023.1-ubuntu-jammy                         "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               zun_api
703c89f0d130   quay.io/openstack.kolla/grafana:2023.1-ubuntu-jammy                         "dumb-init --single-…"   2 hours ago   Up 2 hours                         grafana
d55aa3751b1d   quay.io/openstack.kolla/aodh-notifier:2023.1-ubuntu-jammy                   "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               aodh_notifier
b7f1ade8cf50   quay.io/openstack.kolla/aodh-listener:2023.1-ubuntu-jammy                   "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               aodh_listener
5a2f03854727   quay.io/openstack.kolla/aodh-evaluator:2023.1-ubuntu-jammy                  "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               aodh_evaluator
ca5054ffcb1c   quay.io/openstack.kolla/aodh-api:2023.1-ubuntu-jammy                        "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               aodh_api
b3fe5de9c0d7   quay.io/openstack.kolla/ceilometer-central:2023.1-ubuntu-jammy              "dumb-init --single-…"   2 hours ago   Up 2 hours (unhealthy)             ceilometer_central
96e41a027685   quay.io/openstack.kolla/ceilometer-notification:2023.1-ubuntu-jammy         "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               ceilometer_notification
407a3b083aa9   quay.io/openstack.kolla/gnocchi-statsd:2023.1-ubuntu-jammy                  "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               gnocchi_statsd
8847afd527c6   quay.io/openstack.kolla/gnocchi-metricd:2023.1-ubuntu-jammy                 "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               gnocchi_metricd
6409e89ac7c9   quay.io/openstack.kolla/gnocchi-api:2023.1-ubuntu-jammy                     "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               gnocchi_api
9f89e8ea58cd   quay.io/openstack.kolla/horizon:2023.1-ubuntu-jammy                         "dumb-init --single-…"   2 hours ago   Up 2 hours (healthy)               horizon
853c7bbaea53   quay.io/openstack.kolla/mariadb-server:2023.1-ubuntu-jammy                  "dumb-init -- kolla_…"   2 hours ago   Up 2 hours (healthy)               mariadb
76651604bdf0   quay.io/openstack.kolla/heat-engine:2023.1-ubuntu-jammy                     "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               heat_engine
c6ef2875a2b4   quay.io/openstack.kolla/heat-api-cfn:2023.1-ubuntu-jammy                    "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               heat_api_cfn
12dd70306101   quay.io/openstack.kolla/heat-api:2023.1-ubuntu-jammy                        "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               heat_api
2cddac565e6d   quay.io/openstack.kolla/neutron-metadata-agent:2023.1-ubuntu-jammy          "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               neutron_metadata_agent
ae50f1adecbe   quay.io/openstack.kolla/neutron-l3-agent:2023.1-ubuntu-jammy                "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               neutron_l3_agent
a14026c69184   quay.io/openstack.kolla/neutron-dhcp-agent:2023.1-ubuntu-jammy              "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               neutron_dhcp_agent
803595b10a5f   quay.io/openstack.kolla/neutron-openvswitch-agent:2023.1-ubuntu-jammy       "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               neutron_openvswitch_agent
bc1c0f6b268d   quay.io/openstack.kolla/neutron-server:2023.1-ubuntu-jammy                  "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               neutron_server
c0cd45da353d   quay.io/openstack.kolla/openvswitch-vswitchd:2023.1-ubuntu-jammy            "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               openvswitch_vswitchd
4d452111b3ba   quay.io/openstack.kolla/openvswitch-db-server:2023.1-ubuntu-jammy           "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               openvswitch_db
33cb79ec0a55   quay.io/openstack.kolla/nova-novncproxy:2023.1-ubuntu-jammy                 "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               nova_novncproxy
6ff84d56ce4c   quay.io/openstack.kolla/nova-conductor:2023.1-ubuntu-jammy                  "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               nova_conductor
a6d168bca4c9   quay.io/openstack.kolla/nova-api:2023.1-ubuntu-jammy                        "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               nova_api
e338faeeeee0   quay.io/openstack.kolla/nova-scheduler:2023.1-ubuntu-jammy                  "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               nova_scheduler
9988ebb9618a   quay.io/openstack.kolla/placement-api:2023.1-ubuntu-jammy                   "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               placement_api
9a6e5585150f   quay.io/openstack.kolla/cinder-scheduler:2023.1-ubuntu-jammy                "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               cinder_scheduler
77d6e4115252   quay.io/openstack.kolla/cinder-api:2023.1-ubuntu-jammy                      "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               cinder_api
e0590534d6c8   quay.io/openstack.kolla/glance-api:2023.1-ubuntu-jammy                      "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               glance_api
e08bb362be08   quay.io/openstack.kolla/keystone:2023.1-ubuntu-jammy                        "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               keystone
21534f07df6a   quay.io/openstack.kolla/keystone-fernet:2023.1-ubuntu-jammy                 "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               keystone_fernet
06726634c6eb   quay.io/openstack.kolla/keystone-ssh:2023.1-ubuntu-jammy                    "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               keystone_ssh
8fba633812b7   quay.io/openstack.kolla/etcd:2023.1-ubuntu-jammy                            "dumb-init --single-…"   7 hours ago   Up 2 hours                         etcd
bce5a05dcbba   quay.io/openstack.kolla/rabbitmq:2023.1-ubuntu-jammy                        "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               rabbitmq
f1ff17d3b41a   quay.io/openstack.kolla/prometheus-blackbox-exporter:2023.1-ubuntu-jammy    "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_blackbox_exporter
5d900639936d   quay.io/openstack.kolla/prometheus-openstack-exporter:2023.1-ubuntu-jammy   "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_openstack_exporter
0f5bd3e873b0   quay.io/openstack.kolla/prometheus-alertmanager:2023.1-ubuntu-jammy         "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_alertmanager
aa9ef3408ee0   quay.io/openstack.kolla/prometheus-cadvisor:2023.1-ubuntu-jammy             "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_cadvisor
85443d572966   quay.io/openstack.kolla/prometheus-memcached-exporter:2023.1-ubuntu-jammy   "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_memcached_exporter
77df867d2b80   quay.io/openstack.kolla/prometheus-haproxy-exporter:2023.1-ubuntu-jammy     "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_haproxy_exporter
f4e0e360bdad   quay.io/openstack.kolla/prometheus-mysqld-exporter:2023.1-ubuntu-jammy      "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_mysqld_exporter
1394ba79ee7c   quay.io/openstack.kolla/prometheus-node-exporter:2023.1-ubuntu-jammy        "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_node_exporter
43136ca82078   quay.io/openstack.kolla/prometheus-v2-server:2023.1-ubuntu-jammy            "dumb-init --single-…"   7 hours ago   Up 2 hours                         prometheus_server
87595629490c   quay.io/openstack.kolla/memcached:2023.1-ubuntu-jammy                       "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               memcached
184454c7d93b   quay.io/openstack.kolla/mariadb-clustercheck:2023.1-ubuntu-jammy            "dumb-init --single-…"   7 hours ago   Up 2 hours                         mariadb_clustercheck
27690e469dc2   quay.io/openstack.kolla/keepalived:2023.1-ubuntu-jammy                      "dumb-init --single-…"   7 hours ago   Up 2 hours                         keepalived
0d61e2f1edc6   quay.io/openstack.kolla/haproxy:2023.1-ubuntu-jammy                         "dumb-init --single-…"   7 hours ago   Up 2 hours (healthy)               haproxy
b9477be14773   quay.io/openstack.kolla/cron:2023.1-ubuntu-jammy                            "dumb-init --single-…"   7 hours ago   Up 2 hours                         cron
291f972ce16c   quay.io/openstack.kolla/kolla-toolbox:2023.1-ubuntu-jammy                   "dumb-init --single-…"   7 hours ago   Up 2 hours                         kolla_toolbox
9e3bffb135fa   quay.io/openstack.kolla/fluentd:2023.1-ubuntu-jammy                         "dumb-init --single-…"   7 hours ago   Up 2 hours                         fluentd

Well, I am not sure there is an easy way to list the controller nodes, -:). But from what I can see, they are listed under internal availability zone.

Load the credentials;

source $HOME/kolla-ansible/bin/activate
source /etc/kolla/admin-openrc.sh

Let’s see;

openstack availability zone list --compute --long
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+
| Zone Name | Zone Status | Zone Resource | Host Name    | Service Name   | Service Status                         |
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+
| internal  | available   |               | controller02 | nova-scheduler | enabled :-) 2023-11-14T15:22:10.000000 |
| internal  | available   |               | controller02 | nova-conductor | enabled :-) 2023-11-14T15:22:14.000000 |
| internal  | available   |               | controller03 | nova-scheduler | enabled :-) 2023-11-14T15:22:11.000000 |
| internal  | available   |               | controller03 | nova-conductor | enabled :-) 2023-11-14T15:22:11.000000 |
| internal  | available   |               | controller01 | nova-scheduler | enabled :-) 2023-11-14T15:22:13.000000 |
| internal  | available   |               | controller01 | nova-conductor | enabled :-) 2023-11-14T15:22:12.000000 |
| nova      | available   |               | compute02    | nova-compute   | enabled :-) 2023-11-14T15:22:09.000000 |
| nova      | available   |               | compute01    | nova-compute   | enabled :-) 2023-11-14T15:22:07.000000 |
+-----------+-------------+---------------+--------------+----------------+----------------------------------------+

If you can also list compute services, you will see some services distributed across controller nodes!

openstack compute service list

Output;

+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+
| ID                                   | Binary         | Host         | Zone     | Status  | State | Updated At                 |
+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+
| b4d36484-cd27-4f5b-bf18-c93d7184890d | nova-scheduler | controller01 | internal | enabled | up    | 2023-11-12T15:42:45.000000 |
| 7be8d02c-8a76-48a0-bfa0-3edfc583bc9c | nova-scheduler | controller02 | internal | enabled | up    | 2023-11-12T15:42:44.000000 |
| 24253619-07d6-479f-975c-b2876c81d12f | nova-scheduler | controller03 | internal | enabled | up    | 2023-11-12T15:42:42.000000 |
| 5efddcac-fdf4-4ce3-8843-43e1784dc8d2 | nova-conductor | controller01 | internal | enabled | up    | 2023-11-12T15:42:48.000000 |
| 77ea6f87-5144-476b-9685-ebb6f1765b09 | nova-compute   | compute02    | nova     | enabled | up    | 2023-11-12T15:42:42.000000 |
| 546d891d-04f8-41f7-b2a6-714569e4bc52 | nova-compute   | compute01    | nova     | enabled | up    | 2023-11-12T15:42:46.000000 |
| 3d1fa2a8-0637-4c39-9941-d80a011a6be1 | nova-conductor | controller03 | internal | enabled | up    | 2023-11-12T15:42:42.000000 |
| 22df4a17-2d4b-4413-b81e-32856c5ba2c5 | nova-conductor | controller02 | internal | enabled | up    | 2023-11-12T15:42:42.000000 |
+--------------------------------------+----------------+--------------+----------+---------+-------+----------------------------+

You can see some services are distributed across the controller nodes.

Testing Virtual IP (VIP) high availability on Controller Nodes

At this point, you might need to simulate a scenario whereby the active node (containing the VIP) fails, and verifying that the failover process works as expected, with minimal or no disruption to the services.

Each of our three controller nodes, have their priority numbers defined.

What is a priority number in Keepalived? In Keepalived, the priority number is used to determine the priority of a node in a High Availability (HA) setup. The node with the highest priority is typically chosen as the master (active) node, and in the event of a failure, the node with the next highest priority becomes the master. The priority is an integer value within the range of 0 to 255, and the node with the highest priority is considered the most preferred for the master role.

Let’s check our nodes priorities;

Controller01;

sudo grep priority /etc/kolla/keepalived/keepalived.conf
    priority 1

Controller02;

sudo grep priority /etc/kolla/keepalived/keepalived.conf
    priority 2

Controller03;

sudo grep priority /etc/kolla/keepalived/keepalived.conf
    priority 3

From the commands output above, controller03 is the most preferred node and for our case, it currently has the VIP assigned;

root@controller03:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
enp1s0           UP             
enp7s0           UP             
br-ex            UP             10.100.0.102/24 fe80::c866:d5ff:feba:3cf8/64 
br0              UP             192.168.200.205/24 192.168.200.254/32 fe80::f04e:aaff:fe5c:90f9/64 
vethext@vethint  UP             fe80::44ea:9aff:fe7f:b1e0/64 
vethint@vethext  UP             
ovs-system       DOWN           
br-int           DOWN           
br-tun           DOWN           
root@controller03:~#

So, there are multiple ways to simulate the failover here. For example, you can pause node from the virtualization host, you can take down the network interface, or anything to make sure it temporarily becomes unreachable. If you are doing this in production environment, be cautious!!

So, before I take down controller03, I will tail the logs on controller01 and controller02;

Controller02;

root@controller02:~# docker logs --tail 10 -f keepalived

Controller01;

kifarunix@controller01:~$ docker logs --tail 10 -f keepalived

Now, temporarily taking down controller03;

Logs on Controller02 (see the last line, Sun Nov 12 16:46:39 2023: (kolla_internal_vip_51) Entering MASTER STATE) with higher priority than controller01;

Sun Nov 12 12:58:27 2023:    Reset ARP config counter 0
Sun Nov 12 12:58:27 2023:    Original arp_ignore 0
Sun Nov 12 12:58:27 2023:    Original arp_filter 0
Sun Nov 12 12:58:27 2023:    Original promote_secondaries 1
Sun Nov 12 12:58:27 2023:    Reset promote_secondaries counter 0
Sun Nov 12 12:58:27 2023: Script `check_alive` now returning 1
Sun Nov 12 12:58:27 2023: VRRP_Script(check_alive) failed (exited with status 1)
Sun Nov 12 12:58:31 2023: Script `check_alive` now returning 0
Sun Nov 12 12:58:49 2023: VRRP_Script(check_alive) succeeded
Sun Nov 12 12:58:49 2023: (kolla_internal_vip_51) Entering BACKUP STATE



Sun Nov 12 16:46:39 2023: (kolla_internal_vip_51) Entering MASTER STATE

Nothing much on controller01.

Hence, controller02 should now be having the VIP;

root@controller02:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
enp1s0           UP             
enp7s0           UP             
br-ex            UP             10.100.0.101/24 fe80::5cfc:faff:fe5e:9a93/64 
br0              UP             192.168.200.204/24 192.168.200.254/32 fe80::b04a:bbff:fece:1c6/64 
vethext@vethint  UP             fe80::d458:43ff:fe9c:23a2/64 
vethint@vethext  UP             
ovs-system       DOWN           
br-int           DOWN           
br-tun           DOWN           
vxlan_sys_4789   UNKNOWN        fe80::98c7:18ff:fe88:4f14/64 

And all services are working as expected for me! Hence, at this point, I believe it is good to conclude that the three node controller cluster is working as expected.

When controller03, with high priority is back up, it will assume the master state.

Controller02 Keepalived logs;

Sun Nov 12 16:56:52 2023: (kolla_internal_vip_51) Master received advert from 192.168.200.205 with higher priority 3, ours 2
Sun Nov 12 16:56:52 2023: (kolla_internal_vip_51) Entering BACKUP STATE

Heads up! Before you can conclude that your cluster is working as expected, perform thorough testing!

Define Preferred Master Controller Node

To define a preferred master controller node in a Keepalived setup, you typically need to adjust the priority configuration for the nodes. In Keepalived, the node with the highest priority is elected as the master.

Here is how to define preferred master controller node:

Identify the Keepalived configuration file on each controller node. In our Kolla-ansible deployment, the Keepalived configuration is found under /etc/kolla.

/etc/kolla/keepalived/keepalived.conf is the configuration.

Edit the Keepalived configuration file on the node you want to set as the preferred master. Look for the vrrp_instance section, and specifically, the priority parameter.

vim /etc/kolla/keepalived/keepalived.conf

Change the priority value to a higher number for the preferred master node. Nodes with higher priority values will be preferred for the master role. We set the value of controller02 priority to 20.

vrrp_script check_alive {
    script "/check_alive.sh"
    interval 2
    fall 2
    rise 10
}

vrrp_instance kolla_internal_vip_51 {
    state BACKUP
    nopreempt
    interface br0
    virtual_router_id 51
    priority 20
    advert_int 1
    virtual_ipaddress {
        192.168.200.254 dev br0
    }
    authentication {
        auth_type PASS
        auth_pass 9MC1BOSy764sBcZFHxzniiLwMrBz6iz3HiRcOLWv
    }
    track_script {
        check_alive
    }
}

Save the changes to the configuration file.

Restart Keepalived on the node where you made the configuration changes:

sudo docker restart keepalived

Check the logs or status to confirm that Keepalived has successfully restarted.

On the node that currently has VIP, you can restart Keepalived to relieve it of master role.

Check that the controller node with high priority has assumed the master role:

Sample logs on my controller node;

docker logs --tail 5 -f keepalived
Wed Nov 15 05:19:20 2023:    Original arp_filter 0
Wed Nov 15 05:19:20 2023:    Original promote_secondaries 1
Wed Nov 15 05:19:20 2023:    Reset promote_secondaries counter 0
Wed Nov 15 05:19:20 2023: VRRP_Script(check_alive) succeeded
Wed Nov 15 05:19:20 2023: (kolla_internal_vip_51) Entering BACKUP STATE
Wed Nov 15 05:19:39 2023: (kolla_internal_vip_51) Entering MASTER STATE

Check the Virtual IP (VIP) to ensure it is associated with the preferred master node:

root@controller02:~# ip a
...
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether 52:54:00:0c:c4:bf brd ff:ff:ff:ff:ff:ff
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-ex state UP group default qlen 1000
    link/ether 52:54:00:5c:50:31 brd ff:ff:ff:ff:ff:ff
4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:fc:fa:5e:9a:93 brd ff:ff:ff:ff:ff:ff
    inet 10.100.0.101/24 brd 10.100.0.255 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::5cfc:faff:fe5e:9a93/64 scope link 
       valid_lft forever preferred_lft forever
5: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether b2:4a:bb:ce:01:c6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.200.204/24 brd 192.168.200.255 scope global br0
       valid_lft forever preferred_lft forever
    inet 192.168.200.254/32 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::b04a:bbff:fece:1c6/64 scope link 
       valid_lft forever preferred_lft forever
...

Remember to repeat these steps for each controller node where you want to set or adjust the priority. Adjust the priority values based on your desired preference.

And that concludes our guide on how to add controller nodes into existing Openstack cluster using Kolla-Ansible.

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment