StatefulSets in Kubernetes: Everything You Need to Know

StatefulSets in Kubernetes: Everything You Need to Know

In this blog post, you will learn about statefulsets in Kubernetes and everything you need to know: the definition and purpose of StatefulSets, their importance in handling stateful applications, and how they differ from other Kubernetes objects like Deployments. By the end, you’ll have a thorough understanding of how to create, manage, and scale StatefulSets.

Understanding StatefulSets in Kubernetes

What are StatefulSets?

A StatefulSet in Kubernetes is a workload API object specifically designed for managing stateful applications that require persistent storage and stable network identities. They ensure order, identity and persistence of your stateful application. Imagine a database in a Kubernetes cluster. Each database Pod represents an instance, but the data itself needs to persist across restarts or Pod scaling. This is where StatefulSets come into play.

Core Concepts in Kubernetes StatefulSets

There are a number of Kubernetes concepts that are used in StatefulSets that you should be familiar with. These include:

  • Pod: As you already know, a Pod is the smallest deployable unit that represents a set of one or more containers running together on a Kubernetes cluster. Pods are the fundamental building blocks of Kubernetes applications and serve as the basic unit of execution.
  • Cluster DNS: The name of a StatefulSet object must be a valid DNS label. Cluster DNS facilitates service discovery by assigning DNS names to Services and Pods.
  • Headless Service: In Kubernetes, regular Services use clusterIP to provide a unified virtual IP for load balancing requests across all Pods, suited for stateless applications. In contrast, headless services, used by StatefulSets, do not use clusterIP (clusterIP: None) and provide direct access to each Pod’s IP address or DNS name via internal DNS records served by the cluster’s DNS service. This setup allows stateful applications to maintain stable and direct connections to specific Pods, crucial for tasks like data replication and configuration where individual Pod identity is essential for maintaining consistency and stability across the cluster.
  • PersistentVolumes (PVs): Kubernetes objects that represent persistent storage resources available in the cluster (e.g., local storage, cloud storage). They exist independently of Pods and persist data even when Pods are deleted or rescheduled.
  • PersistentVolume Claims (PVCs): PVCs act as requests for storage by Pods, specifying the access mode and storage requirements. StatefulSets manage PVCs to ensure Pods have persistent storage across restarts and scaling operations.

Stateful vs Stateless Applications

So, what exactly is a stateful application and how does it differ with stateless application? Before diving further into StatefulSets, it’s essential to grasp the fundamental differences between stateful and stateless applications:

Stateful Applications

These are applications that maintain data and state between sessions. They require persistent storage and unique identities to preserve data integrity. Examples include databases (MySQL, PostgreSQL), message queues (Kafka), and caching solutions (Redis) etc.

Managing stateful applications introduces complexities related to data persistence, reliable network communication, and lifecycle management, which StatefulSets address effectively in Kubernetes.

Stateless Applications

In contrast, stateless applications do not store session data locally. They can be easily scaled horizontally by adding or removing instances without concerns about data persistence or state management. Think of a web servers serving static content, that is an example of a stateless application.

In summary;

FeatureStateful ApplicationsStateless Applications
State ManagementMaintains state and data between sessionsDoes not maintain state locally
Data PersistenceRequires persistent storage for data integrityData is ephemeral and does not persist
ScalabilityScaling requires careful management to maintain data consistencyEasily scale horizontally by adding or removing instances
Network IdentityTypically have stable network identities (hostnames)Network identity is not crucial and may change
Restart BehaviorPods may retain state across restarts and reschedulingEach instance is independent; no state retained
Use CasesDatabases (MySQL, PostgreSQL), distributed systems (Kafka, Cassandra)Web servers, load balancers, API gateways
StatefulSet ManagementManaged by StatefulSets for ordered deployment, scaling, and identityManaged by Deployments for simple scaling and restarts

StatefulSets vs. Deployments

How does StatefulSets compare to Kubernetes Deployments? While both are Kubernetes controllers used to manage Pods, they serve different purposes:

FeatureStatefulSetDeployment
Target ApplicationsUsed with applications that require stable identities and persistent storage for data such databases, message queues, caching solutionsIdeal for stateless applications that are easily scalable and don’t rely on persistent data such as web servers, API gateways, load balancers
Pod IdentityStatefulSets ensure each Pod has a unique and persistent identifier (0,1,2..) that remains constant across restarts or rescheduling. This is crucial for stateful applications that depend on specific Pod order or communicationDeployment treat Pods as interchangeable units. That is, they do not maintain local state or session data that needs to be preserved between restarts or rescheduling. They also receive non-ordinal, random identifiers (Pod-abcdef1234-xyz) that may change across restarts making them simpler to manage
Data PersistenceRequires persistent storage (integrates with PVs). This guarantees data survives Pod restarts or scaling operationsData is ephemeral (not persisted)
Deployment and ScalingWhen you scale a StatefulSet (adding or removing Pods), it follows a predictable order. This is essential for applications with dependencies between Pods or specific initialization sequences.Pods are independent of each other. Deployments does not impose any specific order for these operations, hence, Pods can start or stop in any sequence based on cluster conditions and scheduling
UpdatesFacilitates rolling updates, but due to the ordered nature of StatefulSets, updates might require more planning to ensure data consistency across Pods.Rolling updates: New Pods with the updated configuration are launched gradually, while old Pods are terminated.

Creating and Managing StatefulSets

Which method to Create StatefulSet. Declarative or Imperative?

You can create a Kubernetes StatefulSet using declarative or imperative method.

  • Declarative method allows you to define the desired state of your StatefulSet in a YAML manifest file which specifies the configuration details like container image, replicas, storage needs, and labels. You then use the kubectl apply command to apply the configurations defined in the YAML file and create the StatefulSet.
  • Imperative method allows you directly issue kubectl create command to configure and create the StatefulSet. In the command you can specify parameters like image, replicas, storage, and labels.

While both methods work fine, the disadvantages of the imperative method is that it is:

  • Error Prone: Manual configuration through commands increases the risk of typos or syntax errors.
  • Difficult to Reproduce: It’s challenging to recreate the exact configuration later or deploy it consistently across environments without a manifest file.
  • Limited Functionality: Imperative commands might not offer the same level of flexibility and detail as a well-defined YAML manifest.

Configure Persistent Storage Provisioner

Stateful applications often require persistent storage to store data that survives Pod restarts or rescheduling. For that reason, you need to configure a PersistentVolume StorageClass with a specific provisioner, which defines how storage volumes will be provisioned in your Kubernetes cluster. This StorageClass provisioner takes care of dynamically allocating storage based on your Persistent Volume Claim (PVC) requests within the StatefulSet. Some popular provisioners include:

  • Local (local storage path on a node for testing purposes only)
  • Ceph RBD (Ceph block storage): provisioner for block storage using Ceph.
  • AWS Elastic Block Store (EBS) (for AWS deployments)
  • Azure Managed Disks (for Azure deployments)
  • Google Persistent Disk (GPD): Provisioner for Google Cloud Persistent Disks in Google Cloud Platform (GCP).
  • OpenEBS: OpenEBS provisioner for block storage using iSCSI.
  • NFS Client Provisioner: Provisioner for NFS volumes provided by an NFS server.
  • e.t.c

In this guide, we are running a Kubernetes cluster on an on-premise local server and for that reason, we will use local storage provisioner.

Info
Using local storage provisioner is recommended for development or testing only. For production environments, consider using Persistent Volumes (PVs) with storage classes for data persistence and better handling of pod failures or scaling. According to Kubernetes, local volumes are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume becomes inaccessible by the pod. The pod using this volume is unable to run. Applications using local volumes must be able to tolerate this reduced availability, as well as potential data loss, depending on the durability characteristics of the underlying disk.

Check to see if there are any existing StorageClasses in your Kubernetes cluster:

kubectl get storageclass -A

If no suitable StorageClass exists in your respective namespace, create one based on your storage solution or requirements.

Example StorageClass for local volumes:

cat local-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

Where:

  • metadata.name: Specifies the name of the StorageClass (local-storage in this example).
  • provisioner: Set to kubernetes.io/no-provisioner since local volumes are not provisioned by a storage provider.
  • volumeBindingMode: Determines when a PersistentVolume (PV) is bound. WaitForFirstConsumer enables binding of the PV when the first PVC using this StorageClass is created.

Apply the YAML definition to create the StorageClass in your cluster:

kubectl apply -f <filename>.yaml

For example;

kubectl apply -f local-storageclass.yaml

Confirm;

kubectl get storageclass
NAME            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-storage   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  26s

In Kubernetes versions 1.30 and above, the default reclaimPolicy for StorageClasses created without explicitly specifying the reclaim policy type, is set to “Delete.” This means by default, Kubernetes will attempt to reclaim the volume when the last Persistent Volume Claim (PVC) referencing it is deleted i.e, when a PVC is deleted, the corresponding PV is automatically deleted. However, since we are using the kubernetes.io/no-provisioner which implies no dynamic storege provisioning, there won’t be any PVs automatically created for this StorageClass and as such, the reclaimPolicy setting in the StorageClass becomes less relevant because there are no PVs that will be dynamically provisioned. Therefore, when you create a PV manually, you can define the reclaim policy to apply to your PVCs.

Prepare Persistent Volume (PV) for each Node

Next, you need to manually create Persistent Volume (PV) that matches the StorageClass used above for the local volume.

If you have a single worker node, it is enough to create a PV for that single node only. However, if you have multiple worker nodes and want to schedule pods on each node, then you need to create a PV for each node.

By creating a PV for each node, you define that a particular PV is bound to the specific worker node in the cluster. This ensures that when a pod is scheduled on a node, it will use the PV attached to that node. This node affinity prevents data loss because if a pod is rescheduled due to node failure or maintenance, Kubernetes will attempt to reschedule it on the same node where its PV exists.

Note that, you need to define the nodeAffinity when using local volumes. Reason being:

  • Local volumes are tied to specific directories on worker nodes. Without nodeAffinity, a pod requesting a local volume could potentially be scheduled to any node in the cluster, even if the directory containing the data doesn’t exist there. This can lead to pod scheduling failures or unexpected behavior.
  • Enforcing nodeAffinity for local volumes ensures explicit scheduling of pods to the intended nodes. This promotes clarity and consistency in your deployments, especially when dealing with multiple worker nodes.

We have three worker nodes;

kubectl get nodes
NAME        STATUS   ROLES           AGE   VERSION
master-01   Ready    control-plane   12d   v1.30.2
master-02   Ready    control-plane   12d   v1.30.2
master-03   Ready    control-plane   12d   v1.30.2
worker-01   Ready    <none>      33h   v1.30.2
worker-02   Ready    <none>      33h   v1.30.2
worker-03   Ready    <none>      33h   v1.30.2

Here is our sample PV manifest file for each of our three worker nodes.

cat mysql-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv-worker-01
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/k8s/mysql/data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - worker-01
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv-worker-02
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/k8s/mysql/data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - worker-02
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-pv-worker-03
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/k8s/mysql/data/
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - worker-03

Where:

  • apiVersion: Specifies the Kubernetes API version (v1).
  • kind: Specifies the resource type (PersistentVolume).
  • metadata: Provides metadata for the PV, including its name (mysql-pv-NODE-NAME).
  • spec: Specifies the specifications for the PV:
    • capacity: Defines the storage capacity of the PV (10Gi).
    • volumeMode: Indicates the volume mode (Filesystem). Can also be set to Block is using raw block devices
    • accessModes: Specifies the access mode (ReadWriteOnce), allowing read and write access by a single node concurrently.
  • persistentVolumeReclaimPolicy: Sets the reclaim policy for the PV to Retain, meaning that when the associated PersistentVolumeClaim (PVC) is deleted, the PV is not automatically deleted.
  • storageClassName: Associates the PV with a StorageClass named local-storage created before.
  • local: Specifies that this PV uses local storage on the Kubernetes nodes.
    • path: Defines the local path on the node (/mnt/k8s/mysql/data) where the PV will be mounted. It must already exist on the node when you deploy your application that uses it.
  • nodeAffinity: Defines node affinity rules, ensuring the PV is bound only to nodes that match specific criteria:
    • required: Specifies that the PV should be available on nodes that match the following conditions.
      • nodeSelectorTerms: Specifies a list of node selectors.
        • matchExpressions: Defines how nodes are selected based on label keys and values.
          • key: Specifies the label key (kubernetes.io/hostname).
          • operator: Defines the operator for matching (In, meaning the label value must match one of the specified values).
          • values: Lists the specific worker node hostnames (worker-01, worker-02, worker-03) where the PV should be available.

Now, create the PV;;

kubectl apply -f mysql-pv.yaml

Confirm;

kubectl get pv
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM   STORAGECLASS    VOLUMEATTRIBUTESCLASS   REASON   AGE
mysql-pv-worker-01     10Gi       RWO            Retain           Available             local-storage   <unset>                          5s
mysql-pv-worker-02     10Gi       RWO            Retain           Available             local-storage   <unset>                          5s
mysql-pv-worker-03     10Gi       RWO            Retain           Available             local-storage   <unset>                          5s

Create a PersistentVolumeClaim (PVC)

Now, you would create a PersistentVolumeClaim that your application pods in your StatefulSet will use to request storage.

However, to automate the PVC creation and ensure consistent storage configurations across all pods managed by the StatefulSet, we will utilize volumeClaimTemplates. volumeClaimTemplates allow you to define PVC specifications directly within the StatefulSet configuration. Kubernetes uses these templates to automatically create PVCs when the StatefulSet is deployed or updated.

Creating a StatefulSet in Kubernetes Cluster

Now that we have the StorageClass, PV and a PVC, you can now create a StatefulSet that provides a persistent storage for each pod.

So, we will combine two resources in one manifest file; the service to expose MySQL app, the MySQL app StatefulSet.

This is our sample manifest file;

cat mysql-app.yaml

apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - port: 3306
    name: db
  clusterIP: None
  selector:
    app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
          name: db
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-data
        persistentVolumeClaim:
          claimName: mysql-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage

This sample YAML configuration, consists of a Kubernetes Service (mysql) and a StatefulSet (mysql) that deploys MySQL Pods with persistent storage.

In summary:

  • This StatefulSet, named mysql, orchestrates a MySQL deployment with three replicas, each labeled app: mysql. Pods use the mysql:latest Docker image and expose MySQL on port 3306. Persistent storage of 10Gi is requested per pod using mysql-persistent-storage VolumeClaimTemplates with ReadWriteOnce access mode, ensuring reliable data persistence and scalability in Kubernetes environments.

Ensure the local storage path already exists on ALL worker nodes:

sudo mkdir -p /mnt/k8s/mysql/data

Then apply the StatefulSet configuration:

kubectl apply -f mysql-app.yaml

Listing StatefulSets

You can list statefulsets;

kubectl get statefulset

This shows the statefulsets in the default namespace. To check specific namespace, add the option [–namespace <namespace name>|-n <namespace-name>]

Sample output;

NAME    READY   AGE
mysql   3/3     30s

To print a detailed description;

kubectl describe statefulset mysql
Name:               mysql
Namespace:          default
CreationTimestamp:  Sat, 29 Jun 2024 07:33:20 +0000
Selector:           app=mysql
Labels:             
Annotations:        
Replicas:           3 desired | 3 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=mysql
  Containers:
   mysql:
    Image:      mysql:8.0
    Port:       3306/TCP
    Host Port:  0/TCP
    Environment:
      MYSQL_ROOT_PASSWORD:  password
    Mounts:
      /var/lib/mysql from mysql-data (rw)
  Volumes:         
  Node-Selectors:  
  Tolerations:     
Volume Claims:
  Name:          mysql-data
  StorageClass:  local-storage
  Labels:        
  Annotations:   
  Capacity:      10Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age    From                    Message
  ----    ------            ----   ----                    -------
  Normal  SuccessfulCreate  3m27s  statefulset-controller  create Claim mysql-data-mysql-0 Pod mysql-0 in StatefulSet mysql success
  Normal  SuccessfulCreate  3m27s  statefulset-controller  create Pod mysql-0 in StatefulSet mysql successful
  Normal  SuccessfulCreate  3m25s  statefulset-controller  create Claim mysql-data-mysql-1 Pod mysql-1 in StatefulSet mysql success
  Normal  SuccessfulCreate  3m25s  statefulset-controller  create Pod mysql-1 in StatefulSet mysql successful
  Normal  SuccessfulCreate  3m22s  statefulset-controller  create Claim mysql-data-mysql-2 Pod mysql-2 in StatefulSet mysql success
  Normal  SuccessfulCreate  3m22s  statefulset-controller  create Pod mysql-2 in StatefulSet mysql successful

You can also get PVCs;

kubectl get pvc
NAME                   STATUS   VOLUME               CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
mysql-data-mysql-0     Bound    mysql-pv-worker-03   10Gi       RWO            local-storage   <unset>                 5m1s
mysql-data-mysql-1     Bound    mysql-pv-worker-02   10Gi       RWO            local-storage   <unset>                 4m59s
mysql-data-mysql-2     Bound    mysql-pv-worker-01   10Gi       RWO            local-storage   <unset>                 4m56s
kubectl get pv
NAME                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS    VOLUMEATTRIBUTESCLASS   REASON   AGE
mysql-pv-worker-01   10Gi       RWO            Retain           Bound    default/mysql-data-mysql-2   local-storage   <unset>                          7m43s
mysql-pv-worker-02   10Gi       RWO            Retain           Bound    default/mysql-data-mysql-1   local-storage   <unset>                          7m43s
mysql-pv-worker-03   10Gi       RWO            Retain           Bound    default/mysql-data-mysql-0   local-storage   <unset>                          7m43s

And the Pods associated with StatefulSet;

kubectl get pod
NAME                                  READY   STATUS    RESTARTS   AGE
mysql-0                               1/1     Running   0          11m
mysql-1                               1/1     Running   0          11m
mysql-2                               1/1     Running   0          11m

Scaling StatefulSets

It is also possible to scale StatefulSets. As you can see in the output, we have three statefulsets and all of them in READY state;

kubectl get statefulset
NAME    READY   AGE
mysql   3/3     15m

For example;

kubectl scale statefulset mysql --replicas=6

However, note that as per the setup of that we have used here where we have PV for each worker node, then you have to also create additional PVs to meet your requirements. Otherwise, that is how you can do stateful set scaling, especially where dynamic provisioning is supported.

You can also scale down;

kubectl scale statefulset mysql --replicas=2

Deleting StatefulSets

When deleting a StatefulSet, ensure that PVCs and the data they hold are managed correctly to avoid data loss.

Otherwise, you can delete using the manifest file;

kubectl delete -f mysql-app.yaml

Assuming that the statefulset was defined in the manifest file called mysql-app.yaml.

Similarly, use the name of the statefulset. For example, to delete just the statefulset without touching the pods or data stored in the PersistentVolumes;

kubectl delete statefulset mysql --cascade=orphan

Just omit –cascade=orphan to delete everything associated with statefulset.

And that brings us to the end of our tutorial on understanding statefulsets in Kubernetes

Conclusion

StatefulSets provide stable identities, persistent storage, and controlled scaling for stateful applications, making them essential for running databases, distributed systems, and other stateful workloads in Kubernetes environments.

Read more on;

Kubernetes StatefulSets Documentation

Kubernetes Volumes

SUPPORT US VIA A VIRTUAL CUP OF COFFEE

We're passionate about sharing our knowledge and experiences with you through our blog. If you appreciate our efforts, consider buying us a virtual coffee. Your support keeps us motivated and enables us to continually improve, ensuring that we can provide you with the best content possible. Thank you for being a coffee-fueled champion of our work!

Photo of author
Kifarunix
Linux Certified Engineer, with a passion for open-source technology and a strong understanding of Linux systems. With experience in system administration, troubleshooting, and automation, I am skilled in maintaining and optimizing Linux infrastructure.

Leave a Comment