{"id":23008,"date":"2024-07-02T00:43:06","date_gmt":"2024-07-01T21:43:06","guid":{"rendered":"https:\/\/kifarunix.com\/?p=23008"},"modified":"2024-07-12T22:24:51","modified_gmt":"2024-07-12T19:24:51","slug":"mastering-kubernetes-autoscaling-horizontal-vs-vertical-scaling","status":"publish","type":"post","link":"https:\/\/kifarunix.com\/mastering-kubernetes-autoscaling-horizontal-vs-vertical-scaling\/","title":{"rendered":"Mastering Kubernetes Autoscaling: Horizontal vs Vertical Scaling"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1063\" height=\"588\" src=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/07\/kubernetes-hpa-autoscaling.png\" alt=\"Mastering Kubernetes Autoscaling: Horizontal vs Vertical Scaling\" class=\"wp-image-23028\" title=\"\" srcset=\"https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/07\/kubernetes-hpa-autoscaling.png?v=1719869986 1063w, https:\/\/kifarunix.com\/wp-content\/uploads\/2024\/07\/kubernetes-hpa-autoscaling-768x425.png?v=1719869986 768w\" sizes=\"(max-width: 1063px) 100vw, 1063px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-left\">This tutorial serves as a guide to mastering <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/workloads\/autoscaling\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kubernetes Autoscaling<\/a>. We&#8217;ll explore the two main techniques of Kubernetes scaling: <strong>horizontal<\/strong> scaling and <strong>vertical<\/strong> scaling. Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) are the tools that implement these concepts within Kubernetes, respectively. By understanding both horizontal and vertical scaling, along with HPA and VPA, you&#8217;ll be equipped to achieve peak performance and efficient resource management for your containerized applications. Ensuring optimal resource utilization is crucial for cloud applications built with Kubernetes. However, fluctuating workloads can quickly turn manual scaling into a tedious and inefficient task. This is where Kubernetes autoscaling comes in, offering a dynamic approach to resource management. Let&#8217;s get started.<\/p>\n\n\n\n<div class=\"wp-block-rank-math-toc-block aligncenter\" id=\"rank-math-toc\"><h2>Table of Contents<\/h2><nav><ul><li><a href=\"#kubernetes-autoscaling-hpa-vs-vpa\">Kubernetes Autoscaling: HPA vs VPA<\/a><ul><li><a href=\"#what-is-autoscaling\">What is Autoscaling?<\/a><\/li><li><a href=\"#benefits-of-autoscaling\">Benefits of Autoscaling<\/a><\/li><li><a href=\"#types-of-autoscaling-in-kubernetes\">Types of Autoscaling in Kubernetes<\/a><ul><li><a href=\"#horizontal-scaling-with-horizontal-pod-autoscaler-hpa\">Horizontal Scaling with Horizontal Pod Autoscaler (HPA)<\/a><\/li><li><a href=\"#vertical-scaling-with-vertical-pod-autoscaler-vpa\">Vertical Scaling with Vertical Pod Autoscaler (VPA)<\/a><\/li><\/ul><\/li><li><a href=\"#prerequisites-for-enabling-autoscaling-in-kubernetes\">Prerequisites for Enabling Autoscaling in Kubernetes<\/a><\/li><li><a href=\"#creating-horizontal-pod-autoscalers-hpa-in-kubernetes-cluster\">Creating Horizontal Pod Autoscalers (HPA) in Kubernetes Cluster<\/a><ul><li><a href=\"#install-kubernetes-metrics-server\">Install Kubernetes Metrics Server<\/a><\/li><li><a href=\"#deploy-an-application\">Deploy an Application<\/a><\/li><li><a href=\"#define-resource-requests-and-limits-in-pod-specifications\">Define Resource Requests and Limits in Pod Specifications<\/a><\/li><li><a href=\"#create-horizontal-pod-autoscaler-hpa-resource\">Create HorizontalPodAutoscaler (HPA) Resource<\/a><\/li><li><a href=\"#list-available-hp-as\">List available HPAs;<\/a><\/li><\/ul><\/li><li><a href=\"#simulating-events-to-trigger-horizontal-scaling\">Simulating Events to Trigger Horizontal Scaling<\/a><\/li><li><a href=\"#using-vpa-to-autoscale-kubernetes-containers\">Using VPA to Autoscale Kubernetes Containers<\/a><\/li><\/ul><\/li><li><a href=\"#conclusion\">Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"kubernetes-autoscaling-hpa-vs-vpa\">Kubernetes Autoscaling: HPA vs VPA<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"what-is-autoscaling\">What is Autoscaling?<\/h3>\n\n\n\n<p class=\"has-text-align-left\">Autoscaling is a technique that is used to automatically adjust the resources allocated to an application based on predefined metrics. An application deployment\/statefulset or any other resource can be dynamically scaled up or down to meet fluctuating demand, ensuring efficient resource utilization and optimal application performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"benefits-of-autoscaling\">Benefits of Autoscaling<\/h3>\n\n\n\n<p class=\"has-text-align-left\">Autoscaling is needed in cloud-native environments like Kubernetes for several reasons:<\/p>\n\n\n\n<ol class=\"wp-block-list\" style=\"list-style-type:rich\">\n<li><strong>Dynamic Workload Demands:<\/strong> Applications often experience fluctuating traffic patterns throughout the day or across different seasons. Autoscaling allows resources to be added or removed automatically based on these variations, ensuring that the application can handle peak loads without being over-provisioned during quieter periods.<\/li>\n\n\n\n<li><strong>Optimal Resource Utilization:<\/strong> Without autoscaling, resources may be provisioned based on peak loads, leading to underutilization during off-peak times and increased costs. Autoscaling adjusts resources dynamically, optimizing resource utilization and reducing unnecessary expenses.<\/li>\n\n\n\n<li><strong>Improved Performance and Availability:<\/strong> By scaling resources in response to workload changes, autoscaling helps maintain consistent performance levels and availability. It ensures that applications remain responsive even under heavy traffic conditions, enhancing user experience and minimizing downtime.<\/li>\n\n\n\n<li><strong>Cost Efficiency:<\/strong> Autoscaling helps control infrastructure costs by scaling resources up only when needed and scaling down during periods of lower demand. This elasticity allows organizations to pay for resources based on actual usage rather than maintaining fixed-capacity infrastructure.<\/li>\n\n\n\n<li><strong>Operational Efficiency:<\/strong> Manual scaling processes can be time-consuming and prone to human error. Autoscaling automates the scaling process, reducing the burden on operations teams and enabling faster response to workload changes.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"types-of-autoscaling-in-kubernetes\">Types of Autoscaling in Kubernetes<\/h3>\n\n\n\n<p class=\"has-text-align-left\">There are different types of workload autoscaling in Kubernetes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Horizontal workload autoscaling<\/li>\n\n\n\n<li>Vertical workload autoscaling<\/li>\n\n\n\n<li>Cluster-size based autoscaling<\/li>\n\n\n\n<li>Event-driven autoscaling<\/li>\n\n\n\n<li>Autoscaling based on schedules.<\/li>\n<\/ul>\n\n\n\n<p class=\"has-text-align-left\">In this guide, we will be focusing on Horizontal and Vertical workload autoscaling.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"horizontal-scaling-with-horizontal-pod-autoscaler-hpa\">Horizontal Scaling with Horizontal Pod Autoscaler (HPA)<\/h4>\n\n\n\n<p class=\"has-text-align-left\">In Kubernetes, horizontal scaling refers to the capability of increasing or decreasing the number of replicas (instances) of a particular workload based on metrics such as CPU utilization, memory usage or any custom chosen metrics to meet the demand (HPA mostly uses CPU metric). This is achieved using the <strong>Horizontal Pod Autoscaler (HPA)<\/strong>, a Kubernetes API resource\/controller that automatically adjusts the number of Pods in a replication controller, deployment, replica set, or stateful set.<\/p>\n\n\n\n<p class=\"has-text-align-left\">How does horizontal scaling work in Kubernetes?<\/p>\n\n\n\n<p class=\"has-text-align-left\">First of all, you need to enable horizontal scaling by defining an HPA resource in Kubernetes and specifying the target resource (deployment, replica set, stateful set etc.) and the metrics against which scaling decisions should be made (e.g., CPU utilization, memory usage).<\/p>\n\n\n\n<p class=\"has-text-align-left\">When specified metrics exceed the defined thresholds, Kubernetes increases the number of replicas (Pods) to distribute the workload across more instances, ensuring optimal performance and responsiveness. Conversely, when workload decreases and metrics fall below a specified threshold, Kubernetes scales down the number of replicas to conserve resources and reduce costs.<\/p>\n\n\n\n<p class=\"has-text-align-left\">HPA api resource\/controller is available out-of-the-box in Kubernetes cluster.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"vertical-scaling-with-vertical-pod-autoscaler-vpa\">Vertical Scaling with Vertical Pod Autoscaler (VPA)<\/h4>\n\n\n\n<p class=\"has-text-align-left\">In Kubernetes, Vertical scaling refers to the practice of dynamically adjusting the CPU and memory resource requests and limits for already running Pods within a deployment, statefulset, or replicaset to ensure that they have sufficient resources to meet their workload requirements effectively.<\/p>\n\n\n\n<p class=\"has-text-align-left\">Vertical scaling is achieved using <strong>Vertical Pod Autoscaler<\/strong>, aka, VPA. Kubernetes does not ship with VPA API resource\/controller by default. Hence, you need to install it as custom resource if you want to use it.<\/p>\n\n\n\n<p class=\"has-text-align-left\">So, how does Vertical scaling works in Kubernetes?<\/p>\n\n\n\n<p class=\"has-text-align-left\">Unlike HPA which adds or removes Pods in a cluster to meet the demand, VPA focuses on the analysis of individual Pod resource utilization (CPU, memory) and recommends optimal requests and limits. These requests and limits define the minimum and maximum resources a pod can use. VPA can automatically update them, ensuring pods have the resources they need to function effectively without exceeding limitations.<\/p>\n\n\n\n<p class=\"has-text-align-left\">Read more on Kubernetes Vertical scaling on the guide <a href=\"https:\/\/kifarunix.com\/kubernetes-resource-optimization-with-vertical-pod-autoscaler-vpa\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kubernetes Resource Optimization with Vertical Pod Autoscaler (VPA)<\/a>.<\/p>\n\n\n\n<p>Are you preparing to take a Certified Kubernetes Administrator (CKA) certification exam? Look no further as Certified Kubernetes Administrator (CKA) Study Guide: In-Depth Guidance and Practice\u00a01st Edition by Benjamin Muschko is what you are looking for.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-amazon wp-block-embed-amazon\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Certified Kubernetes Administrator (CKA) Study Guide: In-Depth Guidance and Practice\" type=\"text\/html\" width=\"1200\" height=\"550\" frameborder=\"0\" allowfullscreen style=\"max-width:100%\" src=\"https:\/\/read.amazon.com\/kp\/card?preview=inline&#038;linkCode=ll1&#038;ref_=k4w_oembed_OU3qcTI8HwivYs&#038;asin=1098107225&#038;tag=dc42a8f60962-20\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-left\" id=\"prerequisites-for-enabling-autoscaling-in-kubernetes\">Prerequisites for Enabling Autoscaling in Kubernetes<\/h3>\n\n\n\n<p class=\"has-text-align-left\">Enabling autoscaling in Kubernetes typically requires several prerequisites to ensure effective operation:<\/p>\n\n\n\n<ol class=\"wp-block-list\" style=\"list-style-type:rich\">\n<li><strong>Metrics Server:<\/strong> A functional Kubernetes Metrics Server is essential for gathering resource utilization metrics such as CPU and memory usage from cluster nodes and Pods. This server provides the data necessary for autoscaling controllers to make scaling decisions.<br>Follow the link below on how to install and configure Metrics API Server in Kubernetes.<br><a href=\"https:\/\/kifarunix.com\/install-kubernetes-metrics-server-on-a-kubernetes-cluster\/\" target=\"_blank\" rel=\"noreferrer noopener\">Install Kubernetes Metrics Server on a Kubernetes Cluster<\/a><\/li>\n\n\n\n<li><strong>Resource Metrics:<\/strong> Ensure that Pods or applications within the cluster are configured to expose relevant resource metrics. This includes defining metrics endpoints or utilizing metrics providers compatible with Kubernetes (e.g., Prometheus).<\/li>\n\n\n\n<li><strong>Autoscaler Installation:<\/strong> Depending on the type of autoscaling (horizontal or vertical), install the appropriate autoscaler components:\n<ul class=\"wp-block-list\">\n<li><strong>Horizontal Pod Autoscaler (HPA):<\/strong> Typically included by default in Kubernetes clusters, but verify and ensure it is enabled.<\/li>\n\n\n\n<li><strong>Vertical Pod Autoscaler (VPA):<\/strong> Install VPA components as Custom Resource Definitions (CRDs) if not already included in the Kubernetes distribution.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>A running Kubernetes cluster<\/strong>: of course, you must be having a running Kubernetes cluster e.g., Minikube, GKE, EKS, e.t.c.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-left\" id=\"creating-horizontal-pod-autoscalers-hpa-in-kubernetes-cluster\">Creating Horizontal Pod Autoscalers (HPA) in Kubernetes Cluster<\/h3>\n\n\n\n<p class=\"has-text-align-left\">To horizontally scale your resources; Deployments, StatefulSets, ReplicaSets&#8230; you need to create the Horizontal Pod Autoscaler (HPA).<\/p>\n\n\n\n<h4 class=\"wp-block-heading has-text-align-left\" id=\"install-kubernetes-metrics-server\">Install Kubernetes Metrics Server<\/h4>\n\n\n\n<p class=\"has-text-align-left\">Before you can proceed, ensure Metrics Server has been installed as outlined <a href=\"https:\/\/kifarunix.com\/install-kubernetes-metrics-server-on-a-kubernetes-cluster\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<p class=\"has-text-align-left\">Just to confirm that we have the metric server up and running, let&#8217;s check resource usage of the pods in my default namespace.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl top pod<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Sample output;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>NAME                                  CPU(cores)   MEMORY(bytes)   \nexample-deployment-77d66d9f6f-h9nwn   0m           2Mi             \nexample-deployment-77d66d9f6f-hhq4t   0m           2Mi             \nexample-deployment-77d66d9f6f-lnpm8   0m           2Mi             \nmysql-0                               9m           350Mi           \nmysql-1                               9m           350Mi\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">If Metrics server is not installed yet, you will get such an output as, <strong>error: Metrics API not available<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"deploy-an-application\">Deploy an Application<\/h4>\n\n\n\n<p class=\"has-text-align-left\">If you already have an application in place that you want to autoscale, then you are good to go.<\/p>\n\n\n\n<p class=\"has-text-align-left\">We already have a sample Nginx app running under a namespace called <strong>apps<\/strong>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get deployment -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        READY   UP-TO-DATE   AVAILABLE   AGE\nnginx-app   3\/3     3            3           13d\noperator    1\/1     1            1           11d\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">This is the deployment that we will try to scale horizontally. It currently has three replicas. So, first of all, let&#8217;s scale it down to single Pod.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl scale deployment nginx-app --replicas=1 -n apps<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Confirm;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get deployment -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        READY   UP-TO-DATE   AVAILABLE   AGE\nnginx-app   1\/1     1            1           13d\noperator    1\/1     1            1           11d\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"define-resource-requests-and-limits-in-pod-specifications\">Define Resource Requests and Limits in Pod Specifications<\/h4>\n\n\n\n<p class=\"has-text-align-left\">When deploying your application, you need to specify CPU and memory <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/configuration\/manage-resources-containers\/\" target=\"_blank\" rel=\"noreferrer noopener\">requests and limits<\/a> in the Pod&#8217;s container specifications. This helps Kubernetes scheduler make placement decisions based on available resources and ensures fair resource allocation among Pods. It also helps the autoscalers to improve resource usage in the cluster.<\/p>\n\n\n\n<p class=\"has-text-align-left\"><strong>Requests<\/strong> specify guaranteed minimum resources (CPU and memory) for a container while <strong>limits<\/strong> define maximum resource that the containers can consume.<\/p>\n\n\n\n<p class=\"has-text-align-left\">To check if the resource requests and limits are set for a deployment, you can use <strong>kubectl get<\/strong> command. For example, to check resource definition for my <strong>nginx-app<\/strong> deployment in the <strong>apps<\/strong> namespace.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get deployment nginx-app -n apps -o yaml<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Under the deployment <strong>container template specifications,<\/strong> you should see the resource requests and limits for the Pods.<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>  template:\n    metadata:\n      creationTimestamp: null\n      labels:\n        app: nginx-app\n    spec:\n      containers:\n      - image: nginx:latest\n        imagePullPolicy: Always\n        name: nginx\n        <strong>resources:\n          limits:\n            cpu: 500m\n            memory: 512Mi\n          requests:\n            cpu: 100m\n            memory: 256Mi<\/strong>\n        terminationMessagePath: \/dev\/termination-log\n        terminationMessagePolicy: File\n        volumeMounts:\n        - mountPath: \/usr\/share\/nginx\/html\n          name: html-volume\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">If you don&#8217;t see them on your Deployment, then these resources requests and limits are not defined.<\/p>\n\n\n\n<p class=\"has-text-align-left\">You can edit the deployment and set them;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl edit deployment nginx-app -n apps<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">And add your resource requests and limits as shown above.<\/p>\n\n\n\n<p class=\"has-text-align-left\">You can also use <strong>kubectl set<\/strong> command to set the CPU and Memory requests and limits.<\/p>\n\n\n\n<p class=\"has-text-align-left\">To set the requests;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --requests=cpu=100m<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --requests=memory=256Mi<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Limits;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --limits=cpu=500m<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --limits=memory=512Mi<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Or you can combine the resources;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --requests=cpu=100m,memory=256Mi<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl set resources deployment nginx-app -n apps --limits=cpu=500m,memory=512Mi<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Without the resource requests defined, you HPA may show <strong>unknown<\/strong> for the target metric being checked.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get hpa -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        REFERENCE              TARGETS              MINPODS   MAXPODS   REPLICAS   AGE\nnginx-app   Deployment\/nginx-app   cpu: &lt;unknown&gt;\/80%   1         3         1          5m29s\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Similarly, you may see that <strong>the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu in container<\/strong> when you render the HPA;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl describe hpa nginx-app -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>Name:                                                  nginx-app\nNamespace:                                             apps\nLabels:                                                &lt;none&gt;\nAnnotations:                                           &lt;none&gt;\nCreationTimestamp:                                     Sun, 30 Jun 2024 13:17:00 +0000\nReference:                                             Deployment\/nginx-app\nMetrics:                                               ( current \/ target )\n  resource cpu on pods  (as a percentage of request):  &lt;unknown&gt; \/ 80%\nMin replicas:                                          1\nMax replicas:                                          3\nDeployment pods:                                       1 current \/ 0 desired\nConditions:\n  Type           Status  Reason                   Message\n  ----           ------  ------                   -------\n  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale\n  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu in container nginx of Pod nginx-app-6ff7b5d8f6-k5k48\nEvents:\n  Type     Reason                        Age                    From                       Message\n  ----     ------                        ----                   ----                       -------\n  Warning  FailedComputeMetricsReplicas  4m3s (x12 over 6m48s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: missing request for cpu in container nginx of Pod nginx-app-6ff7b5d8f6-k5k48\n  Warning  FailedGetResourceMetric       108s (x21 over 6m48s)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu in container nginx of Pod nginx-app-6ff7b5d8f6-k5k48\n<\/code><\/pre>\n\n\n\n<p>Once you expose the Deployment resource metrics, you should be able to see the correct current and target metrics limit.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get hpa -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%   1         3         1          27s\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"create-horizontal-pod-autoscaler-hpa-resource\">Create HorizontalPodAutoscaler (HPA) Resource<\/h4>\n\n\n\n<p class=\"has-text-align-left\">You can create an HPA resource declaratively using manifests YAML file or imperatively via the <strong>kubectl autoscale<\/strong> command.<\/p>\n\n\n\n<div class=\"info-panel\">\n    <div class=\"info-panel-header\">Note\n    <\/div>\n    <div class=\"info-panel-content\">Each HPA is associated with a specific workload controller (Deployment, ReplicaSet, or StatefulSet), and there isn&#8217;t a direct way to create a generic HPA that scales multiple deployments simultaneously.\n    <\/div>\n<\/div>\n\n\n\n<p class=\"has-text-align-left\">Let&#8217;s see how to create an HPA imperatively via the <strong>kubectl autoscale<\/strong> command.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl autoscale --help<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Sample command to create an HPA that autoscales my deployment, <strong>nginx-apps<\/strong>, in the <strong>apps<\/strong> namespace;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl autoscale deployment nginx-app --cpu-percent=80 --min=1 --max=3 -n apps<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>--cpu-percent=80<\/code>: This flag sets the target CPU utilization for the deployment. The HPA will aim to maintain the average CPU usage across all pods in the deployment at or below 80%.<\/li>\n\n\n\n<li><code>--min=1<\/code>: This flag sets the minimum number of replicas allowed for the deployment. Even if the HPA determines scaling down is necessary based on CPU usage, it won&#8217;t go below 1 replica.<\/li>\n\n\n\n<li><code>--max=3<\/code>: This flag sets the maximum number of replicas allowed for the deployment. The HPA won&#8217;t scale the deployment beyond 3 replicas regardless of how high the CPU usage climbs.<\/li>\n<\/ul>\n\n\n\n<p>The HPA will take the name of the deployment if you don&#8217;t specify the name. Use <strong>&#8211;name &lt;name&gt;<\/strong> to set custom HPA name.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl autoscale deployment nginx-app --name nginx-app --cpu-percent=80 --min=1 --max=3 -n apps<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">In essence:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scaling Up:<\/strong> When the average CPU usage across all pods in the deployment consistently exceeds 80% for a period defined by the HPA&#8217;s cooldown configuration, the HPA triggers a scale-up event. It adds additional replicas to the deployment to distribute the workload and bring the average CPU utilization back down towards the target of 80%.<\/li>\n\n\n\n<li><strong>Scaling Down:<\/strong> While the <code class=\"\">--cpu-percent<\/code> flag doesn&#8217;t define a specific threshold for scaling down, the HPA does have built-in logic for scaling down deployments. This logic considers several factors, including:\n<ul class=\"wp-block-list\">\n<li><strong>Target CPU Utilization:<\/strong> The HPA aims to maintain the average CPU usage around the target of 80%. If the average CPU utilization remains consistently below 80% for a cooldown period, the HPA might consider scaling down.<\/li>\n\n\n\n<li><strong>Minimum Replicas:<\/strong> The HPA won&#8217;t scale the deployment below the minimum number of replicas specified by the <code class=\"\">--min=1<\/code> flag in this case.<\/li>\n\n\n\n<li><strong>HPA Cool Down:<\/strong> Even if the CPU usage falls below 80%, the HPA won&#8217;t immediately scale down. It waits for a cooldown period to ensure the decrease in resource usage is not a temporary fluctuation.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"has-text-align-left\">To create an HPA via declarative way, create a manifest file;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat hpa.yaml<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>apiVersion: autoscaling\/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n  name: nginx-app\n  namespace: apps\nspec:\n  metrics:\n  - type: Resource\n    resource:\n      name: cpu\n      target:\n        averageUtilization: 80\n        type: Utilization\n  minReplicas: 1 \n  maxReplicas: 3\n  scaleTargetRef:\n    apiVersion: apps\/v1\n    kind: Deployment\n    name: nginx-app\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Then apply;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl apply -f hpa.yaml<\/code><\/pre>\n\n\n\n<p>If you want to use both CPU and Memory resouce usage to scale the Deployment pods, then edit the manifest file and define both resources;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cat hpa.yaml<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>apiVersion: autoscaling\/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n  name: nginx-app\n  namespace: apps\nspec:\n  metrics:\n  - type: Resource\n    resource:\n      name: cpu\n      target:\n        averageUtilization: 80\n        type: Utilization\n  - type: Resource\n    resource:\n      name: memory\n      target:\n        type: Utilization\n        averageUtilization: 80\n  minReplicas: 1\n  maxReplicas: 3\n  scaleTargetRef:\n    apiVersion: apps\/v1\n    kind: Deployment\n    name: nginx-app\n<\/code><\/pre>\n\n\n\n<p>Once you apply, you should then see both metrics shown on the HPA.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"list-available-hp-as\">List available HPAs;<\/h4>\n\n\n\n<p class=\"has-text-align-left\">Use the command below to list available HPAs. You can omit <strong>-n &lt;namespace&gt;<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get hpa -n apps<\/code><\/pre>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%   1         3         1          12m\n<\/code><\/pre>\n\n\n\n<p>If you had defined both CPU and Memory, output;<\/p>\n\n\n\n\n\n<p class=\"has-text-align-left\">Get Details of an HPA:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl describe hpa &lt;name&gt; -n &lt;namespace&gt;<\/code><\/pre>\n\n\n\n<div class=\"info-panel\">\n    <div class=\"info-panel-header\">Note\n    <\/div>\n    <div class=\"info-panel-content\">When you use more than on resource metric, all of them will be used simultaneously to determine whether to scale the number of replicas up or down for the target Deployment. Kubernetes will prioritize the metric that requires more scaling action to scale the Deployment. For example, if CPU utilization is at 90% (yet threshold is set to 80%) and memory utilization is at 81% (yet threshold is set to 80%), the HPA will prioritize scaling based on CPU utilization because it&#8217;s further from the target\n    <\/div>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"simulating-events-to-trigger-horizontal-scaling\">Simulating Events to Trigger Horizontal Scaling<\/h3>\n\n\n\n<p class=\"has-text-align-left\">Now, let&#8217;s see how we can stress our app to test the horizontal scaling.<\/p>\n\n\n\n<p class=\"has-text-align-left\">We will use <strong>ApacheBench<\/strong> (<strong>ab<\/strong>) to perform load testing on our web app.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>while true; do sleep 0.01; ab -n 500000 -c 1000 http:\/\/192.168.122.62:30833\/; done<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">The command will send 500,000 HTTP requests to <code>http:\/\/192.168.122.62:30833\/<\/code> with a concurrency level of 1000 requests at a time, pausing for 0.01 seconds between each iteration.<\/p>\n\n\n\n<p class=\"has-text-align-left\">Before you execute the command, run the command below on the control plane node to watch how HPA responds to the load on the web app;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get hpa -n apps -w<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Before we executed the load testing command, this is the status of the HPA;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%   1         3         1          27m\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">Now, execute the load the testing command and see what happens.<\/p>\n\n\n\n<p class=\"has-text-align-left\">Sample output of the load testing command;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>This is ApacheBench, Version 2.3 <$Revision: 1903618 $>\nCopyright 1996 Adam Twiss, Zeus Technology Ltd, http:\/\/www.zeustech.net\/\nLicensed to The Apache Software Foundation, http:\/\/www.apache.org\/\n\nBenchmarking 192.168.122.62 (be patient)\nCompleted 50000 requests\nCompleted 100000 requests\nCompleted 150000 requests\nCompleted 200000 requests\nCompleted 250000 requests\nCompleted 300000 requests\nCompleted 350000 requests\nCompleted 400000 requests\nCompleted 450000 requests\napr_pollset_poll: The timeout specified has expired (70007)\nTotal of 499996 requests completed\nThis is ApacheBench, Version 2.3 <$Revision: 1903618 $>\nCopyright 1996 Adam Twiss, Zeus Technology Ltd, http:\/\/www.zeustech.net\/\nLicensed to The Apache Software Foundation, http:\/\/www.apache.org\/\n\nBenchmarking 192.168.122.62 (be patient)\nCompleted 50000 requests\nCompleted 100000 requests\nCompleted 150000 requests\nCompleted 200000 requests\nCompleted 250000 requests\nCompleted 300000 requests\nCompleted 350000 requests\nCompleted 400000 requests\nCompleted 450000 requests\napr_pollset_poll: The timeout specified has expired (70007)\nTotal of 499995 requests completed\nThis is ApacheBench, Version 2.3 <$Revision: 1903618 $>\nCopyright 1996 Adam Twiss, Zeus Technology Ltd, http:\/\/www.zeustech.net\/\nLicensed to The Apache Software Foundation, http:\/\/www.apache.org\/\n\nBenchmarking 192.168.122.62 (be patient)\nCompleted 50000 requests\nCompleted 100000 requests\nCompleted 150000 requests\nCompleted 200000 requests\nCompleted 250000 requests\nCompleted 300000 requests\nCompleted 350000 requests\nCompleted 400000 requests\nCompleted 450000 requests\napr_pollset_poll: The timeout specified has expired (70007)\nTotal of 499997 requests completed\nThis is ApacheBench, Version 2.3 <$Revision: 1903618 $>\nCopyright 1996 Adam Twiss, Zeus Technology Ltd, http:\/\/www.zeustech.net\/\nLicensed to The Apache Software Foundation, http:\/\/www.apache.org\/\n\nBenchmarking 192.168.122.62 (be patient)\nCompleted 50000 requests\nCompleted 100000 requests\nCompleted 150000 requests\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">HPA response to load;<\/p>\n\n\n\n<pre class=\"scroll-box\"><code>NAME        REFERENCE              TARGETS       MINPODS   MAXPODS   REPLICAS   AGE\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%   1         3         1          27m\nnginx-app   Deployment\/nginx-app   cpu: 124%\/80%   1         3         1          44m\nnginx-app   Deployment\/nginx-app   cpu: 64%\/80%    1         3         2          44m\nnginx-app   Deployment\/nginx-app   cpu: 135%\/80%   1         3         2          44m\nnginx-app   Deployment\/nginx-app   cpu: 179%\/80%   1         3         3          44m\nnginx-app   Deployment\/nginx-app   cpu: 160%\/80%   1         3         3          45m\nnginx-app   Deployment\/nginx-app   cpu: 159%\/80%   1         3         3          45m\nnginx-app   Deployment\/nginx-app   cpu: 157%\/80%   1         3         3          45m\nnginx-app   Deployment\/nginx-app   cpu: 30%\/80%    1         3         3          45m\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%     1         3         3          46m\nnginx-app   Deployment\/nginx-app   cpu: 70%\/80%    1         3         3          46m\nnginx-app   Deployment\/nginx-app   cpu: 156%\/80%   1         3         3          46m\nnginx-app   Deployment\/nginx-app   cpu: 154%\/80%   1         3         3          46m\nnginx-app   Deployment\/nginx-app   cpu: 153%\/80%   1         3         3          47m\nnginx-app   Deployment\/nginx-app   cpu: 118%\/80%   1         3         3          47m\nnginx-app   Deployment\/nginx-app   cpu: 31%\/80%    1         3         3          47m\nnginx-app   Deployment\/nginx-app   cpu: 14%\/80%    1         3         3          47m\nnginx-app   Deployment\/nginx-app   cpu: 98%\/80%    1         3         3          48m\nnginx-app   Deployment\/nginx-app   cpu: 158%\/80%   1         3         3          48m\nnginx-app   Deployment\/nginx-app   cpu: 160%\/80%   1         3         3          48m\nnginx-app   Deployment\/nginx-app   cpu: 156%\/80%   1         3         3          48m\nnginx-app   Deployment\/nginx-app   cpu: 103%\/80%   1         3         3          49m\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%     1         3         3          49m\nnginx-app   Deployment\/nginx-app   cpu: 96%\/80%    1         3         3          49m\nnginx-app   Deployment\/nginx-app   cpu: 151%\/80%   1         3         3          49m\nnginx-app   Deployment\/nginx-app   cpu: 100%\/80%   1         3         3          50m\nnginx-app   Deployment\/nginx-app   cpu: 1%\/80%     1         3         3          50m\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%     1         3         3          50m\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%     1         3         3          55m\nnginx-app   Deployment\/nginx-app   cpu: 0%\/80%     1         3         1          55m\n<\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left\">As a summary, the Horizontal Pod Autoscaler (HPA) for the <code>nginx-app<\/code> deployment responded dynamically to varying levels of CPU utilization over a period of time. Initially, with low CPU usage, it maintained 1 replica. As CPU demand increased, the HPA gradually scaled up, reaching a peak of 3 replicas when CPU usage spiked to 179%. Throughout this scaling process, the HPA adjusted replicas based on the configured target of 80% CPU utilization, demonstrating its ability to automatically scale the deployment to handle increased workload and then scale down as demand decreased, ensuring efficient resource utilization.<\/p>\n\n\n\n<p class=\"has-text-align-left\">After stopping the AB load tester, the last entry shows a reduction to 1 replica suggesting a decrease in CPU load, indicating the HPA&#8217;s ongoing responsiveness to workload fluctuations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"using-vpa-to-autoscale-kubernetes-containers\">Using VPA to Autoscale Kubernetes Containers<\/h3>\n\n\n\n<p class=\"has-text-align-left\">In our next guide, we will learn how to install and use VPA to control resource usage by Kubernetes cluster.<\/p>\n\n\n\n<p class=\"has-text-align-left\"><a href=\"https:\/\/kifarunix.com\/kubernetes-resource-optimization-with-vertical-pod-autoscaler-vpa\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kubernetes Resource Optimization with Vertical Pod Autoscaler (VPA)<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h2>\n\n\n\n<p class=\"has-text-align-left\">In this blog post, you have learnt the major types of autoscaling in Kubernetes, horizontal and vertical scaling along with the api resources that implements them, HPA and VPA. As you can see from the simulation done above, Kubernetes provides robust autoscaling capabilities through Horizontal Pod Autoscaler (HPA). HPA scales the number of pod replicas based on metrics like CPU utilization or custom metrics, ensuring applications can handle varying workloads efficiently.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial serves as a guide to mastering Kubernetes Autoscaling. We&#8217;ll explore the two main techniques of Kubernetes scaling: horizontal scaling and vertical scaling. Horizontal<\/p>\n","protected":false},"author":10,"featured_media":23028,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_lock_modified_date":false,"footnotes":""},"categories":[1668,1076,121],"tags":[7553,7555,7552,7554],"class_list":["post-23008","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kubernetes","category-containers","category-howtos","tag-horizontal-scaling","tag-hpa-vs-vpa","tag-kubernetes-hpa-vs-vpa","tag-vertical-scale","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50","resize-featured-image"],"_links":{"self":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/23008"}],"collection":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/comments?post=23008"}],"version-history":[{"count":31,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/23008\/revisions"}],"predecessor-version":[{"id":23182,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/posts\/23008\/revisions\/23182"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media\/23028"}],"wp:attachment":[{"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/media?parent=23008"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/categories?post=23008"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kifarunix.com\/wp-json\/wp\/v2\/tags?post=23008"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}