How to Use Kubernetes Horizontal Pod Autoscaler?
source link: https://www.geeksforgeeks.org/how-to-use-kubernetes-horizontal-pod-autoscaler/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How to Use Kubernetes Horizontal Pod Autoscaler?
Pre-requisites: Introduction to Kubernetes
The process of automatically scaling in and scaling out of resources is called Autoscaling. There are three different types of autoscalers in Kubernetes: cluster autoscalers, horizontal pod autoscalers, and vertical pod autoscalers. In this article, we’re going to see Horizontal Pod Autoscaler.
Application running workload can be scaled manually by changing the replicas field in the workload manifest file. Although manual scaling is okay for times when you can anticipate load spikes in advance or when the load changes gradually over long periods of time, requiring manual intervention to handle sudden, unpredictable traffic increases isn’t ideal.
To solve this problem Kubernetes has a resource called Horizontal Pod Autoscaler that can monitor pods and scale them automatically as soon as it detects an increase in CPU or memory usage (Based on a defined metric). Horizontal Pod Autoscaling is the process of automatically scaling the number of pod replicas managed by a controller based on the usage of the defined metric, which is managed by Horizontal Pod Autoscaler Kubernetes resource to match the demand.
Setup a Cluster
These steps are necessary to use Autoscaling features
1. Start your cluster
$ minikube start
2. Enable metrics-server addon to collect metrics of resources
$ minikube addons enable metrics-server
3. Edit metrics-server deployment by adding –kubelet-insecure-tls argument
$ kubectl -n kube-system edit deploy metrics-server
containers:
– args:
– –cert-dir=/tmp
– –secure-port=8448
– –kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
– –kubelet-insecure-tls
4. Let’s create a deployment for our demo purpose, I chose Nginx as our application with 1 replica. This deployment requests 100 millicores of CPU per pod.
apiVersion: apps/v1 kind: Deployment metadata: name: webserver labels: app: backend spec: replicas: 1 selector: matchLabels: app: backend template: metadata: labels: app: backend spec: containers: - name: nginx image: nginx:1.23-alpine imagePullPolicy: IfNotPresent resources: limits: cpu: 200m memory: 20Mi requests: cpu: 100m memory: 10Mi
$ kubectl create -f nginx-deploy.yaml
Scaling Based on CPU Usage
One of the most important metrics to define autoscaling is CPU usage. Let’s say the CPU usage of processes running inside your pod reaches 100% then they can’t match the demand anymore. To solve this problem either you can increase the amount of CPU a pod can use (Vertical scale) or increase the number of pods (Horizontal scale) so that the average CPU usage comes down, enough talking let’s create a Horizontal Pod Autoscaler resource based on CPU usage and see it in action.
1. Create a Horizontal Pod Autoscaler resource for our deployment.
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: webserver-cpu-hpa spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: webserver targetCPUUtilizationPercentage: 30
Let’s understand what are these attributes
- maxReplicas – Maximum number of replicas to scale-out
- minReplicas – Minimum number of replicas to scale-in
- scaleTargetRef – Target resource to act upon, in our case webserver deployment
- targetCPUUtilizationPercentage – CPU utilization percentage to adjust the number of pods by Autoscaler so they each utilize 30% of the requested CPU.
Now create the resource
$ kubectl create -f nginx-deploy-cpu-hpa.yaml
Let’s put some load on our deployment so that we can see scaling in action
2. First of all, expose our application as NodePort service otherwise how can we load test our application
$ kubectl expose deploy webserver \ --type=NodePort --port=8080 \ --target-port=80
3. Now comes the interesting part which is load testing. For load testing, I’m using the siege tool.
Here 250 concurrent users simulate the load for 2 minutes, you can change it accordingly.
$ siege -c 250 -t 2m http://127.0.0.1:58421
(replace http://127.0.0.1:58421 with the NodePort service address)
Open another terminal and watch the resources and you will see an increase in the number of pods. Keep an eye on the number of pods because as soon as HPA detects the CPU usage exceeds it will create more pods to handle the load.
$ watch -n 1 kubectl get all po,hpa
Since the load crosses the limit, HPA increased the number of replicas from 1 to 2
now the CPU usage becomes 0 it scales down replicas to the minimum replicas (1) defined in the HPA manifest file.
Scaling Based on Memory Usage
This time we’ll configure HPA based on memory usage
1. Creating a Horizontal Pod Autoscaler resource based on memory usage
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: webserver-mem-hpa spec: maxReplicas: 5 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: webserver metrics: - type: Resource resource: name: memory target: type: Utilization averageValue: 2Mi
Here we mention averageValue as 2Mi because nginx deployment is very lightweight so we’ve to set it so that we can see scaling based on memory.
$ kubectl create -f nginx-deploy-mem-hpa.yaml
Again load test and watch resources in another terminal
again the memory usage exceeds so the HPA spins new pod replicas.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK