How to display Kubernetes request and limit in Grafana / Prometheus properly
source link: https://gist.github.com/max-rocket-internet/6a05ee757b6587668a1de8a5c177728b
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
CPU: percentage of limit
A lot of people land when trying to find out how to calculate CPU usage metric correctly in prometheus, myself included! So I'll post what I eventually ended up using as I think it's still a little difficult trying to tie together all the snippets of info here and elsewhere.
This is specific to k8s and containers that have CPU limits set.
To show CPU usage as a percentage of the limit given to the container, this is the Prometheus query we used to create nice graphs in Grafana:
sum(rate(container_cpu_usage_seconds_total{name!~".*prometheus.*", image!="", container_name!="POD"}[5m])) by (pod_name, container_name) /
sum(container_spec_cpu_quota{name!~".*prometheus.*", image!="", container_name!="POD"}/container_spec_cpu_period{name!~".*prometheus.*", image!="", container_name!="POD"}) by (pod_name, container_name)
It returns a number between 0 and 1 so format the left Y axis as percent (0.0-1.0)
or multiply by 100 to get CPU usage percentage.
Note that we added some filtering here to get rid of some noise: name!~".*prometheus.*", image!="", container_name!="POD"
. The name!~".*prometheus.*"
is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.
(Title on this image is wrong)
CPU: show as cores with request/limit lines
Since some applications have a small request and large limit (to save money) or have an HPA, then just showing a percentage of the limit is sometimes not useful.
So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.
CPU usage
Legend: {{container_name}} in {{pod_name}}
Query: sum(rate(container_cpu_usage_seconds_total{pod_name=~"deployment-name-[^-]*-[^-]*$", image!="", container_name!="POD"}[5m])) by (pod_name, container_name)
CPU limit
Legend: limit
Query: sum(kube_pod_container_resource_limits_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)
CPU request
Legend: request
Query: sum(kube_pod_container_resource_requests_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)
You will need to edit these 3 queries for your environment so that only pods from a single deployment a returned, e.g. replace deployment-name
.
The pod request/limit metrics come from kube-state-metrics.
We then add 2 series overrides to hide the request and limit in the tooltip and legend:
The result looks like this:
Queries to show memory and CPU as percentage of both request and limit
Percentage of CPU request:
round(
100 *
sum(
rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m])
) by (pod, container_name, namespace, slave)
/
sum(
kube_pod_container_resource_requests_cpu_cores{container_name!="POD"}
) by (pod, container_name, namespace, slave)
)
Percentage of CPU limit:
round(
100 *
sum(
rate(container_cpu_usage_seconds_total{image!="", container_name!="POD"}[5m])
) by (pod_name, container_name, namespace, slave)
/
sum(
container_spec_cpu_quota{image!="", container_name!="POD"} / container_spec_cpu_period{image!="", container_name!="POD"}
) by (pod_name, container_name, namespace, slave)
)
Percentage of memory request:
round(
100 *
sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod, namespace, slave)
/
sum(kube_pod_container_resource_requests_memory_bytes{container_name!="POD"} > 0) by (container, pod, namespace, slave)
)
Percentage of memory limit:
round(
100 *
sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod_name, namespace, slave)
/
sum(container_spec_memory_limit_bytes{image!="", container_name!="POD"} > 0) by (container, pod_name, namespace, slave)
)
Recommend
-
140
In a previous post, I've described how we can monitor our spring boot application(s) metrics over time, in a time-series fashion, using the elastic stack. In this post we'll discuss how to achieve the same goal, using another open source stack:...
-
56
一、基础概念1.1基础概念Kubernetes(通常写成“k8s”)Kubernetes是Google开源的容器集群管理系统。其设计目标是在主机集群之间提供一个能够自动化部署、可拓展、应用容器可运营的平台。Kubernetes通常结合docker容器工具工作,并且整合多个运行着docker容器的主机...
-
56
-
55
prometheus/node_exporter/mysqld_exporter都是由go语言编写,需要先安装GoLang环境 下载 node_exporter (监控服务器的CPU、内存、存储使用情况)和
-
26
Prometheus Grafana K8S集群监控
-
8
使用Prometheus进行Kubernetes监视:AlertManager,Grafana,PushGateway(第2部分) 要部署真正的Kubernetes和微服务监控解决方案,需要许多其他支持组件,包括规则和警报(AlertManager),图形可视化层(Grafana),长期指标存储以及与该软...
-
5
介绍 为什么可能需要监控不止一个Kubernetes集群,一般这有两大原因。第一个场景是不同阶段各有专门的集群,比如开发集群,预生产集群和生产集群。另一个场景是集群上运行着受托服务,或者有客户在集群上运行着工作负载,而你...
-
9
Blog / Engineering New in Grafana Loki 2.5: Faster queries, more log sources, so long S3 rate limits, and more!
-
6
Beautiful Dashboards with Grafana and Prometheus - Monitoring Kubernetes Tutorial By Techno Tim Posted 14 hours ago Updated 1...
-
3
Conversation rustbot...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK